xref: /aosp_15_r20/dalvik/docs/verifier.html (revision 055d459012065f78d96b68be8421640240ddf631)
1*055d4590SKeyi Gui<html>
2*055d4590SKeyi Gui<head>
3*055d4590SKeyi Gui<title>Dalvik Bytecode Verifier Notes</title>
4*055d4590SKeyi Gui</head>
5*055d4590SKeyi Gui
6*055d4590SKeyi Gui<body>
7*055d4590SKeyi Gui<h1>Dalvik Bytecode Verifier Notes</h1>
8*055d4590SKeyi Gui
9*055d4590SKeyi Gui<p>
10*055d4590SKeyi GuiThe bytecode verifier in the Dalvik VM attempts to provide the same sorts
11*055d4590SKeyi Guiof checks and guarantees that other popular virtual machines do.  We
12*055d4590SKeyi Guiperform generally the same set of checks as are described in _The Java
13*055d4590SKeyi GuiVirtual Machine Specification, Second Edition_, including the updates
14*055d4590SKeyi Guiplanned for the Third Edition.
15*055d4590SKeyi Gui
16*055d4590SKeyi Gui<p>
17*055d4590SKeyi GuiVerification can be enabled for all classes, disabled for all, or enabled
18*055d4590SKeyi Guionly for "remote" (non-bootstrap) classes.  It should be performed for any
19*055d4590SKeyi Guiclass that will be processed with the DEX optimizer, and in fact the
20*055d4590SKeyi Guidefault VM behavior is to only optimize verified classes.
21*055d4590SKeyi Gui
22*055d4590SKeyi Gui
23*055d4590SKeyi Gui<h2>Why Verify?</h2>
24*055d4590SKeyi Gui
25*055d4590SKeyi Gui<p>
26*055d4590SKeyi GuiThe verification process adds additional time to the build and to
27*055d4590SKeyi Guithe installation of new applications.  It's fairly quick for app-sized
28*055d4590SKeyi GuiDEX files, but rather slow for the big "core" and "framework" files.
29*055d4590SKeyi GuiWhy do it all, when our system relies on UNIX processes for security?
30*055d4590SKeyi Gui<p>
31*055d4590SKeyi Gui<ol>
32*055d4590SKeyi Gui    <li>Optimizations.  The interpreter can ignore a lot of potential
33*055d4590SKeyi Gui    error cases because the verifier guarantees that they are impossible.
34*055d4590SKeyi Gui    Also, we can optimize the DEX file more aggressively if we start
35*055d4590SKeyi Gui    with a stronger set of assumptions about the bytecode.
36*055d4590SKeyi Gui    <li>"Precise" GC.  The work peformed during verification has significant
37*055d4590SKeyi Gui    overlap with the work required to compute register use maps for
38*055d4590SKeyi Gui    type-precise GC.
39*055d4590SKeyi Gui    <li>Intra-application security.  If an app wants to download bits
40*055d4590SKeyi Gui    of interpreted code over the network and execute them, it can safely
41*055d4590SKeyi Gui    do so using well-established security mechanisms.
42*055d4590SKeyi Gui    <li>3rd party app failure analysis.  We have no way to control the
43*055d4590SKeyi Gui    tools and post-processing utilities that external developers employ,
44*055d4590SKeyi Gui    so when we get bug reports with a weird exception or native crash
45*055d4590SKeyi Gui    it's very helpful to start with the assumption that the bytecode
46*055d4590SKeyi Gui    is valid.
47*055d4590SKeyi Gui</ol>
48*055d4590SKeyi Gui<p>
49*055d4590SKeyi GuiIt's also a convenient framework to deal with certain situations, notably
50*055d4590SKeyi Guireplacement of instructions that access volatile 64-bit fields with
51*055d4590SKeyi Guimore rigorous versions that guarantee atomicity.
52*055d4590SKeyi Gui
53*055d4590SKeyi Gui
54*055d4590SKeyi Gui<h2>Verifier Differences</h2>
55*055d4590SKeyi Gui
56*055d4590SKeyi Gui<p>
57*055d4590SKeyi GuiThere are a few checks that the Dalvik bytecode verifier does not perform,
58*055d4590SKeyi Guibecause they're not relevant.  For example:
59*055d4590SKeyi Gui<ul>
60*055d4590SKeyi Gui    <li>Type restrictions on constant pool references are not enforced,
61*055d4590SKeyi Gui    because Dalvik does not have a pool of typed constants.  (Dalvik
62*055d4590SKeyi Gui    uses a simple index into type-specific pools.)
63*055d4590SKeyi Gui    <li>Verification of the operand stack size is not performed, because
64*055d4590SKeyi Gui    Dalvik does not have an operand stack.
65*055d4590SKeyi Gui    <li>Limitations on <code>jsr</code> and <code>ret</code> do not apply,
66*055d4590SKeyi Gui    because Dalvik doesn't support subroutines.
67*055d4590SKeyi Gui</ul>
68*055d4590SKeyi Gui
69*055d4590SKeyi GuiIn some cases they are implemented differently, e.g.:
70*055d4590SKeyi Gui<ul>
71*055d4590SKeyi Gui    <li>In a conventional VM, backward branches and exceptions are
72*055d4590SKeyi Gui    forbidden when a local variable holds an uninitialized reference.  The
73*055d4590SKeyi Gui    restriction was changed to mark registers as invalid when they hold
74*055d4590SKeyi Gui    references to the uninitialized result of a previous invocation of the
75*055d4590SKeyi Gui    same <code>new-instance</code> instruction.
76*055d4590SKeyi Gui    This solves the same problem -- trickery potentially allowing
77*055d4590SKeyi Gui    uninitialized objects to slip past the verifier -- without unduly
78*055d4590SKeyi Gui    limiting branches.
79*055d4590SKeyi Gui</ul>
80*055d4590SKeyi Gui
81*055d4590SKeyi GuiThere are also some new ones, such as:
82*055d4590SKeyi Gui<ul>
83*055d4590SKeyi Gui    <li>The <code>move-exception</code> instruction can only appear as
84*055d4590SKeyi Gui    the first instruction in an exception handler.
85*055d4590SKeyi Gui    <li>The <code>move-result*</code> instructions can only appear
86*055d4590SKeyi Gui    immediately after an appropriate <code>invoke-*</code>
87*055d4590SKeyi Gui    or <code>filled-new-array</code> instruction.
88*055d4590SKeyi Gui</ul>
89*055d4590SKeyi Gui
90*055d4590SKeyi Gui<p>
91*055d4590SKeyi GuiThe VM is permitted but not required to enforce "structured locking"
92*055d4590SKeyi Guiconstraints, which are designed to ensure that, when a method returns, all
93*055d4590SKeyi Guimonitors locked by the method have been unlocked an equal number of times.
94*055d4590SKeyi GuiThis is not currently implemented.
95*055d4590SKeyi Gui
96*055d4590SKeyi Gui<p>
97*055d4590SKeyi GuiThe Dalvik verifier is more restrictive than other VMs in one area:
98*055d4590SKeyi Guitype safety on sub-32-bit integer widths.  These additional restrictions
99*055d4590SKeyi Guishould make it impossible to, say, pass a value outside the range
100*055d4590SKeyi Gui[-128, 127] to a function that takes a <code>byte</code> as an argument.
101*055d4590SKeyi Gui
102*055d4590SKeyi Gui
103*055d4590SKeyi Gui<h2>Monitor Verification</h2>
104*055d4590SKeyi Gui
105*055d4590SKeyi Gui<p>
106*055d4590SKeyi GuiIf a method locks an object with a <code>synchronized</code> statement, the
107*055d4590SKeyi Guiobject must be unlocked before the method returns.  At the bytecode level,
108*055d4590SKeyi Guithis means the method must execute a matching <code>monitor-exit</code>
109*055d4590SKeyi Guifor every <code>monitor-enter</code> instruction, whether the function
110*055d4590SKeyi Guicompletes normally or abnormally.  The bytecode verifier optionally
111*055d4590SKeyi Guienforces this.
112*055d4590SKeyi Gui
113*055d4590SKeyi Gui<p>
114*055d4590SKeyi GuiThe verifier uses a fairly simple-minded model.  If you enter a monitor
115*055d4590SKeyi Guiheld in register N, you can exit the monitor using register N or any
116*055d4590SKeyi Guisubsequently-made copies of register N.  The verifier does not attempt
117*055d4590SKeyi Guito identify previously-made copies, track loads and stores through
118*055d4590SKeyi Guifields, or recognize identical constant values (for example, the result
119*055d4590SKeyi Guivalues from two <code>const-class</code> instructions on the same class
120*055d4590SKeyi Guiwill be the same reference, but the verifier doesn't recognize this).
121*055d4590SKeyi Gui
122*055d4590SKeyi Gui<p>
123*055d4590SKeyi GuiFurther, you may only exit the monitor most recently entered.  "Hand
124*055d4590SKeyi Guiover hand" locking techniques, e.g. "lock A; lock B; unlock A; unlock B",
125*055d4590SKeyi Guiare not allowed.
126*055d4590SKeyi Gui
127*055d4590SKeyi Gui<p>
128*055d4590SKeyi GuiThis means that there are a number of situations in which the verifier
129*055d4590SKeyi Guiwill throw an exception on code that would execute correctly at run time.
130*055d4590SKeyi GuiThis is not expected to be an issue for compiler-generated bytecode.
131*055d4590SKeyi Gui
132*055d4590SKeyi Gui<p>
133*055d4590SKeyi GuiFor implementation convenience, the maximum nesting depth of
134*055d4590SKeyi Gui<code>synchronized</code> statements has been set to 32.  This is not
135*055d4590SKeyi Guia limitation on the recursion count.  The only way to trip this would be
136*055d4590SKeyi Guito have a single method with more than 32 nested <code>synchronized</code>
137*055d4590SKeyi Guistatements, something that is unlikely to occur.
138*055d4590SKeyi Gui
139*055d4590SKeyi Gui
140*055d4590SKeyi Gui<h2>Verification Failures</h2>
141*055d4590SKeyi Gui
142*055d4590SKeyi Gui<p>
143*055d4590SKeyi GuiThe verifier may reject a class immediately, or it may defer throwing
144*055d4590SKeyi Guian exception until the code is actually used.  For example, if a class
145*055d4590SKeyi Guiattempts to perform an illegal access on a field, the VM should throw
146*055d4590SKeyi Guian IllegalAccessError the first time the instruction is encountered.
147*055d4590SKeyi GuiOn the other hand, if a class contains an invalid bytecode, it should be
148*055d4590SKeyi Guirejected immediately with a VerifyError.
149*055d4590SKeyi Gui
150*055d4590SKeyi Gui<p>
151*055d4590SKeyi GuiImmediate VerifyErrors are accompanied by detailed, if somewhat cryptic,
152*055d4590SKeyi Guiinformation in the log file.  From this it's possible to determine the
153*055d4590SKeyi Guiexact instruction that failed, and the reason for the failure.
154*055d4590SKeyi Gui
155*055d4590SKeyi Gui<p>
156*055d4590SKeyi GuiIt's a bit tricky to implement deferred verification errors in Dalvik.
157*055d4590SKeyi GuiA few approaches were considered:
158*055d4590SKeyi Gui
159*055d4590SKeyi Gui<ol>
160*055d4590SKeyi Gui<li>We could replace the invalid field access instruction with a special
161*055d4590SKeyi Guiinstruction that generates an illegal access error, and allow class
162*055d4590SKeyi Guiverification to complete successfully.  This type of verification must
163*055d4590SKeyi Guibe deferred to first class load, rather than be performed ahead of time
164*055d4590SKeyi Guiduring DEX optimization, because some failures will depend on the current
165*055d4590SKeyi Guiexecution environment (e.g. not all classes are available at dexopt time).
166*055d4590SKeyi GuiAt that point the bytecode instructions are mapped read-only during
167*055d4590SKeyi Guiverification, so rewriting them isn't possible.
168*055d4590SKeyi Gui</li>
169*055d4590SKeyi Gui
170*055d4590SKeyi Gui<li>We can perform the access checks when the field/method/class is
171*055d4590SKeyi Guiresolved.  In a typical VM implementation we would do the check when the
172*055d4590SKeyi Guientry is resolved in the context of the current classfile, but our DEX
173*055d4590SKeyi Guifiles combine multiple classfiles together, merging the field/method/class
174*055d4590SKeyi Guiresolution results into a single large table.  Once one class successfully
175*055d4590SKeyi Guiresolves the field, every other class in the same DEX file would be able
176*055d4590SKeyi Guito access the field.  This is incorrect.
177*055d4590SKeyi Gui</li>
178*055d4590SKeyi Gui
179*055d4590SKeyi Gui<li>Perform the access checks on every field/method/class access.
180*055d4590SKeyi GuiThis adds significant overhead.  This is mitigated somewhat by the DEX
181*055d4590SKeyi Guioptimizer, which will convert many field/method/class accesses into a
182*055d4590SKeyi Guisimpler form after performing the access check.  However, not all accesses
183*055d4590SKeyi Guican be optimized (e.g. accesses to classes unknown at dexopt time),
184*055d4590SKeyi Guiand we don't currently have an optimized form of certain instructions
185*055d4590SKeyi Gui(notably static field operations).
186*055d4590SKeyi Gui</li>
187*055d4590SKeyi Gui</ol>
188*055d4590SKeyi Gui
189*055d4590SKeyi Gui<p>
190*055d4590SKeyi GuiIn early versions of Dalvik (as found in Android 1.6 and earlier), the verifier
191*055d4590SKeyi Guisimply regarded all problems as immediately fatal.  This generally worked,
192*055d4590SKeyi Guibut in some cases the VM was rejecting classes because of bits of code
193*055d4590SKeyi Guithat were never used.  The VerifyError itself was sometimes difficult to
194*055d4590SKeyi Guidecipher, because it was thrown during verification rather than at the
195*055d4590SKeyi Guipoint where the problem was first noticed during execution.
196*055d4590SKeyi Gui<p>
197*055d4590SKeyi GuiThe current version uses a variation of approach #1.  The dexopt
198*055d4590SKeyi Guicommand works the way it did before, leaving the code untouched and
199*055d4590SKeyi Guiflagging fully-correct classes as "pre-verified".  When the VM loads a
200*055d4590SKeyi Guiclass that didn't pass pre-verification, the verifier is invoked.  If a
201*055d4590SKeyi Gui"deferrable" problem is detected, a modifiable copy of the instructions
202*055d4590SKeyi Guiin the problematic method is made.  In that copy, the troubled instruction
203*055d4590SKeyi Guiis replaced with an "always throw" opcode, and verification continues.
204*055d4590SKeyi Gui
205*055d4590SKeyi Gui<p>
206*055d4590SKeyi GuiIn the example used earlier, an attempt to read from an inaccessible
207*055d4590SKeyi Guifield would result in the "field get" instruction being replaced by
208*055d4590SKeyi Gui"always throw IllegalAccessError on field X".  Creating copies of method
209*055d4590SKeyi Guibodies requires additional heap space, but since this affects very few
210*055d4590SKeyi Guimethods overall the memory impact should be minor.
211*055d4590SKeyi Gui
212*055d4590SKeyi Gui<p>
213*055d4590SKeyi Gui<address>Copyright &copy; 2008 The Android Open Source Project</address>
214*055d4590SKeyi Gui
215*055d4590SKeyi Gui</body>
216*055d4590SKeyi Gui</html>
217