1*055d4590SKeyi Gui<html> 2*055d4590SKeyi Gui<head> 3*055d4590SKeyi Gui<title>Dalvik Bytecode Verifier Notes</title> 4*055d4590SKeyi Gui</head> 5*055d4590SKeyi Gui 6*055d4590SKeyi Gui<body> 7*055d4590SKeyi Gui<h1>Dalvik Bytecode Verifier Notes</h1> 8*055d4590SKeyi Gui 9*055d4590SKeyi Gui<p> 10*055d4590SKeyi GuiThe bytecode verifier in the Dalvik VM attempts to provide the same sorts 11*055d4590SKeyi Guiof checks and guarantees that other popular virtual machines do. We 12*055d4590SKeyi Guiperform generally the same set of checks as are described in _The Java 13*055d4590SKeyi GuiVirtual Machine Specification, Second Edition_, including the updates 14*055d4590SKeyi Guiplanned for the Third Edition. 15*055d4590SKeyi Gui 16*055d4590SKeyi Gui<p> 17*055d4590SKeyi GuiVerification can be enabled for all classes, disabled for all, or enabled 18*055d4590SKeyi Guionly for "remote" (non-bootstrap) classes. It should be performed for any 19*055d4590SKeyi Guiclass that will be processed with the DEX optimizer, and in fact the 20*055d4590SKeyi Guidefault VM behavior is to only optimize verified classes. 21*055d4590SKeyi Gui 22*055d4590SKeyi Gui 23*055d4590SKeyi Gui<h2>Why Verify?</h2> 24*055d4590SKeyi Gui 25*055d4590SKeyi Gui<p> 26*055d4590SKeyi GuiThe verification process adds additional time to the build and to 27*055d4590SKeyi Guithe installation of new applications. It's fairly quick for app-sized 28*055d4590SKeyi GuiDEX files, but rather slow for the big "core" and "framework" files. 29*055d4590SKeyi GuiWhy do it all, when our system relies on UNIX processes for security? 30*055d4590SKeyi Gui<p> 31*055d4590SKeyi Gui<ol> 32*055d4590SKeyi Gui <li>Optimizations. The interpreter can ignore a lot of potential 33*055d4590SKeyi Gui error cases because the verifier guarantees that they are impossible. 34*055d4590SKeyi Gui Also, we can optimize the DEX file more aggressively if we start 35*055d4590SKeyi Gui with a stronger set of assumptions about the bytecode. 36*055d4590SKeyi Gui <li>"Precise" GC. The work peformed during verification has significant 37*055d4590SKeyi Gui overlap with the work required to compute register use maps for 38*055d4590SKeyi Gui type-precise GC. 39*055d4590SKeyi Gui <li>Intra-application security. If an app wants to download bits 40*055d4590SKeyi Gui of interpreted code over the network and execute them, it can safely 41*055d4590SKeyi Gui do so using well-established security mechanisms. 42*055d4590SKeyi Gui <li>3rd party app failure analysis. We have no way to control the 43*055d4590SKeyi Gui tools and post-processing utilities that external developers employ, 44*055d4590SKeyi Gui so when we get bug reports with a weird exception or native crash 45*055d4590SKeyi Gui it's very helpful to start with the assumption that the bytecode 46*055d4590SKeyi Gui is valid. 47*055d4590SKeyi Gui</ol> 48*055d4590SKeyi Gui<p> 49*055d4590SKeyi GuiIt's also a convenient framework to deal with certain situations, notably 50*055d4590SKeyi Guireplacement of instructions that access volatile 64-bit fields with 51*055d4590SKeyi Guimore rigorous versions that guarantee atomicity. 52*055d4590SKeyi Gui 53*055d4590SKeyi Gui 54*055d4590SKeyi Gui<h2>Verifier Differences</h2> 55*055d4590SKeyi Gui 56*055d4590SKeyi Gui<p> 57*055d4590SKeyi GuiThere are a few checks that the Dalvik bytecode verifier does not perform, 58*055d4590SKeyi Guibecause they're not relevant. For example: 59*055d4590SKeyi Gui<ul> 60*055d4590SKeyi Gui <li>Type restrictions on constant pool references are not enforced, 61*055d4590SKeyi Gui because Dalvik does not have a pool of typed constants. (Dalvik 62*055d4590SKeyi Gui uses a simple index into type-specific pools.) 63*055d4590SKeyi Gui <li>Verification of the operand stack size is not performed, because 64*055d4590SKeyi Gui Dalvik does not have an operand stack. 65*055d4590SKeyi Gui <li>Limitations on <code>jsr</code> and <code>ret</code> do not apply, 66*055d4590SKeyi Gui because Dalvik doesn't support subroutines. 67*055d4590SKeyi Gui</ul> 68*055d4590SKeyi Gui 69*055d4590SKeyi GuiIn some cases they are implemented differently, e.g.: 70*055d4590SKeyi Gui<ul> 71*055d4590SKeyi Gui <li>In a conventional VM, backward branches and exceptions are 72*055d4590SKeyi Gui forbidden when a local variable holds an uninitialized reference. The 73*055d4590SKeyi Gui restriction was changed to mark registers as invalid when they hold 74*055d4590SKeyi Gui references to the uninitialized result of a previous invocation of the 75*055d4590SKeyi Gui same <code>new-instance</code> instruction. 76*055d4590SKeyi Gui This solves the same problem -- trickery potentially allowing 77*055d4590SKeyi Gui uninitialized objects to slip past the verifier -- without unduly 78*055d4590SKeyi Gui limiting branches. 79*055d4590SKeyi Gui</ul> 80*055d4590SKeyi Gui 81*055d4590SKeyi GuiThere are also some new ones, such as: 82*055d4590SKeyi Gui<ul> 83*055d4590SKeyi Gui <li>The <code>move-exception</code> instruction can only appear as 84*055d4590SKeyi Gui the first instruction in an exception handler. 85*055d4590SKeyi Gui <li>The <code>move-result*</code> instructions can only appear 86*055d4590SKeyi Gui immediately after an appropriate <code>invoke-*</code> 87*055d4590SKeyi Gui or <code>filled-new-array</code> instruction. 88*055d4590SKeyi Gui</ul> 89*055d4590SKeyi Gui 90*055d4590SKeyi Gui<p> 91*055d4590SKeyi GuiThe VM is permitted but not required to enforce "structured locking" 92*055d4590SKeyi Guiconstraints, which are designed to ensure that, when a method returns, all 93*055d4590SKeyi Guimonitors locked by the method have been unlocked an equal number of times. 94*055d4590SKeyi GuiThis is not currently implemented. 95*055d4590SKeyi Gui 96*055d4590SKeyi Gui<p> 97*055d4590SKeyi GuiThe Dalvik verifier is more restrictive than other VMs in one area: 98*055d4590SKeyi Guitype safety on sub-32-bit integer widths. These additional restrictions 99*055d4590SKeyi Guishould make it impossible to, say, pass a value outside the range 100*055d4590SKeyi Gui[-128, 127] to a function that takes a <code>byte</code> as an argument. 101*055d4590SKeyi Gui 102*055d4590SKeyi Gui 103*055d4590SKeyi Gui<h2>Monitor Verification</h2> 104*055d4590SKeyi Gui 105*055d4590SKeyi Gui<p> 106*055d4590SKeyi GuiIf a method locks an object with a <code>synchronized</code> statement, the 107*055d4590SKeyi Guiobject must be unlocked before the method returns. At the bytecode level, 108*055d4590SKeyi Guithis means the method must execute a matching <code>monitor-exit</code> 109*055d4590SKeyi Guifor every <code>monitor-enter</code> instruction, whether the function 110*055d4590SKeyi Guicompletes normally or abnormally. The bytecode verifier optionally 111*055d4590SKeyi Guienforces this. 112*055d4590SKeyi Gui 113*055d4590SKeyi Gui<p> 114*055d4590SKeyi GuiThe verifier uses a fairly simple-minded model. If you enter a monitor 115*055d4590SKeyi Guiheld in register N, you can exit the monitor using register N or any 116*055d4590SKeyi Guisubsequently-made copies of register N. The verifier does not attempt 117*055d4590SKeyi Guito identify previously-made copies, track loads and stores through 118*055d4590SKeyi Guifields, or recognize identical constant values (for example, the result 119*055d4590SKeyi Guivalues from two <code>const-class</code> instructions on the same class 120*055d4590SKeyi Guiwill be the same reference, but the verifier doesn't recognize this). 121*055d4590SKeyi Gui 122*055d4590SKeyi Gui<p> 123*055d4590SKeyi GuiFurther, you may only exit the monitor most recently entered. "Hand 124*055d4590SKeyi Guiover hand" locking techniques, e.g. "lock A; lock B; unlock A; unlock B", 125*055d4590SKeyi Guiare not allowed. 126*055d4590SKeyi Gui 127*055d4590SKeyi Gui<p> 128*055d4590SKeyi GuiThis means that there are a number of situations in which the verifier 129*055d4590SKeyi Guiwill throw an exception on code that would execute correctly at run time. 130*055d4590SKeyi GuiThis is not expected to be an issue for compiler-generated bytecode. 131*055d4590SKeyi Gui 132*055d4590SKeyi Gui<p> 133*055d4590SKeyi GuiFor implementation convenience, the maximum nesting depth of 134*055d4590SKeyi Gui<code>synchronized</code> statements has been set to 32. This is not 135*055d4590SKeyi Guia limitation on the recursion count. The only way to trip this would be 136*055d4590SKeyi Guito have a single method with more than 32 nested <code>synchronized</code> 137*055d4590SKeyi Guistatements, something that is unlikely to occur. 138*055d4590SKeyi Gui 139*055d4590SKeyi Gui 140*055d4590SKeyi Gui<h2>Verification Failures</h2> 141*055d4590SKeyi Gui 142*055d4590SKeyi Gui<p> 143*055d4590SKeyi GuiThe verifier may reject a class immediately, or it may defer throwing 144*055d4590SKeyi Guian exception until the code is actually used. For example, if a class 145*055d4590SKeyi Guiattempts to perform an illegal access on a field, the VM should throw 146*055d4590SKeyi Guian IllegalAccessError the first time the instruction is encountered. 147*055d4590SKeyi GuiOn the other hand, if a class contains an invalid bytecode, it should be 148*055d4590SKeyi Guirejected immediately with a VerifyError. 149*055d4590SKeyi Gui 150*055d4590SKeyi Gui<p> 151*055d4590SKeyi GuiImmediate VerifyErrors are accompanied by detailed, if somewhat cryptic, 152*055d4590SKeyi Guiinformation in the log file. From this it's possible to determine the 153*055d4590SKeyi Guiexact instruction that failed, and the reason for the failure. 154*055d4590SKeyi Gui 155*055d4590SKeyi Gui<p> 156*055d4590SKeyi GuiIt's a bit tricky to implement deferred verification errors in Dalvik. 157*055d4590SKeyi GuiA few approaches were considered: 158*055d4590SKeyi Gui 159*055d4590SKeyi Gui<ol> 160*055d4590SKeyi Gui<li>We could replace the invalid field access instruction with a special 161*055d4590SKeyi Guiinstruction that generates an illegal access error, and allow class 162*055d4590SKeyi Guiverification to complete successfully. This type of verification must 163*055d4590SKeyi Guibe deferred to first class load, rather than be performed ahead of time 164*055d4590SKeyi Guiduring DEX optimization, because some failures will depend on the current 165*055d4590SKeyi Guiexecution environment (e.g. not all classes are available at dexopt time). 166*055d4590SKeyi GuiAt that point the bytecode instructions are mapped read-only during 167*055d4590SKeyi Guiverification, so rewriting them isn't possible. 168*055d4590SKeyi Gui</li> 169*055d4590SKeyi Gui 170*055d4590SKeyi Gui<li>We can perform the access checks when the field/method/class is 171*055d4590SKeyi Guiresolved. In a typical VM implementation we would do the check when the 172*055d4590SKeyi Guientry is resolved in the context of the current classfile, but our DEX 173*055d4590SKeyi Guifiles combine multiple classfiles together, merging the field/method/class 174*055d4590SKeyi Guiresolution results into a single large table. Once one class successfully 175*055d4590SKeyi Guiresolves the field, every other class in the same DEX file would be able 176*055d4590SKeyi Guito access the field. This is incorrect. 177*055d4590SKeyi Gui</li> 178*055d4590SKeyi Gui 179*055d4590SKeyi Gui<li>Perform the access checks on every field/method/class access. 180*055d4590SKeyi GuiThis adds significant overhead. This is mitigated somewhat by the DEX 181*055d4590SKeyi Guioptimizer, which will convert many field/method/class accesses into a 182*055d4590SKeyi Guisimpler form after performing the access check. However, not all accesses 183*055d4590SKeyi Guican be optimized (e.g. accesses to classes unknown at dexopt time), 184*055d4590SKeyi Guiand we don't currently have an optimized form of certain instructions 185*055d4590SKeyi Gui(notably static field operations). 186*055d4590SKeyi Gui</li> 187*055d4590SKeyi Gui</ol> 188*055d4590SKeyi Gui 189*055d4590SKeyi Gui<p> 190*055d4590SKeyi GuiIn early versions of Dalvik (as found in Android 1.6 and earlier), the verifier 191*055d4590SKeyi Guisimply regarded all problems as immediately fatal. This generally worked, 192*055d4590SKeyi Guibut in some cases the VM was rejecting classes because of bits of code 193*055d4590SKeyi Guithat were never used. The VerifyError itself was sometimes difficult to 194*055d4590SKeyi Guidecipher, because it was thrown during verification rather than at the 195*055d4590SKeyi Guipoint where the problem was first noticed during execution. 196*055d4590SKeyi Gui<p> 197*055d4590SKeyi GuiThe current version uses a variation of approach #1. The dexopt 198*055d4590SKeyi Guicommand works the way it did before, leaving the code untouched and 199*055d4590SKeyi Guiflagging fully-correct classes as "pre-verified". When the VM loads a 200*055d4590SKeyi Guiclass that didn't pass pre-verification, the verifier is invoked. If a 201*055d4590SKeyi Gui"deferrable" problem is detected, a modifiable copy of the instructions 202*055d4590SKeyi Guiin the problematic method is made. In that copy, the troubled instruction 203*055d4590SKeyi Guiis replaced with an "always throw" opcode, and verification continues. 204*055d4590SKeyi Gui 205*055d4590SKeyi Gui<p> 206*055d4590SKeyi GuiIn the example used earlier, an attempt to read from an inaccessible 207*055d4590SKeyi Guifield would result in the "field get" instruction being replaced by 208*055d4590SKeyi Gui"always throw IllegalAccessError on field X". Creating copies of method 209*055d4590SKeyi Guibodies requires additional heap space, but since this affects very few 210*055d4590SKeyi Guimethods overall the memory impact should be minor. 211*055d4590SKeyi Gui 212*055d4590SKeyi Gui<p> 213*055d4590SKeyi Gui<address>Copyright © 2008 The Android Open Source Project</address> 214*055d4590SKeyi Gui 215*055d4590SKeyi Gui</body> 216*055d4590SKeyi Gui</html> 217