1*16467b97STreehugger RobotThe following changes (change numbers refer to perforce) were 2*16467b97STreehugger Robotmade from version 3.1.1 to 3.1.2 3*16467b97STreehugger Robot 4*16467b97STreehugger RobotRuntime 5*16467b97STreehugger Robot------- 6*16467b97STreehugger Robot 7*16467b97STreehugger RobotChange 5641 on 2009/02/20 by [email protected] 8*16467b97STreehugger Robot 9*16467b97STreehugger Robot Release version 3.1.2 of the ANTLR C runtime. 10*16467b97STreehugger Robot 11*16467b97STreehugger Robot Updated documents and release notes will have to follow later. 12*16467b97STreehugger Robot 13*16467b97STreehugger RobotChange 5639 on 2009/02/20 by [email protected] 14*16467b97STreehugger Robot 15*16467b97STreehugger Robot Fixed: ANTLR-356 16*16467b97STreehugger Robot 17*16467b97STreehugger Robot Ensure that code generation for C++ does not require casts 18*16467b97STreehugger Robot 19*16467b97STreehugger RobotChange 5577 on 2009/02/12 by [email protected] 20*16467b97STreehugger Robot 21*16467b97STreehugger Robot C Runtime - Bug fixes. 22*16467b97STreehugger Robot 23*16467b97STreehugger Robot o Having moved to use an extract directly from a vector for returning 24*16467b97STreehugger Robot tokens, it exposed a 25*16467b97STreehugger Robot bug whereby the EOF boudary calculation in tokLT was incorrectly 26*16467b97STreehugger Robot checking > rather than >=. 27*16467b97STreehugger Robot o Changing to API initialization of tokens rather than memcmp() 28*16467b97STreehugger Robot incorrectly forgot to set teh input stream pointer for the 29*16467b97STreehugger Robot manufactured tokens in the token factory; 30*16467b97STreehugger Robot o Rewrite streams for rewriting tree parsers did not check whether the 31*16467b97STreehugger Robot rewrite stream was ever assigned before trying to free it, it is now 32*16467b97STreehugger Robot in line with the ordinary parser code. 33*16467b97STreehugger Robot 34*16467b97STreehugger RobotChange 5576 on 2009/02/11 by [email protected] 35*16467b97STreehugger Robot 36*16467b97STreehugger Robot C Runtime: Ensure that when we manufacture a new token for a missing 37*16467b97STreehugger Robot token, that the user suplied custom information (if any) is copied 38*16467b97STreehugger Robot from the current token. 39*16467b97STreehugger Robot 40*16467b97STreehugger RobotChange 5575 on 2009/02/08 by [email protected] 41*16467b97STreehugger Robot 42*16467b97STreehugger Robot C Runtime - Vastly improve the reuse of allocated memory for nodes in 43*16467b97STreehugger Robot tree rewriting. 44*16467b97STreehugger Robot 45*16467b97STreehugger Robot A problem for all targets at the moment si that the rewrite logic 46*16467b97STreehugger Robot generated by ANTLR makes no attempt 47*16467b97STreehugger Robot to reuse any resources, it merely gurantees that the tree shape at the 48*16467b97STreehugger Robot end is correct. To some extent this is mitigated by the garbage 49*16467b97STreehugger Robot collection systems of Java and .Net, even thoguh it is still an overhead to 50*16467b97STreehugger Robot keep creating so many modes. 51*16467b97STreehugger Robot 52*16467b97STreehugger Robot This change implements the first of two C runtime changes that make 53*16467b97STreehugger Robot best efforst to track when a node has become orphaned and will never 54*16467b97STreehugger Robot be reused, based on inherent knowledge of the rewrite logic (which in 55*16467b97STreehugger Robot the long term is not a great soloution). 56*16467b97STreehugger Robot 57*16467b97STreehugger Robot Much of the rewrite logic consists of creating a niilnode into which 58*16467b97STreehugger Robot child nodes are appended. At: rulePost processing time; when a rewrite 59*16467b97STreehugger Robot stream is closed; and when becomeRoot is called, there are many situations 60*16467b97STreehugger Robot where the root of the tree that will be manipulted, or is finished with 61*16467b97STreehugger Robot (in the case of rewrtie streams), where the nilNode was just a temporary 62*16467b97STreehugger Robot creation for the sake of the rewrite itself. 63*16467b97STreehugger Robot 64*16467b97STreehugger Robot In these cases we can see that the nilNode would just be left ot rot in 65*16467b97STreehugger Robot the node factory that tracks all the tree nodes. 66*16467b97STreehugger Robot Rather than leave these in the factory to rot, we now keep a resuse 67*16467b97STreehugger Robot stck and always reuse any node on this 68*16467b97STreehugger Robot stack before claimin a new node from the factory pool. 69*16467b97STreehugger Robot 70*16467b97STreehugger Robot This single change alone reduces memory usage in the test case (20,604 71*16467b97STreehugger Robot line C program and a GNU C parser) 72*16467b97STreehugger Robot from nearly a GB, to 276MB. This is still way more memory than we 73*16467b97STreehugger Robot shoudl need to do this operation, even on such a large input file, 74*16467b97STreehugger Robot but the reduction results in a huge performance increase and greatly 75*16467b97STreehugger Robot reduced system time spent on allocations. 76*16467b97STreehugger Robot 77*16467b97STreehugger Robot After this optimizatoin, comparison with gcc yeilds: 78*16467b97STreehugger Robot 79*16467b97STreehugger Robot time gcc -S a.c 80*16467b97STreehugger Robot a.c:1026: warning: conflicting types for built-in function ‘vsprintf’ 81*16467b97STreehugger Robot a.c:1030: warning: conflicting types for built-in function ‘vsnprintf’ 82*16467b97STreehugger Robot a.c:1041: warning: conflicting types for built-in function ‘vsscanf’ 83*16467b97STreehugger Robot 0.21user 0.01system 0:00.22elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k 84*16467b97STreehugger Robot 0inputs+240outputs (0major+8345minor)pagefaults 0swaps 85*16467b97STreehugger Robot 86*16467b97STreehugger Robot and 87*16467b97STreehugger Robot 88*16467b97STreehugger Robot time ./jimi 89*16467b97STreehugger Robot Reading a.c 90*16467b97STreehugger Robot 0.28user 0.11system 0:00.39elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k 91*16467b97STreehugger Robot 0inputs+0outputs (0major+66609minor)pagefaults 0swaps 92*16467b97STreehugger Robot 93*16467b97STreehugger Robot And we can now interpolate the fact that the only major differnce is 94*16467b97STreehugger Robot now the huge disparity in memory allocations. A 95*16467b97STreehugger Robot future optimization of vector pooling, to sepate node resue from vector 96*16467b97STreehugger Robot reuse, currently looks promising for further reuse of memory. 97*16467b97STreehugger Robot 98*16467b97STreehugger Robot Finally, a static analysis of the rewrte code, plus a realtime analysis 99*16467b97STreehugger Robot of the heap at runtime, may well give us a reasonable memory usage 100*16467b97STreehugger Robot pattern. In reality though, it is the generated rewrite logic 101*16467b97STreehugger Robot that must becom optional at not continuously rewriting things that it 102*16467b97STreehugger Robot need not, as it ascends the rule chain. 103*16467b97STreehugger Robot 104*16467b97STreehugger RobotChange 5563 on 2009/01/28 by [email protected] 105*16467b97STreehugger Robot 106*16467b97STreehugger Robot Allow rewrite streams to use the base adaptors vector factory and not 107*16467b97STreehugger Robot try to malloc new vectors themselves. 108*16467b97STreehugger Robot 109*16467b97STreehugger RobotChange 5562 on 2009/01/28 by [email protected] 110*16467b97STreehugger Robot 111*16467b97STreehugger Robot Don't use CALLOC to allocate tree pools, use malloc as there is no need 112*16467b97STreehugger Robot for calloc. 113*16467b97STreehugger Robot 114*16467b97STreehugger RobotChange 5561 on 2009/01/28 by [email protected] 115*16467b97STreehugger Robot 116*16467b97STreehugger Robot Prevent warnigsn about retval.stop not being initialized when a rule 117*16467b97STreehugger Robot returns eraly because it is in backtracking mode 118*16467b97STreehugger Robot 119*16467b97STreehugger RobotChange 5558 on 2009/01/28 by [email protected] 120*16467b97STreehugger Robot 121*16467b97STreehugger Robot Lots of optimizations (though the next one to be checked in is the huge 122*16467b97STreehugger Robot win) for AST building and vector factories. 123*16467b97STreehugger Robot 124*16467b97STreehugger Robot A large part of tree rewriting was the creation of vectors to hold AST 125*16467b97STreehugger Robot nodes. Although I had created a vector factory, for some reason I never got 126*16467b97STreehugger Robot around to creating a proper one, that pre-allocated the vectors in chunks and 127*16467b97STreehugger Robot so on. I guess I just forgot to. Hence a big win here is prevention of calling 128*16467b97STreehugger Robot malloc lots and lots of times to create vectors. 129*16467b97STreehugger Robot 130*16467b97STreehugger Robot A second inprovement was to change teh vector definition such that it 131*16467b97STreehugger Robot holds a certain number of elements wihtin the vector structure itself, rather 132*16467b97STreehugger Robot than malloc and freeing these. Currently this is set to 8, but may increase. 133*16467b97STreehugger Robot For AST construction, this is generally a big win because AST nodes don't often 134*16467b97STreehugger Robot have many individual children unless there has not been any shaping going on in 135*16467b97STreehugger Robot the parser. But if you are not shaping, then you don't really need a tree. 136*16467b97STreehugger Robot 137*16467b97STreehugger Robot Other perforamnce inprovements here include not calling functions 138*16467b97STreehugger Robot indirectly within token stream and common token stream. Hence tokens are 139*16467b97STreehugger Robot claimed directly from the vectors. Users can override these funcitons of course 140*16467b97STreehugger Robot and all this means is that if you override tokenstreams then you pretty much 141*16467b97STreehugger Robot have to provide all the mehtods, but then I think you woudl have to anyway (and 142*16467b97STreehugger Robot I don't know of anyone that has wanted to do this as you can carry your own 143*16467b97STreehugger Robot structure around with the tokens anyway and that is much easier). 144*16467b97STreehugger Robot 145*16467b97STreehugger RobotChange 5555 on 2009/01/26 by [email protected] 146*16467b97STreehugger Robot 147*16467b97STreehugger Robot Fixed: ANTLR-288 148*16467b97STreehugger Robot Correct the interpretation of the skip token such that channel, start 149*16467b97STreehugger Robot index, char pos in lie, start line and text are correctly reset to the start of 150*16467b97STreehugger Robot the new token when the one that we just traversed was marked as being skipped. 151*16467b97STreehugger Robot 152*16467b97STreehugger Robot This correctly excludes the text that was matched as part of the 153*16467b97STreehugger Robot SKIP()ed token from the next token in the token stream and so has the side 154*16467b97STreehugger Robot effect that asking for $text of a rule no longer includes the text that shuodl 155*16467b97STreehugger Robot be skipped, but DOES include the text of tokens that were merely placed off the 156*16467b97STreehugger Robot default channel. 157*16467b97STreehugger Robot 158*16467b97STreehugger RobotChange 5551 on 2009/01/25 by [email protected] 159*16467b97STreehugger Robot 160*16467b97STreehugger Robot Fixed: ANTLR-287 161*16467b97STreehugger Robot Most of the source files did not include the BSD license. THis might 162*16467b97STreehugger Robot not be that big a deal given that I don't care what people do with it 163*16467b97STreehugger Robot other than take my name off it, but having the license reproduced 164*16467b97STreehugger Robot everywhere 165*16467b97STreehugger Robot at least makes things perfectly clear. Hence this mass change of 166*16467b97STreehugger Robot sources and templates 167*16467b97STreehugger Robot to include the license. 168*16467b97STreehugger Robot 169*16467b97STreehugger RobotChange 5550 on 2009/01/25 by [email protected] 170*16467b97STreehugger Robot 171*16467b97STreehugger Robot Fixed: ANTLR-365 172*16467b97STreehugger Robot Ensure that as soon as we known about an input stream on the lexer that 173*16467b97STreehugger Robot we borrow its string factroy adn use it in our EOF token in case 174*16467b97STreehugger Robot anyone tries to make it a string, such as in error messages for 175*16467b97STreehugger Robot instance. 176*16467b97STreehugger Robot 177*16467b97STreehugger RobotChange 5548 on 2009/01/25 by [email protected] 178*16467b97STreehugger Robot 179*16467b97STreehugger Robot Fixed: ANTLR-363 180*16467b97STreehugger Robot At some point the Java runtime default changed from discarding offchannel 181*16467b97STreehugger Robot tokens to preserving them. The fix is to make the C runtime also 182*16467b97STreehugger Robot default to preserving off-channel tokens. 183*16467b97STreehugger Robot 184*16467b97STreehugger RobotChange 5544 on 2009/01/24 by [email protected] 185*16467b97STreehugger Robot 186*16467b97STreehugger Robot Fixed: ANTLR-360 187*16467b97STreehugger Robot Ensure that the fillBuffer funtiion does not call any methods 188*16467b97STreehugger Robot that require the cached buffer size to be recorded before we 189*16467b97STreehugger Robot have actually recorded it. 190*16467b97STreehugger Robot 191*16467b97STreehugger RobotChange 5543 on 2009/01/24 by [email protected] 192*16467b97STreehugger Robot 193*16467b97STreehugger Robot Fixed: ANTLR-362 194*16467b97STreehugger Robot Some users have started using string factories themselves and 195*16467b97STreehugger Robot exposed a flaw in the destroy method, that is intended to remove 196*16467b97STreehugger Robot a strng htat was created by the factory and is no longer needed. 197*16467b97STreehugger Robot The string was correctly removed from the vector that tracks them 198*16467b97STreehugger Robot but after the first one, all the remaining strings are then numbered 199*16467b97STreehugger Robot incorrectly. Hence the destroy method has been recoded to reindex 200*16467b97STreehugger Robot the strings in the factory after one is removed and everythig is once 201*16467b97STreehugger Robot more hunky dory. 202*16467b97STreehugger Robot User suggested fix rejected. 203*16467b97STreehugger Robot 204*16467b97STreehugger RobotChange 5542 on 2009/01/24 by [email protected] 205*16467b97STreehugger Robot 206*16467b97STreehugger Robot Fixed ANTLR-366 207*16467b97STreehugger Robot The recognizer state now ensures that all fields are set to NULL upon 208*16467b97STreehugger Robotcreation 209*16467b97STreehugger Robot and the reset does not overwrite the tokenname array 210*16467b97STreehugger Robot 211*16467b97STreehugger RobotChange 5527 on 2009/01/15 by [email protected] 212*16467b97STreehugger Robot 213*16467b97STreehugger Robot Add the C runtime for 3.1.2 beta2 to perforce 214*16467b97STreehugger Robot 215*16467b97STreehugger RobotChange 5526 on 2009/01/15 by [email protected] 216*16467b97STreehugger Robot 217*16467b97STreehugger Robot Correctly define the MEMMOVE macro which was inadvertently left to be 218*16467b97STreehugger Robot memcpy. 219*16467b97STreehugger Robot 220*16467b97STreehugger RobotChange 5503 on 2008/12/12 by [email protected] 221*16467b97STreehugger Robot 222*16467b97STreehugger Robot Change C runtime release number to 3.1.2 beta 223*16467b97STreehugger Robot 224*16467b97STreehugger RobotChange 5473 on 2008/12/01 by [email protected] 225*16467b97STreehugger Robot 226*16467b97STreehugger Robot Fixed: ANTLR-350 - C runtime use of memcpy 227*16467b97STreehugger Robot Prior change to use memcpy instead of memmove in all cases missed the 228*16467b97STreehugger Robot fact that the string factory can be in a situation where overlaps occur. We now 229*16467b97STreehugger Robot have ANTLR3_MEMCPY and ANTLR3_MEMMOVE and use the two appropriately. 230*16467b97STreehugger Robot 231*16467b97STreehugger RobotChange 5471 on 2008/12/01 by [email protected] 232*16467b97STreehugger Robot 233*16467b97STreehugger Robot Fixed ANTLR-361 234*16467b97STreehugger Robot - Ensure that ANTLR3_BOOLEAN is typedef'ed correctly when building for 235*16467b97STreehugger Robot MingW 236*16467b97STreehugger Robot 237*16467b97STreehugger RobotTemplates 238*16467b97STreehugger Robot--------- 239*16467b97STreehugger Robot 240*16467b97STreehugger RobotChange 5637 on 2009/02/20 by [email protected] 241*16467b97STreehugger Robot 242*16467b97STreehugger Robot C rtunime - make sure that ADAPTOR results are cast to the tree type on 243*16467b97STreehugger Robot a rewrite 244*16467b97STreehugger Robot 245*16467b97STreehugger RobotChange 5620 on 2009/02/18 by [email protected] 246*16467b97STreehugger Robot 247*16467b97STreehugger Robot Rename/Move: 248*16467b97STreehugger Robot From: //depot/code/antlr/main/src/org/antlr/codegen/templates/... 249*16467b97STreehugger Robot To: //depot/code/antlr/main/src/main/resources/org/antlr/codegen/templates/... 250*16467b97STreehugger Robot 251*16467b97STreehugger Robot Relocate the code generating templates to exist in the directory set 252*16467b97STreehugger Robot that maven expects. 253*16467b97STreehugger Robot 254*16467b97STreehugger Robot When checking in your templates, you may find it easiest to make a copy 255*16467b97STreehugger Robot of what you have, revert the change in perforce, then just check out the 256*16467b97STreehugger Robot template in the new location, and copy the changes back over. Nobody has oore 257*16467b97STreehugger Robot than two files open at the moment. 258*16467b97STreehugger Robot 259*16467b97STreehugger RobotChange 5578 on 2009/02/12 by [email protected] 260*16467b97STreehugger Robot 261*16467b97STreehugger Robot Correct the string template escape sequences for generating scope 262*16467b97STreehugger Robot code in the C templates. 263*16467b97STreehugger Robot 264*16467b97STreehugger RobotChange 5577 on 2009/02/12 by [email protected] 265*16467b97STreehugger Robot 266*16467b97STreehugger Robot C Runtime - Bug fixes. 267*16467b97STreehugger Robot 268*16467b97STreehugger Robot o Having moved to use an extract directly from a vector for returning 269*16467b97STreehugger Robot tokens, it exposed a 270*16467b97STreehugger Robot bug whereby the EOF boudary calculation in tokLT was incorrectly 271*16467b97STreehugger Robot checking > rather than 272*16467b97STreehugger Robot >=. 273*16467b97STreehugger Robot o Changing to API initialization of tokens rather than memcmp() 274*16467b97STreehugger Robot incorrectly forgot to 275*16467b97STreehugger Robot set teh input stream pointer for the manufactured tokens in the 276*16467b97STreehugger Robot token factory; 277*16467b97STreehugger Robot o Rewrite streams for rewriting tree parsers did not check whether the 278*16467b97STreehugger Robot rewrite stream 279*16467b97STreehugger Robot was ever assigned before trying to free it, it is now in line with 280*16467b97STreehugger Robot the ordinary parser code. 281*16467b97STreehugger Robot 282*16467b97STreehugger RobotChange 5567 on 2009/01/29 by [email protected] 283*16467b97STreehugger Robot 284*16467b97STreehugger Robot C Runtime - Further Optimizations 285*16467b97STreehugger Robot 286*16467b97STreehugger Robot Within grammars that used scopes and were intended to parse large 287*16467b97STreehugger Robot inputs with many rule nests, 288*16467b97STreehugger Robot the creation anf deletion of the scopes themselves became significant. 289*16467b97STreehugger Robot Careful analysis shows that 290*16467b97STreehugger Robot for most grammars, while a parse could create and delete 20,000 scopes, 291*16467b97STreehugger Robot the maxium depth of 292*16467b97STreehugger Robot any scope was only 8. 293*16467b97STreehugger Robot 294*16467b97STreehugger Robot This change therefore changes the scope implementation so that it does 295*16467b97STreehugger Robot not free scope memory when 296*16467b97STreehugger Robot it is popped but just tracks it in a C runtime stack, eventually 297*16467b97STreehugger Robot freeing it when the stack is freed. This change 298*16467b97STreehugger Robot caused the allocation of only 12 scope structures instead of 20,000 for 299*16467b97STreehugger Robot the extreme example case. 300*16467b97STreehugger Robot 301*16467b97STreehugger Robot This change means that scope users must be carefule (as ever in C) to 302*16467b97STreehugger Robot initializae their scope elements 303*16467b97STreehugger Robot correctly as: 304*16467b97STreehugger Robot 305*16467b97STreehugger Robot 1) If not you may inherit values from a prior use of the scope 306*16467b97STreehugger Robot structure; 307*16467b97STreehugger Robot 2) SCope structure are now allocated with malloc and not calloc; 308*16467b97STreehugger Robot 309*16467b97STreehugger Robot Also, when using a custom free function to clean a scope when it is 310*16467b97STreehugger Robot popped, it is probably a good idea 311*16467b97STreehugger Robot to set any free'd pointers to NULL (this is generally good C programmig 312*16467b97STreehugger Robot practice in any case) 313*16467b97STreehugger Robot 314*16467b97STreehugger RobotChange 5566 on 2009/01/29 by [email protected] 315*16467b97STreehugger Robot 316*16467b97STreehugger Robot Remove redundant BACKTRACK checking so that MSVC9 does not get confused 317*16467b97STreehugger Robot about possibly uninitialized variables 318*16467b97STreehugger Robot 319*16467b97STreehugger RobotChange 5565 on 2009/01/28 by [email protected] 320*16467b97STreehugger Robot 321*16467b97STreehugger Robot Use malloc rather than calloc to allocate memory for new scopes. Note 322*16467b97STreehugger Robot that this means users will have to be careful to initialize any values in their 323*16467b97STreehugger Robot scopes that they expect to be 0 or NULL and I must document this. 324*16467b97STreehugger Robot 325*16467b97STreehugger RobotChange 5564 on 2009/01/28 by [email protected] 326*16467b97STreehugger Robot 327*16467b97STreehugger Robot Use malloc rather than calloc for copying list lable tokens for 328*16467b97STreehugger Robot rewrites. 329*16467b97STreehugger Robot 330*16467b97STreehugger RobotChange 5561 on 2009/01/28 by [email protected] 331*16467b97STreehugger Robot 332*16467b97STreehugger Robot Prevent warnigsn about retval.stop not being initialized when a rule 333*16467b97STreehugger Robot returns eraly because it is in backtracking mode 334*16467b97STreehugger Robot 335*16467b97STreehugger RobotChange 5560 on 2009/01/28 by [email protected] 336*16467b97STreehugger Robot 337*16467b97STreehugger Robot Add a NULL check before freeing rewrite streams used in AST rewrites 338*16467b97STreehugger Robot rather than auto-rewrites. 339*16467b97STreehugger Robot 340*16467b97STreehugger Robot While the NULL check is redundant as the free cannot be called unless 341*16467b97STreehugger Robot it is assigned, Visual Studio C 2008 342*16467b97STreehugger Robot gets it wrong and thinks that there is a PATH than can arrive at the 343*16467b97STreehugger Robot free wihtout it being assigned and that is too annoying to ignore. 344*16467b97STreehugger Robot 345*16467b97STreehugger RobotChange 5559 on 2009/01/28 by [email protected] 346*16467b97STreehugger Robot 347*16467b97STreehugger Robot C target Tree rewrite optimization 348*16467b97STreehugger Robot 349*16467b97STreehugger Robot There is only one optimization in this change, but it is a huge one. 350*16467b97STreehugger Robot 351*16467b97STreehugger Robot The code generation templates were set up so that at the start of a rule, 352*16467b97STreehugger Robot any rewrite streams mentioned in the rule wer pre-created. However, this 353*16467b97STreehugger Robot is a massive overhead for rules where only one or two of the streams are 354*16467b97STreehugger Robot actually used, as we create them then free them without ever using them. 355*16467b97STreehugger Robot This was copied from the Java templates basically. 356*16467b97STreehugger Robot This caused literally millions of extra calls and vector allocations 357*16467b97STreehugger Robot in the case of the GNU C parser given to me for testing with a 20,000 358*16467b97STreehugger Robot line program. 359*16467b97STreehugger Robot 360*16467b97STreehugger Robot After this change, the following comparison is avaiable against the gcc 361*16467b97STreehugger Robot compiler: 362*16467b97STreehugger Robot 363*16467b97STreehugger Robot Before (different machines here so use the relative difference for 364*16467b97STreehugger Robot comparison): 365*16467b97STreehugger Robot 366*16467b97STreehugger Robot gcc: 367*16467b97STreehugger Robot 368*16467b97STreehugger Robot real 0m0.425s 369*16467b97STreehugger Robot user 0m0.384s 370*16467b97STreehugger Robot sys 0m0.036s 371*16467b97STreehugger Robot 372*16467b97STreehugger Robot ANTLR C 373*16467b97STreehugger Robot real 0m1.958s 374*16467b97STreehugger Robot user 0m1.284s 375*16467b97STreehugger Robot sys 0m0.656s 376*16467b97STreehugger Robot 377*16467b97STreehugger Robot After the previous optimizations for vector pooling via a factory, 378*16467b97STreehugger Robot plus this huge win in removing redundant code, we have the following 379*16467b97STreehugger Robot (different machine to the one above): 380*16467b97STreehugger Robot 381*16467b97STreehugger Robot gcc: 382*16467b97STreehugger Robot 0.21user 0.01system 0:00.23elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 383*16467b97STreehugger Robot 0inputs+328outputs (0major+9922minor)pagefaults 0swaps 384*16467b97STreehugger Robot 385*16467b97STreehugger Robot ANTLR C: 386*16467b97STreehugger Robot 387*16467b97STreehugger Robot 0.37user 0.26system 0:00.64elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 388*16467b97STreehugger Robot 0inputs+0outputs (0major+130944minor)pagefaults 0swaps 389*16467b97STreehugger Robot 390*16467b97STreehugger Robot The extra system time coming from the fact that although the tree 391*16467b97STreehugger Robot rewriting is now optimal in terms of not allocating things it does 392*16467b97STreehugger Robot not need, there is still a lot more overhead in a parser that is generated 393*16467b97STreehugger Robot for generic use, including much more use of structures for tokens and extra 394*16467b97STreehugger Robot copying and so on. I will 395*16467b97STreehugger Robot continue to work on improviing things where I can, but the next big 396*16467b97STreehugger Robot improvement will come from Ter's optimization of the actual code structures we 397*16467b97STreehugger Robot generate including not doing things with rewrite streams that we do not need to 398*16467b97STreehugger Robot do at all. 399*16467b97STreehugger Robot 400*16467b97STreehugger Robot The second machine I used is about twice as fast CPU wise as the system 401*16467b97STreehugger Robot that was used originally by the user that asked about this performance. 402*16467b97STreehugger Robot 403*16467b97STreehugger RobotChange 5558 on 2009/01/28 by [email protected] 404*16467b97STreehugger Robot 405*16467b97STreehugger Robot Lots of optimizations (though the next one to be checked in is the huge 406*16467b97STreehugger Robot win) for AST building and vector factories. 407*16467b97STreehugger Robot 408*16467b97STreehugger Robot A large part of tree rewriting was the creation of vectors to hold AST 409*16467b97STreehugger Robot nodes. Although I had created a vector factory, for some reason I never got 410*16467b97STreehugger Robot around to creating a proper one, that pre-allocated the vectors in chunks and 411*16467b97STreehugger Robot so on. I guess I just forgot to. Hence a big win here is prevention of calling 412*16467b97STreehugger Robot malloc lots and lots of times to create vectors. 413*16467b97STreehugger Robot 414*16467b97STreehugger Robot A second inprovement was to change teh vector definition such that it 415*16467b97STreehugger Robot holds a certain number of elements wihtin the vector structure itself, rather 416*16467b97STreehugger Robot than malloc and freeing these. Currently this is set to 8, but may increase. 417*16467b97STreehugger Robot For AST construction, this is generally a big win because AST nodes don't often 418*16467b97STreehugger Robot have many individual children unless there has not been any shaping going on in 419*16467b97STreehugger Robot the parser. But if you are not shaping, then you don't really need a tree. 420*16467b97STreehugger Robot 421*16467b97STreehugger Robot Other perforamnce inprovements here include not calling functions 422*16467b97STreehugger Robot indirectly within token stream and common token stream. Hence tokens are 423*16467b97STreehugger Robot claimed directly from the vectors. Users can override these funcitons of course 424*16467b97STreehugger Robot and all this means is that if you override tokenstreams then you pretty much 425*16467b97STreehugger Robot have to provide all the mehtods, but then I think you woudl have to anyway (and 426*16467b97STreehugger Robot I don't know of anyone that has wanted to do this as you can carry your own 427*16467b97STreehugger Robot structure around with the tokens anyway and that is much easier). 428*16467b97STreehugger Robot 429*16467b97STreehugger RobotChange 5554 on 2009/01/26 by [email protected] 430*16467b97STreehugger Robot 431*16467b97STreehugger Robot Fixed: ANTLR-379 432*16467b97STreehugger Robot For some reason in the past, the ruleMemozation() template had required 433*16467b97STreehugger Robot that the name parameter be set to the rule name. This does not seem to be a 434*16467b97STreehugger Robot requirement any more. The name=xxx override when invoking the template was 435*16467b97STreehugger Robot causing all the scope names derived when cleaning up in memoization to be 436*16467b97STreehugger Robot called after the rule name, which was not correct. Howver, this only affected 437*16467b97STreehugger Robot the output when in output=AST mode. 438*16467b97STreehugger Robot 439*16467b97STreehugger Robot This template invocation is now corrected. 440*16467b97STreehugger Robot 441*16467b97STreehugger RobotChange 5553 on 2009/01/26 by [email protected] 442*16467b97STreehugger Robot 443*16467b97STreehugger Robot Fixed: ANTLR-330 444*16467b97STreehugger Robot Managed to get the one rule that could not see the ASTLabelType to call 445*16467b97STreehugger Robot back in to the super template C.stg and ask it to construct hte name. I am not 446*16467b97STreehugger Robot 100% sure that this fixes all cases, but I cannot find any that fail. PLease 447*16467b97STreehugger Robot let me know if you find any exampoles of being unable to default the 448*16467b97STreehugger Robot ASTLabelType option in the C target. 449*16467b97STreehugger Robot 450*16467b97STreehugger RobotChange 5552 on 2009/01/25 by [email protected] 451*16467b97STreehugger Robot 452*16467b97STreehugger Robot Progress: ANTLR-327 453*16467b97STreehugger Robot Fix debug code generation templates when output=AST such that code 454*16467b97STreehugger Robot can at least be generated and I can debug the output code correctly. 455*16467b97STreehugger Robot Note that this checkin does not implement the debugging requirements 456*16467b97STreehugger Robot for tree generating parsers. 457*16467b97STreehugger Robot 458*16467b97STreehugger RobotChange 5551 on 2009/01/25 by [email protected] 459*16467b97STreehugger Robot 460*16467b97STreehugger Robot Fixed: ANTLR-287 461*16467b97STreehugger Robot Most of the source files did not include the BSD license. THis might 462*16467b97STreehugger Robot not be that big a deal given that I don't care what people do with it 463*16467b97STreehugger Robot other than take my name off it, but having the license reproduced 464*16467b97STreehugger Robot everywhere at least makes things perfectly clear. Hence this mass change of 465*16467b97STreehugger Robot sources and templates to include the license. 466*16467b97STreehugger Robot 467*16467b97STreehugger RobotChange 5549 on 2009/01/25 by [email protected] 468*16467b97STreehugger Robot 469*16467b97STreehugger Robot Fixed: ANTLR-354 470*16467b97STreehugger Robot Using 0.0D as the default initialize value for a double caused 471*16467b97STreehugger Robot VS 2003 C compiler to bomb out. There seesm to be no reason other 472*16467b97STreehugger Robot than force of habit to set this to 0.0D so I have dropped the D so 473*16467b97STreehugger Robot that older compilers do not complain. 474*16467b97STreehugger Robot 475*16467b97STreehugger RobotChange 5547 on 2009/01/25 by [email protected] 476*16467b97STreehugger Robot 477*16467b97STreehugger Robot Fixed: ANTLR-282 478*16467b97STreehugger Robot All references are now unadorned with any type of NULL check for the 479*16467b97STreehugger Robot following reasons: 480*16467b97STreehugger Robot 481*16467b97STreehugger Robot 1) A NULL reference means that there is a problem with the 482*16467b97STreehugger Robot grammar and we need the program to fail immediately so 483*16467b97STreehugger Robot that the programmer can work out where the problem occured; 484*16467b97STreehugger Robot 2) Most of the time, the only sensible value that can be 485*16467b97STreehugger Robot returned is NULL or 0 which 486*16467b97STreehugger Robot obviates the NULL check in the first place; 487*16467b97STreehugger Robot 3) If we replace a NULL reference with some value such as 0, 488*16467b97STreehugger Robot then the program may blithely continue but just do something 489*16467b97STreehugger Robot logically wrong, which will be very difficult for the 490*16467b97STreehugger Robot grammar programmer to detect and correct. 491*16467b97STreehugger Robot 492*16467b97STreehugger RobotChange 5545 on 2009/01/24 by [email protected] 493*16467b97STreehugger Robot 494*16467b97STreehugger Robot Fixed: ANTLR-357 495*16467b97STreehugger Robot The bug report was correct in that the types of references to things 496*16467b97STreehugger Robot like $start were being incorrectly cast as they wer not changed from 497*16467b97STreehugger Robot Java style casts (and the casts are unneccessary). this is now fixed 498*16467b97STreehugger Robot and references are referencing the correct, uncast, types. 499*16467b97STreehugger Robot However, the bug report was wrong in that the reference in the bok to 500*16467b97STreehugger Robot $start.pos will only work for Java and really, it is incorrect in the 501*16467b97STreehugger Robot book because it shoudl not access the .pos member directly but shudl 502*16467b97STreehugger Robot be using $start.getCharPositionInLine(). 503*16467b97STreehugger Robot Because there is no access qualification in C, one could use 504*16467b97STreehugger Robot $start.charPosition, however 505*16467b97STreehugger Robot really this should be $start->getCharPositionInLine($start); 506*16467b97STreehugger Robot 507*16467b97STreehugger RobotChange 5541 on 2009/01/24 by [email protected] 508*16467b97STreehugger Robot 509*16467b97STreehugger Robot Fixed - ANTLR-367 510*16467b97STreehugger Robot The code generation for the free method of a recognizer was not 511*16467b97STreehugger Robot distinguishing tree parsers from parsers when it came to calling delegate free 512*16467b97STreehugger Robot functions. 513*16467b97STreehugger Robot This is now corrected. 514*16467b97STreehugger Robot 515*16467b97STreehugger RobotChange 5540 on 2009/01/24 by [email protected] 516*16467b97STreehugger Robot 517*16467b97STreehugger Robot Fixed ANTLR-355 518*16467b97STreehugger Robot Ensure that we do not attempt to free any memory that we did not 519*16467b97STreehugger Robot actually allocate because the parser rule was being executed in 520*16467b97STreehugger Robot backtracking mode. 521*16467b97STreehugger Robot 522*16467b97STreehugger RobotChange 5539 on 2009/01/24 by [email protected] 523*16467b97STreehugger Robot 524*16467b97STreehugger Robot Fixed: ANTLR-355 525*16467b97STreehugger Robot When a C targetted parser is producing in backtracking mode, then the 526*16467b97STreehugger Robot creation of new stream rewrite structures shoudl not happen if the rule is 527*16467b97STreehugger Robot currently backtracking 528*16467b97STreehugger Robot 529*16467b97STreehugger RobotChange 5502 on 2008/12/11 by [email protected] 530*16467b97STreehugger Robot 531*16467b97STreehugger Robot Fixed: ANTLR-349 Ensure that all marker labels in the lexer are 64 bit 532*16467b97STreehugger Robot compatible 533*16467b97STreehugger Robot 534*16467b97STreehugger RobotChange 5473 on 2008/12/01 by [email protected] 535*16467b97STreehugger Robot 536*16467b97STreehugger Robot Fixed: ANTLR-350 - C runtime use of memcpy 537*16467b97STreehugger Robot Prior change to use memcpy instead of memmove in all cases missed the 538*16467b97STreehugger Robot fact that the string factory can be in a situation where overlaps occur. We now 539*16467b97STreehugger Robot have ANTLR3_MEMCPY and ANTLR3_MEMMOVE and use the two appropriately. 540*16467b97STreehugger Robot 541*16467b97STreehugger RobotChange 5387 on 2008/11/05 by [email protected] 542*16467b97STreehugger Robot 543*16467b97STreehugger Robot Fixed x+=. issue with tree grammars; added unit test 544*16467b97STreehugger Robot 545*16467b97STreehugger RobotChange 5325 on 2008/10/23 by [email protected] 546*16467b97STreehugger Robot 547*16467b97STreehugger Robot We were all ref'ing backtracking==0 hardcoded instead checking the 548*16467b97STreehugger Robot @synpredgate action. 549*16467b97STreehugger Robot 550*16467b97STreehugger Robot 551