xref: /aosp_15_r20/external/antlr/runtime/C/ChangeLog (revision 16467b971bd3e2009fad32dd79016f2c7e421deb)
1*16467b97STreehugger RobotThe following changes (change numbers refer to perforce) were
2*16467b97STreehugger Robotmade from version 3.1.1 to 3.1.2
3*16467b97STreehugger Robot
4*16467b97STreehugger RobotRuntime
5*16467b97STreehugger Robot-------
6*16467b97STreehugger Robot
7*16467b97STreehugger RobotChange 5641 on 2009/02/20 by [email protected]
8*16467b97STreehugger Robot
9*16467b97STreehugger Robot	Release version 3.1.2 of the ANTLR C runtime.
10*16467b97STreehugger Robot
11*16467b97STreehugger Robot	Updated documents and release notes will have to follow later.
12*16467b97STreehugger Robot
13*16467b97STreehugger RobotChange 5639 on 2009/02/20 by [email protected]
14*16467b97STreehugger Robot
15*16467b97STreehugger Robot	Fixed: ANTLR-356
16*16467b97STreehugger Robot
17*16467b97STreehugger Robot	Ensure that code generation for C++ does not require casts
18*16467b97STreehugger Robot
19*16467b97STreehugger RobotChange 5577 on 2009/02/12 by [email protected]
20*16467b97STreehugger Robot
21*16467b97STreehugger Robot	C Runtime - Bug fixes.
22*16467b97STreehugger Robot
23*16467b97STreehugger Robot	 o Having moved to use an extract directly from a vector for returning
24*16467b97STreehugger Robot	   tokens, it exposed a
25*16467b97STreehugger Robot	   bug whereby the EOF boudary calculation in tokLT was incorrectly
26*16467b97STreehugger Robot	   checking > rather than >=.
27*16467b97STreehugger Robot	 o Changing to API initialization of tokens rather than memcmp()
28*16467b97STreehugger Robot	   incorrectly forgot to set teh input stream pointer for the
29*16467b97STreehugger Robot	   manufactured tokens in the token factory;
30*16467b97STreehugger Robot	 o Rewrite streams for rewriting tree parsers did not check whether the
31*16467b97STreehugger Robot	   rewrite stream was ever assigned before trying to free it, it is now
32*16467b97STreehugger Robot	   in line with the ordinary parser code.
33*16467b97STreehugger Robot
34*16467b97STreehugger RobotChange 5576 on 2009/02/11 by [email protected]
35*16467b97STreehugger Robot
36*16467b97STreehugger Robot	C Runtime: Ensure that when we manufacture a new token for a missing
37*16467b97STreehugger Robot	token, that the user suplied custom information (if any) is copied
38*16467b97STreehugger Robot	from the current token.
39*16467b97STreehugger Robot
40*16467b97STreehugger RobotChange 5575 on 2009/02/08 by [email protected]
41*16467b97STreehugger Robot
42*16467b97STreehugger Robot	C Runtime - Vastly improve the reuse of allocated memory for nodes in
43*16467b97STreehugger Robot	  tree rewriting.
44*16467b97STreehugger Robot
45*16467b97STreehugger Robot	A problem for all targets at the moment si that the rewrite logic
46*16467b97STreehugger Robot	generated by ANTLR makes no attempt
47*16467b97STreehugger Robot	to reuse any resources, it merely gurantees that the tree shape at the
48*16467b97STreehugger Robot	end is correct. To some extent this is mitigated by the garbage
49*16467b97STreehugger Robot	collection systems of Java and .Net, even thoguh it is still an overhead to
50*16467b97STreehugger Robot	keep creating so many modes.
51*16467b97STreehugger Robot
52*16467b97STreehugger Robot	This change implements the first of two C runtime changes that make
53*16467b97STreehugger Robot	best efforst to track when a node has become orphaned and will never
54*16467b97STreehugger Robot	be reused, based on inherent knowledge of the rewrite logic (which in
55*16467b97STreehugger Robot	the long term is not a great soloution).
56*16467b97STreehugger Robot
57*16467b97STreehugger Robot	Much of the rewrite logic consists of creating a niilnode into which
58*16467b97STreehugger Robot	child nodes are appended. At: rulePost processing time; when a rewrite
59*16467b97STreehugger Robot	stream is closed; and when becomeRoot is called, there are many situations
60*16467b97STreehugger Robot	where the root of the tree that will be manipulted, or is finished with
61*16467b97STreehugger Robot	(in the case of rewrtie streams), where the nilNode was just a temporary
62*16467b97STreehugger Robot	creation for the sake of the rewrite itself.
63*16467b97STreehugger Robot
64*16467b97STreehugger Robot	In these cases we can see that the nilNode would just be left ot rot in
65*16467b97STreehugger Robot	the node factory that tracks all the tree nodes.
66*16467b97STreehugger Robot	Rather than leave these in the factory to rot, we now keep a resuse
67*16467b97STreehugger Robot	stck and always reuse any node on this
68*16467b97STreehugger Robot	stack before claimin a new node from the factory pool.
69*16467b97STreehugger Robot
70*16467b97STreehugger Robot	This single change alone reduces memory usage in the test case (20,604
71*16467b97STreehugger Robot	line C program and a GNU C parser)
72*16467b97STreehugger Robot	from nearly a GB, to 276MB. This is still way more memory than we
73*16467b97STreehugger Robot	shoudl need to do this operation, even on such a large input file,
74*16467b97STreehugger Robot	but the reduction results in a huge performance increase and greatly
75*16467b97STreehugger Robot	reduced system time spent on allocations.
76*16467b97STreehugger Robot
77*16467b97STreehugger Robot	After this optimizatoin, comparison with gcc yeilds:
78*16467b97STreehugger Robot
79*16467b97STreehugger Robot	time gcc -S a.c
80*16467b97STreehugger Robot	a.c:1026: warning: conflicting types for built-in function ‘vsprintf’
81*16467b97STreehugger Robot	a.c:1030: warning: conflicting types for built-in function ‘vsnprintf’
82*16467b97STreehugger Robot	a.c:1041: warning: conflicting types for built-in function ‘vsscanf’
83*16467b97STreehugger Robot	0.21user 0.01system 0:00.22elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
84*16467b97STreehugger Robot	0inputs+240outputs (0major+8345minor)pagefaults 0swaps
85*16467b97STreehugger Robot
86*16467b97STreehugger Robot	and
87*16467b97STreehugger Robot
88*16467b97STreehugger Robot	time ./jimi
89*16467b97STreehugger Robot	Reading a.c
90*16467b97STreehugger Robot	0.28user 0.11system 0:00.39elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
91*16467b97STreehugger Robot	0inputs+0outputs (0major+66609minor)pagefaults 0swaps
92*16467b97STreehugger Robot
93*16467b97STreehugger Robot	And we can now interpolate the fact that the only major differnce is
94*16467b97STreehugger Robot	now the huge disparity in memory allocations. A
95*16467b97STreehugger Robot	future optimization of vector pooling, to sepate node resue from vector
96*16467b97STreehugger Robot	reuse, currently looks promising for further reuse of memory.
97*16467b97STreehugger Robot
98*16467b97STreehugger Robot	Finally, a static analysis of the rewrte code, plus a realtime analysis
99*16467b97STreehugger Robot	of the heap at runtime, may well give us a reasonable memory usage
100*16467b97STreehugger Robot	pattern. In reality though, it is the generated rewrite logic
101*16467b97STreehugger Robot	that must becom optional at not continuously rewriting things that it
102*16467b97STreehugger Robot	need not, as it ascends the rule chain.
103*16467b97STreehugger Robot
104*16467b97STreehugger RobotChange 5563 on 2009/01/28 by [email protected]
105*16467b97STreehugger Robot
106*16467b97STreehugger Robot	Allow rewrite streams to use the base adaptors vector factory and not
107*16467b97STreehugger Robot	try to malloc new vectors themselves.
108*16467b97STreehugger Robot
109*16467b97STreehugger RobotChange 5562 on 2009/01/28 by [email protected]
110*16467b97STreehugger Robot
111*16467b97STreehugger Robot	Don't use CALLOC to allocate tree pools, use malloc as there is no need
112*16467b97STreehugger Robot	for calloc.
113*16467b97STreehugger Robot
114*16467b97STreehugger RobotChange 5561 on 2009/01/28 by [email protected]
115*16467b97STreehugger Robot
116*16467b97STreehugger Robot	Prevent warnigsn about retval.stop not being initialized when a rule
117*16467b97STreehugger Robot	returns eraly because it is in backtracking mode
118*16467b97STreehugger Robot
119*16467b97STreehugger RobotChange 5558 on 2009/01/28 by [email protected]
120*16467b97STreehugger Robot
121*16467b97STreehugger Robot	Lots of optimizations (though the next one to be checked in is the huge
122*16467b97STreehugger Robot	win) for AST building and vector factories.
123*16467b97STreehugger Robot
124*16467b97STreehugger Robot	A large part of tree rewriting was the creation of vectors to hold AST
125*16467b97STreehugger Robot	nodes. Although I had created a vector factory, for some reason I never got
126*16467b97STreehugger Robot	around to creating a proper one, that pre-allocated the vectors in chunks and
127*16467b97STreehugger Robot	so on. I guess I just forgot to. Hence a big win here is prevention of calling
128*16467b97STreehugger Robot	malloc lots and lots of times to create vectors.
129*16467b97STreehugger Robot
130*16467b97STreehugger Robot	A second inprovement was to change teh vector definition such that it
131*16467b97STreehugger Robot	holds a certain number of elements wihtin the vector structure itself, rather
132*16467b97STreehugger Robot	than malloc and freeing these. Currently this is set to 8, but may increase.
133*16467b97STreehugger Robot	For AST construction, this is generally a big win because AST nodes don't often
134*16467b97STreehugger Robot	have many individual children unless there has not been any shaping going on in
135*16467b97STreehugger Robot	the parser. But if you are not shaping, then you don't really need a tree.
136*16467b97STreehugger Robot
137*16467b97STreehugger Robot	Other perforamnce inprovements here include not calling functions
138*16467b97STreehugger Robot	indirectly within token stream and common token stream. Hence tokens are
139*16467b97STreehugger Robot	claimed directly from the vectors. Users can override these funcitons of course
140*16467b97STreehugger Robot	and all this means is that if you override tokenstreams then you pretty much
141*16467b97STreehugger Robot	have to provide all the mehtods, but then I think you woudl have to anyway (and
142*16467b97STreehugger Robot	I don't know of anyone that has wanted to do this as you can carry your own
143*16467b97STreehugger Robot	structure around with the tokens anyway and that is much easier).
144*16467b97STreehugger Robot
145*16467b97STreehugger RobotChange 5555 on 2009/01/26 by [email protected]
146*16467b97STreehugger Robot
147*16467b97STreehugger Robot	Fixed: ANTLR-288
148*16467b97STreehugger Robot	Correct the interpretation of the skip token such that channel, start
149*16467b97STreehugger Robot	index, char pos in lie, start line and text are correctly reset to the start of
150*16467b97STreehugger Robot	the new token when the one that we just traversed was marked as being skipped.
151*16467b97STreehugger Robot
152*16467b97STreehugger Robot	This correctly excludes the text that was matched as part of the
153*16467b97STreehugger Robot	SKIP()ed token from the next token in the token stream and so has the side
154*16467b97STreehugger Robot	effect that asking for $text of a rule no longer includes the text that shuodl
155*16467b97STreehugger Robot	be skipped, but DOES include the text of tokens that were merely placed off the
156*16467b97STreehugger Robot	default channel.
157*16467b97STreehugger Robot
158*16467b97STreehugger RobotChange 5551 on 2009/01/25 by [email protected]
159*16467b97STreehugger Robot
160*16467b97STreehugger Robot	Fixed: ANTLR-287
161*16467b97STreehugger Robot	Most of the source files did not include the BSD license. THis might
162*16467b97STreehugger Robot	not be that big a deal given that I don't care what people do with it
163*16467b97STreehugger Robot	other than take my name off it, but having the license reproduced
164*16467b97STreehugger Robot	everywhere
165*16467b97STreehugger Robot	at least makes things perfectly clear. Hence this mass change of
166*16467b97STreehugger Robot	sources and templates
167*16467b97STreehugger Robot	to include the license.
168*16467b97STreehugger Robot
169*16467b97STreehugger RobotChange 5550 on 2009/01/25 by [email protected]
170*16467b97STreehugger Robot
171*16467b97STreehugger Robot	Fixed: ANTLR-365
172*16467b97STreehugger Robot	Ensure that as soon as we known about an input stream on the lexer that
173*16467b97STreehugger Robot	we borrow its string factroy adn use it in our EOF token in case
174*16467b97STreehugger Robot	anyone tries to make it a string, such as in error messages for
175*16467b97STreehugger Robot	instance.
176*16467b97STreehugger Robot
177*16467b97STreehugger RobotChange 5548 on 2009/01/25 by [email protected]
178*16467b97STreehugger Robot
179*16467b97STreehugger Robot	Fixed: ANTLR-363
180*16467b97STreehugger Robot        At some point the Java runtime default changed from discarding offchannel
181*16467b97STreehugger Robot        tokens to preserving them. The fix is to make the C runtime also
182*16467b97STreehugger Robot	default to preserving off-channel tokens.
183*16467b97STreehugger Robot
184*16467b97STreehugger RobotChange 5544 on 2009/01/24 by [email protected]
185*16467b97STreehugger Robot
186*16467b97STreehugger Robot	Fixed: ANTLR-360
187*16467b97STreehugger Robot	Ensure that the fillBuffer funtiion does not call any methods
188*16467b97STreehugger Robot	that require the cached buffer size to be recorded before we
189*16467b97STreehugger Robot	have actually recorded it.
190*16467b97STreehugger Robot
191*16467b97STreehugger RobotChange 5543 on 2009/01/24 by [email protected]
192*16467b97STreehugger Robot
193*16467b97STreehugger Robot	Fixed: ANTLR-362
194*16467b97STreehugger Robot	Some users have started using string factories themselves and
195*16467b97STreehugger Robot	exposed a flaw in the destroy method, that is intended to remove
196*16467b97STreehugger Robot	a strng htat was created by the factory and is no longer needed.
197*16467b97STreehugger Robot	The string was correctly removed from the vector that tracks them
198*16467b97STreehugger Robot	but after the first one, all the remaining strings are then numbered
199*16467b97STreehugger Robot	incorrectly. Hence the destroy method has been recoded to reindex
200*16467b97STreehugger Robot	the strings in the factory after one is removed and everythig is once
201*16467b97STreehugger Robot	more hunky dory.
202*16467b97STreehugger Robot	User suggested fix rejected.
203*16467b97STreehugger Robot
204*16467b97STreehugger RobotChange 5542 on 2009/01/24 by [email protected]
205*16467b97STreehugger Robot
206*16467b97STreehugger Robot	Fixed ANTLR-366
207*16467b97STreehugger Robot	The recognizer state now ensures that all fields are set to NULL upon
208*16467b97STreehugger Robotcreation
209*16467b97STreehugger Robot	and the reset does not overwrite the tokenname array
210*16467b97STreehugger Robot
211*16467b97STreehugger RobotChange 5527 on 2009/01/15 by [email protected]
212*16467b97STreehugger Robot
213*16467b97STreehugger Robot	Add the C runtime for 3.1.2 beta2 to perforce
214*16467b97STreehugger Robot
215*16467b97STreehugger RobotChange 5526 on 2009/01/15 by [email protected]
216*16467b97STreehugger Robot
217*16467b97STreehugger Robot	Correctly define the MEMMOVE macro which was inadvertently left to be
218*16467b97STreehugger Robot	memcpy.
219*16467b97STreehugger Robot
220*16467b97STreehugger RobotChange 5503 on 2008/12/12 by [email protected]
221*16467b97STreehugger Robot
222*16467b97STreehugger Robot	Change C runtime release number to 3.1.2 beta
223*16467b97STreehugger Robot
224*16467b97STreehugger RobotChange 5473 on 2008/12/01 by [email protected]
225*16467b97STreehugger Robot
226*16467b97STreehugger Robot	Fixed: ANTLR-350 - C runtime use of memcpy
227*16467b97STreehugger Robot	Prior change to use memcpy instead of memmove in all cases missed the
228*16467b97STreehugger Robot	fact that the string factory can be in a situation where overlaps occur. We now
229*16467b97STreehugger Robot	have ANTLR3_MEMCPY and ANTLR3_MEMMOVE and use the two appropriately.
230*16467b97STreehugger Robot
231*16467b97STreehugger RobotChange 5471 on 2008/12/01 by [email protected]
232*16467b97STreehugger Robot
233*16467b97STreehugger Robot	Fixed ANTLR-361
234*16467b97STreehugger Robot	 - Ensure that ANTLR3_BOOLEAN is typedef'ed correctly when building for
235*16467b97STreehugger Robot	   MingW
236*16467b97STreehugger Robot
237*16467b97STreehugger RobotTemplates
238*16467b97STreehugger Robot---------
239*16467b97STreehugger Robot
240*16467b97STreehugger RobotChange 5637 on 2009/02/20 by [email protected]
241*16467b97STreehugger Robot
242*16467b97STreehugger Robot	C rtunime - make sure that ADAPTOR results are cast to the tree type on
243*16467b97STreehugger Robot	a rewrite
244*16467b97STreehugger Robot
245*16467b97STreehugger RobotChange 5620 on 2009/02/18 by [email protected]
246*16467b97STreehugger Robot
247*16467b97STreehugger Robot	Rename/Move:
248*16467b97STreehugger Robot	From: //depot/code/antlr/main/src/org/antlr/codegen/templates/...
249*16467b97STreehugger Robot	To: //depot/code/antlr/main/src/main/resources/org/antlr/codegen/templates/...
250*16467b97STreehugger Robot
251*16467b97STreehugger Robot	Relocate the code generating templates to exist in the directory set
252*16467b97STreehugger Robot	that maven expects.
253*16467b97STreehugger Robot
254*16467b97STreehugger Robot	When checking in your templates, you may find it easiest to make a copy
255*16467b97STreehugger Robot	of what you have, revert the change in perforce, then just check out the
256*16467b97STreehugger Robot	template in the new location, and copy the changes back over. Nobody has oore
257*16467b97STreehugger Robot	than two files open at the moment.
258*16467b97STreehugger Robot
259*16467b97STreehugger RobotChange 5578 on 2009/02/12 by [email protected]
260*16467b97STreehugger Robot
261*16467b97STreehugger Robot	Correct the string template escape sequences for generating scope
262*16467b97STreehugger Robot	code in the C templates.
263*16467b97STreehugger Robot
264*16467b97STreehugger RobotChange 5577 on 2009/02/12 by [email protected]
265*16467b97STreehugger Robot
266*16467b97STreehugger Robot	C Runtime - Bug fixes.
267*16467b97STreehugger Robot
268*16467b97STreehugger Robot	 o Having moved to use an extract directly from a vector for returning
269*16467b97STreehugger Robot	    tokens, it exposed a
270*16467b97STreehugger Robot	    bug whereby the EOF boudary calculation in tokLT was incorrectly
271*16467b97STreehugger Robot	    checking > rather than
272*16467b97STreehugger Robot	    >=.
273*16467b97STreehugger Robot	 o Changing to API initialization of tokens rather than memcmp()
274*16467b97STreehugger Robot	    incorrectly forgot to
275*16467b97STreehugger Robot	    set teh input stream pointer for the manufactured tokens in the
276*16467b97STreehugger Robot	    token factory;
277*16467b97STreehugger Robot	 o Rewrite streams for rewriting tree parsers did not check whether the
278*16467b97STreehugger Robot	    rewrite stream
279*16467b97STreehugger Robot	    was ever assigned before trying to free it, it is now in line with
280*16467b97STreehugger Robot	    the ordinary parser code.
281*16467b97STreehugger Robot
282*16467b97STreehugger RobotChange 5567 on 2009/01/29 by [email protected]
283*16467b97STreehugger Robot
284*16467b97STreehugger Robot	C Runtime - Further Optimizations
285*16467b97STreehugger Robot
286*16467b97STreehugger Robot	Within grammars that used scopes and were intended to parse large
287*16467b97STreehugger Robot	inputs with many rule nests,
288*16467b97STreehugger Robot	the creation anf deletion of the scopes themselves became significant.
289*16467b97STreehugger Robot	Careful analysis shows that
290*16467b97STreehugger Robot	for most grammars, while a parse could create and delete 20,000 scopes,
291*16467b97STreehugger Robot	the maxium depth of
292*16467b97STreehugger Robot	any scope was only 8.
293*16467b97STreehugger Robot
294*16467b97STreehugger Robot	This change therefore changes the scope implementation so that it does
295*16467b97STreehugger Robot	not free scope memory when
296*16467b97STreehugger Robot	it is popped but just tracks it in a C runtime stack, eventually
297*16467b97STreehugger Robot	freeing it when the stack is freed. This change
298*16467b97STreehugger Robot	caused the allocation of only 12 scope structures instead of 20,000 for
299*16467b97STreehugger Robot	the extreme example case.
300*16467b97STreehugger Robot
301*16467b97STreehugger Robot	This change means that scope users must be carefule (as ever in C) to
302*16467b97STreehugger Robot	initializae their scope elements
303*16467b97STreehugger Robot	correctly as:
304*16467b97STreehugger Robot
305*16467b97STreehugger Robot	1) If not you may inherit values from a prior use of the scope
306*16467b97STreehugger Robot	    structure;
307*16467b97STreehugger Robot	2) SCope structure are now allocated with malloc and not calloc;
308*16467b97STreehugger Robot
309*16467b97STreehugger Robot	Also, when using a custom free function to clean a scope when it is
310*16467b97STreehugger Robot	popped, it is probably a good idea
311*16467b97STreehugger Robot	to set any free'd pointers to NULL (this is generally good C programmig
312*16467b97STreehugger Robot	practice in any case)
313*16467b97STreehugger Robot
314*16467b97STreehugger RobotChange 5566 on 2009/01/29 by [email protected]
315*16467b97STreehugger Robot
316*16467b97STreehugger Robot	Remove redundant BACKTRACK checking so that MSVC9 does not get confused
317*16467b97STreehugger Robot	about possibly uninitialized variables
318*16467b97STreehugger Robot
319*16467b97STreehugger RobotChange 5565 on 2009/01/28 by [email protected]
320*16467b97STreehugger Robot
321*16467b97STreehugger Robot	Use malloc rather than calloc to allocate memory for new scopes. Note
322*16467b97STreehugger Robot	that this means users will have to be careful to initialize any values in their
323*16467b97STreehugger Robot	scopes that they expect to be 0 or NULL and I must document this.
324*16467b97STreehugger Robot
325*16467b97STreehugger RobotChange 5564 on 2009/01/28 by [email protected]
326*16467b97STreehugger Robot
327*16467b97STreehugger Robot	Use malloc rather than calloc for copying list lable tokens for
328*16467b97STreehugger Robot	rewrites.
329*16467b97STreehugger Robot
330*16467b97STreehugger RobotChange 5561 on 2009/01/28 by [email protected]
331*16467b97STreehugger Robot
332*16467b97STreehugger Robot	Prevent warnigsn about retval.stop not being initialized when a rule
333*16467b97STreehugger Robot	returns eraly because it is in backtracking mode
334*16467b97STreehugger Robot
335*16467b97STreehugger RobotChange 5560 on 2009/01/28 by [email protected]
336*16467b97STreehugger Robot
337*16467b97STreehugger Robot	Add a NULL check before freeing rewrite streams used in AST rewrites
338*16467b97STreehugger Robot	rather than auto-rewrites.
339*16467b97STreehugger Robot
340*16467b97STreehugger Robot	While the NULL check is redundant as the free cannot be called unless
341*16467b97STreehugger Robot	it is assigned, Visual Studio C 2008
342*16467b97STreehugger Robot	gets it wrong and thinks that there is a PATH than can arrive at the
343*16467b97STreehugger Robot	free wihtout it being assigned and that is too annoying to ignore.
344*16467b97STreehugger Robot
345*16467b97STreehugger RobotChange 5559 on 2009/01/28 by [email protected]
346*16467b97STreehugger Robot
347*16467b97STreehugger Robot	C target Tree rewrite optimization
348*16467b97STreehugger Robot
349*16467b97STreehugger Robot	There is only one optimization in this change, but it is a huge one.
350*16467b97STreehugger Robot
351*16467b97STreehugger Robot	The code generation templates were set up so that at the start of a rule,
352*16467b97STreehugger Robot	any rewrite streams mentioned in the rule wer pre-created. However, this
353*16467b97STreehugger Robot	is a massive overhead for rules where only one or two of the streams are
354*16467b97STreehugger Robot	actually used, as we create them then free them without ever using them.
355*16467b97STreehugger Robot	This was copied from the Java templates basically.
356*16467b97STreehugger Robot	This caused literally millions of extra calls and vector allocations
357*16467b97STreehugger Robot	in the case of the GNU C parser given to me for testing with a 20,000
358*16467b97STreehugger Robot	line program.
359*16467b97STreehugger Robot
360*16467b97STreehugger Robot	After this change, the following comparison is avaiable against the gcc
361*16467b97STreehugger Robot	compiler:
362*16467b97STreehugger Robot
363*16467b97STreehugger Robot	Before (different machines here so use the relative difference for
364*16467b97STreehugger Robot	comparison):
365*16467b97STreehugger Robot
366*16467b97STreehugger Robot	gcc:
367*16467b97STreehugger Robot
368*16467b97STreehugger Robot	real    0m0.425s
369*16467b97STreehugger Robot	user    0m0.384s
370*16467b97STreehugger Robot	sys     0m0.036s
371*16467b97STreehugger Robot
372*16467b97STreehugger Robot	ANTLR C
373*16467b97STreehugger Robot	real    0m1.958s
374*16467b97STreehugger Robot	user    0m1.284s
375*16467b97STreehugger Robot	sys     0m0.656s
376*16467b97STreehugger Robot
377*16467b97STreehugger Robot	After the previous optimizations for vector pooling via a factory,
378*16467b97STreehugger Robot	plus this huge win in removing redundant code, we have the following
379*16467b97STreehugger Robot	(different machine to the one above):
380*16467b97STreehugger Robot
381*16467b97STreehugger Robot	gcc:
382*16467b97STreehugger Robot	0.21user 0.01system 0:00.23elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
383*16467b97STreehugger Robot	0inputs+328outputs (0major+9922minor)pagefaults 0swaps
384*16467b97STreehugger Robot
385*16467b97STreehugger Robot	ANTLR C:
386*16467b97STreehugger Robot
387*16467b97STreehugger Robot	0.37user 0.26system 0:00.64elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
388*16467b97STreehugger Robot	0inputs+0outputs (0major+130944minor)pagefaults 0swaps
389*16467b97STreehugger Robot
390*16467b97STreehugger Robot	The extra system time coming from the fact that although the tree
391*16467b97STreehugger Robot	rewriting is now optimal in terms of not allocating things it does
392*16467b97STreehugger Robot	not need, there is still a lot more overhead in a parser that is generated
393*16467b97STreehugger Robot	for generic use, including much more use of structures for tokens and extra
394*16467b97STreehugger Robot	copying and so on. I will
395*16467b97STreehugger Robot	continue to work on improviing things where I can, but the next big
396*16467b97STreehugger Robot	improvement will come from Ter's optimization of the actual code structures we
397*16467b97STreehugger Robot	generate including not doing things with rewrite streams that we do not need to
398*16467b97STreehugger Robot	do at all.
399*16467b97STreehugger Robot
400*16467b97STreehugger Robot	The second machine I used is about twice as fast CPU wise as the system
401*16467b97STreehugger Robot	that was used originally by the user that asked about this performance.
402*16467b97STreehugger Robot
403*16467b97STreehugger RobotChange 5558 on 2009/01/28 by [email protected]
404*16467b97STreehugger Robot
405*16467b97STreehugger Robot	Lots of optimizations (though the next one to be checked in is the huge
406*16467b97STreehugger Robot	win) for AST building and vector factories.
407*16467b97STreehugger Robot
408*16467b97STreehugger Robot	A large part of tree rewriting was the creation of vectors to hold AST
409*16467b97STreehugger Robot	nodes. Although I had created a vector factory, for some reason I never got
410*16467b97STreehugger Robot	around to creating a proper one, that pre-allocated the vectors in chunks and
411*16467b97STreehugger Robot	so on. I guess I just forgot to. Hence a big win here is prevention of calling
412*16467b97STreehugger Robot	malloc lots and lots of times to create vectors.
413*16467b97STreehugger Robot
414*16467b97STreehugger Robot	A second inprovement was to change teh vector definition such that it
415*16467b97STreehugger Robot	holds a certain number of elements wihtin the vector structure itself, rather
416*16467b97STreehugger Robot	than malloc and freeing these. Currently this is set to 8, but may increase.
417*16467b97STreehugger Robot	For AST construction, this is generally a big win because AST nodes don't often
418*16467b97STreehugger Robot	have many individual children unless there has not been any shaping going on in
419*16467b97STreehugger Robot	the parser. But if you are not shaping, then you don't really need a tree.
420*16467b97STreehugger Robot
421*16467b97STreehugger Robot	Other perforamnce inprovements here include not calling functions
422*16467b97STreehugger Robot	indirectly within token stream and common token stream. Hence tokens are
423*16467b97STreehugger Robot	claimed directly from the vectors. Users can override these funcitons of course
424*16467b97STreehugger Robot	and all this means is that if you override tokenstreams then you pretty much
425*16467b97STreehugger Robot	have to provide all the mehtods, but then I think you woudl have to anyway (and
426*16467b97STreehugger Robot	I don't know of anyone that has wanted to do this as you can carry your own
427*16467b97STreehugger Robot	structure around with the tokens anyway and that is much easier).
428*16467b97STreehugger Robot
429*16467b97STreehugger RobotChange 5554 on 2009/01/26 by [email protected]
430*16467b97STreehugger Robot
431*16467b97STreehugger Robot	Fixed: ANTLR-379
432*16467b97STreehugger Robot	For some reason in the past, the ruleMemozation() template had required
433*16467b97STreehugger Robot	that the name parameter be set to the rule name. This does not seem to be a
434*16467b97STreehugger Robot	requirement any more. The name=xxx override when invoking the template was
435*16467b97STreehugger Robot	causing all the scope names derived when cleaning up in memoization to be
436*16467b97STreehugger Robot	called after the rule name, which was not correct. Howver, this only affected
437*16467b97STreehugger Robot	the output when in output=AST mode.
438*16467b97STreehugger Robot
439*16467b97STreehugger Robot	This template invocation is now corrected.
440*16467b97STreehugger Robot
441*16467b97STreehugger RobotChange 5553 on 2009/01/26 by [email protected]
442*16467b97STreehugger Robot
443*16467b97STreehugger Robot	Fixed: ANTLR-330
444*16467b97STreehugger Robot	Managed to get the one rule that could not see the ASTLabelType to call
445*16467b97STreehugger Robot	back in to the super template C.stg and ask it to construct hte name. I am not
446*16467b97STreehugger Robot	100% sure that this fixes all cases, but I cannot find any that fail. PLease
447*16467b97STreehugger Robot	let me know if you find any exampoles of being unable to default the
448*16467b97STreehugger Robot	ASTLabelType option in the C target.
449*16467b97STreehugger Robot
450*16467b97STreehugger RobotChange 5552 on 2009/01/25 by [email protected]
451*16467b97STreehugger Robot
452*16467b97STreehugger Robot	Progress: ANTLR-327
453*16467b97STreehugger Robot	Fix debug code generation templates when output=AST such that code
454*16467b97STreehugger Robot	can at least be generated and I can debug the output code correctly.
455*16467b97STreehugger Robot	Note that this checkin does not implement the debugging requirements
456*16467b97STreehugger Robot	for tree generating parsers.
457*16467b97STreehugger Robot
458*16467b97STreehugger RobotChange 5551 on 2009/01/25 by [email protected]
459*16467b97STreehugger Robot
460*16467b97STreehugger Robot	Fixed: ANTLR-287
461*16467b97STreehugger Robot	Most of the source files did not include the BSD license. THis might
462*16467b97STreehugger Robot	not be that big a deal given that I don't care what people do with it
463*16467b97STreehugger Robot	other than take my name off it, but having the license reproduced
464*16467b97STreehugger Robot	everywhere at least makes things perfectly clear. Hence this mass change of
465*16467b97STreehugger Robot	sources and templates to include the license.
466*16467b97STreehugger Robot
467*16467b97STreehugger RobotChange 5549 on 2009/01/25 by [email protected]
468*16467b97STreehugger Robot
469*16467b97STreehugger Robot	Fixed: ANTLR-354
470*16467b97STreehugger Robot	Using 0.0D as the default initialize value for a double caused
471*16467b97STreehugger Robot	VS 2003 C compiler to bomb out. There seesm to be no reason other
472*16467b97STreehugger Robot	than force of habit to set this to 0.0D so I have dropped the D so
473*16467b97STreehugger Robot	that older compilers do not complain.
474*16467b97STreehugger Robot
475*16467b97STreehugger RobotChange 5547 on 2009/01/25 by [email protected]
476*16467b97STreehugger Robot
477*16467b97STreehugger Robot	Fixed: ANTLR-282
478*16467b97STreehugger Robot	All references are now unadorned with any type of NULL check for the
479*16467b97STreehugger Robot	following reasons:
480*16467b97STreehugger Robot
481*16467b97STreehugger Robot		1) A NULL reference means that there is a problem with the
482*16467b97STreehugger Robot		   grammar and we need the program to fail immediately so
483*16467b97STreehugger Robot		   that the programmer can work out where the problem occured;
484*16467b97STreehugger Robot		2) Most of the time, the only sensible value that can be
485*16467b97STreehugger Robot		   returned is NULL or 0 which
486*16467b97STreehugger Robot		   obviates the NULL check in the first place;
487*16467b97STreehugger Robot		3) If we replace a NULL reference with some value such as 0,
488*16467b97STreehugger Robot		   then the program may blithely continue but just do something
489*16467b97STreehugger Robot		   logically wrong, which will be very difficult for the
490*16467b97STreehugger Robot		   grammar programmer to detect and correct.
491*16467b97STreehugger Robot
492*16467b97STreehugger RobotChange 5545 on 2009/01/24 by [email protected]
493*16467b97STreehugger Robot
494*16467b97STreehugger Robot	Fixed: ANTLR-357
495*16467b97STreehugger Robot	The bug report was correct in that the types of references to things
496*16467b97STreehugger Robot	like $start were being incorrectly cast as they wer not changed from
497*16467b97STreehugger Robot	Java style casts (and the casts are unneccessary). this is now fixed
498*16467b97STreehugger Robot	and references are referencing the correct, uncast, types.
499*16467b97STreehugger Robot	However, the bug report was wrong in that the reference in the bok to
500*16467b97STreehugger Robot	$start.pos will only work for Java and really, it is incorrect in the
501*16467b97STreehugger Robot	book because it shoudl not access the .pos member directly but shudl
502*16467b97STreehugger Robot	be using $start.getCharPositionInLine().
503*16467b97STreehugger Robot	Because there is no access qualification in C, one could use
504*16467b97STreehugger Robot	$start.charPosition, however
505*16467b97STreehugger Robot	really this should be $start->getCharPositionInLine($start);
506*16467b97STreehugger Robot
507*16467b97STreehugger RobotChange 5541 on 2009/01/24 by [email protected]
508*16467b97STreehugger Robot
509*16467b97STreehugger Robot	Fixed - ANTLR-367
510*16467b97STreehugger Robot	The code generation for the free method of a recognizer was not
511*16467b97STreehugger Robot	distinguishing tree parsers from parsers when it came to calling delegate free
512*16467b97STreehugger Robot	functions.
513*16467b97STreehugger Robot	This is now corrected.
514*16467b97STreehugger Robot
515*16467b97STreehugger RobotChange 5540 on 2009/01/24 by [email protected]
516*16467b97STreehugger Robot
517*16467b97STreehugger Robot	Fixed ANTLR-355
518*16467b97STreehugger Robot	Ensure that we do not attempt to free any memory that we did not
519*16467b97STreehugger Robot	actually allocate because the parser rule was being executed in
520*16467b97STreehugger Robot	backtracking mode.
521*16467b97STreehugger Robot
522*16467b97STreehugger RobotChange 5539 on 2009/01/24 by [email protected]
523*16467b97STreehugger Robot
524*16467b97STreehugger Robot	Fixed: ANTLR-355
525*16467b97STreehugger Robot	When a C targetted parser is producing in backtracking mode, then the
526*16467b97STreehugger Robot	creation of new stream rewrite structures shoudl not happen if the rule is
527*16467b97STreehugger Robot	currently backtracking
528*16467b97STreehugger Robot
529*16467b97STreehugger RobotChange 5502 on 2008/12/11 by [email protected]
530*16467b97STreehugger Robot
531*16467b97STreehugger Robot	Fixed: ANTLR-349 Ensure that all marker labels in the lexer are 64 bit
532*16467b97STreehugger Robot	compatible
533*16467b97STreehugger Robot
534*16467b97STreehugger RobotChange 5473 on 2008/12/01 by [email protected]
535*16467b97STreehugger Robot
536*16467b97STreehugger Robot	Fixed: ANTLR-350 - C runtime use of memcpy
537*16467b97STreehugger Robot	Prior change to use memcpy instead of memmove in all cases missed the
538*16467b97STreehugger Robot	fact that the string factory can be in a situation where overlaps occur. We now
539*16467b97STreehugger Robot	have ANTLR3_MEMCPY and ANTLR3_MEMMOVE and use the two appropriately.
540*16467b97STreehugger Robot
541*16467b97STreehugger RobotChange 5387 on 2008/11/05 by [email protected]
542*16467b97STreehugger Robot
543*16467b97STreehugger Robot	Fixed x+=. issue with tree grammars; added unit test
544*16467b97STreehugger Robot
545*16467b97STreehugger RobotChange 5325 on 2008/10/23 by [email protected]
546*16467b97STreehugger Robot
547*16467b97STreehugger Robot	We were all ref'ing backtracking==0 hardcoded instead checking the
548*16467b97STreehugger Robot	@synpredgate action.
549*16467b97STreehugger Robot
550*16467b97STreehugger Robot
551