1[/============================================================================== 2 Copyright (C) 2001-2011 Joel de Guzman 3 Copyright (C) 2001-2011 Hartmut Kaiser 4 5 Distributed under the Boost Software License, Version 1.0. (See accompanying 6 file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) 7===============================================================================/] 8 9[section:lexer_static_model The /Static/ Lexer Model] 10 11The documentation of __lex__ so far mostly was about describing the features of 12the /dynamic/ model, where the tables needed for lexical analysis are generated 13from the regular expressions at runtime. The big advantage of the dynamic model 14is its flexibility, and its integration with the __spirit__ library and the C++ 15host language. Its big disadvantage is the need to spend additional runtime to 16generate the tables, which especially might be a limitation for larger lexical 17analyzers. The /static/ model strives to build upon the smooth integration with 18__spirit__ and C++, and reuses large parts of the __lex__ library as described 19so far, while overcoming the additional runtime requirements by using 20pre-generated tables and tokenizer routines. To make the code generation as 21simple as possible, the static model reuses the token definition types developed 22for the /dynamic/ model without any changes. As will be shown in this 23section, building a code generator based on an existing token definition type 24is a matter of writing 3 lines of code. 25 26Assuming you already built a dynamic lexer for your problem, there are two more 27steps needed to create a static lexical analyzer using __lex__: 28 29# generating the C++ code for the static analyzer (including the tokenization 30 function and corresponding tables), and 31# modifying the dynamic lexical analyzer to use the generated code. 32 33Both steps are described in more detail in the two sections below (for the full 34source code used in this example see the code here: 35[@../../example/lex/static_lexer/word_count_tokens.hpp the common token definition], 36[@../../example/lex/static_lexer/word_count_generate.cpp the code generator], 37[@../../example/lex/static_lexer/word_count_static.hpp the generated code], and 38[@../../example/lex/static_lexer/word_count_static.cpp the static lexical analyzer]). 39 40[import ../example/lex/static_lexer/word_count_tokens.hpp] 41[import ../example/lex/static_lexer/word_count_static.cpp] 42[import ../example/lex/static_lexer/word_count_generate.cpp] 43 44But first we provide the code snippets needed to further understand the 45descriptions. Both, the definition of the used token identifier and the of the 46token definition class in this example are put into a separate header file to 47make these available to the code generator and the static lexical analyzer. 48 49[wc_static_tokenids] 50 51The important point here is, that the token definition class is not different 52from a similar class to be used for a dynamic lexical analyzer. The library 53has been designed in a way, that all components (dynamic lexical analyzer, code 54generator, and static lexical analyzer) can reuse the very same token definition 55syntax. 56 57[wc_static_tokendef] 58 59The only thing changing between the three different use cases is the template 60parameter used to instantiate a concrete token definition. For the dynamic 61model and the code generator you probably will use the __class_lexertl_lexer__ 62template, where for the static model you will use the 63__class_lexertl_static_lexer__ type as the template parameter. 64 65This example not only shows how to build a static lexer, but it additionally 66demonstrates how such a lexer can be used for parsing in conjunction with a 67__qi__ grammar. For completeness, we provide the simple grammar used in this 68example. As you can see, this grammar does not have any dependencies on the 69static lexical analyzer, and for this reason it is not different from a grammar 70used either without a lexer or using a dynamic lexical analyzer as described 71before. 72 73[wc_static_grammar] 74 75 76[heading Generating the Static Analyzer] 77 78The first additional step to perform in order to create a static lexical 79analyzer is to create a small stand alone program for creating the lexer tables 80and the corresponding tokenization function. For this purpose the __lex__ 81library exposes a special API - the function __api_generate_static__. It 82implements the whole code generator, no further code is needed. All what it 83takes to invoke this function is to supply a token definition instance, an 84output stream to use to generate the code to, and an optional string to be used 85as a suffix for the name of the generated function. All in all just a couple 86lines of code. 87 88[wc_static_generate_main] 89 90The shown code generator will generate output, which should be stored in a file 91for later inclusion into the static lexical analyzer as shown in the next 92topic (the full generated code can be viewed 93[@../../example/lex/static_lexer/word_count_static.hpp here]). 94 95[note The generated code will have compiled in the version number of the 96 current __lex__ library. This version number is used at compilation time 97 of your static lexer object to ensure this is compiled using exactly the 98 same version of the __lex__ library as the lexer tables have been 99 generated with. If the versions do not match you will see an compilation 100 error mentioning an `incompatible_static_lexer_version`. 101] 102 103[heading Modifying the Dynamic Analyzer] 104 105The second required step to convert an existing dynamic lexer into a static one 106is to change your main program at two places. First, you need to change the 107type of the used lexer (that is the template parameter used while instantiating 108your token definition class). While in the dynamic model we have been using the 109__class_lexertl_lexer__ template, we now need to change that to the 110__class_lexertl_static_lexer__ type. The second change is tightly related to 111the first one and involves correcting the corresponding `#include` statement to: 112 113[wc_static_include] 114 115Otherwise the main program is not different from an equivalent program using 116the dynamic model. This feature makes it easy to develop the lexer in dynamic 117mode and to switch to the static mode after the code has been stabilized. 118The simple generator application shown above enables the integration of the 119code generator into any existing build process. The following code snippet 120provides the overall main function, highlighting the code to be changed. 121 122[wc_static_main] 123 124[important The generated code for the static lexer contains the token ids as 125 they have been assigned, either explicitly by the programmer or 126 implicitly during lexer construction. It is your responsibility 127 to make sure that all instances of a particular static lexer 128 type use exactly the same token ids. The constructor of the lexer 129 object has a second (default) parameter allowing it to designate a 130 starting token id to be used while assigning the ids to the token 131 definitions. The requirement above is fulfilled by default 132 as long as no `first_id` is specified during construction of the 133 static lexer instances. 134] 135 136 137[endsect] 138