1[/============================================================================== 2 Copyright (C) 2001-2011 Joel de Guzman 3 Copyright (C) 2001-2011 Hartmut Kaiser 4 5 Distributed under the Boost Software License, Version 1.0. (See accompanying 6 file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) 7===============================================================================/] 8 9[section:lexer_semantic_actions Lexer Semantic Actions] 10 11The main task of a lexer normally is to recognize tokens in the input. 12Traditionally this has been complemented with the possibility to execute 13arbitrary code whenever a certain token has been detected. __lex__ has been 14designed to support this mode of operation as well. We borrow from the concept 15of semantic actions for parsers (__qi__) and generators (__karma__). Lexer 16semantic actions may be attached to any token definition. These are C++ 17functions or function objects that are called whenever a token definition 18successfully recognizes a portion of the input. Say you have a token definition 19`D`, and a C++ function `f`, you can make the lexer call `f` whenever it matches 20an input by attaching `f`: 21 22 D[f] 23 24The expression above links `f` to the token definition, `D`. The required 25prototype of `f` is: 26 27 void f (Iterator& start, Iterator& end, pass_flag& matched, Idtype& id, Context& ctx); 28 29[variablelist where: 30 [[`Iterator& start`] [This is the iterator pointing to the begin of the 31 matched range in the underlying input sequence. The 32 type of the iterator is the same as specified while 33 defining the type of the `lexertl::actor_lexer<...>` 34 (its first template parameter). The semantic action 35 is allowed to change the value of this iterator 36 influencing, the matched input sequence.]] 37 [[`Iterator& end`] [This is the iterator pointing to the end of the 38 matched range in the underlying input sequence. The 39 type of the iterator is the same as specified while 40 defining the type of the `lexertl::actor_lexer<...>` 41 (its first template parameter). The semantic action 42 is allowed to change the value of this iterator 43 influencing, the matched input sequence.]] 44 [[`pass_flag& matched`] [This value is pre/initialized to `pass_normal`. 45 If the semantic action sets it to `pass_fail` this 46 behaves as if the token has not been matched in 47 the first place. If the semantic action sets this 48 to `pass_ignore` the lexer ignores the current 49 token and tries to match a next token from the 50 input.]] 51 [[`Idtype& id`] [This is the token id of the type Idtype (most of 52 the time this will be a `std::size_t`) for the 53 matched token. The semantic action is allowed to 54 change the value of this token id, influencing the 55 if of the created token.]] 56 [[`Context& ctx`] [This is a reference to a lexer specific, 57 unspecified type, providing the context for the 58 current lexer state. It can be used to access 59 different internal data items and is needed for 60 lexer state control from inside a semantic 61 action.]] 62] 63 64When using a C++ function as the semantic action the following prototypes are 65allowed as well: 66 67 void f (Iterator& start, Iterator& end, pass_flag& matched, Idtype& id); 68 void f (Iterator& start, Iterator& end, pass_flag& matched); 69 void f (Iterator& start, Iterator& end); 70 void f (); 71 72[important In order to use lexer semantic actions you need to use type 73 `lexertl::actor_lexer<>` as your lexer class (instead of the 74 type `lexertl::lexer<>` as described in earlier examples).] 75 76[heading The context of a lexer semantic action] 77 78The last parameter passed to any lexer semantic action is a reference to an 79unspecified type (see the `Context` type in the table above). This type is 80unspecified because it depends on the token type returned by the lexer. It is 81implemented in the internals of the iterator type exposed by the lexer. 82Nevertheless, any context type is expected to expose a couple of 83functions allowing to influence the behavior of the lexer. The following table 84gives an overview and a short description of the available functionality. 85 86[table Functions exposed by any context passed to a lexer semantic action 87 [[Name] [Description]] 88 [[`Iterator const& get_eoi() const`] 89 [The function `get_eoi()` may be used by to access the end iterator of 90 the input stream the lexer has been initialized with]] 91 [[`void more()`] 92 [The function `more()` tells the lexer that the next time it matches a 93 rule, the corresponding token should be appended onto the current token 94 value rather than replacing it.]] 95 [[`Iterator const& less(Iterator const& it, int n)`] 96 [The function `less()` returns an iterator positioned to the nth input 97 character beyond the current token start iterator (i.e. by passing the 98 return value to the parameter `end` it is possible to return all but the 99 first n characters of the current token back to the input stream.]] 100 [[`bool lookahead(std::size_t id)`] 101 [The function `lookahead()` can be used to implement lookahead for lexer 102 engines not supporting constructs like flex' `a/b` 103 (match `a`, but only when followed by `b`). It invokes the lexer on the 104 input following the current token without actually moving forward in the 105 input stream. The function returns whether the lexer was able to match a 106 token with the given token-id `id`.]] 107 [[`std::size_t get_state() const` and `void set_state(std::size_t state)`] 108 [The functions `get_state()` and `set_state()` may be used to introspect 109 and change the current lexer state.]] 110 [[`token_value_type get_value() const` and `void set_value(Value const&)`] 111 [The functions `get_value()` and `set_value()` may be used to introspect 112 and change the current token value.]] 113] 114 115[heading Lexer Semantic Actions Using Phoenix] 116 117Even if it is possible to write your own function object implementations (i.e. 118using Boost.Lambda or Boost.Bind), the preferred way of defining lexer semantic 119actions is to use __phoenix__. In this case you can access the parameters 120described above by using the predefined __spirit__ placeholders: 121 122[table Predefined Phoenix placeholders for lexer semantic actions 123 [[Placeholder] [Description]] 124 [[`_start`] 125 [Refers to the iterator pointing to the beginning of the matched input 126 sequence. Any modifications to this iterator value will be reflected in 127 the generated token.]] 128 [[`_end`] 129 [Refers to the iterator pointing past the end of the matched input 130 sequence. Any modifications to this iterator value will be reflected in 131 the generated token.]] 132 [[`_pass`] 133 [References the value signaling the outcome of the semantic action. This 134 is pre-initialized to `lex::pass_flags::pass_normal`. If this is set to 135 `lex::pass_flags::pass_fail`, the lexer will behave as if no token has 136 been matched, if is set to `lex::pass_flags::pass_ignore`, the lexer will 137 ignore the current match and proceed trying to match tokens from the 138 input.]] 139 [[`_tokenid`] 140 [Refers to the token id of the token to be generated. Any modifications 141 to this value will be reflected in the generated token.]] 142 [[`_val`] 143 [Refers to the value the next token will be initialized from. Any 144 modifications to this value will be reflected in the generated token.]] 145 [[`_state`] 146 [Refers to the lexer state the input has been match in. Any modifications 147 to this value will be reflected in the lexer itself (the next match will 148 start in the new state). The currently generated token is not affected 149 by changes to this variable.]] 150 [[`_eoi`] 151 [References the end iterator of the overall lexer input. This value 152 cannot be changed.]] 153] 154 155The context object passed as the last parameter to any lexer semantic action is 156not directly accessible while using __phoenix__ expressions. We rather provide 157predefined Phoenix functions allowing to invoke the different support functions 158as mentioned above. The following table lists the available support functions 159and describes their functionality: 160 161[table Support functions usable from Phoenix expressions inside lexer semantic actions 162 [[Plain function] [Phoenix function] [Description]] 163 [[`ctx.more()`] 164 [`more()`] 165 [The function `more()` tells the lexer that the next time it matches a 166 rule, the corresponding token should be appended onto the current token 167 value rather than replacing it.]] 168 [[`ctx.less()`] 169 [`less(n)`] 170 [The function `less()` takes a single integer parameter `n` and returns an 171 iterator positioned to the nth input character beyond the current token 172 start iterator (i.e. by assigning the return value to the placeholder 173 `_end` it is possible to return all but the first `n` characters of the 174 current token back to the input stream.]] 175 [[`ctx.lookahead()`] 176 [`lookahead(std::size_t)` or `lookahead(token_def)`] 177 [The function `lookahead()` takes a single parameter specifying the token 178 to match in the input. The function can be used for instance to implement 179 lookahead for lexer engines not supporting constructs like flex' `a/b` 180 (match `a`, but only when followed by `b`). It invokes the lexer on the 181 input following the current token without actually moving forward in the 182 input stream. The function returns whether the lexer was able to match 183 the specified token.]] 184] 185 186[endsect] 187