1*61c4878aSAndroid Build Coastguard Worker:tocdepth: 3 2*61c4878aSAndroid Build Coastguard Worker 3*61c4878aSAndroid Build Coastguard Worker.. _module-pw_tokenizer-tokenization: 4*61c4878aSAndroid Build Coastguard Worker 5*61c4878aSAndroid Build Coastguard Worker============ 6*61c4878aSAndroid Build Coastguard WorkerTokenization 7*61c4878aSAndroid Build Coastguard Worker============ 8*61c4878aSAndroid Build Coastguard Worker.. pigweed-module-subpage:: 9*61c4878aSAndroid Build Coastguard Worker :name: pw_tokenizer 10*61c4878aSAndroid Build Coastguard Worker 11*61c4878aSAndroid Build Coastguard WorkerTokenization converts a string literal to a token. If it's a printf-style 12*61c4878aSAndroid Build Coastguard Workerstring, its arguments are encoded along with it. The results of tokenization can 13*61c4878aSAndroid Build Coastguard Workerbe sent off device or stored in place of a full string. 14*61c4878aSAndroid Build Coastguard Worker 15*61c4878aSAndroid Build Coastguard Worker-------- 16*61c4878aSAndroid Build Coastguard WorkerConcepts 17*61c4878aSAndroid Build Coastguard Worker-------- 18*61c4878aSAndroid Build Coastguard WorkerSee :ref:`module-pw_tokenizer-get-started-overview` for a high-level 19*61c4878aSAndroid Build Coastguard Workerexplanation of how ``pw_tokenizer`` works. 20*61c4878aSAndroid Build Coastguard Worker 21*61c4878aSAndroid Build Coastguard WorkerToken generation: fixed length hashing at compile time 22*61c4878aSAndroid Build Coastguard Worker====================================================== 23*61c4878aSAndroid Build Coastguard WorkerString tokens are generated using a modified version of the x65599 hash used by 24*61c4878aSAndroid Build Coastguard Workerthe SDBM project. All hashing is done at compile time. 25*61c4878aSAndroid Build Coastguard Worker 26*61c4878aSAndroid Build Coastguard WorkerIn C code, strings are hashed with a preprocessor macro. For compatibility with 27*61c4878aSAndroid Build Coastguard Workermacros, the hash must be limited to a fixed maximum number of characters. This 28*61c4878aSAndroid Build Coastguard Workervalue is set by ``PW_TOKENIZER_CFG_C_HASH_LENGTH``. Increasing 29*61c4878aSAndroid Build Coastguard Worker``PW_TOKENIZER_CFG_C_HASH_LENGTH`` increases the compilation time for C due to 30*61c4878aSAndroid Build Coastguard Workerthe complexity of the hashing macros. 31*61c4878aSAndroid Build Coastguard Worker 32*61c4878aSAndroid Build Coastguard WorkerC++ macros use a constexpr function instead of a macro. This function works with 33*61c4878aSAndroid Build Coastguard Workerany length of string and has lower compilation time impact than the C macros. 34*61c4878aSAndroid Build Coastguard WorkerFor consistency, C++ tokenization uses the same hash algorithm, but the 35*61c4878aSAndroid Build Coastguard Workercalculated values will differ between C and C++ for strings longer than 36*61c4878aSAndroid Build Coastguard Worker``PW_TOKENIZER_CFG_C_HASH_LENGTH`` characters. 37*61c4878aSAndroid Build Coastguard Worker 38*61c4878aSAndroid Build Coastguard WorkerToken encoding 39*61c4878aSAndroid Build Coastguard Worker============== 40*61c4878aSAndroid Build Coastguard WorkerThe token is a 32-bit hash calculated during compilation. The string is encoded 41*61c4878aSAndroid Build Coastguard Workerlittle-endian with the token followed by arguments, if any. For example, the 42*61c4878aSAndroid Build Coastguard Worker31-byte string ``You can go about your business.`` hashes to 0xdac9a244. 43*61c4878aSAndroid Build Coastguard WorkerThis is encoded as 4 bytes: ``44 a2 c9 da``. 44*61c4878aSAndroid Build Coastguard Worker 45*61c4878aSAndroid Build Coastguard WorkerArguments are encoded as follows: 46*61c4878aSAndroid Build Coastguard Worker 47*61c4878aSAndroid Build Coastguard Worker* **Integers** (1--10 bytes) -- 48*61c4878aSAndroid Build Coastguard Worker `ZagZag and varint encoded <https://developers.google.com/protocol-buffers/docs/encoding#signed-ints>`_, 49*61c4878aSAndroid Build Coastguard Worker similarly to Protocol Buffers. Smaller values take fewer bytes. 50*61c4878aSAndroid Build Coastguard Worker* **Floating point numbers** (4 bytes) -- Single precision floating point. 51*61c4878aSAndroid Build Coastguard Worker* **Strings** (1--128 bytes) -- Length byte followed by the string contents. 52*61c4878aSAndroid Build Coastguard Worker The top bit of the length whether the string was truncated or not. The 53*61c4878aSAndroid Build Coastguard Worker remaining 7 bits encode the string length, with a maximum of 127 bytes. 54*61c4878aSAndroid Build Coastguard Worker 55*61c4878aSAndroid Build Coastguard Worker.. TODO(hepler): insert diagram here! 56*61c4878aSAndroid Build Coastguard Worker 57*61c4878aSAndroid Build Coastguard Worker.. tip:: 58*61c4878aSAndroid Build Coastguard Worker ``%s`` arguments can quickly fill a tokenization buffer. Keep ``%s`` 59*61c4878aSAndroid Build Coastguard Worker arguments short or avoid encoding them as strings (e.g. encode an enum as an 60*61c4878aSAndroid Build Coastguard Worker integer instead of a string). See also 61*61c4878aSAndroid Build Coastguard Worker :ref:`module-pw_tokenizer-nested-arguments`. 62*61c4878aSAndroid Build Coastguard Worker 63*61c4878aSAndroid Build Coastguard Worker.. _module-pw_tokenizer-proto: 64*61c4878aSAndroid Build Coastguard Worker 65*61c4878aSAndroid Build Coastguard WorkerTokenized fields in protocol buffers 66*61c4878aSAndroid Build Coastguard Worker==================================== 67*61c4878aSAndroid Build Coastguard WorkerText may be represented in a few different ways: 68*61c4878aSAndroid Build Coastguard Worker 69*61c4878aSAndroid Build Coastguard Worker- Plain ASCII or UTF-8 text (``This is plain text``) 70*61c4878aSAndroid Build Coastguard Worker- Base64-encoded tokenized message (``$ibafcA==``) 71*61c4878aSAndroid Build Coastguard Worker- Binary-encoded tokenized message (``89 b6 9f 70``) 72*61c4878aSAndroid Build Coastguard Worker- Little-endian 32-bit integer token (``0x709fb689``) 73*61c4878aSAndroid Build Coastguard Worker 74*61c4878aSAndroid Build Coastguard Worker``pw_tokenizer`` provides the ``pw.tokenizer.format`` protobuf field option. 75*61c4878aSAndroid Build Coastguard WorkerThis option may be applied to a protobuf field to indicate that it may contain a 76*61c4878aSAndroid Build Coastguard Workertokenized string. A string that is optionally tokenized is represented with a 77*61c4878aSAndroid Build Coastguard Workersingle ``bytes`` field annotated with ``(pw.tokenizer.format) = 78*61c4878aSAndroid Build Coastguard WorkerTOKENIZATION_OPTIONAL``. 79*61c4878aSAndroid Build Coastguard Worker 80*61c4878aSAndroid Build Coastguard WorkerFor example, the following protobuf has one field that may contain a tokenized 81*61c4878aSAndroid Build Coastguard Workerstring. 82*61c4878aSAndroid Build Coastguard Worker 83*61c4878aSAndroid Build Coastguard Worker.. code-block:: protobuf 84*61c4878aSAndroid Build Coastguard Worker 85*61c4878aSAndroid Build Coastguard Worker import "pw_tokenizer_proto/options.proto"; 86*61c4878aSAndroid Build Coastguard Worker 87*61c4878aSAndroid Build Coastguard Worker message MessageWithOptionallyTokenizedField { 88*61c4878aSAndroid Build Coastguard Worker bytes just_bytes = 1; 89*61c4878aSAndroid Build Coastguard Worker bytes maybe_tokenized = 2 [(pw.tokenizer.format) = TOKENIZATION_OPTIONAL]; 90*61c4878aSAndroid Build Coastguard Worker string just_text = 3; 91*61c4878aSAndroid Build Coastguard Worker } 92*61c4878aSAndroid Build Coastguard Worker 93*61c4878aSAndroid Build Coastguard Worker----------------------- 94*61c4878aSAndroid Build Coastguard WorkerTokenization in C++ / C 95*61c4878aSAndroid Build Coastguard Worker----------------------- 96*61c4878aSAndroid Build Coastguard WorkerTo tokenize a string, include ``pw_tokenizer/tokenize.h`` and invoke one of the 97*61c4878aSAndroid Build Coastguard Worker``PW_TOKENIZE_*`` macros. 98*61c4878aSAndroid Build Coastguard Worker 99*61c4878aSAndroid Build Coastguard WorkerTokenize string literals outside of expressions 100*61c4878aSAndroid Build Coastguard Worker=============================================== 101*61c4878aSAndroid Build Coastguard Worker``pw_tokenizer`` provides macros for tokenizing string literals with no 102*61c4878aSAndroid Build Coastguard Workerarguments: 103*61c4878aSAndroid Build Coastguard Worker 104*61c4878aSAndroid Build Coastguard Worker* :c:macro:`PW_TOKENIZE_STRING` 105*61c4878aSAndroid Build Coastguard Worker* :c:macro:`PW_TOKENIZE_STRING_DOMAIN` 106*61c4878aSAndroid Build Coastguard Worker* :c:macro:`PW_TOKENIZE_STRING_MASK` 107*61c4878aSAndroid Build Coastguard Worker 108*61c4878aSAndroid Build Coastguard WorkerThe tokenization macros above cannot be used inside other expressions. 109*61c4878aSAndroid Build Coastguard Worker 110*61c4878aSAndroid Build Coastguard Worker.. admonition:: **Yes**: Assign :c:macro:`PW_TOKENIZE_STRING` to a ``constexpr`` variable. 111*61c4878aSAndroid Build Coastguard Worker :class: checkmark 112*61c4878aSAndroid Build Coastguard Worker 113*61c4878aSAndroid Build Coastguard Worker .. code-block:: cpp 114*61c4878aSAndroid Build Coastguard Worker 115*61c4878aSAndroid Build Coastguard Worker constexpr uint32_t kGlobalToken = PW_TOKENIZE_STRING("Wowee Zowee!"); 116*61c4878aSAndroid Build Coastguard Worker 117*61c4878aSAndroid Build Coastguard Worker void Function() { 118*61c4878aSAndroid Build Coastguard Worker constexpr uint32_t local_token = PW_TOKENIZE_STRING("Wowee Zowee?"); 119*61c4878aSAndroid Build Coastguard Worker } 120*61c4878aSAndroid Build Coastguard Worker 121*61c4878aSAndroid Build Coastguard Worker.. admonition:: **No**: Use :c:macro:`PW_TOKENIZE_STRING` in another expression. 122*61c4878aSAndroid Build Coastguard Worker :class: error 123*61c4878aSAndroid Build Coastguard Worker 124*61c4878aSAndroid Build Coastguard Worker .. code-block:: cpp 125*61c4878aSAndroid Build Coastguard Worker 126*61c4878aSAndroid Build Coastguard Worker void BadExample() { 127*61c4878aSAndroid Build Coastguard Worker ProcessToken(PW_TOKENIZE_STRING("This won't compile!")); 128*61c4878aSAndroid Build Coastguard Worker } 129*61c4878aSAndroid Build Coastguard Worker 130*61c4878aSAndroid Build Coastguard Worker Use :c:macro:`PW_TOKENIZE_STRING_EXPR` instead. 131*61c4878aSAndroid Build Coastguard Worker 132*61c4878aSAndroid Build Coastguard WorkerTokenize inside expressions 133*61c4878aSAndroid Build Coastguard Worker=========================== 134*61c4878aSAndroid Build Coastguard WorkerAn alternate set of macros are provided for use inside expressions. These make 135*61c4878aSAndroid Build Coastguard Workeruse of lambda functions, so while they can be used inside expressions, they 136*61c4878aSAndroid Build Coastguard Workerrequire C++ and cannot be assigned to constexpr variables or be used with 137*61c4878aSAndroid Build Coastguard Workerspecial function variables like ``__func__``. 138*61c4878aSAndroid Build Coastguard Worker 139*61c4878aSAndroid Build Coastguard Worker* :c:macro:`PW_TOKENIZE_STRING_EXPR` 140*61c4878aSAndroid Build Coastguard Worker* :c:macro:`PW_TOKENIZE_STRING_DOMAIN_EXPR` 141*61c4878aSAndroid Build Coastguard Worker* :c:macro:`PW_TOKENIZE_STRING_MASK_EXPR` 142*61c4878aSAndroid Build Coastguard Worker 143*61c4878aSAndroid Build Coastguard Worker.. admonition:: When to use these macros 144*61c4878aSAndroid Build Coastguard Worker 145*61c4878aSAndroid Build Coastguard Worker Use :c:macro:`PW_TOKENIZE_STRING` and related macros to tokenize string 146*61c4878aSAndroid Build Coastguard Worker literals that do not need %-style arguments encoded. 147*61c4878aSAndroid Build Coastguard Worker 148*61c4878aSAndroid Build Coastguard Worker.. admonition:: **Yes**: Use :c:macro:`PW_TOKENIZE_STRING_EXPR` within other expressions. 149*61c4878aSAndroid Build Coastguard Worker :class: checkmark 150*61c4878aSAndroid Build Coastguard Worker 151*61c4878aSAndroid Build Coastguard Worker .. code-block:: cpp 152*61c4878aSAndroid Build Coastguard Worker 153*61c4878aSAndroid Build Coastguard Worker void GoodExample() { 154*61c4878aSAndroid Build Coastguard Worker ProcessToken(PW_TOKENIZE_STRING_EXPR("This will compile!")); 155*61c4878aSAndroid Build Coastguard Worker } 156*61c4878aSAndroid Build Coastguard Worker 157*61c4878aSAndroid Build Coastguard Worker.. admonition:: **No**: Assign :c:macro:`PW_TOKENIZE_STRING_EXPR` to a ``constexpr`` variable. 158*61c4878aSAndroid Build Coastguard Worker :class: error 159*61c4878aSAndroid Build Coastguard Worker 160*61c4878aSAndroid Build Coastguard Worker .. code-block:: cpp 161*61c4878aSAndroid Build Coastguard Worker 162*61c4878aSAndroid Build Coastguard Worker constexpr uint32_t wont_work = PW_TOKENIZE_STRING_EXPR("This won't compile!")); 163*61c4878aSAndroid Build Coastguard Worker 164*61c4878aSAndroid Build Coastguard Worker Instead, use :c:macro:`PW_TOKENIZE_STRING` to assign to a ``constexpr`` variable. 165*61c4878aSAndroid Build Coastguard Worker 166*61c4878aSAndroid Build Coastguard Worker.. admonition:: **No**: Tokenize ``__func__`` in :c:macro:`PW_TOKENIZE_STRING_EXPR`. 167*61c4878aSAndroid Build Coastguard Worker :class: error 168*61c4878aSAndroid Build Coastguard Worker 169*61c4878aSAndroid Build Coastguard Worker .. code-block:: cpp 170*61c4878aSAndroid Build Coastguard Worker 171*61c4878aSAndroid Build Coastguard Worker void BadExample() { 172*61c4878aSAndroid Build Coastguard Worker // This compiles, but __func__ will not be the outer function's name, and 173*61c4878aSAndroid Build Coastguard Worker // there may be compiler warnings. 174*61c4878aSAndroid Build Coastguard Worker constexpr uint32_t wont_work = PW_TOKENIZE_STRING_EXPR(__func__); 175*61c4878aSAndroid Build Coastguard Worker } 176*61c4878aSAndroid Build Coastguard Worker 177*61c4878aSAndroid Build Coastguard Worker Instead, use :c:macro:`PW_TOKENIZE_STRING` to tokenize ``__func__`` or similar macros. 178*61c4878aSAndroid Build Coastguard Worker 179*61c4878aSAndroid Build Coastguard WorkerTokenize a message with arguments to a buffer 180*61c4878aSAndroid Build Coastguard Worker============================================= 181*61c4878aSAndroid Build Coastguard Worker* :c:macro:`PW_TOKENIZE_TO_BUFFER` 182*61c4878aSAndroid Build Coastguard Worker* :c:macro:`PW_TOKENIZE_TO_BUFFER_DOMAIN` 183*61c4878aSAndroid Build Coastguard Worker* :c:macro:`PW_TOKENIZE_TO_BUFFER_MASK` 184*61c4878aSAndroid Build Coastguard Worker 185*61c4878aSAndroid Build Coastguard Worker.. admonition:: Why use this macro 186*61c4878aSAndroid Build Coastguard Worker 187*61c4878aSAndroid Build Coastguard Worker - Encode a tokenized message for consumption within a function. 188*61c4878aSAndroid Build Coastguard Worker - Encode a tokenized message into an existing buffer. 189*61c4878aSAndroid Build Coastguard Worker 190*61c4878aSAndroid Build Coastguard Worker Avoid using ``PW_TOKENIZE_TO_BUFFER`` in widely expanded macros, such as a 191*61c4878aSAndroid Build Coastguard Worker logging macro, because it will result in larger code size than passing the 192*61c4878aSAndroid Build Coastguard Worker tokenized data to a function. 193*61c4878aSAndroid Build Coastguard Worker 194*61c4878aSAndroid Build Coastguard Worker.. _module-pw_tokenizer-nested-arguments: 195*61c4878aSAndroid Build Coastguard Worker 196*61c4878aSAndroid Build Coastguard WorkerTokenize nested arguments 197*61c4878aSAndroid Build Coastguard Worker========================= 198*61c4878aSAndroid Build Coastguard WorkerEncoding ``%s`` string arguments is inefficient, since ``%s`` strings are 199*61c4878aSAndroid Build Coastguard Workerencoded 1:1, with no tokenization. Tokens can therefore be used to replace 200*61c4878aSAndroid Build Coastguard Workerstring arguments to tokenized format strings. 201*61c4878aSAndroid Build Coastguard Worker 202*61c4878aSAndroid Build Coastguard Worker* :c:macro:`PW_TOKEN_FMT` 203*61c4878aSAndroid Build Coastguard Worker 204*61c4878aSAndroid Build Coastguard Worker.. admonition:: Logging nested tokens 205*61c4878aSAndroid Build Coastguard Worker 206*61c4878aSAndroid Build Coastguard Worker Users will typically interact with nested token arguments during logging. 207*61c4878aSAndroid Build Coastguard Worker In this case there is a slightly different interface described by 208*61c4878aSAndroid Build Coastguard Worker :ref:`module-pw_log-tokenized-args` that does not generally invoke 209*61c4878aSAndroid Build Coastguard Worker ``PW_TOKEN_FMT`` directly. 210*61c4878aSAndroid Build Coastguard Worker 211*61c4878aSAndroid Build Coastguard WorkerThe format specifier for a token is given by PRI-style macro ``PW_TOKEN_FMT()``, 212*61c4878aSAndroid Build Coastguard Workerwhich is concatenated to the rest of the format string by the C preprocessor. 213*61c4878aSAndroid Build Coastguard Worker 214*61c4878aSAndroid Build Coastguard Worker.. code-block:: cpp 215*61c4878aSAndroid Build Coastguard Worker 216*61c4878aSAndroid Build Coastguard Worker PW_TOKENIZE_FORMAT_STRING("margarine_domain", 217*61c4878aSAndroid Build Coastguard Worker UINT32_MAX, 218*61c4878aSAndroid Build Coastguard Worker "I can't believe it's not " PW_TOKEN_FMT() "!", 219*61c4878aSAndroid Build Coastguard Worker PW_TOKENIZE_STRING_EXPR("butter")); 220*61c4878aSAndroid Build Coastguard Worker 221*61c4878aSAndroid Build Coastguard WorkerThis feature is currently only supported by the Python detokenizer. 222*61c4878aSAndroid Build Coastguard Worker 223*61c4878aSAndroid Build Coastguard WorkerNested token format 224*61c4878aSAndroid Build Coastguard Worker------------------- 225*61c4878aSAndroid Build Coastguard WorkerNested tokens have the following format within strings: 226*61c4878aSAndroid Build Coastguard Worker 227*61c4878aSAndroid Build Coastguard Worker.. code-block:: 228*61c4878aSAndroid Build Coastguard Worker 229*61c4878aSAndroid Build Coastguard Worker $[{DOMAIN}][BASE#]TOKEN 230*61c4878aSAndroid Build Coastguard Worker 231*61c4878aSAndroid Build Coastguard WorkerThe ``$`` is a common prefix required for all nested tokens. It is possible to 232*61c4878aSAndroid Build Coastguard Workerconfigure a different common prefix if necessary, but using the default ``$`` 233*61c4878aSAndroid Build Coastguard Workercharacter is strongly recommended. 234*61c4878aSAndroid Build Coastguard Worker 235*61c4878aSAndroid Build Coastguard WorkerThe optional ``DOMAIN`` specifies the token domain. If this option is omitted, 236*61c4878aSAndroid Build Coastguard Workerthe default (empty) domain is assumed. 237*61c4878aSAndroid Build Coastguard Worker 238*61c4878aSAndroid Build Coastguard WorkerThe optional ``BASE`` defines the numeric base encoding of the token. Accepted 239*61c4878aSAndroid Build Coastguard Workervalues are 8, 10, 16, and 64. If the hash symbol ``#`` is used without 240*61c4878aSAndroid Build Coastguard Workerspecifying a number, the base is assumed to be 16. If the base option is 241*61c4878aSAndroid Build Coastguard Workeromitted entirely, the base defaults to 64 for backward compatibility. All 242*61c4878aSAndroid Build Coastguard Workerencodings except Base64 are not case sensitive. This may be expanded to support 243*61c4878aSAndroid Build Coastguard Workerother bases in the future. 244*61c4878aSAndroid Build Coastguard Worker 245*61c4878aSAndroid Build Coastguard WorkerNon-Base64 tokens are encoded strictly as 32-bit integers with padding. 246*61c4878aSAndroid Build Coastguard WorkerBase64 data may additionally encode string arguments for the detokenized token, 247*61c4878aSAndroid Build Coastguard Workerand therefore does not have a maximum width. 248*61c4878aSAndroid Build Coastguard Worker 249*61c4878aSAndroid Build Coastguard WorkerThe meaning of ``TOKEN`` depends on the current phase of transformation for the 250*61c4878aSAndroid Build Coastguard Workercurrent tokenized format string. Within the format string's entry in the token 251*61c4878aSAndroid Build Coastguard Workerdatabase, when the actual value of the token argument is not known, ``TOKEN`` is 252*61c4878aSAndroid Build Coastguard Workera printf argument specifier (e.g. ``%08x`` for a base-16 token with correct 253*61c4878aSAndroid Build Coastguard Workerpadding). The actual tokens that will be used as arguments have separate 254*61c4878aSAndroid Build Coastguard Workerentries in the token database. 255*61c4878aSAndroid Build Coastguard Worker 256*61c4878aSAndroid Build Coastguard WorkerAfter the top-level format string has been detokenized and formatted, ``TOKEN`` 257*61c4878aSAndroid Build Coastguard Workershould be the value of the token argument in the specified base, with any 258*61c4878aSAndroid Build Coastguard Workernecessary padding. This is the final format of a nested token if it cannot be 259*61c4878aSAndroid Build Coastguard Workertokenized. 260*61c4878aSAndroid Build Coastguard Worker 261*61c4878aSAndroid Build Coastguard Worker.. list-table:: Example tokens 262*61c4878aSAndroid Build Coastguard Worker :widths: 10 25 25 263*61c4878aSAndroid Build Coastguard Worker 264*61c4878aSAndroid Build Coastguard Worker * - Base 265*61c4878aSAndroid Build Coastguard Worker - | Token database 266*61c4878aSAndroid Build Coastguard Worker | (within format string entry) 267*61c4878aSAndroid Build Coastguard Worker - Partially detokenized 268*61c4878aSAndroid Build Coastguard Worker * - 10 269*61c4878aSAndroid Build Coastguard Worker - ``$10#%010d`` 270*61c4878aSAndroid Build Coastguard Worker - ``$10#0086025943`` 271*61c4878aSAndroid Build Coastguard Worker * - 16 272*61c4878aSAndroid Build Coastguard Worker - ``$#%08x`` 273*61c4878aSAndroid Build Coastguard Worker - ``$#0000001A`` 274*61c4878aSAndroid Build Coastguard Worker * - 64 275*61c4878aSAndroid Build Coastguard Worker - ``%s`` 276*61c4878aSAndroid Build Coastguard Worker - ``$QA19pfEQ`` 277*61c4878aSAndroid Build Coastguard Worker 278*61c4878aSAndroid Build Coastguard Worker.. _module-pw_tokenizer-custom-macro: 279*61c4878aSAndroid Build Coastguard Worker 280*61c4878aSAndroid Build Coastguard WorkerTokenizing enums 281*61c4878aSAndroid Build Coastguard Worker================ 282*61c4878aSAndroid Build Coastguard WorkerLogging enums is one common special case where tokenization is particularly 283*61c4878aSAndroid Build Coastguard Workerappropriate: enum values are conceptually already tokens mapping to their 284*61c4878aSAndroid Build Coastguard Workernames, assuming no duplicate values. 285*61c4878aSAndroid Build Coastguard Worker 286*61c4878aSAndroid Build Coastguard Worker:c:macro:`PW_TOKENIZE_ENUM` will take in a fully qualified enum name along with all 287*61c4878aSAndroid Build Coastguard Workerof the associated enum values. This macro will create database entries that 288*61c4878aSAndroid Build Coastguard Workerinclude the domain name (fully qualified enum name), enum value, and a tokenized 289*61c4878aSAndroid Build Coastguard Workerform of the enum value. 290*61c4878aSAndroid Build Coastguard Worker 291*61c4878aSAndroid Build Coastguard WorkerThe macro also supports returing the string version of the enum value in the 292*61c4878aSAndroid Build Coastguard Workercase that there is a non-tokenizing backend, using 293*61c4878aSAndroid Build Coastguard Worker:cpp:func:`pw::tokenizer::EnumToString`. 294*61c4878aSAndroid Build Coastguard Worker 295*61c4878aSAndroid Build Coastguard WorkerAll enum values in the enum declaration must be present in the macro, and the 296*61c4878aSAndroid Build Coastguard Workermacro must be in the same namespace as the enum to be able to use the 297*61c4878aSAndroid Build Coastguard Worker:cpp:func:`pw::tokenizer::EnumToString` function and avoid compiler errors. 298*61c4878aSAndroid Build Coastguard Worker 299*61c4878aSAndroid Build Coastguard Worker.. literalinclude: enum_test.cc 300*61c4878aSAndroid Build Coastguard Worker :language: cpp 301*61c4878aSAndroid Build Coastguard Worker :start-after: [pw_tokenizer-examples-enum] 302*61c4878aSAndroid Build Coastguard Worker :end-before: [pw_tokenizer-examples-enum] 303*61c4878aSAndroid Build Coastguard Worker 304*61c4878aSAndroid Build Coastguard Worker:c:macro:`PW_TOKENIZE_ENUM_CUSTOM` is an alternative version of 305*61c4878aSAndroid Build Coastguard Worker:c:macro:`PW_TOKENIZE_ENUM` to tokenized a custom strings instead of a 306*61c4878aSAndroid Build Coastguard Workerstringified form of the enum value name. It will take in a fully qualified enum 307*61c4878aSAndroid Build Coastguard Workername along with all the associated enum values and custom string for these 308*61c4878aSAndroid Build Coastguard Workervalues. This macro will create database entries that include the domain name 309*61c4878aSAndroid Build Coastguard Worker(fully qualified enum name), enum value, and a tokenized form of the custom 310*61c4878aSAndroid Build Coastguard Workerstring for the enum value. 311*61c4878aSAndroid Build Coastguard Worker 312*61c4878aSAndroid Build Coastguard Worker.. literalinclude: enum_test.cc 313*61c4878aSAndroid Build Coastguard Worker :language: cpp 314*61c4878aSAndroid Build Coastguard Worker :start-after: [pw_tokenizer-examples-enum-custom] 315*61c4878aSAndroid Build Coastguard Worker :end-before: [pw_tokenizer-examples-enum-custom] 316*61c4878aSAndroid Build Coastguard Worker 317*61c4878aSAndroid Build Coastguard WorkerTokenize a message with arguments in a custom macro 318*61c4878aSAndroid Build Coastguard Worker=================================================== 319*61c4878aSAndroid Build Coastguard WorkerProjects can leverage the tokenization machinery in whichever way best suits 320*61c4878aSAndroid Build Coastguard Workertheir needs. The most efficient way to use ``pw_tokenizer`` is to pass tokenized 321*61c4878aSAndroid Build Coastguard Workerdata to a global handler function. A project's custom tokenization macro can 322*61c4878aSAndroid Build Coastguard Workerhandle tokenized data in a function of their choosing. The function may accept 323*61c4878aSAndroid Build Coastguard Workerany arguments, but its final arguments must be: 324*61c4878aSAndroid Build Coastguard Worker 325*61c4878aSAndroid Build Coastguard Worker* The 32-bit token (:cpp:type:`pw_tokenizer_Token`) 326*61c4878aSAndroid Build Coastguard Worker* The argument types (:cpp:type:`pw_tokenizer_ArgTypes`) 327*61c4878aSAndroid Build Coastguard Worker* Variadic arguments, if any 328*61c4878aSAndroid Build Coastguard Worker 329*61c4878aSAndroid Build Coastguard Worker``pw_tokenizer`` provides two low-level macros to help projects create custom 330*61c4878aSAndroid Build Coastguard Workertokenization macros: 331*61c4878aSAndroid Build Coastguard Worker 332*61c4878aSAndroid Build Coastguard Worker* :c:macro:`PW_TOKENIZE_FORMAT_STRING` 333*61c4878aSAndroid Build Coastguard Worker* :c:macro:`PW_TOKENIZER_REPLACE_FORMAT_STRING` 334*61c4878aSAndroid Build Coastguard Worker 335*61c4878aSAndroid Build Coastguard Worker.. caution:: 336*61c4878aSAndroid Build Coastguard Worker 337*61c4878aSAndroid Build Coastguard Worker Note the spelling difference! The first macro begins with ``PW_TOKENIZE_`` 338*61c4878aSAndroid Build Coastguard Worker (no ``R``) whereas the second begins with ``PW_TOKENIZER_``. 339*61c4878aSAndroid Build Coastguard Worker 340*61c4878aSAndroid Build Coastguard WorkerUse these macros to invoke an encoding function with the token, argument types, 341*61c4878aSAndroid Build Coastguard Workerand variadic arguments. The function can then encode the tokenized message to a 342*61c4878aSAndroid Build Coastguard Workerbuffer using helpers in ``pw_tokenizer/encode_args.h``: 343*61c4878aSAndroid Build Coastguard Worker 344*61c4878aSAndroid Build Coastguard Worker.. Note: pw_tokenizer_EncodeArgs is a C function so you would expect to 345*61c4878aSAndroid Build Coastguard Worker.. reference it as :c:func:`pw_tokenizer_EncodeArgs`. That doesn't work because 346*61c4878aSAndroid Build Coastguard Worker.. it's defined in a header file that mixes C and C++. 347*61c4878aSAndroid Build Coastguard Worker 348*61c4878aSAndroid Build Coastguard Worker* :cpp:func:`pw::tokenizer::EncodeArgs` 349*61c4878aSAndroid Build Coastguard Worker* :cpp:class:`pw::tokenizer::EncodedMessage` 350*61c4878aSAndroid Build Coastguard Worker* :cpp:func:`pw_tokenizer_EncodeArgs` 351*61c4878aSAndroid Build Coastguard Worker 352*61c4878aSAndroid Build Coastguard WorkerExample 353*61c4878aSAndroid Build Coastguard Worker------- 354*61c4878aSAndroid Build Coastguard WorkerThe following example implements a custom tokenization macro similar to 355*61c4878aSAndroid Build Coastguard Worker:ref:`module-pw_log_tokenized`. 356*61c4878aSAndroid Build Coastguard Worker 357*61c4878aSAndroid Build Coastguard Worker.. code-block:: cpp 358*61c4878aSAndroid Build Coastguard Worker 359*61c4878aSAndroid Build Coastguard Worker #include "pw_tokenizer/tokenize.h" 360*61c4878aSAndroid Build Coastguard Worker 361*61c4878aSAndroid Build Coastguard Worker #ifndef __cplusplus 362*61c4878aSAndroid Build Coastguard Worker extern "C" { 363*61c4878aSAndroid Build Coastguard Worker #endif 364*61c4878aSAndroid Build Coastguard Worker 365*61c4878aSAndroid Build Coastguard Worker void EncodeTokenizedMessage(uint32_t metadata, 366*61c4878aSAndroid Build Coastguard Worker pw_tokenizer_Token token, 367*61c4878aSAndroid Build Coastguard Worker pw_tokenizer_ArgTypes types, 368*61c4878aSAndroid Build Coastguard Worker ...); 369*61c4878aSAndroid Build Coastguard Worker 370*61c4878aSAndroid Build Coastguard Worker #ifndef __cplusplus 371*61c4878aSAndroid Build Coastguard Worker } // extern "C" 372*61c4878aSAndroid Build Coastguard Worker #endif 373*61c4878aSAndroid Build Coastguard Worker 374*61c4878aSAndroid Build Coastguard Worker #define PW_LOG_TOKENIZED_ENCODE_MESSAGE(metadata, format, ...) \ 375*61c4878aSAndroid Build Coastguard Worker do { \ 376*61c4878aSAndroid Build Coastguard Worker PW_TOKENIZE_FORMAT_STRING("logs", UINT32_MAX, format, __VA_ARGS__); \ 377*61c4878aSAndroid Build Coastguard Worker EncodeTokenizedMessage( \ 378*61c4878aSAndroid Build Coastguard Worker metadata, PW_TOKENIZER_REPLACE_FORMAT_STRING(__VA_ARGS__)); \ 379*61c4878aSAndroid Build Coastguard Worker } while (0) 380*61c4878aSAndroid Build Coastguard Worker 381*61c4878aSAndroid Build Coastguard WorkerIn this example, the ``EncodeTokenizedMessage`` function would handle encoding 382*61c4878aSAndroid Build Coastguard Workerand processing the message. Encoding is done by the 383*61c4878aSAndroid Build Coastguard Worker:cpp:class:`pw::tokenizer::EncodedMessage` class or 384*61c4878aSAndroid Build Coastguard Worker:cpp:func:`pw::tokenizer::EncodeArgs` function from 385*61c4878aSAndroid Build Coastguard Worker``pw_tokenizer/encode_args.h``. The encoded message can then be transmitted or 386*61c4878aSAndroid Build Coastguard Workerstored as needed. 387*61c4878aSAndroid Build Coastguard Worker 388*61c4878aSAndroid Build Coastguard Worker.. code-block:: cpp 389*61c4878aSAndroid Build Coastguard Worker 390*61c4878aSAndroid Build Coastguard Worker #include "pw_log_tokenized/log_tokenized.h" 391*61c4878aSAndroid Build Coastguard Worker #include "pw_tokenizer/encode_args.h" 392*61c4878aSAndroid Build Coastguard Worker 393*61c4878aSAndroid Build Coastguard Worker void HandleTokenizedMessage(pw::log_tokenized::Metadata metadata, 394*61c4878aSAndroid Build Coastguard Worker pw::span<std::byte> message); 395*61c4878aSAndroid Build Coastguard Worker 396*61c4878aSAndroid Build Coastguard Worker extern "C" void EncodeTokenizedMessage(const uint32_t metadata, 397*61c4878aSAndroid Build Coastguard Worker const pw_tokenizer_Token token, 398*61c4878aSAndroid Build Coastguard Worker const pw_tokenizer_ArgTypes types, 399*61c4878aSAndroid Build Coastguard Worker ...) { 400*61c4878aSAndroid Build Coastguard Worker va_list args; 401*61c4878aSAndroid Build Coastguard Worker va_start(args, types); 402*61c4878aSAndroid Build Coastguard Worker pw::tokenizer::EncodedMessage<kLogBufferSize> encoded_message(token, types, args); 403*61c4878aSAndroid Build Coastguard Worker va_end(args); 404*61c4878aSAndroid Build Coastguard Worker 405*61c4878aSAndroid Build Coastguard Worker HandleTokenizedMessage(metadata, encoded_message); 406*61c4878aSAndroid Build Coastguard Worker } 407*61c4878aSAndroid Build Coastguard Worker 408*61c4878aSAndroid Build Coastguard Worker.. admonition:: Why use a custom macro 409*61c4878aSAndroid Build Coastguard Worker 410*61c4878aSAndroid Build Coastguard Worker - Optimal code size. Invoking a free function with the tokenized data results 411*61c4878aSAndroid Build Coastguard Worker in the smallest possible call site. 412*61c4878aSAndroid Build Coastguard Worker - Pass additional arguments, such as metadata, with the tokenized message. 413*61c4878aSAndroid Build Coastguard Worker - Integrate ``pw_tokenizer`` with other systems. 414*61c4878aSAndroid Build Coastguard Worker 415*61c4878aSAndroid Build Coastguard WorkerTokenizing function names 416*61c4878aSAndroid Build Coastguard Worker========================= 417*61c4878aSAndroid Build Coastguard WorkerThe string literal tokenization functions support tokenizing string literals or 418*61c4878aSAndroid Build Coastguard Workerconstexpr character arrays (``constexpr const char[]``). In GCC and Clang, the 419*61c4878aSAndroid Build Coastguard Workerspecial ``__func__`` variable and ``__PRETTY_FUNCTION__`` extension are declared 420*61c4878aSAndroid Build Coastguard Workeras ``static constexpr char[]`` in C++ instead of the standard ``static const 421*61c4878aSAndroid Build Coastguard Workerchar[]``. This means that ``__func__`` and ``__PRETTY_FUNCTION__`` can be 422*61c4878aSAndroid Build Coastguard Workertokenized while compiling C++ with GCC or Clang. 423*61c4878aSAndroid Build Coastguard Worker 424*61c4878aSAndroid Build Coastguard Worker.. code-block:: cpp 425*61c4878aSAndroid Build Coastguard Worker 426*61c4878aSAndroid Build Coastguard Worker // Tokenize the special function name variables. 427*61c4878aSAndroid Build Coastguard Worker constexpr uint32_t function = PW_TOKENIZE_STRING(__func__); 428*61c4878aSAndroid Build Coastguard Worker constexpr uint32_t pretty_function = PW_TOKENIZE_STRING(__PRETTY_FUNCTION__); 429*61c4878aSAndroid Build Coastguard Worker 430*61c4878aSAndroid Build Coastguard WorkerNote that ``__func__`` and ``__PRETTY_FUNCTION__`` are not string literals. 431*61c4878aSAndroid Build Coastguard WorkerThey are defined as static character arrays, so they cannot be implicitly 432*61c4878aSAndroid Build Coastguard Workerconcatentated with string literals. For example, ``printf(__func__ ": %d", 433*61c4878aSAndroid Build Coastguard Worker123);`` will not compile. 434*61c4878aSAndroid Build Coastguard Worker 435*61c4878aSAndroid Build Coastguard WorkerCalculate minimum required buffer size 436*61c4878aSAndroid Build Coastguard Worker====================================== 437*61c4878aSAndroid Build Coastguard WorkerSee :cpp:func:`pw::tokenizer::MinEncodingBufferSizeBytes`. 438*61c4878aSAndroid Build Coastguard Worker 439*61c4878aSAndroid Build Coastguard Worker.. _module-pw_tokenizer-base64-format: 440*61c4878aSAndroid Build Coastguard Worker 441*61c4878aSAndroid Build Coastguard WorkerEncoding Base64 442*61c4878aSAndroid Build Coastguard Worker=============== 443*61c4878aSAndroid Build Coastguard WorkerThe tokenizer encodes messages to a compact binary representation. Applications 444*61c4878aSAndroid Build Coastguard Workermay desire a textual representation of tokenized strings. This makes it easy to 445*61c4878aSAndroid Build Coastguard Workeruse tokenized messages alongside plain text messages, but comes at a small 446*61c4878aSAndroid Build Coastguard Workerefficiency cost: encoded Base64 messages occupy about 4/3 (133%) as much memory 447*61c4878aSAndroid Build Coastguard Workeras binary messages. 448*61c4878aSAndroid Build Coastguard Worker 449*61c4878aSAndroid Build Coastguard WorkerThe Base64 format is comprised of a ``$`` character followed by the 450*61c4878aSAndroid Build Coastguard WorkerBase64-encoded contents of the tokenized message. For example, consider 451*61c4878aSAndroid Build Coastguard Workertokenizing the string ``This is an example: %d!`` with the argument -1. The 452*61c4878aSAndroid Build Coastguard Workerstring's token is 0x4b016e66. 453*61c4878aSAndroid Build Coastguard Worker 454*61c4878aSAndroid Build Coastguard Worker.. code-block:: text 455*61c4878aSAndroid Build Coastguard Worker 456*61c4878aSAndroid Build Coastguard Worker Source code: PW_LOG("This is an example: %d!", -1); 457*61c4878aSAndroid Build Coastguard Worker 458*61c4878aSAndroid Build Coastguard Worker Plain text: This is an example: -1! [23 bytes] 459*61c4878aSAndroid Build Coastguard Worker 460*61c4878aSAndroid Build Coastguard Worker Binary: 66 6e 01 4b 01 [ 5 bytes] 461*61c4878aSAndroid Build Coastguard Worker 462*61c4878aSAndroid Build Coastguard Worker Base64: $Zm4BSwE= [ 9 bytes] 463*61c4878aSAndroid Build Coastguard Worker 464*61c4878aSAndroid Build Coastguard WorkerTo encode with the Base64 format, add a call to 465*61c4878aSAndroid Build Coastguard Worker``pw::tokenizer::PrefixedBase64Encode`` or ``pw_tokenizer_PrefixedBase64Encode`` 466*61c4878aSAndroid Build Coastguard Workerin the tokenizer handler function. For example, 467*61c4878aSAndroid Build Coastguard Worker 468*61c4878aSAndroid Build Coastguard Worker.. code-block:: cpp 469*61c4878aSAndroid Build Coastguard Worker 470*61c4878aSAndroid Build Coastguard Worker void TokenizedMessageHandler(const uint8_t encoded_message[], 471*61c4878aSAndroid Build Coastguard Worker size_t size_bytes) { 472*61c4878aSAndroid Build Coastguard Worker pw::InlineBasicString base64 = pw::tokenizer::PrefixedBase64Encode( 473*61c4878aSAndroid Build Coastguard Worker pw::span(encoded_message, size_bytes)); 474*61c4878aSAndroid Build Coastguard Worker 475*61c4878aSAndroid Build Coastguard Worker TransmitLogMessage(base64.data(), base64.size()); 476*61c4878aSAndroid Build Coastguard Worker } 477*61c4878aSAndroid Build Coastguard Worker 478*61c4878aSAndroid Build Coastguard Worker.. _module-pw_tokenizer-masks: 479*61c4878aSAndroid Build Coastguard Worker 480*61c4878aSAndroid Build Coastguard WorkerReduce token size with masking 481*61c4878aSAndroid Build Coastguard Worker============================== 482*61c4878aSAndroid Build Coastguard Worker``pw_tokenizer`` uses 32-bit tokens. On 32-bit or 64-bit architectures, using 483*61c4878aSAndroid Build Coastguard Workerfewer than 32 bits does not improve runtime or code size efficiency. However, 484*61c4878aSAndroid Build Coastguard Workerwhen tokens are packed into data structures or stored in arrays, the size of the 485*61c4878aSAndroid Build Coastguard Workertoken directly affects memory usage. In those cases, every bit counts, and it 486*61c4878aSAndroid Build Coastguard Workermay be desireable to use fewer bits for the token. 487*61c4878aSAndroid Build Coastguard Worker 488*61c4878aSAndroid Build Coastguard Worker``pw_tokenizer`` allows users to provide a mask to apply to the token. This 489*61c4878aSAndroid Build Coastguard Workermasked token is used in both the token database and the code. The masked token 490*61c4878aSAndroid Build Coastguard Workeris not a masked version of the full 32-bit token, the masked token is the token. 491*61c4878aSAndroid Build Coastguard WorkerThis makes it trivial to decode tokens that use fewer than 32 bits. 492*61c4878aSAndroid Build Coastguard Worker 493*61c4878aSAndroid Build Coastguard WorkerMasking functionality is provided through the ``*_MASK`` versions of the macros: 494*61c4878aSAndroid Build Coastguard Worker 495*61c4878aSAndroid Build Coastguard Worker* :c:macro:`PW_TOKENIZE_STRING_MASK` 496*61c4878aSAndroid Build Coastguard Worker* :c:macro:`PW_TOKENIZE_STRING_MASK_EXPR` 497*61c4878aSAndroid Build Coastguard Worker* :c:macro:`PW_TOKENIZE_TO_BUFFER_MASK` 498*61c4878aSAndroid Build Coastguard Worker 499*61c4878aSAndroid Build Coastguard WorkerFor example, the following generates 16-bit tokens and packs them into an 500*61c4878aSAndroid Build Coastguard Workerexisting value. 501*61c4878aSAndroid Build Coastguard Worker 502*61c4878aSAndroid Build Coastguard Worker.. code-block:: cpp 503*61c4878aSAndroid Build Coastguard Worker 504*61c4878aSAndroid Build Coastguard Worker constexpr uint32_t token = PW_TOKENIZE_STRING_MASK("domain", 0xFFFF, "Pigweed!"); 505*61c4878aSAndroid Build Coastguard Worker uint32_t packed_word = (other_bits << 16) | token; 506*61c4878aSAndroid Build Coastguard Worker 507*61c4878aSAndroid Build Coastguard WorkerTokens are hashes, so tokens of any size have a collision risk. The fewer bits 508*61c4878aSAndroid Build Coastguard Workerused for tokens, the more likely two strings are to hash to the same token. See 509*61c4878aSAndroid Build Coastguard Worker:ref:`module-pw_tokenizer-collisions`. 510*61c4878aSAndroid Build Coastguard Worker 511*61c4878aSAndroid Build Coastguard WorkerMasked tokens without arguments may be encoded in fewer bytes. For example, the 512*61c4878aSAndroid Build Coastguard Worker16-bit token ``0x1234`` may be encoded as two little-endian bytes (``34 12``) 513*61c4878aSAndroid Build Coastguard Workerrather than four (``34 12 00 00``). The detokenizer tools zero-pad data smaller 514*61c4878aSAndroid Build Coastguard Workerthan four bytes. Tokens with arguments must always be encoded as four bytes. 515*61c4878aSAndroid Build Coastguard Worker 516*61c4878aSAndroid Build Coastguard Worker.. _module-pw_tokenizer-domains: 517*61c4878aSAndroid Build Coastguard Worker 518*61c4878aSAndroid Build Coastguard WorkerKeep tokens from different sources separate with domains 519*61c4878aSAndroid Build Coastguard Worker======================================================== 520*61c4878aSAndroid Build Coastguard Worker``pw_tokenizer`` supports having multiple tokenization domains. Domains are a 521*61c4878aSAndroid Build Coastguard Workerstring label associated with each tokenized string. This allows projects to keep 522*61c4878aSAndroid Build Coastguard Workertokens from different sources separate. Potential use cases include the 523*61c4878aSAndroid Build Coastguard Workerfollowing: 524*61c4878aSAndroid Build Coastguard Worker 525*61c4878aSAndroid Build Coastguard Worker* Keep large sets of tokenized strings separate to avoid collisions. 526*61c4878aSAndroid Build Coastguard Worker* Create a separate database for a small number of strings that use truncated 527*61c4878aSAndroid Build Coastguard Worker tokens, for example only 10 or 16 bits instead of the full 32 bits. 528*61c4878aSAndroid Build Coastguard Worker 529*61c4878aSAndroid Build Coastguard WorkerWhen a domain is specified, any whitespace will be ignored in domain names and 530*61c4878aSAndroid Build Coastguard Workerremoved from the database. 531*61c4878aSAndroid Build Coastguard Worker 532*61c4878aSAndroid Build Coastguard WorkerIf no domain is specified, the domain is empty (``""``). For many projects, this 533*61c4878aSAndroid Build Coastguard Workerdefault domain is sufficient, so no additional configuration is required. 534*61c4878aSAndroid Build Coastguard Worker 535*61c4878aSAndroid Build Coastguard Worker.. code-block:: cpp 536*61c4878aSAndroid Build Coastguard Worker 537*61c4878aSAndroid Build Coastguard Worker // Tokenizes this string to the default ("") domain. 538*61c4878aSAndroid Build Coastguard Worker PW_TOKENIZE_STRING("Hello, world!"); 539*61c4878aSAndroid Build Coastguard Worker 540*61c4878aSAndroid Build Coastguard Worker // Tokenizes this string to the "my_custom_domain" domain. 541*61c4878aSAndroid Build Coastguard Worker PW_TOKENIZE_STRING_DOMAIN("my_custom_domain", "Hello, world!"); 542*61c4878aSAndroid Build Coastguard Worker 543*61c4878aSAndroid Build Coastguard WorkerThe database and detokenization command line tools default to loading tokens 544*61c4878aSAndroid Build Coastguard Workerfrom all domains. The domain may be specified for ELF files by appending 545*61c4878aSAndroid Build Coastguard Worker``#DOMAIN_NAME_REGEX`` to the file path. Use ``#`` to only read from the default 546*61c4878aSAndroid Build Coastguard Workerdomain. For example, the following reads strings in ``some_domain`` from 547*61c4878aSAndroid Build Coastguard Worker``my_image.elf``. 548*61c4878aSAndroid Build Coastguard Worker 549*61c4878aSAndroid Build Coastguard Worker.. code-block:: sh 550*61c4878aSAndroid Build Coastguard Worker 551*61c4878aSAndroid Build Coastguard Worker ./database.py create --database my_db.csv "path/to/my_image.elf#some_domain" 552*61c4878aSAndroid Build Coastguard Worker 553*61c4878aSAndroid Build Coastguard WorkerSee :ref:`module-pw_tokenizer-managing-token-databases` for information about 554*61c4878aSAndroid Build Coastguard Workerthe ``database.py`` command line tool. 555*61c4878aSAndroid Build Coastguard Worker 556*61c4878aSAndroid Build Coastguard WorkerLimitations, bugs, and future work 557*61c4878aSAndroid Build Coastguard Worker================================== 558*61c4878aSAndroid Build Coastguard Worker 559*61c4878aSAndroid Build Coastguard Worker.. _module-pw_tokenizer-gcc-template-bug: 560*61c4878aSAndroid Build Coastguard Worker 561*61c4878aSAndroid Build Coastguard WorkerGCC bug: tokenization in template functions 562*61c4878aSAndroid Build Coastguard Worker------------------------------------------- 563*61c4878aSAndroid Build Coastguard WorkerGCC releases prior to 14 incorrectly ignore the section attribute for template 564*61c4878aSAndroid Build Coastguard Worker`functions <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70435>`_ and `variables 565*61c4878aSAndroid Build Coastguard Worker<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88061>`_. The bug causes tokenized 566*61c4878aSAndroid Build Coastguard Workerstrings in template functions to be emitted into ``.rodata`` instead of the 567*61c4878aSAndroid Build Coastguard Workertokenized string section, so they cannot be extracted for detokenization. 568*61c4878aSAndroid Build Coastguard Worker 569*61c4878aSAndroid Build Coastguard WorkerFortunately, this is simple to work around in the linker script. 570*61c4878aSAndroid Build Coastguard Worker``pw_tokenizer_linker_sections.ld`` includes a statement that pulls tokenized 571*61c4878aSAndroid Build Coastguard Workerstring entries from ``.rodata`` into the tokenized string section. See 572*61c4878aSAndroid Build Coastguard Worker`b/321306079 <https://issues.pigweed.dev/issues/321306079>`_ for details. 573*61c4878aSAndroid Build Coastguard Worker 574*61c4878aSAndroid Build Coastguard WorkerIf tokenization is working, but strings in templates are not appearing in token 575*61c4878aSAndroid Build Coastguard Workerdatabases, check the following: 576*61c4878aSAndroid Build Coastguard Worker 577*61c4878aSAndroid Build Coastguard Worker- The full contents of the latest version of ``pw_tokenizer_linker_sections.ld`` 578*61c4878aSAndroid Build Coastguard Worker are included with the linker script. The linker script was updated in 579*61c4878aSAndroid Build Coastguard Worker `pwrev.dev/188424 <http://pwrev.dev/188424>`_. 580*61c4878aSAndroid Build Coastguard Worker- The ``-fdata-sections`` compilation option is in use. This places each 581*61c4878aSAndroid Build Coastguard Worker variable in its own section, which is necessary for pulling tokenized string 582*61c4878aSAndroid Build Coastguard Worker entries from ``.rodata`` into the proper section. 583*61c4878aSAndroid Build Coastguard Worker 584*61c4878aSAndroid Build Coastguard Worker64-bit tokenization 585*61c4878aSAndroid Build Coastguard Worker------------------- 586*61c4878aSAndroid Build Coastguard WorkerThe Python and C++ detokenizing libraries currently assume that strings were 587*61c4878aSAndroid Build Coastguard Workertokenized on a system with 32-bit ``long``, ``size_t``, ``intptr_t``, and 588*61c4878aSAndroid Build Coastguard Worker``ptrdiff_t``. Decoding may not work correctly for these types if a 64-bit 589*61c4878aSAndroid Build Coastguard Workerdevice performed the tokenization. 590*61c4878aSAndroid Build Coastguard Worker 591*61c4878aSAndroid Build Coastguard WorkerSupporting detokenization of strings tokenized on 64-bit targets would be 592*61c4878aSAndroid Build Coastguard Workersimple. This could be done by adding an option to switch the 32-bit types to 593*61c4878aSAndroid Build Coastguard Worker64-bit. The tokenizer stores the sizes of these types in the 594*61c4878aSAndroid Build Coastguard Worker``.pw_tokenizer.info`` ELF section, so the sizes of these types can be verified 595*61c4878aSAndroid Build Coastguard Workerby checking the ELF file, if necessary. 596*61c4878aSAndroid Build Coastguard Worker 597*61c4878aSAndroid Build Coastguard WorkerTokenization in headers 598*61c4878aSAndroid Build Coastguard Worker----------------------- 599*61c4878aSAndroid Build Coastguard WorkerTokenizing code in header files (inline functions or templates) may trigger 600*61c4878aSAndroid Build Coastguard Workerwarnings such as ``-Wlto-type-mismatch`` under certain conditions. That 601*61c4878aSAndroid Build Coastguard Workeris because tokenization requires declaring a character array for each tokenized 602*61c4878aSAndroid Build Coastguard Workerstring. If the tokenized string includes macros that change value, the size of 603*61c4878aSAndroid Build Coastguard Workerthis character array changes, which means the same static variable is defined 604*61c4878aSAndroid Build Coastguard Workerwith different sizes. It should be safe to suppress these warnings, but, when 605*61c4878aSAndroid Build Coastguard Workerpossible, code that tokenizes strings with macros that can change value should 606*61c4878aSAndroid Build Coastguard Workerbe moved to source files rather than headers. 607*61c4878aSAndroid Build Coastguard Worker 608*61c4878aSAndroid Build Coastguard Worker---------------------- 609*61c4878aSAndroid Build Coastguard WorkerTokenization in Python 610*61c4878aSAndroid Build Coastguard Worker---------------------- 611*61c4878aSAndroid Build Coastguard WorkerThe Python ``pw_tokenizer.encode`` module has limited support for encoding 612*61c4878aSAndroid Build Coastguard Workertokenized messages with the :func:`pw_tokenizer.encode.encode_token_and_args` 613*61c4878aSAndroid Build Coastguard Workerfunction. This function requires a string's token is already calculated. 614*61c4878aSAndroid Build Coastguard WorkerTypically these tokens are provided by a database, but they can be manually 615*61c4878aSAndroid Build Coastguard Workercreated using the tokenizer hash. 616*61c4878aSAndroid Build Coastguard Worker 617*61c4878aSAndroid Build Coastguard Worker:func:`pw_tokenizer.tokens.pw_tokenizer_65599_hash` is particularly useful 618*61c4878aSAndroid Build Coastguard Workerfor offline token database generation in cases where tokenized strings in a 619*61c4878aSAndroid Build Coastguard Workerbinary cannot be embedded as parsable pw_tokenizer entries. 620*61c4878aSAndroid Build Coastguard Worker 621*61c4878aSAndroid Build Coastguard Worker.. note:: 622*61c4878aSAndroid Build Coastguard Worker In C, the hash length of a string has a fixed limit controlled by 623*61c4878aSAndroid Build Coastguard Worker ``PW_TOKENIZER_CFG_C_HASH_LENGTH``. To match tokens produced by C (as opposed 624*61c4878aSAndroid Build Coastguard Worker to C++) code, ``pw_tokenizer_65599_hash()`` should be called with a matching 625*61c4878aSAndroid Build Coastguard Worker hash length limit. When creating an offline database, it's a good idea to 626*61c4878aSAndroid Build Coastguard Worker generate tokens for both, and merge the databases. 627*61c4878aSAndroid Build Coastguard Worker 628*61c4878aSAndroid Build Coastguard Worker.. _module-pw_tokenizer-cli-encoding: 629*61c4878aSAndroid Build Coastguard Worker 630*61c4878aSAndroid Build Coastguard Worker----------------- 631*61c4878aSAndroid Build Coastguard WorkerEncoding CLI tool 632*61c4878aSAndroid Build Coastguard Worker----------------- 633*61c4878aSAndroid Build Coastguard WorkerThe ``pw_tokenizer.encode`` command line tool can be used to encode 634*61c4878aSAndroid Build Coastguard Workerformat strings and optional arguments. 635*61c4878aSAndroid Build Coastguard Worker 636*61c4878aSAndroid Build Coastguard Worker.. code-block:: bash 637*61c4878aSAndroid Build Coastguard Worker 638*61c4878aSAndroid Build Coastguard Worker python -m pw_tokenizer.encode [-h] FORMAT_STRING [ARG ...] 639*61c4878aSAndroid Build Coastguard Worker 640*61c4878aSAndroid Build Coastguard WorkerExample: 641*61c4878aSAndroid Build Coastguard Worker 642*61c4878aSAndroid Build Coastguard Worker.. code-block:: text 643*61c4878aSAndroid Build Coastguard Worker 644*61c4878aSAndroid Build Coastguard Worker $ python -m pw_tokenizer.encode "There's... %d many of %s!" 2 them 645*61c4878aSAndroid Build Coastguard Worker Raw input: "There's... %d many of %s!" % (2, 'them') 646*61c4878aSAndroid Build Coastguard Worker Formatted input: There's... 2 many of them! 647*61c4878aSAndroid Build Coastguard Worker Token: 0xb6ef8b2d 648*61c4878aSAndroid Build Coastguard Worker Encoded: b'-\x8b\xef\xb6\x04\x04them' (2d 8b ef b6 04 04 74 68 65 6d) [10 bytes] 649*61c4878aSAndroid Build Coastguard Worker Prefixed Base64: $LYvvtgQEdGhlbQ== 650*61c4878aSAndroid Build Coastguard Worker 651*61c4878aSAndroid Build Coastguard WorkerSee ``--help`` for full usage details. 652*61c4878aSAndroid Build Coastguard Worker 653*61c4878aSAndroid Build Coastguard Worker-------- 654*61c4878aSAndroid Build Coastguard WorkerAppendix 655*61c4878aSAndroid Build Coastguard Worker-------- 656*61c4878aSAndroid Build Coastguard Worker 657*61c4878aSAndroid Build Coastguard WorkerCase study 658*61c4878aSAndroid Build Coastguard Worker========== 659*61c4878aSAndroid Build Coastguard Worker.. note:: This section discusses the implementation, results, and lessons 660*61c4878aSAndroid Build Coastguard Worker learned from a real-world deployment of ``pw_tokenizer``. 661*61c4878aSAndroid Build Coastguard Worker 662*61c4878aSAndroid Build Coastguard WorkerThe tokenizer module was developed to bring tokenized logging to an 663*61c4878aSAndroid Build Coastguard Workerin-development product. The product already had an established text-based 664*61c4878aSAndroid Build Coastguard Workerlogging system. Deploying tokenization was straightforward and had substantial 665*61c4878aSAndroid Build Coastguard Workerbenefits. 666*61c4878aSAndroid Build Coastguard Worker 667*61c4878aSAndroid Build Coastguard WorkerResults 668*61c4878aSAndroid Build Coastguard Worker------- 669*61c4878aSAndroid Build Coastguard Worker* Log contents shrunk by over 50%, even with Base64 encoding. 670*61c4878aSAndroid Build Coastguard Worker 671*61c4878aSAndroid Build Coastguard Worker * Significant size savings for encoded logs, even using the less-efficient 672*61c4878aSAndroid Build Coastguard Worker Base64 encoding required for compatibility with the existing log system. 673*61c4878aSAndroid Build Coastguard Worker * Freed valuable communication bandwidth. 674*61c4878aSAndroid Build Coastguard Worker * Allowed storing many more logs in crash dumps. 675*61c4878aSAndroid Build Coastguard Worker 676*61c4878aSAndroid Build Coastguard Worker* Substantial flash savings. 677*61c4878aSAndroid Build Coastguard Worker 678*61c4878aSAndroid Build Coastguard Worker * Reduced the size firmware images by up to 18%. 679*61c4878aSAndroid Build Coastguard Worker 680*61c4878aSAndroid Build Coastguard Worker* Simpler logging code. 681*61c4878aSAndroid Build Coastguard Worker 682*61c4878aSAndroid Build Coastguard Worker * Removed CPU-heavy ``snprintf`` calls. 683*61c4878aSAndroid Build Coastguard Worker * Removed complex code for forwarding log arguments to a low-priority task. 684*61c4878aSAndroid Build Coastguard Worker 685*61c4878aSAndroid Build Coastguard WorkerThis section describes the tokenizer deployment process and highlights key 686*61c4878aSAndroid Build Coastguard Workerinsights. 687*61c4878aSAndroid Build Coastguard Worker 688*61c4878aSAndroid Build Coastguard WorkerFirmware deployment 689*61c4878aSAndroid Build Coastguard Worker------------------- 690*61c4878aSAndroid Build Coastguard Worker* In the project's logging macro, calls to the underlying logging function were 691*61c4878aSAndroid Build Coastguard Worker replaced with a tokenized log macro invocation. 692*61c4878aSAndroid Build Coastguard Worker* The log level was passed as the payload argument to facilitate runtime log 693*61c4878aSAndroid Build Coastguard Worker level control. 694*61c4878aSAndroid Build Coastguard Worker* For this project, it was necessary to encode the log messages as text. In 695*61c4878aSAndroid Build Coastguard Worker the handler function the log messages were encoded in the $-prefixed 696*61c4878aSAndroid Build Coastguard Worker :ref:`module-pw_tokenizer-base64-format`, then dispatched as normal log messages. 697*61c4878aSAndroid Build Coastguard Worker* Asserts were tokenized a callback-based API that has been removed (a 698*61c4878aSAndroid Build Coastguard Worker :ref:`custom macro <module-pw_tokenizer-custom-macro>` is a better 699*61c4878aSAndroid Build Coastguard Worker alternative). 700*61c4878aSAndroid Build Coastguard Worker 701*61c4878aSAndroid Build Coastguard Worker.. attention:: 702*61c4878aSAndroid Build Coastguard Worker Do not encode line numbers in tokenized strings. This results in a huge 703*61c4878aSAndroid Build Coastguard Worker number of lines being added to the database, since every time code moves, 704*61c4878aSAndroid Build Coastguard Worker new strings are tokenized. If :ref:`module-pw_log_tokenized` is used, line 705*61c4878aSAndroid Build Coastguard Worker numbers are encoded in the log metadata. Line numbers may also be included by 706*61c4878aSAndroid Build Coastguard Worker by adding ``"%d"`` to the format string and passing ``__LINE__``. 707*61c4878aSAndroid Build Coastguard Worker 708*61c4878aSAndroid Build Coastguard Worker.. _module-pw_tokenizer-database-management: 709*61c4878aSAndroid Build Coastguard Worker 710*61c4878aSAndroid Build Coastguard WorkerDatabase management 711*61c4878aSAndroid Build Coastguard Worker------------------- 712*61c4878aSAndroid Build Coastguard Worker* The token database was stored as a CSV file in the project's Git repo. 713*61c4878aSAndroid Build Coastguard Worker* The token database was automatically updated as part of the build, and 714*61c4878aSAndroid Build Coastguard Worker developers were expected to check in the database changes alongside their code 715*61c4878aSAndroid Build Coastguard Worker changes. 716*61c4878aSAndroid Build Coastguard Worker* A presubmit check verified that all strings added by a change were added to 717*61c4878aSAndroid Build Coastguard Worker the token database. 718*61c4878aSAndroid Build Coastguard Worker* The token database included logs and asserts for all firmware images in the 719*61c4878aSAndroid Build Coastguard Worker project. 720*61c4878aSAndroid Build Coastguard Worker* No strings were purged from the token database. 721*61c4878aSAndroid Build Coastguard Worker 722*61c4878aSAndroid Build Coastguard Worker.. tip:: 723*61c4878aSAndroid Build Coastguard Worker Merge conflicts may be a frequent occurrence with an in-source CSV database. 724*61c4878aSAndroid Build Coastguard Worker Use the :ref:`module-pw_tokenizer-directory-database-format` instead. 725*61c4878aSAndroid Build Coastguard Worker 726*61c4878aSAndroid Build Coastguard WorkerDecoding tooling deployment 727*61c4878aSAndroid Build Coastguard Worker--------------------------- 728*61c4878aSAndroid Build Coastguard Worker* The Python detokenizer in ``pw_tokenizer`` was deployed to two places: 729*61c4878aSAndroid Build Coastguard Worker 730*61c4878aSAndroid Build Coastguard Worker * Product-specific Python command line tools, using 731*61c4878aSAndroid Build Coastguard Worker ``pw_tokenizer.Detokenizer``. 732*61c4878aSAndroid Build Coastguard Worker * Standalone script for decoding prefixed Base64 tokens in files or 733*61c4878aSAndroid Build Coastguard Worker live output (e.g. from ``adb``), using ``detokenize.py``'s command line 734*61c4878aSAndroid Build Coastguard Worker interface. 735*61c4878aSAndroid Build Coastguard Worker 736*61c4878aSAndroid Build Coastguard Worker* The C++ detokenizer library was deployed to two Android apps with a Java 737*61c4878aSAndroid Build Coastguard Worker Native Interface (JNI) layer. 738*61c4878aSAndroid Build Coastguard Worker 739*61c4878aSAndroid Build Coastguard Worker * The binary token database was included as a raw resource in the APK. 740*61c4878aSAndroid Build Coastguard Worker * In one app, the built-in token database could be overridden by copying a 741*61c4878aSAndroid Build Coastguard Worker file to the phone. 742*61c4878aSAndroid Build Coastguard Worker 743*61c4878aSAndroid Build Coastguard Worker.. tip:: 744*61c4878aSAndroid Build Coastguard Worker Make the tokenized logging tools simple to use for your project. 745*61c4878aSAndroid Build Coastguard Worker 746*61c4878aSAndroid Build Coastguard Worker * Provide simple wrapper shell scripts that fill in arguments for the 747*61c4878aSAndroid Build Coastguard Worker project. For example, point ``detokenize.py`` to the project's token 748*61c4878aSAndroid Build Coastguard Worker databases. 749*61c4878aSAndroid Build Coastguard Worker * Use ``pw_tokenizer.AutoUpdatingDetokenizer`` to decode in 750*61c4878aSAndroid Build Coastguard Worker continuously-running tools, so that users don't have to restart the tool 751*61c4878aSAndroid Build Coastguard Worker when the token database updates. 752*61c4878aSAndroid Build Coastguard Worker * Integrate detokenization everywhere it is needed. Integrating the tools 753*61c4878aSAndroid Build Coastguard Worker takes just a few lines of code, and token databases can be embedded in APKs 754*61c4878aSAndroid Build Coastguard Worker or binaries. 755