1*61c4878aSAndroid Build Coastguard Worker.. _seed-0105: 2*61c4878aSAndroid Build Coastguard Worker 3*61c4878aSAndroid Build Coastguard Worker=============================================== 4*61c4878aSAndroid Build Coastguard Worker0105: Nested Tokens and Tokenized Log Arguments 5*61c4878aSAndroid Build Coastguard Worker=============================================== 6*61c4878aSAndroid Build Coastguard Worker 7*61c4878aSAndroid Build Coastguard Worker.. seed:: 8*61c4878aSAndroid Build Coastguard Worker :number: 105 9*61c4878aSAndroid Build Coastguard Worker :name: Nested Tokens and Tokenized Log Arguments 10*61c4878aSAndroid Build Coastguard Worker :status: Accepted 11*61c4878aSAndroid Build Coastguard Worker :proposal_date: 2023-07-10 12*61c4878aSAndroid Build Coastguard Worker :cl: 154190 13*61c4878aSAndroid Build Coastguard Worker :authors: Gwyneth Chen 14*61c4878aSAndroid Build Coastguard Worker :facilitator: Wyatt Hepler 15*61c4878aSAndroid Build Coastguard Worker 16*61c4878aSAndroid Build Coastguard Worker------- 17*61c4878aSAndroid Build Coastguard WorkerSummary 18*61c4878aSAndroid Build Coastguard Worker------- 19*61c4878aSAndroid Build Coastguard WorkerThis SEED describes a number of extensions to the `pw_tokenizer <https://pigweed.dev/pw_tokenizer/>`_ 20*61c4878aSAndroid Build Coastguard Workerand `pw_log_tokenized <https://pigweed.dev/pw_log_tokenized>`_ modules to 21*61c4878aSAndroid Build Coastguard Workerimprove support for nesting tokens and add facilities for tokenizing arguments 22*61c4878aSAndroid Build Coastguard Workerto logs such as strings or and enums. This SEED primarily addresses C/C++ 23*61c4878aSAndroid Build Coastguard Workertokenization and Python/C++ detokenization. 24*61c4878aSAndroid Build Coastguard Worker 25*61c4878aSAndroid Build Coastguard Worker---------- 26*61c4878aSAndroid Build Coastguard WorkerMotivation 27*61c4878aSAndroid Build Coastguard Worker---------- 28*61c4878aSAndroid Build Coastguard WorkerCurrently, ``pw_tokenizer`` and ``pw_log_tokenized`` enable devices with limited 29*61c4878aSAndroid Build Coastguard Workermemory to store long log format strings as hashed 32-bit tokens. When logs are 30*61c4878aSAndroid Build Coastguard Workermoved off-device, host tooling can recover the full logs using token databases 31*61c4878aSAndroid Build Coastguard Workerthat were created when building the device image. However, logs may still have 32*61c4878aSAndroid Build Coastguard Workerruntime string arguments that are stored and transferred 1:1 without additional 33*61c4878aSAndroid Build Coastguard Workerencoding. This SEED aims to extend tokenization to these arguments to further 34*61c4878aSAndroid Build Coastguard Workerreduce the weight of logging for embedded applications. 35*61c4878aSAndroid Build Coastguard Worker 36*61c4878aSAndroid Build Coastguard WorkerThe proposed changes affect both the tokenization module itself and the logging 37*61c4878aSAndroid Build Coastguard Workerfacilities built on top of tokenization. 38*61c4878aSAndroid Build Coastguard Worker 39*61c4878aSAndroid Build Coastguard Worker-------- 40*61c4878aSAndroid Build Coastguard WorkerProposal 41*61c4878aSAndroid Build Coastguard Worker-------- 42*61c4878aSAndroid Build Coastguard WorkerLogging enums such as ``pw::Status`` is one common special case where 43*61c4878aSAndroid Build Coastguard Workertokenization is particularly appropriate: enum values are conceptually 44*61c4878aSAndroid Build Coastguard Workeralready tokens mapping to their names, assuming no duplicate values. Logging 45*61c4878aSAndroid Build Coastguard Workerenums frequently entails creating functions and string names that occupy space 46*61c4878aSAndroid Build Coastguard Workerexclusively for logging purposes, which this proposal seeks to mitigate. 47*61c4878aSAndroid Build Coastguard WorkerHere, ``pw::Status::NotFound()`` is presented as an illustrative example of 48*61c4878aSAndroid Build Coastguard Workerthe several transformations that strings undergo during tokenization and 49*61c4878aSAndroid Build Coastguard Workerdetokenization, further complicated in the proposed design by nested tokens. 50*61c4878aSAndroid Build Coastguard Worker 51*61c4878aSAndroid Build Coastguard Worker.. list-table:: Enum Tokenization/Detokenization Phases 52*61c4878aSAndroid Build Coastguard Worker :widths: 20 45 53*61c4878aSAndroid Build Coastguard Worker 54*61c4878aSAndroid Build Coastguard Worker * - (1) Source code 55*61c4878aSAndroid Build Coastguard Worker - ``PW_LOG("Status: " PW_LOG_ENUM_FMT(pw::Status), status.code())`` 56*61c4878aSAndroid Build Coastguard Worker * - (2) Token database entries (token, string, domain) 57*61c4878aSAndroid Build Coastguard Worker - | ``16170adf, "Status: ${pw::Status}#%08x", ""`` 58*61c4878aSAndroid Build Coastguard Worker | ``5 , "PW_STATUS_NOT_FOUND" , "pw::Status"`` 59*61c4878aSAndroid Build Coastguard Worker * - (3) Wire format 60*61c4878aSAndroid Build Coastguard Worker - ``df 0a 17 16 0a`` (5 bytes) 61*61c4878aSAndroid Build Coastguard Worker * - (4) Top-level detokenized and formatted 62*61c4878aSAndroid Build Coastguard Worker - ``"Status: ${pw::Status}#00000005"`` 63*61c4878aSAndroid Build Coastguard Worker * - (5) Fully detokenized 64*61c4878aSAndroid Build Coastguard Worker - ``"Status: PW_STATUS_NOT_FOUND"`` 65*61c4878aSAndroid Build Coastguard Worker 66*61c4878aSAndroid Build Coastguard WorkerCompared to log tokenization without nesting, string literals in token 67*61c4878aSAndroid Build Coastguard Workerdatabase entries may not be identical to what is typed in source code due 68*61c4878aSAndroid Build Coastguard Workerto the use of macros and preprocessor string concatenation. The 69*61c4878aSAndroid Build Coastguard Workerdetokenizer also takes an additional step to recursively detokenize any 70*61c4878aSAndroid Build Coastguard Workernested tokens. In exchange for this added complexity, nested enum tokenization 71*61c4878aSAndroid Build Coastguard Workerallows us to gain the readability of logging value names with zero additional 72*61c4878aSAndroid Build Coastguard Workerruntime space or performance cost compared to logging the integral values 73*61c4878aSAndroid Build Coastguard Workerdirectly with ``pw_log_tokenized``. 74*61c4878aSAndroid Build Coastguard Worker 75*61c4878aSAndroid Build Coastguard Worker.. note:: 76*61c4878aSAndroid Build Coastguard Worker Without nested enum token support, users can select either readability or 77*61c4878aSAndroid Build Coastguard Worker reduced binary and transmission size, but not easily both: 78*61c4878aSAndroid Build Coastguard Worker 79*61c4878aSAndroid Build Coastguard Worker .. list-table:: 80*61c4878aSAndroid Build Coastguard Worker :widths: 15 20 20 81*61c4878aSAndroid Build Coastguard Worker :header-rows: 1 82*61c4878aSAndroid Build Coastguard Worker 83*61c4878aSAndroid Build Coastguard Worker * - 84*61c4878aSAndroid Build Coastguard Worker - Raw integers 85*61c4878aSAndroid Build Coastguard Worker - String names 86*61c4878aSAndroid Build Coastguard Worker * - (1) Source code 87*61c4878aSAndroid Build Coastguard Worker - ``PW_LOG("Status: %x" , status.code())`` 88*61c4878aSAndroid Build Coastguard Worker - ``PW_LOG("Status: %s" , pw_StatusString(status))`` 89*61c4878aSAndroid Build Coastguard Worker * - (2) Token database entries (token, string, domain) 90*61c4878aSAndroid Build Coastguard Worker - ``03a83461, "Status: %x", ""`` 91*61c4878aSAndroid Build Coastguard Worker - ``069c3ef0, "Status: %s", ""`` 92*61c4878aSAndroid Build Coastguard Worker * - (3) Wire format 93*61c4878aSAndroid Build Coastguard Worker - ``61 34 a8 03 0a`` (5 bytes) 94*61c4878aSAndroid Build Coastguard Worker - ``f0 3e 9c 06 09 4e 4f 54 5f 46 4f 55 4e 44`` (14 bytes) 95*61c4878aSAndroid Build Coastguard Worker * - (4) Top-level detokenized and formatted 96*61c4878aSAndroid Build Coastguard Worker - ``"Status: 5"`` 97*61c4878aSAndroid Build Coastguard Worker - ``"Status: PW_STATUS_NOT_FOUND"`` 98*61c4878aSAndroid Build Coastguard Worker * - (5) Fully detokenized 99*61c4878aSAndroid Build Coastguard Worker - ``"Status: 5"`` 100*61c4878aSAndroid Build Coastguard Worker - ``"Status: PW_STATUS_NOT_FOUND"`` 101*61c4878aSAndroid Build Coastguard Worker 102*61c4878aSAndroid Build Coastguard WorkerTokenization (C/C++) 103*61c4878aSAndroid Build Coastguard Worker==================== 104*61c4878aSAndroid Build Coastguard WorkerThe ``pw_log_tokenized`` module exposes a set of macros for creating and 105*61c4878aSAndroid Build Coastguard Workerformatting nested tokens. Within format strings in the source code, tokens 106*61c4878aSAndroid Build Coastguard Workerare specified using function-like PRI-style macros. These can be used to 107*61c4878aSAndroid Build Coastguard Workerencode static information like the token domain or a numeric base encoding 108*61c4878aSAndroid Build Coastguard Workerand are macro-expanded to string literals that are concatenated with the 109*61c4878aSAndroid Build Coastguard Workerrest of the format string during preprocessing. Since ``pw_log`` generally 110*61c4878aSAndroid Build Coastguard Workeruses printf syntax, only bases 8, 10, and 16 are supported for integer token 111*61c4878aSAndroid Build Coastguard Workerarguments via ``%[odiuxX]``. 112*61c4878aSAndroid Build Coastguard Worker 113*61c4878aSAndroid Build Coastguard WorkerThe provided macros enforce the token specifier syntax and keep the argument 114*61c4878aSAndroid Build Coastguard Workertypes in sync when switching between other ``pw_log`` backends like 115*61c4878aSAndroid Build Coastguard Worker``pw_log_basic``. These macros for basic usage are as follows: 116*61c4878aSAndroid Build Coastguard Worker 117*61c4878aSAndroid Build Coastguard Worker* ``PW_LOG_TOKEN`` and ``PW_LOG_TOKEN_EXPR`` are used to tokenize string args. 118*61c4878aSAndroid Build Coastguard Worker* ``PW_LOG_TOKEN_FMT`` is used inside the format string to specify a token arg. 119*61c4878aSAndroid Build Coastguard Worker* ``PW_LOG_TOKEN_TYPE`` is used if the type of a tokenized arg needs to be 120*61c4878aSAndroid Build Coastguard Worker referenced, e.g. as a ``ToString`` function return type. 121*61c4878aSAndroid Build Coastguard Worker 122*61c4878aSAndroid Build Coastguard Worker.. code-block:: cpp 123*61c4878aSAndroid Build Coastguard Worker 124*61c4878aSAndroid Build Coastguard Worker #include "pw_log/log.h" 125*61c4878aSAndroid Build Coastguard Worker #include "pw_log/tokenized_args.h" 126*61c4878aSAndroid Build Coastguard Worker 127*61c4878aSAndroid Build Coastguard Worker // token with default options base-16 and empty domain 128*61c4878aSAndroid Build Coastguard Worker // token database literal: "The sun will come out $#%08x!" 129*61c4878aSAndroid Build Coastguard Worker PW_LOG("The sun will come out " PW_LOG_TOKEN_FMT() "!", PW_LOG_TOKEN_EXPR("tomorrow")) 130*61c4878aSAndroid Build Coastguard Worker // after detokenization: "The sun will come out tomorrow!" 131*61c4878aSAndroid Build Coastguard Worker 132*61c4878aSAndroid Build Coastguard WorkerAdditional macros are also provided specifically for enum handling. The 133*61c4878aSAndroid Build Coastguard Worker``TOKENIZE_ENUM`` macro creates ELF token database entries for each enum 134*61c4878aSAndroid Build Coastguard Workervalue with the specified token domain to prevent token collision between 135*61c4878aSAndroid Build Coastguard Workermultiple tokenized enums. This macro is kept separate from the enum 136*61c4878aSAndroid Build Coastguard Workerdefinition to allow things like tokenizing a preexisting enum defined in an 137*61c4878aSAndroid Build Coastguard Workerexternal dependency. 138*61c4878aSAndroid Build Coastguard Worker 139*61c4878aSAndroid Build Coastguard Worker.. code-block:: cpp 140*61c4878aSAndroid Build Coastguard Worker 141*61c4878aSAndroid Build Coastguard Worker // enums 142*61c4878aSAndroid Build Coastguard Worker namespace foo { 143*61c4878aSAndroid Build Coastguard Worker 144*61c4878aSAndroid Build Coastguard Worker enum class Color { kRed, kGreen, kBlue }; 145*61c4878aSAndroid Build Coastguard Worker 146*61c4878aSAndroid Build Coastguard Worker // syntax TBD 147*61c4878aSAndroid Build Coastguard Worker TOKENIZE_ENUM( 148*61c4878aSAndroid Build Coastguard Worker foo::Color, 149*61c4878aSAndroid Build Coastguard Worker kRed, 150*61c4878aSAndroid Build Coastguard Worker kGreen, 151*61c4878aSAndroid Build Coastguard Worker kBlue 152*61c4878aSAndroid Build Coastguard Worker ) 153*61c4878aSAndroid Build Coastguard Worker 154*61c4878aSAndroid Build Coastguard Worker } // namespace foo 155*61c4878aSAndroid Build Coastguard Worker 156*61c4878aSAndroid Build Coastguard Worker void LogColor(foo::Color color) { 157*61c4878aSAndroid Build Coastguard Worker // token database literal: 158*61c4878aSAndroid Build Coastguard Worker // "Color: [${foo::Color}10#%010d]" 159*61c4878aSAndroid Build Coastguard Worker PW_LOG("Color: [" PW_LOG_ENUM_FMT(foo::Color, 10) "]", color) 160*61c4878aSAndroid Build Coastguard Worker // after detokenization: 161*61c4878aSAndroid Build Coastguard Worker // e.g. "Color: kRed" 162*61c4878aSAndroid Build Coastguard Worker } 163*61c4878aSAndroid Build Coastguard Worker 164*61c4878aSAndroid Build Coastguard Worker.. admonition:: Nested Base64 tokens 165*61c4878aSAndroid Build Coastguard Worker 166*61c4878aSAndroid Build Coastguard Worker ``PW_LOG_TOKEN_FMT`` can accept 64 as the base encoding for an argument, in 167*61c4878aSAndroid Build Coastguard Worker which case the argument should be a pre-encoded Base64 string argument 168*61c4878aSAndroid Build Coastguard Worker (e.g. ``QAzF39==``). However, this should be avoided when possible to 169*61c4878aSAndroid Build Coastguard Worker maximize space savings. Fully-formatted Base64 including the token prefix 170*61c4878aSAndroid Build Coastguard Worker may also be logged with ``%s`` as before. 171*61c4878aSAndroid Build Coastguard Worker 172*61c4878aSAndroid Build Coastguard WorkerDetokenization (Python) 173*61c4878aSAndroid Build Coastguard Worker======================= 174*61c4878aSAndroid Build Coastguard Worker``Detokenizer.detokenize`` in Python (``Detokenizer::Detokenize`` in C++) 175*61c4878aSAndroid Build Coastguard Workerwill automatically recursively detokenize tokens of all known formats rather 176*61c4878aSAndroid Build Coastguard Workerthan requiring a separate call to ``detokenize_base64`` or similar. 177*61c4878aSAndroid Build Coastguard Worker 178*61c4878aSAndroid Build Coastguard WorkerTo support detokenizing domain-specific tokens, token databases support multiple 179*61c4878aSAndroid Build Coastguard Workerdomains, and ``database.py create`` will build a database with tokens from all 180*61c4878aSAndroid Build Coastguard Workerdomains by default. Specifying a domain during database creation will cause 181*61c4878aSAndroid Build Coastguard Workerthat domain to be treated as the default. 182*61c4878aSAndroid Build Coastguard Worker 183*61c4878aSAndroid Build Coastguard WorkerWhen detokenization fails, tokens appear as-is in logs. If the detokenizer has 184*61c4878aSAndroid Build Coastguard Workerthe ``show_errors`` option set to ``True``, error messages may be printed 185*61c4878aSAndroid Build Coastguard Workerinline following the raw token. 186*61c4878aSAndroid Build Coastguard Worker 187*61c4878aSAndroid Build Coastguard WorkerTokens 188*61c4878aSAndroid Build Coastguard Worker====== 189*61c4878aSAndroid Build Coastguard WorkerMany details described here are provided via the ``PW_LOG_TOKEN_FMT`` macro, so 190*61c4878aSAndroid Build Coastguard Workerusers should typically not be manually formatting tokens. However, if 191*61c4878aSAndroid Build Coastguard Workerdetokenization fails for any reason, tokens will appear with the following 192*61c4878aSAndroid Build Coastguard Workerformat in the final logs and should be easily recognizable. 193*61c4878aSAndroid Build Coastguard Worker 194*61c4878aSAndroid Build Coastguard WorkerNested tokens have the following structure in partially detokenized logs 195*61c4878aSAndroid Build Coastguard Worker(transformation stage 4): 196*61c4878aSAndroid Build Coastguard Worker 197*61c4878aSAndroid Build Coastguard Worker.. code-block:: 198*61c4878aSAndroid Build Coastguard Worker 199*61c4878aSAndroid Build Coastguard Worker $[{DOMAIN}][BASE#]TOKEN 200*61c4878aSAndroid Build Coastguard Worker 201*61c4878aSAndroid Build Coastguard WorkerThe ``$`` is a common prefix required for all nested tokens. It is possible to 202*61c4878aSAndroid Build Coastguard Workerconfigure a different common prefix if necessary, but using the default ``$`` 203*61c4878aSAndroid Build Coastguard Workercharacter is strongly recommended. 204*61c4878aSAndroid Build Coastguard Worker 205*61c4878aSAndroid Build Coastguard Worker.. list-table:: Options 206*61c4878aSAndroid Build Coastguard Worker :widths: 10 30 207*61c4878aSAndroid Build Coastguard Worker 208*61c4878aSAndroid Build Coastguard Worker * - ``{DOMAIN}`` 209*61c4878aSAndroid Build Coastguard Worker - Specifies the token domain. If this option is omitted, the default 210*61c4878aSAndroid Build Coastguard Worker (empty) domain is assumed. 211*61c4878aSAndroid Build Coastguard Worker * - ``BASE#`` 212*61c4878aSAndroid Build Coastguard Worker - Defines the numeric base encoding of the token. Accepted values are 8, 213*61c4878aSAndroid Build Coastguard Worker 10, 16, and 64. If the hash symbol ``#`` is used without specifying a 214*61c4878aSAndroid Build Coastguard Worker number, the base is assumed to be 16. If the base option is omitted 215*61c4878aSAndroid Build Coastguard Worker entirely, the base defaults to 64 for backward compatibility. All 216*61c4878aSAndroid Build Coastguard Worker encodings except Base64 are not case sensitive. 217*61c4878aSAndroid Build Coastguard Worker 218*61c4878aSAndroid Build Coastguard Worker This option may be expanded to support other bases in the future. 219*61c4878aSAndroid Build Coastguard Worker * - ``TOKEN`` (required) 220*61c4878aSAndroid Build Coastguard Worker - The numeric representation of the token in the given base encoding. All 221*61c4878aSAndroid Build Coastguard Worker encodings except Base64 are left-padded with zeroes to the maximum width 222*61c4878aSAndroid Build Coastguard Worker of a 32-bit integer in the given base. Base64 data may additionally encode 223*61c4878aSAndroid Build Coastguard Worker string arguments for the detokenized token, and therefore does not have a 224*61c4878aSAndroid Build Coastguard Worker maximum width. This is automatically handled by ``PW_LOG_TOKEN_FMT`` for 225*61c4878aSAndroid Build Coastguard Worker supported bases. 226*61c4878aSAndroid Build Coastguard Worker 227*61c4878aSAndroid Build Coastguard WorkerWhen used in conjunction with ``pw_log_tokenized``, the token prefix (including 228*61c4878aSAndroid Build Coastguard Workerany domain and base specifications) is tokenized as part of the log format 229*61c4878aSAndroid Build Coastguard Workerstring and therefore incurs zero additional memory or transmission cost over 230*61c4878aSAndroid Build Coastguard Workerthat of the original format string. Over the wire, tokens in bases 8, 10, and 231*61c4878aSAndroid Build Coastguard Worker16 are transmitted as varint-encoded integers up to 5 bytes in size. Base64 232*61c4878aSAndroid Build Coastguard Workertokens continue to be encoded as strings. 233*61c4878aSAndroid Build Coastguard Worker 234*61c4878aSAndroid Build Coastguard Worker.. warning:: 235*61c4878aSAndroid Build Coastguard Worker Tokens do not have a terminating character in general, which is why we 236*61c4878aSAndroid Build Coastguard Worker require them to be formatted with fixed width. Otherwise, following them 237*61c4878aSAndroid Build Coastguard Worker immediately with alphanumeric characters valid in their base encoding 238*61c4878aSAndroid Build Coastguard Worker will cause detokenization errors. 239*61c4878aSAndroid Build Coastguard Worker 240*61c4878aSAndroid Build Coastguard Worker.. admonition:: Recognizing raw nested tokens in strings 241*61c4878aSAndroid Build Coastguard Worker 242*61c4878aSAndroid Build Coastguard Worker When a string is fully detokenized, there should no longer be any indication 243*61c4878aSAndroid Build Coastguard Worker of tokenization in the final result, e.g. detokenized logs should read the 244*61c4878aSAndroid Build Coastguard Worker same as plain string logs. However, if nested tokens cannot be detokenized for 245*61c4878aSAndroid Build Coastguard Worker any reason, they will appear in their raw form as below: 246*61c4878aSAndroid Build Coastguard Worker 247*61c4878aSAndroid Build Coastguard Worker .. code-block:: 248*61c4878aSAndroid Build Coastguard Worker 249*61c4878aSAndroid Build Coastguard Worker // Base64 token with no arguments and empty domain 250*61c4878aSAndroid Build Coastguard Worker $QA19pfEQ 251*61c4878aSAndroid Build Coastguard Worker 252*61c4878aSAndroid Build Coastguard Worker // Base-10 token 253*61c4878aSAndroid Build Coastguard Worker $10#0086025943 254*61c4878aSAndroid Build Coastguard Worker 255*61c4878aSAndroid Build Coastguard Worker // Base-16 token with specified domain 256*61c4878aSAndroid Build Coastguard Worker ${foo_namespace::MyEnum}#0000001A 257*61c4878aSAndroid Build Coastguard Worker 258*61c4878aSAndroid Build Coastguard Worker // Base64 token with specified domain 259*61c4878aSAndroid Build Coastguard Worker ${bar_namespace::MyEnum}QAQQQQ== 260*61c4878aSAndroid Build Coastguard Worker 261*61c4878aSAndroid Build Coastguard Worker 262*61c4878aSAndroid Build Coastguard Worker--------------------- 263*61c4878aSAndroid Build Coastguard WorkerProblem investigation 264*61c4878aSAndroid Build Coastguard Worker--------------------- 265*61c4878aSAndroid Build Coastguard WorkerComplex embedded device projects are perpetually seeking more RAM. For longer 266*61c4878aSAndroid Build Coastguard Workerdescriptive string arguments, even just a handful can take up hundreds of bytes 267*61c4878aSAndroid Build Coastguard Workerthat are frequently exclusively for logging purposes, without any impact on 268*61c4878aSAndroid Build Coastguard Workerfunction. 269*61c4878aSAndroid Build Coastguard Worker 270*61c4878aSAndroid Build Coastguard WorkerOne of the most common potential use cases is for logging enum values. 271*61c4878aSAndroid Build Coastguard WorkerInspection of one project revealed that enums accounted for some 90% of the 272*61c4878aSAndroid Build Coastguard Workerstring log arguments. We have encountered instances where, to save space, 273*61c4878aSAndroid Build Coastguard Workerdevelopers have avoided logging descriptive names in favor of raw enum values, 274*61c4878aSAndroid Build Coastguard Workerforcing readers of logs look up or memorize the meanings of each number. Like 275*61c4878aSAndroid Build Coastguard Workerwith log format strings, we do know the set of possible string values that 276*61c4878aSAndroid Build Coastguard Workermight be emitted in the final logs, so they should be able to be extracted 277*61c4878aSAndroid Build Coastguard Workerinto a token database at compile time. 278*61c4878aSAndroid Build Coastguard Worker 279*61c4878aSAndroid Build Coastguard WorkerAnother major challenge overall is maintaining a user interface 280*61c4878aSAndroid Build Coastguard Workerthat is easy to understand and use. The current primary interface through 281*61c4878aSAndroid Build Coastguard Worker``pw_log`` provides printf-style formatting, which is familiar and succinct 282*61c4878aSAndroid Build Coastguard Workerfor basic applications. 283*61c4878aSAndroid Build Coastguard Worker 284*61c4878aSAndroid Build Coastguard WorkerWe also have to contend with the interchangeable backends of ``pw_log``. The 285*61c4878aSAndroid Build Coastguard Worker``pw_log`` facade is intended as an opaque interface layer; adding syntax 286*61c4878aSAndroid Build Coastguard Workerspecifically for tokenized logging will break this abstraction barrier. Either 287*61c4878aSAndroid Build Coastguard Workerthis additional syntax would be ignored by other backends, or it might simply 288*61c4878aSAndroid Build Coastguard Workerbe incompatible (e.g. logging raw integer tokens instead of strings). 289*61c4878aSAndroid Build Coastguard Worker 290*61c4878aSAndroid Build Coastguard WorkerPigweed already supports one form of nested tokens via Base64 encoding. Base64 291*61c4878aSAndroid Build Coastguard Workertokens begin with ``'$'``, followed by Base64-encoded data, and may be padded 292*61c4878aSAndroid Build Coastguard Workerwith one or two trailing ``'='`` symbols. The Python 293*61c4878aSAndroid Build Coastguard Worker``Detokenizer.detokenize_base64`` method recursively detokenizes Base64 by 294*61c4878aSAndroid Build Coastguard Workerrunning a regex replacement on the formatted results of each iteration. Base64 295*61c4878aSAndroid Build Coastguard Workeris not merely a token format, however; it can encode any binary data in a text 296*61c4878aSAndroid Build Coastguard Workerformat at the cost of reduced efficiency. Therefore, Base64 tokens may include 297*61c4878aSAndroid Build Coastguard Workernot only a database token that may detokenize to a format string but also 298*61c4878aSAndroid Build Coastguard Workerbinary-encoded arguments. Other token types are not expected to include this 299*61c4878aSAndroid Build Coastguard Workeradditional argument data. 300*61c4878aSAndroid Build Coastguard Worker 301*61c4878aSAndroid Build Coastguard Worker--------------- 302*61c4878aSAndroid Build Coastguard WorkerDetailed design 303*61c4878aSAndroid Build Coastguard Worker--------------- 304*61c4878aSAndroid Build Coastguard Worker 305*61c4878aSAndroid Build Coastguard WorkerTokenization 306*61c4878aSAndroid Build Coastguard Worker============ 307*61c4878aSAndroid Build Coastguard Worker``pw_tokenizer`` and ``pw_log_tokenized`` already provide much of the necessary 308*61c4878aSAndroid Build Coastguard Workerfunctionality to support tokenized arguments. The proposed API is fully 309*61c4878aSAndroid Build Coastguard Workerbackward-compatible with non-nested tokenized logging. 310*61c4878aSAndroid Build Coastguard Worker 311*61c4878aSAndroid Build Coastguard WorkerToken arguments are indicated in log format strings via PRI-style macros that 312*61c4878aSAndroid Build Coastguard Workerare exposed by a new ``pw_log/tokenized_args.h`` header. ``PW_LOG_TOKEN_FMT`` 313*61c4878aSAndroid Build Coastguard Workersupplies the ``$`` token prefix, brackets around the domain, the base specifier, 314*61c4878aSAndroid Build Coastguard Workerand the printf-style specifier including padding and width, i.e. ``%011o`` for 315*61c4878aSAndroid Build Coastguard Workerbase-8, ``%010u`` for base-10, and ``%08X`` for base-16. 316*61c4878aSAndroid Build Coastguard Worker 317*61c4878aSAndroid Build Coastguard WorkerFor free-standing string arguments such as those where the literals are defined 318*61c4878aSAndroid Build Coastguard Workerin the log statements themselves, tokenization is performed with macros from 319*61c4878aSAndroid Build Coastguard Worker``pw_log/tokenized_args.h``. With the tokenized logging backend, these macros 320*61c4878aSAndroid Build Coastguard Workersimply alias the corresponding ``PW_TOKENIZE`` macros, but they also revert to 321*61c4878aSAndroid Build Coastguard Workerbasic string formatting for other backends. This is achieved by placing an 322*61c4878aSAndroid Build Coastguard Workerempty header file in the local ``public_overrides`` directory of 323*61c4878aSAndroid Build Coastguard Worker``pw_log_tokenized`` and checking for it in ``pw_log/tokenized_args.h`` using 324*61c4878aSAndroid Build Coastguard Workerthe ``__has_include`` directive. 325*61c4878aSAndroid Build Coastguard Worker 326*61c4878aSAndroid Build Coastguard WorkerFor variable string arguments, the API is split across locations. The string 327*61c4878aSAndroid Build Coastguard Workerliterals are tokenized wherever they are defined, and the string format macros 328*61c4878aSAndroid Build Coastguard Workerappear in the log format strings corresponding to those string arguments. 329*61c4878aSAndroid Build Coastguard Worker 330*61c4878aSAndroid Build Coastguard WorkerWhen tokens use non-default domains, additional work may be required to create 331*61c4878aSAndroid Build Coastguard Workerthe domain name and store associated tokens in the ELF. 332*61c4878aSAndroid Build Coastguard Worker 333*61c4878aSAndroid Build Coastguard WorkerEnum Tokenization 334*61c4878aSAndroid Build Coastguard Worker----------------- 335*61c4878aSAndroid Build Coastguard WorkerWe use existing ``pw_tokenizer`` utilities to record the raw enum values as 336*61c4878aSAndroid Build Coastguard Workertokens corresponding to their string names in the ELF. There is no change 337*61c4878aSAndroid Build Coastguard Workerrequired for the backend implementation; we simply skip the token calculation 338*61c4878aSAndroid Build Coastguard Workerstep, since we already have a value to use, and specifying a token domain is 339*61c4878aSAndroid Build Coastguard Workergenerally required to isolate multiple enums from token collision. 340*61c4878aSAndroid Build Coastguard Worker 341*61c4878aSAndroid Build Coastguard WorkerFor ease of use, we can also provide a macro that wraps the enum value list 342*61c4878aSAndroid Build Coastguard Workerand encapsulates the recording of each token value-string pair in the ELF. 343*61c4878aSAndroid Build Coastguard Worker 344*61c4878aSAndroid Build Coastguard WorkerWhen actually logging the values, users pass the enum type name as the domain 345*61c4878aSAndroid Build Coastguard Workerto format specifier macro ``PW_LOG_TOKEN()``, and the enum values can be 346*61c4878aSAndroid Build Coastguard Workerpassed as-is to ``PW_LOG`` (casting to integers as necessary for scoped enums). 347*61c4878aSAndroid Build Coastguard WorkerSince integers are varint-encoded over the wire, this will only require a 348*61c4878aSAndroid Build Coastguard Workersingle byte for most enums. 349*61c4878aSAndroid Build Coastguard Worker 350*61c4878aSAndroid Build Coastguard Worker.. admonition:: Logging pw::status 351*61c4878aSAndroid Build Coastguard Worker 352*61c4878aSAndroid Build Coastguard Worker Note that while this immediately reduces transmission size, the code 353*61c4878aSAndroid Build Coastguard Worker space occupied by the string names in ``pw::Status::str()`` cannot be 354*61c4878aSAndroid Build Coastguard Worker recovered unless an entire project is converted to log ``pw::Status`` 355*61c4878aSAndroid Build Coastguard Worker as tokens. 356*61c4878aSAndroid Build Coastguard Worker 357*61c4878aSAndroid Build Coastguard Worker .. code-block:: cpp 358*61c4878aSAndroid Build Coastguard Worker 359*61c4878aSAndroid Build Coastguard Worker #include "pw_log/log.h" 360*61c4878aSAndroid Build Coastguard Worker #include "pw_log/tokenized_args.h" 361*61c4878aSAndroid Build Coastguard Worker #include "pw_status/status.h" 362*61c4878aSAndroid Build Coastguard Worker 363*61c4878aSAndroid Build Coastguard Worker pw::Status status = pw::Status::NotFound(); 364*61c4878aSAndroid Build Coastguard Worker 365*61c4878aSAndroid Build Coastguard Worker // "pw::Status: ${pw::Status}#%08d" 366*61c4878aSAndroid Build Coastguard Worker PW_LOG("pw::Status: " PW_LOG_TOKEN(pw::Status), status.code) 367*61c4878aSAndroid Build Coastguard Worker // "pw::Status: NOT_FOUND" 368*61c4878aSAndroid Build Coastguard Worker 369*61c4878aSAndroid Build Coastguard WorkerSince the token mapping entries in the ELF are optimized out of the final 370*61c4878aSAndroid Build Coastguard Workerbinary, the enum domains are tokenized away as part of the log format strings, 371*61c4878aSAndroid Build Coastguard Workerand we don't need to store separate tokens for each enum value, this addition 372*61c4878aSAndroid Build Coastguard Workerto the API would would provide enum value names in logs with zero additional 373*61c4878aSAndroid Build Coastguard WorkerRAM cost. Compared to logging strings with ``ToString``-style functions, we 374*61c4878aSAndroid Build Coastguard Workersave space on the string names as well as the functions themselves. 375*61c4878aSAndroid Build Coastguard Worker 376*61c4878aSAndroid Build Coastguard WorkerToken Database 377*61c4878aSAndroid Build Coastguard Worker============== 378*61c4878aSAndroid Build Coastguard WorkerToken databases will be expanded to include a column for domains, so that 379*61c4878aSAndroid Build Coastguard Workermultiple domains can be encompassed in a single database rather than requiring 380*61c4878aSAndroid Build Coastguard Workerseparate databases for each domain. This is important because domains are being 381*61c4878aSAndroid Build Coastguard Workerused to categorize tokens within a single project, rather than merely keeping 382*61c4878aSAndroid Build Coastguard Workerseparate projects distinct from each other. When creating a database 383*61c4878aSAndroid Build Coastguard Workerfrom an ELF, a domain may be specified as the default domain instead of the 384*61c4878aSAndroid Build Coastguard Workerempty domain. A list of domains or path to a file with a list of domains may 385*61c4878aSAndroid Build Coastguard Workeralso separately be specified to define which domains are to be included in 386*61c4878aSAndroid Build Coastguard Workerthe database; all domains are now included by default. 387*61c4878aSAndroid Build Coastguard Worker 388*61c4878aSAndroid Build Coastguard WorkerWhen accessing a token database, both a domain and token value may be specified 389*61c4878aSAndroid Build Coastguard Workerto access specific values. If a domain is not specified, the default domain 390*61c4878aSAndroid Build Coastguard Workerwill be assumed, retaining the same behavior as before. 391*61c4878aSAndroid Build Coastguard Worker 392*61c4878aSAndroid Build Coastguard WorkerDetokenization 393*61c4878aSAndroid Build Coastguard Worker============== 394*61c4878aSAndroid Build Coastguard WorkerDetokenization is relatively straightforward. When the detokenizer is called, 395*61c4878aSAndroid Build Coastguard Workerit will first detokenize and format the top-level token and binary argument 396*61c4878aSAndroid Build Coastguard Workerdata. The detokenizer will then find and replace nested tokens in the resulting 397*61c4878aSAndroid Build Coastguard Workerformatted string, then rescan the result for more nested tokens up to a fixed 398*61c4878aSAndroid Build Coastguard Workernumber of rescans. 399*61c4878aSAndroid Build Coastguard Worker 400*61c4878aSAndroid Build Coastguard WorkerFor each token type or format, ``pw_tokenizer`` defines a regular expression to 401*61c4878aSAndroid Build Coastguard Workermatch the expected formatted output token and a helper function to convert a 402*61c4878aSAndroid Build Coastguard Workertoken from a particular format to its mapped value. The regular expressions for 403*61c4878aSAndroid Build Coastguard Workereach token type are combined into a single regex that matches any one of the 404*61c4878aSAndroid Build Coastguard Workerformats. At each recursive step for every match, each detokenization format 405*61c4878aSAndroid Build Coastguard Workerwill be attempted, stopping at the first successful token type and then 406*61c4878aSAndroid Build Coastguard Workerrecursively replacing all nested tokens in the result. Only full data encoding- 407*61c4878aSAndroid Build Coastguard Workertype tokens like Base64 will also require string/argument formatting as part of 408*61c4878aSAndroid Build Coastguard Workerthe recursive step. 409*61c4878aSAndroid Build Coastguard Worker 410*61c4878aSAndroid Build Coastguard WorkerFor non-Base64 tokens, a token's base encoding as specified by ``BASE#`` 411*61c4878aSAndroid Build Coastguard Workerdetermines its set of permissible alphanumeric characters and the 412*61c4878aSAndroid Build Coastguard Workermaximum token width for regex matching. 413*61c4878aSAndroid Build Coastguard Worker 414*61c4878aSAndroid Build Coastguard WorkerIf nested detokenization fails for any reason, the formatted token will be 415*61c4878aSAndroid Build Coastguard Workerprinted as-is in the output logs. If ``show_errors`` is true for the 416*61c4878aSAndroid Build Coastguard Workerdetokenizer, errors will appear in parentheses immediately following the 417*61c4878aSAndroid Build Coastguard Workertoken. Supported errors include: 418*61c4878aSAndroid Build Coastguard Worker 419*61c4878aSAndroid Build Coastguard Worker* ``(token collision)`` 420*61c4878aSAndroid Build Coastguard Worker* ``(missing database)`` 421*61c4878aSAndroid Build Coastguard Worker* ``(token not found)`` 422*61c4878aSAndroid Build Coastguard Worker 423*61c4878aSAndroid Build Coastguard Worker------------ 424*61c4878aSAndroid Build Coastguard WorkerAlternatives 425*61c4878aSAndroid Build Coastguard Worker------------ 426*61c4878aSAndroid Build Coastguard Worker 427*61c4878aSAndroid Build Coastguard WorkerProtobuf-based Tokenization 428*61c4878aSAndroid Build Coastguard Worker=========================== 429*61c4878aSAndroid Build Coastguard WorkerTokenization may be expanded to function on structured data via protobufs. 430*61c4878aSAndroid Build Coastguard WorkerThis can be used to make logging more flexible, as all manner of compile-time 431*61c4878aSAndroid Build Coastguard Workermetadata can be freely attached to log arguments at effectively no cost. 432*61c4878aSAndroid Build Coastguard WorkerThis will most likely involve a separate build process to generate and tokenize 433*61c4878aSAndroid Build Coastguard Workerpartially-populated protos and will significantly change the user API. It 434*61c4878aSAndroid Build Coastguard Workerwill also be a large break from the existing process in implementation, as 435*61c4878aSAndroid Build Coastguard Workerthe current system relies only on existing C preprocessor and C++ constexpr 436*61c4878aSAndroid Build Coastguard Workertricks to function. 437*61c4878aSAndroid Build Coastguard Worker 438*61c4878aSAndroid Build Coastguard WorkerIn this model, the token domain would likely be a fully-qualified 439*61c4878aSAndroid Build Coastguard Workernamespace for or path to the proto definition. 440*61c4878aSAndroid Build Coastguard Worker 441*61c4878aSAndroid Build Coastguard WorkerImplementing this approach also requires a method of passing ordered arguments 442*61c4878aSAndroid Build Coastguard Workerto a partially-filled detokenized protobuf in a manner similar to printf-style 443*61c4878aSAndroid Build Coastguard Workerstring formatting, so that argument data can be efficiently encoded and 444*61c4878aSAndroid Build Coastguard Workertransmitted alongside the protobuf's token, and the arguments to a particular 445*61c4878aSAndroid Build Coastguard Workerproto can be disambiguated from arguments to the rest of a log statement. 446*61c4878aSAndroid Build Coastguard Worker 447*61c4878aSAndroid Build Coastguard WorkerThis approach will also most likely preclude plain string logging as is 448*61c4878aSAndroid Build Coastguard Workercurrently supported by ``pw_log``, as the implementations diverge dramatically. 449*61c4878aSAndroid Build Coastguard WorkerHowever, if pursued, this would likely be made the default logging schema 450*61c4878aSAndroid Build Coastguard Workeracross all platforms, including host devices. 451*61c4878aSAndroid Build Coastguard Worker 452*61c4878aSAndroid Build Coastguard WorkerCustom Detokenization 453*61c4878aSAndroid Build Coastguard Worker===================== 454*61c4878aSAndroid Build Coastguard WorkerTheoretically, individual projects could implement their own regex replacement 455*61c4878aSAndroid Build Coastguard Workerschemes on top of Pigweed's detokenizer, allowing them to more flexibly define 456*61c4878aSAndroid Build Coastguard Workercomplex relationships between logged tokens via custom log format string 457*61c4878aSAndroid Build Coastguard Workersyntax. However, Pigweed should provide utilities for nested tokenization in 458*61c4878aSAndroid Build Coastguard Workercommon cases such as logging enums. 459*61c4878aSAndroid Build Coastguard Worker 460*61c4878aSAndroid Build Coastguard WorkerThe changes proposed do not preclude additional custom detokenization schemas 461*61c4878aSAndroid Build Coastguard Workerif absolutely necessary, and such practices do not appear to have been popular 462*61c4878aSAndroid Build Coastguard Workerthus far in any case. 463*61c4878aSAndroid Build Coastguard Worker 464*61c4878aSAndroid Build Coastguard Worker-------------- 465*61c4878aSAndroid Build Coastguard WorkerOpen questions 466*61c4878aSAndroid Build Coastguard Worker-------------- 467*61c4878aSAndroid Build Coastguard WorkerMissing API definitions: 468*61c4878aSAndroid Build Coastguard Worker 469*61c4878aSAndroid Build Coastguard Worker* Updated APIs for creating and accessing token databases with multiple domains 470*61c4878aSAndroid Build Coastguard Worker* Python nested tokenization 471*61c4878aSAndroid Build Coastguard Worker* C++ nested detokenization 472