xref: /aosp_15_r20/external/pigweed/seed/0105.rst (revision 61c4878ac05f98d0ceed94b57d316916de578985)
1*61c4878aSAndroid Build Coastguard Worker.. _seed-0105:
2*61c4878aSAndroid Build Coastguard Worker
3*61c4878aSAndroid Build Coastguard Worker===============================================
4*61c4878aSAndroid Build Coastguard Worker0105: Nested Tokens and Tokenized Log Arguments
5*61c4878aSAndroid Build Coastguard Worker===============================================
6*61c4878aSAndroid Build Coastguard Worker
7*61c4878aSAndroid Build Coastguard Worker.. seed::
8*61c4878aSAndroid Build Coastguard Worker   :number: 105
9*61c4878aSAndroid Build Coastguard Worker   :name: Nested Tokens and Tokenized Log Arguments
10*61c4878aSAndroid Build Coastguard Worker   :status: Accepted
11*61c4878aSAndroid Build Coastguard Worker   :proposal_date: 2023-07-10
12*61c4878aSAndroid Build Coastguard Worker   :cl: 154190
13*61c4878aSAndroid Build Coastguard Worker   :authors: Gwyneth Chen
14*61c4878aSAndroid Build Coastguard Worker   :facilitator: Wyatt Hepler
15*61c4878aSAndroid Build Coastguard Worker
16*61c4878aSAndroid Build Coastguard Worker-------
17*61c4878aSAndroid Build Coastguard WorkerSummary
18*61c4878aSAndroid Build Coastguard Worker-------
19*61c4878aSAndroid Build Coastguard WorkerThis SEED describes a number of extensions to the `pw_tokenizer <https://pigweed.dev/pw_tokenizer/>`_
20*61c4878aSAndroid Build Coastguard Workerand `pw_log_tokenized <https://pigweed.dev/pw_log_tokenized>`_ modules to
21*61c4878aSAndroid Build Coastguard Workerimprove support for nesting tokens and add facilities for tokenizing arguments
22*61c4878aSAndroid Build Coastguard Workerto logs such as strings or and enums. This SEED primarily addresses C/C++
23*61c4878aSAndroid Build Coastguard Workertokenization and Python/C++ detokenization.
24*61c4878aSAndroid Build Coastguard Worker
25*61c4878aSAndroid Build Coastguard Worker----------
26*61c4878aSAndroid Build Coastguard WorkerMotivation
27*61c4878aSAndroid Build Coastguard Worker----------
28*61c4878aSAndroid Build Coastguard WorkerCurrently, ``pw_tokenizer`` and ``pw_log_tokenized`` enable devices with limited
29*61c4878aSAndroid Build Coastguard Workermemory to store long log format strings as hashed 32-bit tokens. When logs are
30*61c4878aSAndroid Build Coastguard Workermoved off-device, host tooling can recover the full logs using token databases
31*61c4878aSAndroid Build Coastguard Workerthat were created when building the device image. However, logs may still have
32*61c4878aSAndroid Build Coastguard Workerruntime string arguments that are stored and transferred 1:1 without additional
33*61c4878aSAndroid Build Coastguard Workerencoding. This SEED aims to extend tokenization to these arguments to further
34*61c4878aSAndroid Build Coastguard Workerreduce the weight of logging for embedded applications.
35*61c4878aSAndroid Build Coastguard Worker
36*61c4878aSAndroid Build Coastguard WorkerThe proposed changes affect both the tokenization module itself and the logging
37*61c4878aSAndroid Build Coastguard Workerfacilities built on top of tokenization.
38*61c4878aSAndroid Build Coastguard Worker
39*61c4878aSAndroid Build Coastguard Worker--------
40*61c4878aSAndroid Build Coastguard WorkerProposal
41*61c4878aSAndroid Build Coastguard Worker--------
42*61c4878aSAndroid Build Coastguard WorkerLogging enums such as ``pw::Status`` is one common special case where
43*61c4878aSAndroid Build Coastguard Workertokenization is particularly appropriate: enum values are conceptually
44*61c4878aSAndroid Build Coastguard Workeralready tokens mapping to their names, assuming no duplicate values. Logging
45*61c4878aSAndroid Build Coastguard Workerenums frequently entails creating functions and string names that occupy space
46*61c4878aSAndroid Build Coastguard Workerexclusively for logging purposes, which this proposal seeks to mitigate.
47*61c4878aSAndroid Build Coastguard WorkerHere, ``pw::Status::NotFound()`` is presented as an illustrative example of
48*61c4878aSAndroid Build Coastguard Workerthe several transformations that strings undergo during tokenization and
49*61c4878aSAndroid Build Coastguard Workerdetokenization, further complicated in the proposed design by nested tokens.
50*61c4878aSAndroid Build Coastguard Worker
51*61c4878aSAndroid Build Coastguard Worker.. list-table:: Enum Tokenization/Detokenization Phases
52*61c4878aSAndroid Build Coastguard Worker   :widths: 20 45
53*61c4878aSAndroid Build Coastguard Worker
54*61c4878aSAndroid Build Coastguard Worker   * - (1) Source code
55*61c4878aSAndroid Build Coastguard Worker     - ``PW_LOG("Status: " PW_LOG_ENUM_FMT(pw::Status), status.code())``
56*61c4878aSAndroid Build Coastguard Worker   * - (2) Token database entries (token, string, domain)
57*61c4878aSAndroid Build Coastguard Worker     - | ``16170adf, "Status: ${pw::Status}#%08x", ""``
58*61c4878aSAndroid Build Coastguard Worker       | ``5       , "PW_STATUS_NOT_FOUND"       , "pw::Status"``
59*61c4878aSAndroid Build Coastguard Worker   * - (3) Wire format
60*61c4878aSAndroid Build Coastguard Worker     - ``df 0a 17 16 0a`` (5 bytes)
61*61c4878aSAndroid Build Coastguard Worker   * - (4) Top-level detokenized and formatted
62*61c4878aSAndroid Build Coastguard Worker     - ``"Status: ${pw::Status}#00000005"``
63*61c4878aSAndroid Build Coastguard Worker   * - (5) Fully detokenized
64*61c4878aSAndroid Build Coastguard Worker     - ``"Status: PW_STATUS_NOT_FOUND"``
65*61c4878aSAndroid Build Coastguard Worker
66*61c4878aSAndroid Build Coastguard WorkerCompared to log tokenization without nesting, string literals in token
67*61c4878aSAndroid Build Coastguard Workerdatabase entries may not be identical to what is typed in source code due
68*61c4878aSAndroid Build Coastguard Workerto the use of macros and preprocessor string concatenation. The
69*61c4878aSAndroid Build Coastguard Workerdetokenizer also takes an additional step to recursively detokenize any
70*61c4878aSAndroid Build Coastguard Workernested tokens. In exchange for this added complexity, nested enum tokenization
71*61c4878aSAndroid Build Coastguard Workerallows us to gain the readability of logging value names with zero additional
72*61c4878aSAndroid Build Coastguard Workerruntime space or performance cost compared to logging the integral values
73*61c4878aSAndroid Build Coastguard Workerdirectly with ``pw_log_tokenized``.
74*61c4878aSAndroid Build Coastguard Worker
75*61c4878aSAndroid Build Coastguard Worker.. note::
76*61c4878aSAndroid Build Coastguard Worker  Without nested enum token support, users can select either readability or
77*61c4878aSAndroid Build Coastguard Worker  reduced binary and transmission size, but not easily both:
78*61c4878aSAndroid Build Coastguard Worker
79*61c4878aSAndroid Build Coastguard Worker  .. list-table::
80*61c4878aSAndroid Build Coastguard Worker    :widths: 15 20 20
81*61c4878aSAndroid Build Coastguard Worker    :header-rows: 1
82*61c4878aSAndroid Build Coastguard Worker
83*61c4878aSAndroid Build Coastguard Worker    * -
84*61c4878aSAndroid Build Coastguard Worker      - Raw integers
85*61c4878aSAndroid Build Coastguard Worker      - String names
86*61c4878aSAndroid Build Coastguard Worker    * - (1) Source code
87*61c4878aSAndroid Build Coastguard Worker      - ``PW_LOG("Status: %x" , status.code())``
88*61c4878aSAndroid Build Coastguard Worker      - ``PW_LOG("Status: %s" , pw_StatusString(status))``
89*61c4878aSAndroid Build Coastguard Worker    * - (2) Token database entries (token, string, domain)
90*61c4878aSAndroid Build Coastguard Worker      - ``03a83461, "Status: %x", ""``
91*61c4878aSAndroid Build Coastguard Worker      - ``069c3ef0, "Status: %s", ""``
92*61c4878aSAndroid Build Coastguard Worker    * - (3) Wire format
93*61c4878aSAndroid Build Coastguard Worker      - ``61 34 a8 03 0a`` (5 bytes)
94*61c4878aSAndroid Build Coastguard Worker      - ``f0 3e 9c 06 09 4e 4f 54 5f 46 4f 55 4e 44`` (14 bytes)
95*61c4878aSAndroid Build Coastguard Worker    * - (4) Top-level detokenized and formatted
96*61c4878aSAndroid Build Coastguard Worker      - ``"Status: 5"``
97*61c4878aSAndroid Build Coastguard Worker      - ``"Status: PW_STATUS_NOT_FOUND"``
98*61c4878aSAndroid Build Coastguard Worker    * - (5) Fully detokenized
99*61c4878aSAndroid Build Coastguard Worker      - ``"Status: 5"``
100*61c4878aSAndroid Build Coastguard Worker      - ``"Status: PW_STATUS_NOT_FOUND"``
101*61c4878aSAndroid Build Coastguard Worker
102*61c4878aSAndroid Build Coastguard WorkerTokenization (C/C++)
103*61c4878aSAndroid Build Coastguard Worker====================
104*61c4878aSAndroid Build Coastguard WorkerThe ``pw_log_tokenized`` module exposes a set of macros for creating and
105*61c4878aSAndroid Build Coastguard Workerformatting nested tokens. Within format strings in the source code, tokens
106*61c4878aSAndroid Build Coastguard Workerare specified using function-like PRI-style macros. These can be used to
107*61c4878aSAndroid Build Coastguard Workerencode static information like the token domain or a numeric base encoding
108*61c4878aSAndroid Build Coastguard Workerand are macro-expanded to string literals that are concatenated with the
109*61c4878aSAndroid Build Coastguard Workerrest of the format string during preprocessing. Since ``pw_log`` generally
110*61c4878aSAndroid Build Coastguard Workeruses printf syntax, only bases 8, 10, and 16 are supported for integer token
111*61c4878aSAndroid Build Coastguard Workerarguments via ``%[odiuxX]``.
112*61c4878aSAndroid Build Coastguard Worker
113*61c4878aSAndroid Build Coastguard WorkerThe provided macros enforce the token specifier syntax and keep the argument
114*61c4878aSAndroid Build Coastguard Workertypes in sync when switching between other ``pw_log`` backends like
115*61c4878aSAndroid Build Coastguard Worker``pw_log_basic``. These macros for basic usage are as follows:
116*61c4878aSAndroid Build Coastguard Worker
117*61c4878aSAndroid Build Coastguard Worker* ``PW_LOG_TOKEN`` and ``PW_LOG_TOKEN_EXPR`` are used to tokenize string args.
118*61c4878aSAndroid Build Coastguard Worker* ``PW_LOG_TOKEN_FMT`` is used inside the format string to specify a token arg.
119*61c4878aSAndroid Build Coastguard Worker* ``PW_LOG_TOKEN_TYPE`` is used if the type of a tokenized arg needs to be
120*61c4878aSAndroid Build Coastguard Worker  referenced, e.g. as a ``ToString`` function return type.
121*61c4878aSAndroid Build Coastguard Worker
122*61c4878aSAndroid Build Coastguard Worker.. code-block:: cpp
123*61c4878aSAndroid Build Coastguard Worker
124*61c4878aSAndroid Build Coastguard Worker   #include "pw_log/log.h"
125*61c4878aSAndroid Build Coastguard Worker   #include "pw_log/tokenized_args.h"
126*61c4878aSAndroid Build Coastguard Worker
127*61c4878aSAndroid Build Coastguard Worker   // token with default options base-16 and empty domain
128*61c4878aSAndroid Build Coastguard Worker   // token database literal: "The sun will come out $#%08x!"
129*61c4878aSAndroid Build Coastguard Worker   PW_LOG("The sun will come out " PW_LOG_TOKEN_FMT() "!", PW_LOG_TOKEN_EXPR("tomorrow"))
130*61c4878aSAndroid Build Coastguard Worker   // after detokenization: "The sun will come out tomorrow!"
131*61c4878aSAndroid Build Coastguard Worker
132*61c4878aSAndroid Build Coastguard WorkerAdditional macros are also provided specifically for enum handling. The
133*61c4878aSAndroid Build Coastguard Worker``TOKENIZE_ENUM`` macro creates ELF token database entries for each enum
134*61c4878aSAndroid Build Coastguard Workervalue with the specified token domain to prevent token collision between
135*61c4878aSAndroid Build Coastguard Workermultiple tokenized enums. This macro is kept separate from the enum
136*61c4878aSAndroid Build Coastguard Workerdefinition to allow things like tokenizing a preexisting enum defined in an
137*61c4878aSAndroid Build Coastguard Workerexternal dependency.
138*61c4878aSAndroid Build Coastguard Worker
139*61c4878aSAndroid Build Coastguard Worker.. code-block:: cpp
140*61c4878aSAndroid Build Coastguard Worker
141*61c4878aSAndroid Build Coastguard Worker   // enums
142*61c4878aSAndroid Build Coastguard Worker   namespace foo {
143*61c4878aSAndroid Build Coastguard Worker
144*61c4878aSAndroid Build Coastguard Worker     enum class Color { kRed, kGreen, kBlue };
145*61c4878aSAndroid Build Coastguard Worker
146*61c4878aSAndroid Build Coastguard Worker     // syntax TBD
147*61c4878aSAndroid Build Coastguard Worker     TOKENIZE_ENUM(
148*61c4878aSAndroid Build Coastguard Worker       foo::Color,
149*61c4878aSAndroid Build Coastguard Worker       kRed,
150*61c4878aSAndroid Build Coastguard Worker       kGreen,
151*61c4878aSAndroid Build Coastguard Worker       kBlue
152*61c4878aSAndroid Build Coastguard Worker     )
153*61c4878aSAndroid Build Coastguard Worker
154*61c4878aSAndroid Build Coastguard Worker   } // namespace foo
155*61c4878aSAndroid Build Coastguard Worker
156*61c4878aSAndroid Build Coastguard Worker   void LogColor(foo::Color color) {
157*61c4878aSAndroid Build Coastguard Worker     // token database literal:
158*61c4878aSAndroid Build Coastguard Worker     // "Color: [${foo::Color}10#%010d]"
159*61c4878aSAndroid Build Coastguard Worker     PW_LOG("Color: [" PW_LOG_ENUM_FMT(foo::Color, 10) "]", color)
160*61c4878aSAndroid Build Coastguard Worker     // after detokenization:
161*61c4878aSAndroid Build Coastguard Worker     // e.g. "Color: kRed"
162*61c4878aSAndroid Build Coastguard Worker   }
163*61c4878aSAndroid Build Coastguard Worker
164*61c4878aSAndroid Build Coastguard Worker.. admonition:: Nested Base64 tokens
165*61c4878aSAndroid Build Coastguard Worker
166*61c4878aSAndroid Build Coastguard Worker  ``PW_LOG_TOKEN_FMT`` can accept 64 as the base encoding for an argument, in
167*61c4878aSAndroid Build Coastguard Worker  which case the argument should be a pre-encoded Base64 string argument
168*61c4878aSAndroid Build Coastguard Worker  (e.g. ``QAzF39==``). However, this should be avoided when possible to
169*61c4878aSAndroid Build Coastguard Worker  maximize space savings. Fully-formatted Base64 including the token prefix
170*61c4878aSAndroid Build Coastguard Worker  may also be logged with ``%s`` as before.
171*61c4878aSAndroid Build Coastguard Worker
172*61c4878aSAndroid Build Coastguard WorkerDetokenization (Python)
173*61c4878aSAndroid Build Coastguard Worker=======================
174*61c4878aSAndroid Build Coastguard Worker``Detokenizer.detokenize`` in Python (``Detokenizer::Detokenize`` in C++)
175*61c4878aSAndroid Build Coastguard Workerwill automatically recursively detokenize tokens of all known formats rather
176*61c4878aSAndroid Build Coastguard Workerthan requiring a separate call to ``detokenize_base64`` or similar.
177*61c4878aSAndroid Build Coastguard Worker
178*61c4878aSAndroid Build Coastguard WorkerTo support detokenizing domain-specific tokens, token databases support multiple
179*61c4878aSAndroid Build Coastguard Workerdomains, and ``database.py create`` will build a database with tokens from all
180*61c4878aSAndroid Build Coastguard Workerdomains by default. Specifying a domain during database creation will cause
181*61c4878aSAndroid Build Coastguard Workerthat domain to be treated as the default.
182*61c4878aSAndroid Build Coastguard Worker
183*61c4878aSAndroid Build Coastguard WorkerWhen detokenization fails, tokens appear as-is in logs. If the detokenizer has
184*61c4878aSAndroid Build Coastguard Workerthe ``show_errors`` option set to ``True``, error messages may be printed
185*61c4878aSAndroid Build Coastguard Workerinline following the raw token.
186*61c4878aSAndroid Build Coastguard Worker
187*61c4878aSAndroid Build Coastguard WorkerTokens
188*61c4878aSAndroid Build Coastguard Worker======
189*61c4878aSAndroid Build Coastguard WorkerMany details described here are provided via the ``PW_LOG_TOKEN_FMT`` macro, so
190*61c4878aSAndroid Build Coastguard Workerusers should typically not be manually formatting tokens. However, if
191*61c4878aSAndroid Build Coastguard Workerdetokenization fails for any reason, tokens will appear with the following
192*61c4878aSAndroid Build Coastguard Workerformat in the final logs and should be easily recognizable.
193*61c4878aSAndroid Build Coastguard Worker
194*61c4878aSAndroid Build Coastguard WorkerNested tokens have the following structure in partially detokenized logs
195*61c4878aSAndroid Build Coastguard Worker(transformation stage 4):
196*61c4878aSAndroid Build Coastguard Worker
197*61c4878aSAndroid Build Coastguard Worker.. code-block::
198*61c4878aSAndroid Build Coastguard Worker
199*61c4878aSAndroid Build Coastguard Worker   $[{DOMAIN}][BASE#]TOKEN
200*61c4878aSAndroid Build Coastguard Worker
201*61c4878aSAndroid Build Coastguard WorkerThe ``$`` is a common prefix required for all nested tokens. It is possible to
202*61c4878aSAndroid Build Coastguard Workerconfigure a different common prefix if necessary, but using the default ``$``
203*61c4878aSAndroid Build Coastguard Workercharacter is strongly recommended.
204*61c4878aSAndroid Build Coastguard Worker
205*61c4878aSAndroid Build Coastguard Worker.. list-table:: Options
206*61c4878aSAndroid Build Coastguard Worker   :widths: 10 30
207*61c4878aSAndroid Build Coastguard Worker
208*61c4878aSAndroid Build Coastguard Worker   * - ``{DOMAIN}``
209*61c4878aSAndroid Build Coastguard Worker     - Specifies the token domain. If this option is omitted, the default
210*61c4878aSAndroid Build Coastguard Worker       (empty) domain is assumed.
211*61c4878aSAndroid Build Coastguard Worker   * - ``BASE#``
212*61c4878aSAndroid Build Coastguard Worker     - Defines the numeric base encoding of the token. Accepted values are 8,
213*61c4878aSAndroid Build Coastguard Worker       10, 16, and 64. If the hash symbol ``#`` is used without specifying a
214*61c4878aSAndroid Build Coastguard Worker       number, the base is assumed to be 16. If the base option is omitted
215*61c4878aSAndroid Build Coastguard Worker       entirely, the base defaults to 64 for backward compatibility. All
216*61c4878aSAndroid Build Coastguard Worker       encodings except Base64 are not case sensitive.
217*61c4878aSAndroid Build Coastguard Worker
218*61c4878aSAndroid Build Coastguard Worker       This option may be expanded to support other bases in the future.
219*61c4878aSAndroid Build Coastguard Worker   * - ``TOKEN`` (required)
220*61c4878aSAndroid Build Coastguard Worker     - The numeric representation of the token in the given base encoding. All
221*61c4878aSAndroid Build Coastguard Worker       encodings except Base64 are left-padded with zeroes to the maximum width
222*61c4878aSAndroid Build Coastguard Worker       of a 32-bit integer in the given base. Base64 data may additionally encode
223*61c4878aSAndroid Build Coastguard Worker       string arguments for the detokenized token, and therefore does not have a
224*61c4878aSAndroid Build Coastguard Worker       maximum width. This is automatically handled by ``PW_LOG_TOKEN_FMT`` for
225*61c4878aSAndroid Build Coastguard Worker       supported bases.
226*61c4878aSAndroid Build Coastguard Worker
227*61c4878aSAndroid Build Coastguard WorkerWhen used in conjunction with ``pw_log_tokenized``, the token prefix (including
228*61c4878aSAndroid Build Coastguard Workerany domain and base specifications) is tokenized as part of the log format
229*61c4878aSAndroid Build Coastguard Workerstring and therefore incurs zero additional memory or transmission cost over
230*61c4878aSAndroid Build Coastguard Workerthat of the original format string. Over the wire, tokens in bases 8, 10, and
231*61c4878aSAndroid Build Coastguard Worker16 are transmitted as varint-encoded integers up to 5 bytes in size. Base64
232*61c4878aSAndroid Build Coastguard Workertokens continue to be encoded as strings.
233*61c4878aSAndroid Build Coastguard Worker
234*61c4878aSAndroid Build Coastguard Worker.. warning::
235*61c4878aSAndroid Build Coastguard Worker  Tokens do not have a terminating character in general, which is why we
236*61c4878aSAndroid Build Coastguard Worker  require them to be formatted with fixed width. Otherwise, following them
237*61c4878aSAndroid Build Coastguard Worker  immediately with alphanumeric characters valid in their base encoding
238*61c4878aSAndroid Build Coastguard Worker  will cause detokenization errors.
239*61c4878aSAndroid Build Coastguard Worker
240*61c4878aSAndroid Build Coastguard Worker.. admonition:: Recognizing raw nested tokens in strings
241*61c4878aSAndroid Build Coastguard Worker
242*61c4878aSAndroid Build Coastguard Worker  When a string is fully detokenized, there should no longer be any indication
243*61c4878aSAndroid Build Coastguard Worker  of tokenization in the final result, e.g. detokenized logs should read the
244*61c4878aSAndroid Build Coastguard Worker  same as plain string logs. However, if nested tokens cannot be detokenized for
245*61c4878aSAndroid Build Coastguard Worker  any reason, they will appear in their raw form as below:
246*61c4878aSAndroid Build Coastguard Worker
247*61c4878aSAndroid Build Coastguard Worker  .. code-block::
248*61c4878aSAndroid Build Coastguard Worker
249*61c4878aSAndroid Build Coastguard Worker     // Base64 token with no arguments and empty domain
250*61c4878aSAndroid Build Coastguard Worker     $QA19pfEQ
251*61c4878aSAndroid Build Coastguard Worker
252*61c4878aSAndroid Build Coastguard Worker     // Base-10 token
253*61c4878aSAndroid Build Coastguard Worker     $10#0086025943
254*61c4878aSAndroid Build Coastguard Worker
255*61c4878aSAndroid Build Coastguard Worker     // Base-16 token with specified domain
256*61c4878aSAndroid Build Coastguard Worker     ${foo_namespace::MyEnum}#0000001A
257*61c4878aSAndroid Build Coastguard Worker
258*61c4878aSAndroid Build Coastguard Worker     // Base64 token with specified domain
259*61c4878aSAndroid Build Coastguard Worker     ${bar_namespace::MyEnum}QAQQQQ==
260*61c4878aSAndroid Build Coastguard Worker
261*61c4878aSAndroid Build Coastguard Worker
262*61c4878aSAndroid Build Coastguard Worker---------------------
263*61c4878aSAndroid Build Coastguard WorkerProblem investigation
264*61c4878aSAndroid Build Coastguard Worker---------------------
265*61c4878aSAndroid Build Coastguard WorkerComplex embedded device projects are perpetually seeking more RAM. For longer
266*61c4878aSAndroid Build Coastguard Workerdescriptive string arguments, even just a handful can take up hundreds of bytes
267*61c4878aSAndroid Build Coastguard Workerthat are frequently exclusively for logging purposes, without any impact on
268*61c4878aSAndroid Build Coastguard Workerfunction.
269*61c4878aSAndroid Build Coastguard Worker
270*61c4878aSAndroid Build Coastguard WorkerOne of the most common potential use cases is for logging enum values.
271*61c4878aSAndroid Build Coastguard WorkerInspection of one project revealed that enums accounted for some 90% of the
272*61c4878aSAndroid Build Coastguard Workerstring log arguments. We have encountered instances where, to save space,
273*61c4878aSAndroid Build Coastguard Workerdevelopers have avoided logging descriptive names in favor of raw enum values,
274*61c4878aSAndroid Build Coastguard Workerforcing readers of logs look up or memorize the meanings of each number. Like
275*61c4878aSAndroid Build Coastguard Workerwith log format strings, we do know the set of possible string values that
276*61c4878aSAndroid Build Coastguard Workermight be emitted in the final logs, so they should be able to be extracted
277*61c4878aSAndroid Build Coastguard Workerinto a token database at compile time.
278*61c4878aSAndroid Build Coastguard Worker
279*61c4878aSAndroid Build Coastguard WorkerAnother major challenge overall is maintaining a user interface
280*61c4878aSAndroid Build Coastguard Workerthat is easy to understand and use. The current primary interface through
281*61c4878aSAndroid Build Coastguard Worker``pw_log`` provides printf-style formatting, which is familiar and succinct
282*61c4878aSAndroid Build Coastguard Workerfor basic applications.
283*61c4878aSAndroid Build Coastguard Worker
284*61c4878aSAndroid Build Coastguard WorkerWe also have to contend with the interchangeable backends of ``pw_log``. The
285*61c4878aSAndroid Build Coastguard Worker``pw_log`` facade is intended as an opaque interface layer; adding syntax
286*61c4878aSAndroid Build Coastguard Workerspecifically for tokenized logging will break this abstraction barrier. Either
287*61c4878aSAndroid Build Coastguard Workerthis additional syntax would be ignored by other backends, or it might simply
288*61c4878aSAndroid Build Coastguard Workerbe incompatible (e.g. logging raw integer tokens instead of strings).
289*61c4878aSAndroid Build Coastguard Worker
290*61c4878aSAndroid Build Coastguard WorkerPigweed already supports one form of nested tokens via Base64 encoding. Base64
291*61c4878aSAndroid Build Coastguard Workertokens begin with ``'$'``, followed by Base64-encoded data, and may be padded
292*61c4878aSAndroid Build Coastguard Workerwith one or two trailing ``'='`` symbols. The Python
293*61c4878aSAndroid Build Coastguard Worker``Detokenizer.detokenize_base64`` method recursively detokenizes Base64 by
294*61c4878aSAndroid Build Coastguard Workerrunning a regex replacement on the formatted results of each iteration. Base64
295*61c4878aSAndroid Build Coastguard Workeris not merely a token format, however; it can encode any binary data in a text
296*61c4878aSAndroid Build Coastguard Workerformat at the cost of reduced efficiency. Therefore, Base64 tokens may include
297*61c4878aSAndroid Build Coastguard Workernot only a database token that may detokenize to a format string but also
298*61c4878aSAndroid Build Coastguard Workerbinary-encoded arguments. Other token types are not expected to include this
299*61c4878aSAndroid Build Coastguard Workeradditional argument data.
300*61c4878aSAndroid Build Coastguard Worker
301*61c4878aSAndroid Build Coastguard Worker---------------
302*61c4878aSAndroid Build Coastguard WorkerDetailed design
303*61c4878aSAndroid Build Coastguard Worker---------------
304*61c4878aSAndroid Build Coastguard Worker
305*61c4878aSAndroid Build Coastguard WorkerTokenization
306*61c4878aSAndroid Build Coastguard Worker============
307*61c4878aSAndroid Build Coastguard Worker``pw_tokenizer`` and ``pw_log_tokenized`` already provide much of the necessary
308*61c4878aSAndroid Build Coastguard Workerfunctionality to support tokenized arguments. The proposed API is fully
309*61c4878aSAndroid Build Coastguard Workerbackward-compatible with non-nested tokenized logging.
310*61c4878aSAndroid Build Coastguard Worker
311*61c4878aSAndroid Build Coastguard WorkerToken arguments are indicated in log format strings via PRI-style macros that
312*61c4878aSAndroid Build Coastguard Workerare exposed by a new ``pw_log/tokenized_args.h`` header. ``PW_LOG_TOKEN_FMT``
313*61c4878aSAndroid Build Coastguard Workersupplies the ``$`` token prefix, brackets around the domain, the base specifier,
314*61c4878aSAndroid Build Coastguard Workerand the printf-style specifier including padding and width, i.e. ``%011o`` for
315*61c4878aSAndroid Build Coastguard Workerbase-8, ``%010u`` for base-10, and ``%08X`` for base-16.
316*61c4878aSAndroid Build Coastguard Worker
317*61c4878aSAndroid Build Coastguard WorkerFor free-standing string arguments such as those where the literals are defined
318*61c4878aSAndroid Build Coastguard Workerin the log statements themselves, tokenization is performed with macros from
319*61c4878aSAndroid Build Coastguard Worker``pw_log/tokenized_args.h``. With the tokenized logging backend, these macros
320*61c4878aSAndroid Build Coastguard Workersimply alias the corresponding ``PW_TOKENIZE`` macros, but they also revert to
321*61c4878aSAndroid Build Coastguard Workerbasic string formatting for other backends. This is achieved by placing an
322*61c4878aSAndroid Build Coastguard Workerempty header file in the local ``public_overrides`` directory of
323*61c4878aSAndroid Build Coastguard Worker``pw_log_tokenized`` and checking for it in ``pw_log/tokenized_args.h`` using
324*61c4878aSAndroid Build Coastguard Workerthe ``__has_include`` directive.
325*61c4878aSAndroid Build Coastguard Worker
326*61c4878aSAndroid Build Coastguard WorkerFor variable string arguments, the API is split across locations. The string
327*61c4878aSAndroid Build Coastguard Workerliterals are tokenized wherever they are defined, and the string format macros
328*61c4878aSAndroid Build Coastguard Workerappear in the log format strings corresponding to those string arguments.
329*61c4878aSAndroid Build Coastguard Worker
330*61c4878aSAndroid Build Coastguard WorkerWhen tokens use non-default domains, additional work may be required to create
331*61c4878aSAndroid Build Coastguard Workerthe domain name and store associated tokens in the ELF.
332*61c4878aSAndroid Build Coastguard Worker
333*61c4878aSAndroid Build Coastguard WorkerEnum Tokenization
334*61c4878aSAndroid Build Coastguard Worker-----------------
335*61c4878aSAndroid Build Coastguard WorkerWe use existing ``pw_tokenizer`` utilities to record the raw enum values as
336*61c4878aSAndroid Build Coastguard Workertokens corresponding to their string names in the ELF. There is no change
337*61c4878aSAndroid Build Coastguard Workerrequired for the backend implementation; we simply skip the token calculation
338*61c4878aSAndroid Build Coastguard Workerstep, since we already have a value to use, and specifying a token domain is
339*61c4878aSAndroid Build Coastguard Workergenerally required to isolate multiple enums from token collision.
340*61c4878aSAndroid Build Coastguard Worker
341*61c4878aSAndroid Build Coastguard WorkerFor ease of use, we can also provide a macro that wraps the enum value list
342*61c4878aSAndroid Build Coastguard Workerand encapsulates the recording of each token value-string pair in the ELF.
343*61c4878aSAndroid Build Coastguard Worker
344*61c4878aSAndroid Build Coastguard WorkerWhen actually logging the values, users pass the enum type name as the domain
345*61c4878aSAndroid Build Coastguard Workerto format specifier macro ``PW_LOG_TOKEN()``, and the enum values can be
346*61c4878aSAndroid Build Coastguard Workerpassed as-is to ``PW_LOG`` (casting to integers as necessary for scoped enums).
347*61c4878aSAndroid Build Coastguard WorkerSince integers are varint-encoded over the wire, this will only require a
348*61c4878aSAndroid Build Coastguard Workersingle byte for most enums.
349*61c4878aSAndroid Build Coastguard Worker
350*61c4878aSAndroid Build Coastguard Worker.. admonition:: Logging pw::status
351*61c4878aSAndroid Build Coastguard Worker
352*61c4878aSAndroid Build Coastguard Worker  Note that while this immediately reduces transmission size, the code
353*61c4878aSAndroid Build Coastguard Worker  space occupied by the string names in ``pw::Status::str()`` cannot be
354*61c4878aSAndroid Build Coastguard Worker  recovered unless an entire project is converted to log ``pw::Status``
355*61c4878aSAndroid Build Coastguard Worker  as tokens.
356*61c4878aSAndroid Build Coastguard Worker
357*61c4878aSAndroid Build Coastguard Worker  .. code-block:: cpp
358*61c4878aSAndroid Build Coastguard Worker
359*61c4878aSAndroid Build Coastguard Worker     #include "pw_log/log.h"
360*61c4878aSAndroid Build Coastguard Worker     #include "pw_log/tokenized_args.h"
361*61c4878aSAndroid Build Coastguard Worker     #include "pw_status/status.h"
362*61c4878aSAndroid Build Coastguard Worker
363*61c4878aSAndroid Build Coastguard Worker     pw::Status status = pw::Status::NotFound();
364*61c4878aSAndroid Build Coastguard Worker
365*61c4878aSAndroid Build Coastguard Worker     // "pw::Status: ${pw::Status}#%08d"
366*61c4878aSAndroid Build Coastguard Worker     PW_LOG("pw::Status: " PW_LOG_TOKEN(pw::Status), status.code)
367*61c4878aSAndroid Build Coastguard Worker     // "pw::Status: NOT_FOUND"
368*61c4878aSAndroid Build Coastguard Worker
369*61c4878aSAndroid Build Coastguard WorkerSince the token mapping entries in the ELF are optimized out of the final
370*61c4878aSAndroid Build Coastguard Workerbinary, the enum domains are tokenized away as part of the log format strings,
371*61c4878aSAndroid Build Coastguard Workerand we don't need to store separate tokens for each enum value, this addition
372*61c4878aSAndroid Build Coastguard Workerto the API would would provide enum value names in logs with zero additional
373*61c4878aSAndroid Build Coastguard WorkerRAM cost. Compared to logging strings with ``ToString``-style functions, we
374*61c4878aSAndroid Build Coastguard Workersave space on the string names as well as the functions themselves.
375*61c4878aSAndroid Build Coastguard Worker
376*61c4878aSAndroid Build Coastguard WorkerToken Database
377*61c4878aSAndroid Build Coastguard Worker==============
378*61c4878aSAndroid Build Coastguard WorkerToken databases will be expanded to include a column for domains, so that
379*61c4878aSAndroid Build Coastguard Workermultiple domains can be encompassed in a single database rather than requiring
380*61c4878aSAndroid Build Coastguard Workerseparate databases for each domain. This is important because domains are being
381*61c4878aSAndroid Build Coastguard Workerused to categorize tokens within a single project, rather than merely keeping
382*61c4878aSAndroid Build Coastguard Workerseparate projects distinct from each other. When creating a database
383*61c4878aSAndroid Build Coastguard Workerfrom an ELF, a domain may be specified as the default domain instead of the
384*61c4878aSAndroid Build Coastguard Workerempty domain. A list of domains or path to a file with a list of domains may
385*61c4878aSAndroid Build Coastguard Workeralso separately be specified to define which domains are to be included in
386*61c4878aSAndroid Build Coastguard Workerthe database; all domains are now included by default.
387*61c4878aSAndroid Build Coastguard Worker
388*61c4878aSAndroid Build Coastguard WorkerWhen accessing a token database, both a domain and token value may be specified
389*61c4878aSAndroid Build Coastguard Workerto access specific values. If a domain is not specified, the default domain
390*61c4878aSAndroid Build Coastguard Workerwill be assumed, retaining the same behavior as before.
391*61c4878aSAndroid Build Coastguard Worker
392*61c4878aSAndroid Build Coastguard WorkerDetokenization
393*61c4878aSAndroid Build Coastguard Worker==============
394*61c4878aSAndroid Build Coastguard WorkerDetokenization is relatively straightforward. When the detokenizer is called,
395*61c4878aSAndroid Build Coastguard Workerit will first detokenize and format the top-level token and binary argument
396*61c4878aSAndroid Build Coastguard Workerdata. The detokenizer will then find and replace nested tokens in the resulting
397*61c4878aSAndroid Build Coastguard Workerformatted string, then rescan the result for more nested tokens up to a fixed
398*61c4878aSAndroid Build Coastguard Workernumber of rescans.
399*61c4878aSAndroid Build Coastguard Worker
400*61c4878aSAndroid Build Coastguard WorkerFor each token type or format, ``pw_tokenizer`` defines a regular expression to
401*61c4878aSAndroid Build Coastguard Workermatch the expected formatted output token and a helper function to convert a
402*61c4878aSAndroid Build Coastguard Workertoken from a particular format to its mapped value. The regular expressions for
403*61c4878aSAndroid Build Coastguard Workereach token type are combined into a single regex that matches any one of the
404*61c4878aSAndroid Build Coastguard Workerformats. At each recursive step for every match, each detokenization format
405*61c4878aSAndroid Build Coastguard Workerwill be attempted, stopping at the first successful token type and then
406*61c4878aSAndroid Build Coastguard Workerrecursively replacing all nested tokens in the result. Only full data encoding-
407*61c4878aSAndroid Build Coastguard Workertype tokens like Base64 will also require string/argument formatting as part of
408*61c4878aSAndroid Build Coastguard Workerthe recursive step.
409*61c4878aSAndroid Build Coastguard Worker
410*61c4878aSAndroid Build Coastguard WorkerFor non-Base64 tokens, a token's base encoding as specified by ``BASE#``
411*61c4878aSAndroid Build Coastguard Workerdetermines its set of permissible alphanumeric characters and the
412*61c4878aSAndroid Build Coastguard Workermaximum token width for regex matching.
413*61c4878aSAndroid Build Coastguard Worker
414*61c4878aSAndroid Build Coastguard WorkerIf nested detokenization fails for any reason, the formatted token will be
415*61c4878aSAndroid Build Coastguard Workerprinted as-is in the output logs. If ``show_errors`` is true for the
416*61c4878aSAndroid Build Coastguard Workerdetokenizer, errors will appear in parentheses immediately following the
417*61c4878aSAndroid Build Coastguard Workertoken. Supported errors include:
418*61c4878aSAndroid Build Coastguard Worker
419*61c4878aSAndroid Build Coastguard Worker* ``(token collision)``
420*61c4878aSAndroid Build Coastguard Worker* ``(missing database)``
421*61c4878aSAndroid Build Coastguard Worker* ``(token not found)``
422*61c4878aSAndroid Build Coastguard Worker
423*61c4878aSAndroid Build Coastguard Worker------------
424*61c4878aSAndroid Build Coastguard WorkerAlternatives
425*61c4878aSAndroid Build Coastguard Worker------------
426*61c4878aSAndroid Build Coastguard Worker
427*61c4878aSAndroid Build Coastguard WorkerProtobuf-based Tokenization
428*61c4878aSAndroid Build Coastguard Worker===========================
429*61c4878aSAndroid Build Coastguard WorkerTokenization may be expanded to function on structured data via protobufs.
430*61c4878aSAndroid Build Coastguard WorkerThis can be used to make logging more flexible, as all manner of compile-time
431*61c4878aSAndroid Build Coastguard Workermetadata can be freely attached to log arguments at effectively no cost.
432*61c4878aSAndroid Build Coastguard WorkerThis will most likely involve a separate build process to generate and tokenize
433*61c4878aSAndroid Build Coastguard Workerpartially-populated protos and will significantly change the user API. It
434*61c4878aSAndroid Build Coastguard Workerwill also be a large break from the existing process in implementation, as
435*61c4878aSAndroid Build Coastguard Workerthe current system relies only on existing C preprocessor and C++ constexpr
436*61c4878aSAndroid Build Coastguard Workertricks to function.
437*61c4878aSAndroid Build Coastguard Worker
438*61c4878aSAndroid Build Coastguard WorkerIn this model, the token domain would likely be a fully-qualified
439*61c4878aSAndroid Build Coastguard Workernamespace for or path to the proto definition.
440*61c4878aSAndroid Build Coastguard Worker
441*61c4878aSAndroid Build Coastguard WorkerImplementing this approach also requires a method of passing ordered arguments
442*61c4878aSAndroid Build Coastguard Workerto a partially-filled detokenized protobuf in a manner similar to printf-style
443*61c4878aSAndroid Build Coastguard Workerstring formatting, so that argument data can be efficiently encoded and
444*61c4878aSAndroid Build Coastguard Workertransmitted alongside the protobuf's token, and the arguments to a particular
445*61c4878aSAndroid Build Coastguard Workerproto can be disambiguated from arguments to the rest of a log statement.
446*61c4878aSAndroid Build Coastguard Worker
447*61c4878aSAndroid Build Coastguard WorkerThis approach will also most likely preclude plain string logging as is
448*61c4878aSAndroid Build Coastguard Workercurrently supported by ``pw_log``, as the implementations diverge dramatically.
449*61c4878aSAndroid Build Coastguard WorkerHowever, if pursued, this would likely be made the default logging schema
450*61c4878aSAndroid Build Coastguard Workeracross all platforms, including host devices.
451*61c4878aSAndroid Build Coastguard Worker
452*61c4878aSAndroid Build Coastguard WorkerCustom Detokenization
453*61c4878aSAndroid Build Coastguard Worker=====================
454*61c4878aSAndroid Build Coastguard WorkerTheoretically, individual projects could implement their own regex replacement
455*61c4878aSAndroid Build Coastguard Workerschemes on top of Pigweed's detokenizer, allowing them to more flexibly define
456*61c4878aSAndroid Build Coastguard Workercomplex relationships between logged tokens via custom log format string
457*61c4878aSAndroid Build Coastguard Workersyntax. However, Pigweed should provide utilities for nested tokenization in
458*61c4878aSAndroid Build Coastguard Workercommon cases such as logging enums.
459*61c4878aSAndroid Build Coastguard Worker
460*61c4878aSAndroid Build Coastguard WorkerThe changes proposed do not preclude additional custom detokenization schemas
461*61c4878aSAndroid Build Coastguard Workerif absolutely necessary, and such practices do not appear to have been popular
462*61c4878aSAndroid Build Coastguard Workerthus far in any case.
463*61c4878aSAndroid Build Coastguard Worker
464*61c4878aSAndroid Build Coastguard Worker--------------
465*61c4878aSAndroid Build Coastguard WorkerOpen questions
466*61c4878aSAndroid Build Coastguard Worker--------------
467*61c4878aSAndroid Build Coastguard WorkerMissing API definitions:
468*61c4878aSAndroid Build Coastguard Worker
469*61c4878aSAndroid Build Coastguard Worker* Updated APIs for creating and accessing token databases with multiple domains
470*61c4878aSAndroid Build Coastguard Worker* Python nested tokenization
471*61c4878aSAndroid Build Coastguard Worker* C++ nested detokenization
472