xref: /aosp_15_r20/external/pigweed/pw_tokenizer/detokenization.rst (revision 61c4878ac05f98d0ceed94b57d316916de578985)
1:tocdepth: 3
2
3.. _module-pw_tokenizer-detokenization:
4
5==============
6Detokenization
7==============
8.. pigweed-module-subpage::
9   :name: pw_tokenizer
10
11Detokenization is the process of expanding a token to the string it represents
12and decoding its arguments. ``pw_tokenizer`` provides Python, C++ and
13TypeScript detokenization libraries.
14
15--------------------------------
16Example: decoding tokenized logs
17--------------------------------
18A project might tokenize its log messages with the
19:ref:`module-pw_tokenizer-base64-format`. Consider the following log file, which
20has four tokenized logs and one plain text log:
21
22.. code-block:: text
23
24   20200229 14:38:58 INF $HL2VHA==
25   20200229 14:39:00 DBG $5IhTKg==
26   20200229 14:39:20 DBG Crunching numbers to calculate probability of success
27   20200229 14:39:21 INF $EgFj8lVVAUI=
28   20200229 14:39:23 ERR $DFRDNwlOT1RfUkVBRFk=
29
30The project's log strings are stored in a database like the following:
31
32.. code-block::
33
34   1c95bd1c,          ,"Initiating retrieval process for recovery object"
35   2a5388e4,          ,"Determining optimal approach and coordinating vectors"
36   3743540c,          ,"Recovery object retrieval failed with status %s"
37   f2630112,          ,"Calculated acceptable probability of success (%.2f%%)"
38
39Using the detokenizing tools with the database, the logs can be decoded:
40
41.. code-block:: text
42
43   20200229 14:38:58 INF Initiating retrieval process for recovery object
44   20200229 14:39:00 DBG Determining optimal algorithm and coordinating approach vectors
45   20200229 14:39:20 DBG Crunching numbers to calculate probability of success
46   20200229 14:39:21 INF Calculated acceptable probability of success (32.33%)
47   20200229 14:39:23 ERR Recovery object retrieval failed with status NOT_READY
48
49.. note::
50
51   This example uses the :ref:`module-pw_tokenizer-base64-format`, which
52   occupies about 4/3 (133%) as much space as the default binary format when
53   encoded. For projects that wish to interleave tokenized with plain text,
54   using Base64 is a worthwhile tradeoff.
55
56------------------------
57Detokenization in Python
58------------------------
59To detokenize in Python, import ``Detokenizer`` from the ``pw_tokenizer``
60package, and instantiate it with paths to token databases or ELF files.
61
62.. code-block:: python
63
64   import pw_tokenizer
65
66   detokenizer = pw_tokenizer.Detokenizer('path/to/database.csv', 'other/path.elf')
67
68   def process_log_message(log_message):
69       result = detokenizer.detokenize(log_message.payload)
70       self._log(str(result))
71
72The ``pw_tokenizer`` package also provides the ``AutoUpdatingDetokenizer``
73class, which can be used in place of the standard ``Detokenizer``. This class
74monitors database files for changes and automatically reloads them when they
75change. This is helpful for long-running tools that use detokenization. The
76class also supports filtering token domains for the given database files in the
77``<path>#<domain>`` format.
78
79For messages that are optionally tokenized and may be encoded as binary,
80Base64, or plaintext UTF-8, use
81:func:`pw_tokenizer.proto.decode_optionally_tokenized`. This will attempt to
82determine the correct method to detokenize and always provide a printable
83string.
84
85.. _module-pw_tokenizer-base64-decoding:
86
87Decoding Base64
88===============
89The Python ``Detokenizer`` class supports decoding and detokenizing prefixed
90Base64 messages with ``detokenize_base64`` and related methods.
91
92.. tip::
93   The Python detokenization tools support recursive detokenization for prefixed
94   Base64 text. Tokenized strings found in detokenized text are detokenized, so
95   prefixed Base64 messages can be passed as ``%s`` arguments.
96
97   For example, the tokenized string for "Wow!" is ``$RhYjmQ==``. This could be
98   passed as an argument to the printf-style string ``Nested message: %s``, which
99   encodes to ``$pEVTYQkkUmhZam1RPT0=``. The detokenizer would decode the message
100   as follows:
101
102   ::
103
104     "$pEVTYQkkUmhZam1RPT0=" → "Nested message: $RhYjmQ==" → "Nested message: Wow!"
105
106Base64 decoding is supported in C++ or C with the
107``pw::tokenizer::PrefixedBase64Decode`` or ``pw_tokenizer_PrefixedBase64Decode``
108functions.
109
110Investigating undecoded Base64 messages
111---------------------------------------
112Tokenized messages cannot be decoded if the token is not recognized. The Python
113package includes the ``parse_message`` tool, which parses tokenized Base64
114messages without looking up the token in a database. This tool attempts to guess
115the types of the arguments and displays potential ways to decode them.
116
117This tool can be used to extract argument information from an otherwise unusable
118message. It could help identify which statement in the code produced the
119message. This tool is not particularly helpful for tokenized messages without
120arguments, since all it can do is show the value of the unknown token.
121
122The tool is executed by passing Base64 tokenized messages, with or without the
123``$`` prefix, to ``pw_tokenizer.parse_message``. Pass ``-h`` or ``--help`` to
124see full usage information.
125
126Example
127^^^^^^^
128.. code-block::
129
130   $ python -m pw_tokenizer.parse_message '$329JMwA=' koSl524TRkFJTEVEX1BSRUNPTkRJVElPTgJPSw== --specs %s %d
131
132   INF Decoding arguments for '$329JMwA='
133   INF Binary: b'\xdfoI3\x00' [df 6f 49 33 00] (5 bytes)
134   INF Token:  0x33496fdf
135   INF Args:   b'\x00' [00] (1 bytes)
136   INF Decoding with up to 8 %s or %d arguments
137   INF   Attempt 1: [%s]
138   INF   Attempt 2: [%d] 0
139
140   INF Decoding arguments for '$koSl524TRkFJTEVEX1BSRUNPTkRJVElPTgJPSw=='
141   INF Binary: b'\x92\x84\xa5\xe7n\x13FAILED_PRECONDITION\x02OK' [92 84 a5 e7 6e 13 46 41 49 4c 45 44 5f 50 52 45 43 4f 4e 44 49 54 49 4f 4e 02 4f 4b] (28 bytes)
142   INF Token:  0xe7a58492
143   INF Args:   b'n\x13FAILED_PRECONDITION\x02OK' [6e 13 46 41 49 4c 45 44 5f 50 52 45 43 4f 4e 44 49 54 49 4f 4e 02 4f 4b] (24 bytes)
144   INF Decoding with up to 8 %s or %d arguments
145   INF   Attempt 1: [%d %s %d %d %d] 55 FAILED_PRECONDITION 1 -40 -38
146   INF   Attempt 2: [%d %s %s] 55 FAILED_PRECONDITION OK
147
148
149.. _module-pw_tokenizer-protobuf-tokenization-python:
150
151Detokenizing protobufs
152======================
153The :py:mod:`pw_tokenizer.proto` Python module defines functions that may be
154used to detokenize protobuf objects in Python. The function
155:py:func:`pw_tokenizer.proto.detokenize_fields` detokenizes all fields
156annotated as tokenized, replacing them with their detokenized version. For
157example:
158
159.. code-block:: python
160
161   my_detokenizer = pw_tokenizer.Detokenizer(some_database)
162
163   my_message = SomeMessage(tokenized_field=b'$YS1EMQ==')
164   pw_tokenizer.proto.detokenize_fields(my_detokenizer, my_message)
165
166   assert my_message.tokenized_field == b'The detokenized string! Cool!'
167
168Decoding optionally tokenized strings
169-------------------------------------
170The encoding used for an optionally tokenized field is not recorded in the
171protobuf. Despite this, the text can reliably be decoded. This is accomplished
172by attempting to decode the field as binary or Base64 tokenized data before
173treating it like plain text.
174
175The following diagram describes the decoding process for optionally tokenized
176fields in detail.
177
178.. mermaid::
179
180  flowchart TD
181     start([Received bytes]) --> binary
182
183     binary[Decode as<br>binary tokenized] --> binary_ok
184     binary_ok{Detokenizes<br>successfully?} -->|no| utf8
185     binary_ok -->|yes| done_binary([Display decoded binary])
186
187     utf8[Decode as UTF-8] --> utf8_ok
188     utf8_ok{Valid UTF-8?} -->|no| base64_encode
189     utf8_ok -->|yes| base64
190
191     base64_encode[Encode as<br>tokenized Base64] --> display
192     display([Display encoded Base64])
193
194     base64[Decode as<br>Base64 tokenized] --> base64_ok
195
196     base64_ok{Fully<br>or partially<br>detokenized?} -->|no| is_plain_text
197     base64_ok -->|yes| base64_results
198
199     is_plain_text{Text is<br>printable?} -->|no| base64_encode
200     is_plain_text-->|yes| plain_text
201
202     base64_results([Display decoded Base64])
203     plain_text([Display text])
204
205Potential decoding problems
206---------------------------
207The decoding process for optionally tokenized fields will yield correct results
208in almost every situation. In rare circumstances, it is possible for it to fail,
209but these can be avoided with a low-overhead mitigation if desired.
210
211There are two ways in which the decoding process may fail.
212
213Accidentally interpreting plain text as tokenized binary
214^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
215If a plain-text string happens to decode as a binary tokenized message, the
216incorrect message could be displayed. This is very unlikely to occur. While many
217tokens will incidentally end up being valid UTF-8 strings, it is highly unlikely
218that a device will happen to log one of these strings as plain text. The
219overwhelming majority of these strings will be nonsense.
220
221If an implementation wishes to guard against this extremely improbable
222situation, it is possible to prevent it. This situation is prevented by
223appending 0xFF (or another byte never valid in UTF-8) to binary tokenized data
224that happens to be valid UTF-8 (or all binary tokenized messages, if desired).
225When decoding, if there is an extra 0xFF byte, it is discarded.
226
227Displaying undecoded binary as plain text instead of Base64
228^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
229If a message fails to decode as binary tokenized and it is not valid UTF-8, it
230is displayed as tokenized Base64. This makes it easily recognizable as a
231tokenized message and makes it simple to decode later from the text output (for
232example, with an updated token database).
233
234A binary message for which the token is not known may coincidentally be valid
235UTF-8 or ASCII. 6.25% of 4-byte sequences are composed only of ASCII characters
236When decoding with an out-of-date token database, it is possible that some
237binary tokenized messages will be displayed as plain text rather than tokenized
238Base64.
239
240This situation is likely to occur, but should be infrequent. Even if it does
241happen, it is not a serious issue. A very small number of strings will be
242displayed incorrectly, but these strings cannot be decoded anyway. One nonsense
243string (e.g. ``a-D1``) would be displayed instead of another (``$YS1EMQ==``).
244Updating the token database would resolve the issue, though the non-Base64 logs
245would be difficult decode later from a log file.
246
247This situation can be avoided with the same approach described in
248`Accidentally interpreting plain text as tokenized binary`_. Appending
249an invalid UTF-8 character prevents the undecoded binary message from being
250interpreted as plain text.
251
252---------------------
253Detokenization in C++
254---------------------
255The C++ detokenization libraries can be used in C++ or any language that can
256call into C++ with a C-linkage wrapper, such as Java or Rust. A reference
257Java Native Interface (JNI) implementation is provided.
258
259The C++ detokenization library uses binary-format token databases (created with
260``database.py create --type binary``). Read a binary format database from a
261file or include it in the source code. Pass the database array to
262``TokenDatabase::Create``, and construct a detokenizer.
263
264.. code-block:: cpp
265
266   Detokenizer detokenizer(TokenDatabase::Create(token_database_array));
267
268   std::string ProcessLog(span<uint8_t> log_data) {
269     return detokenizer.Detokenize(log_data).BestString();
270   }
271
272The ``TokenDatabase`` class verifies that its data is valid before using it. If
273it is invalid, the ``TokenDatabase::Create`` returns an empty database for which
274``ok()`` returns false. If the token database is included in the source code,
275this check can be done at compile time.
276
277.. code-block:: cpp
278
279   // This line fails to compile with a static_assert if the database is invalid.
280   constexpr TokenDatabase kDefaultDatabase =  TokenDatabase::Create<kData>();
281
282   Detokenizer OpenDatabase(std::string_view path) {
283     std::vector<uint8_t> data = ReadWholeFile(path);
284
285     TokenDatabase database = TokenDatabase::Create(data);
286
287     // This checks if the file contained a valid database. It is safe to use a
288     // TokenDatabase that failed to load (it will be empty), but it may be
289     // desirable to provide a default database or otherwise handle the error.
290     if (database.ok()) {
291       return Detokenizer(database);
292     }
293     return Detokenizer(kDefaultDatabase);
294   }
295
296----------------------------
297Detokenization in TypeScript
298----------------------------
299To detokenize in TypeScript, import ``Detokenizer`` from the ``pigweedjs``
300package, and instantiate it with a CSV token database.
301
302.. code-block:: typescript
303
304   import { pw_tokenizer, pw_hdlc } from 'pigweedjs';
305   const { Detokenizer } = pw_tokenizer;
306   const { Frame } = pw_hdlc;
307
308   const detokenizer = new Detokenizer(String(tokenCsv));
309
310   function processLog(frame: Frame){
311     const result = detokenizer.detokenize(frame);
312     console.log(result);
313   }
314
315For messages that are encoded in Base64, use ``Detokenizer::detokenizeBase64``.
316`detokenizeBase64` will also attempt to detokenize nested Base64 tokens. There
317is also `detokenizeUint8Array` that works just like `detokenize` but expects
318`Uint8Array` instead of a `Frame` argument.
319
320
321
322.. _module-pw_tokenizer-cli-detokenizing:
323
324---------------------
325Detokenizing CLI tool
326---------------------
327``pw_tokenizer`` provides two standalone command line utilities for detokenizing
328Base64-encoded tokenized strings.
329
330* ``detokenize.py`` -- Detokenizes Base64-encoded strings in files or from
331  stdin.
332* ``serial_detokenizer.py`` -- Detokenizes Base64-encoded strings from a
333  connected serial device.
334
335If the ``pw_tokenizer`` Python package is installed, these tools may be executed
336as runnable modules. For example:
337
338.. code-block::
339
340   # Detokenize Base64-encoded strings in a file
341   python -m pw_tokenizer.detokenize -i input_file.txt
342
343   # Detokenize Base64-encoded strings in output from a serial device
344   python -m pw_tokenizer.serial_detokenizer --device /dev/ttyACM0
345
346See the ``--help`` options for these tools for full usage information.
347
348--------
349Appendix
350--------
351
352.. _module-pw_tokenizer-python-detokenization-c99-printf-notes:
353
354Python detokenization: C99 ``printf`` compatibility notes
355=========================================================
356This implementation is designed to align with the
357`C99 specification, section 7.19.6
358<https://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf>`_.
359Notably, this specification is slightly different than what is implemented
360in most compilers due to each compiler choosing to interpret undefined
361behavior in slightly different ways. Treat the following description as the
362source of truth.
363
364This implementation supports:
365
366- Overall Format: ``%[flags][width][.precision][length][specifier]``
367- Flags (Zero or More)
368   - ``-``: Left-justify within the given field width; Right justification is
369     the default (see Width modifier).
370   - ``+``: Forces to preceed the result with a plus or minus sign (``+`` or
371     ``-``) even for positive numbers. By default, only negative numbers are
372     preceded with a ``-`` sign.
373   - (space): If no sign is going to be written, a blank space is inserted
374     before the value.
375   - ``#``: Specifies an alternative print syntax should be used.
376      - Used with ``o``, ``x`` or ``X`` specifiers the value is preceeded with
377        ``0``, ``0x`` or ``0X``, respectively, for values different than zero.
378      - Used with ``a``, ``A``, ``e``, ``E``, ``f``, ``F``, ``g``, or ``G`` it
379        forces the written output to contain a decimal point even if no more
380        digits follow. By default, if no digits follow, no decimal point is
381        written.
382   - ``0``: Left-pads the number with zeroes (``0``) instead of spaces when
383     padding is specified (see width sub-specifier).
384- Width (Optional)
385   - ``(number)``: Minimum number of characters to be printed. If the value to
386     be printed is shorter than this number, the result is padded with blank
387     spaces or ``0`` if the ``0`` flag is present. The value is not truncated
388     even if the result is larger. If the value is negative and the ``0`` flag
389     is present, the ``0``\s are padded after the ``-`` symbol.
390   - ``*``: The width is not specified in the format string, but as an
391     additional integer value argument preceding the argument that has to be
392     formatted.
393- Precision (Optional)
394   - ``.(number)``
395      - For ``d``, ``i``, ``o``, ``u``, ``x``, ``X``, specifies the minimum
396        number of digits to be written. If the value to be written is shorter
397        than this number, the result is padded with leading zeros. The value is
398        not truncated even if the result is longer.
399
400        - A precision of ``0`` means that no character is written for the value
401          ``0``.
402
403      - For ``a``, ``A``, ``e``, ``E``, ``f``, and ``F``, specifies the number
404        of digits to be printed after the decimal point. By default, this is
405        ``6``.
406
407      - For ``g`` and ``G``, specifies the maximum number of significant digits
408        to be printed.
409
410      - For ``s``, specifies the maximum number of characters to be printed. By
411        default all characters are printed until the ending null character is
412        encountered.
413
414      - If the period is specified without an explicit value for precision,
415        ``0`` is assumed.
416   - ``.*``: The precision is not specified in the format string, but as an
417     additional integer value argument preceding the argument that has to be
418     formatted.
419- Length (Optional)
420   - ``hh``: Usable with ``d``, ``i``, ``o``, ``u``, ``x``, or ``X`` specifiers
421     to convey the argument will be a ``signed char`` or ``unsigned char``.
422     However, this is largely ignored in the implementation due to it not being
423     necessary for Python or argument decoding (since the argument is always
424     encoded at least as a 32-bit integer).
425   - ``h``: Usable with ``d``, ``i``, ``o``, ``u``, ``x``, or ``X`` specifiers
426     to convey the argument will be a ``signed short int`` or
427     ``unsigned short int``. However, this is largely ignored in the
428     implementation due to it not being necessary for Python or argument
429     decoding (since the argument is always encoded at least as a 32-bit
430     integer).
431   - ``l``: Usable with ``d``, ``i``, ``o``, ``u``, ``x``, or ``X`` specifiers
432     to convey the argument will be a ``signed long int`` or
433     ``unsigned long int``. Also is usable with ``c`` and ``s`` to specify that
434     the arguments will be encoded with ``wchar_t`` values (which isn't
435     different from normal ``char`` values). However, this is largely ignored in
436     the implementation due to it not being necessary for Python or argument
437     decoding (since the argument is always encoded at least as a 32-bit
438     integer).
439   - ``ll``: Usable with ``d``, ``i``, ``o``, ``u``, ``x``, or ``X`` specifiers
440     to convey the argument will be a ``signed long long int`` or
441     ``unsigned long long int``. This is required to properly decode the
442     argument as a 64-bit integer.
443   - ``L``: Usable with ``a``, ``A``, ``e``, ``E``, ``f``, ``F``, ``g``, or
444     ``G`` conversion specifiers applies to a long double argument. However,
445     this is ignored in the implementation due to floating point value encoded
446     that is unaffected by bit width.
447   - ``j``: Usable with ``d``, ``i``, ``o``, ``u``, ``x``, or ``X`` specifiers
448     to convey the argument will be a ``intmax_t`` or ``uintmax_t``.
449   - ``z``: Usable with ``d``, ``i``, ``o``, ``u``, ``x``, or ``X`` specifiers
450     to convey the argument will be a ``size_t``. This will force the argument
451     to be decoded as an unsigned integer.
452   - ``t``: Usable with ``d``, ``i``, ``o``, ``u``, ``x``, or ``X`` specifiers
453     to convey the argument will be a ``ptrdiff_t``.
454   - If a length modifier is provided for an incorrect specifier, it is ignored.
455- Specifier (Required)
456   - ``d`` / ``i``: Used for signed decimal integers.
457
458   - ``u``: Used for unsigned decimal integers.
459
460   - ``o``: Used for unsigned decimal integers and specifies formatting should
461     be as an octal number.
462
463   - ``x``: Used for unsigned decimal integers and specifies formatting should
464     be as a hexadecimal number using all lowercase letters.
465
466   - ``X``: Used for unsigned decimal integers and specifies formatting should
467     be as a hexadecimal number using all uppercase letters.
468
469   - ``f``: Used for floating-point values and specifies to use lowercase,
470     decimal floating point formatting.
471
472     - Default precision is ``6`` decimal places unless explicitly specified.
473
474   - ``F``: Used for floating-point values and specifies to use uppercase,
475     decimal floating point formatting.
476
477     - Default precision is ``6`` decimal places unless explicitly specified.
478
479   - ``e``: Used for floating-point values and specifies to use lowercase,
480     exponential (scientific) formatting.
481
482     - Default precision is ``6`` decimal places unless explicitly specified.
483
484   - ``E``: Used for floating-point values and specifies to use uppercase,
485     exponential (scientific) formatting.
486
487     - Default precision is ``6`` decimal places unless explicitly specified.
488
489   - ``g``: Used for floating-point values and specified to use ``f`` or ``e``
490     formatting depending on which would be the shortest representation.
491
492     - Precision specifies the number of significant digits, not just digits
493       after the decimal place.
494
495     - If the precision is specified as ``0``, it is interpreted to mean ``1``.
496
497     - ``e`` formatting is used if the the exponent would be less than ``-4`` or
498       is greater than or equal to the precision.
499
500     - Trailing zeros are removed unless the ``#`` flag is set.
501
502     - A decimal point only appears if it is followed by a digit.
503
504     - ``NaN`` or infinities always follow ``f`` formatting.
505
506   - ``G``: Used for floating-point values and specified to use ``f`` or ``e``
507     formatting depending on which would be the shortest representation.
508
509     - Precision specifies the number of significant digits, not just digits
510       after the decimal place.
511
512     - If the precision is specified as ``0``, it is interpreted to mean ``1``.
513
514     - ``E`` formatting is used if the the exponent would be less than ``-4`` or
515       is greater than or equal to the precision.
516
517     - Trailing zeros are removed unless the ``#`` flag is set.
518
519     - A decimal point only appears if it is followed by a digit.
520
521     - ``NaN`` or infinities always follow ``F`` formatting.
522
523   - ``c``: Used for formatting a ``char`` value.
524
525   - ``s``: Used for formatting a string of ``char`` values.
526
527     - If width is specified, the null terminator character is included as a
528       character for width count.
529
530     - If precision is specified, no more ``char``\s than that value will be
531       written from the string (padding is used to fill additional width).
532
533   - ``p``: Used for formatting a pointer address.
534
535   - ``%``: Prints a single ``%``. Only valid as ``%%`` (supports no flags,
536     width, precision, or length modifiers).
537
538Underspecified details:
539
540- If both ``+`` and (space) flags appear, the (space) is ignored.
541- The ``+`` and (space) flags will error if used with ``c`` or ``s``.
542- The ``#`` flag will error if used with ``d``, ``i``, ``u``, ``c``, ``s``, or
543  ``p``.
544- The ``0`` flag will error if used with ``c``, ``s``, or ``p``.
545- Both ``+`` and (space) can work with the unsigned integer specifiers ``u``,
546  ``o``, ``x``, and ``X``.
547- If a length modifier is provided for an incorrect specifier, it is ignored.
548- The ``z`` length modifier will decode arugments as signed as long as ``d`` or
549  ``i`` is used.
550- ``p`` is implementation defined.
551
552  - For this implementation, it will print with a ``0x`` prefix and then the
553    pointer value was printed using ``%08X``.
554
555  - ``p`` supports the ``+``, ``-``, and (space) flags, but not the ``#`` or
556    ``0`` flags.
557
558  - None of the length modifiers are usable with ``p``.
559
560  - This implementation will try to adhere to user-specified width (assuming the
561    width provided is larger than the guaranteed minimum of ``10``).
562
563  - Specifying precision for ``p`` is considered an error.
564- Only ``%%`` is allowed with no other modifiers. Things like ``%+%`` will fail
565  to decode. Some C stdlib implementations support any modifiers being
566  present between ``%``, but ignore any for the output.
567- If a width is specified with the ``0`` flag for a negative value, the padded
568  ``0``\s will appear after the ``-`` symbol.
569- A precision of ``0`` for ``d``, ``i``, ``u``, ``o``, ``x``, or ``X`` means
570  that no character is written for the value ``0``.
571- Precision cannot be specified for ``c``.
572- Using ``*`` or fixed precision with the ``s`` specifier still requires the
573  string argument to be null-terminated. This is due to argument encoding
574  happening on the C/C++-side while the precision value is not read or
575  otherwise used until decoding happens in this Python code.
576
577Non-conformant details:
578
579- ``n`` specifier: We do not support the ``n`` specifier since it is impossible
580  for us to retroactively tell the original program how many characters have
581  been printed since this decoding happens a great deal of time after the
582  device sent it, usually on a separate processing device entirely.
583