1:tocdepth: 3 2 3.. _module-pw_tokenizer-detokenization: 4 5============== 6Detokenization 7============== 8.. pigweed-module-subpage:: 9 :name: pw_tokenizer 10 11Detokenization is the process of expanding a token to the string it represents 12and decoding its arguments. ``pw_tokenizer`` provides Python, C++ and 13TypeScript detokenization libraries. 14 15-------------------------------- 16Example: decoding tokenized logs 17-------------------------------- 18A project might tokenize its log messages with the 19:ref:`module-pw_tokenizer-base64-format`. Consider the following log file, which 20has four tokenized logs and one plain text log: 21 22.. code-block:: text 23 24 20200229 14:38:58 INF $HL2VHA== 25 20200229 14:39:00 DBG $5IhTKg== 26 20200229 14:39:20 DBG Crunching numbers to calculate probability of success 27 20200229 14:39:21 INF $EgFj8lVVAUI= 28 20200229 14:39:23 ERR $DFRDNwlOT1RfUkVBRFk= 29 30The project's log strings are stored in a database like the following: 31 32.. code-block:: 33 34 1c95bd1c, ,"Initiating retrieval process for recovery object" 35 2a5388e4, ,"Determining optimal approach and coordinating vectors" 36 3743540c, ,"Recovery object retrieval failed with status %s" 37 f2630112, ,"Calculated acceptable probability of success (%.2f%%)" 38 39Using the detokenizing tools with the database, the logs can be decoded: 40 41.. code-block:: text 42 43 20200229 14:38:58 INF Initiating retrieval process for recovery object 44 20200229 14:39:00 DBG Determining optimal algorithm and coordinating approach vectors 45 20200229 14:39:20 DBG Crunching numbers to calculate probability of success 46 20200229 14:39:21 INF Calculated acceptable probability of success (32.33%) 47 20200229 14:39:23 ERR Recovery object retrieval failed with status NOT_READY 48 49.. note:: 50 51 This example uses the :ref:`module-pw_tokenizer-base64-format`, which 52 occupies about 4/3 (133%) as much space as the default binary format when 53 encoded. For projects that wish to interleave tokenized with plain text, 54 using Base64 is a worthwhile tradeoff. 55 56------------------------ 57Detokenization in Python 58------------------------ 59To detokenize in Python, import ``Detokenizer`` from the ``pw_tokenizer`` 60package, and instantiate it with paths to token databases or ELF files. 61 62.. code-block:: python 63 64 import pw_tokenizer 65 66 detokenizer = pw_tokenizer.Detokenizer('path/to/database.csv', 'other/path.elf') 67 68 def process_log_message(log_message): 69 result = detokenizer.detokenize(log_message.payload) 70 self._log(str(result)) 71 72The ``pw_tokenizer`` package also provides the ``AutoUpdatingDetokenizer`` 73class, which can be used in place of the standard ``Detokenizer``. This class 74monitors database files for changes and automatically reloads them when they 75change. This is helpful for long-running tools that use detokenization. The 76class also supports filtering token domains for the given database files in the 77``<path>#<domain>`` format. 78 79For messages that are optionally tokenized and may be encoded as binary, 80Base64, or plaintext UTF-8, use 81:func:`pw_tokenizer.proto.decode_optionally_tokenized`. This will attempt to 82determine the correct method to detokenize and always provide a printable 83string. 84 85.. _module-pw_tokenizer-base64-decoding: 86 87Decoding Base64 88=============== 89The Python ``Detokenizer`` class supports decoding and detokenizing prefixed 90Base64 messages with ``detokenize_base64`` and related methods. 91 92.. tip:: 93 The Python detokenization tools support recursive detokenization for prefixed 94 Base64 text. Tokenized strings found in detokenized text are detokenized, so 95 prefixed Base64 messages can be passed as ``%s`` arguments. 96 97 For example, the tokenized string for "Wow!" is ``$RhYjmQ==``. This could be 98 passed as an argument to the printf-style string ``Nested message: %s``, which 99 encodes to ``$pEVTYQkkUmhZam1RPT0=``. The detokenizer would decode the message 100 as follows: 101 102 :: 103 104 "$pEVTYQkkUmhZam1RPT0=" → "Nested message: $RhYjmQ==" → "Nested message: Wow!" 105 106Base64 decoding is supported in C++ or C with the 107``pw::tokenizer::PrefixedBase64Decode`` or ``pw_tokenizer_PrefixedBase64Decode`` 108functions. 109 110Investigating undecoded Base64 messages 111--------------------------------------- 112Tokenized messages cannot be decoded if the token is not recognized. The Python 113package includes the ``parse_message`` tool, which parses tokenized Base64 114messages without looking up the token in a database. This tool attempts to guess 115the types of the arguments and displays potential ways to decode them. 116 117This tool can be used to extract argument information from an otherwise unusable 118message. It could help identify which statement in the code produced the 119message. This tool is not particularly helpful for tokenized messages without 120arguments, since all it can do is show the value of the unknown token. 121 122The tool is executed by passing Base64 tokenized messages, with or without the 123``$`` prefix, to ``pw_tokenizer.parse_message``. Pass ``-h`` or ``--help`` to 124see full usage information. 125 126Example 127^^^^^^^ 128.. code-block:: 129 130 $ python -m pw_tokenizer.parse_message '$329JMwA=' koSl524TRkFJTEVEX1BSRUNPTkRJVElPTgJPSw== --specs %s %d 131 132 INF Decoding arguments for '$329JMwA=' 133 INF Binary: b'\xdfoI3\x00' [df 6f 49 33 00] (5 bytes) 134 INF Token: 0x33496fdf 135 INF Args: b'\x00' [00] (1 bytes) 136 INF Decoding with up to 8 %s or %d arguments 137 INF Attempt 1: [%s] 138 INF Attempt 2: [%d] 0 139 140 INF Decoding arguments for '$koSl524TRkFJTEVEX1BSRUNPTkRJVElPTgJPSw==' 141 INF Binary: b'\x92\x84\xa5\xe7n\x13FAILED_PRECONDITION\x02OK' [92 84 a5 e7 6e 13 46 41 49 4c 45 44 5f 50 52 45 43 4f 4e 44 49 54 49 4f 4e 02 4f 4b] (28 bytes) 142 INF Token: 0xe7a58492 143 INF Args: b'n\x13FAILED_PRECONDITION\x02OK' [6e 13 46 41 49 4c 45 44 5f 50 52 45 43 4f 4e 44 49 54 49 4f 4e 02 4f 4b] (24 bytes) 144 INF Decoding with up to 8 %s or %d arguments 145 INF Attempt 1: [%d %s %d %d %d] 55 FAILED_PRECONDITION 1 -40 -38 146 INF Attempt 2: [%d %s %s] 55 FAILED_PRECONDITION OK 147 148 149.. _module-pw_tokenizer-protobuf-tokenization-python: 150 151Detokenizing protobufs 152====================== 153The :py:mod:`pw_tokenizer.proto` Python module defines functions that may be 154used to detokenize protobuf objects in Python. The function 155:py:func:`pw_tokenizer.proto.detokenize_fields` detokenizes all fields 156annotated as tokenized, replacing them with their detokenized version. For 157example: 158 159.. code-block:: python 160 161 my_detokenizer = pw_tokenizer.Detokenizer(some_database) 162 163 my_message = SomeMessage(tokenized_field=b'$YS1EMQ==') 164 pw_tokenizer.proto.detokenize_fields(my_detokenizer, my_message) 165 166 assert my_message.tokenized_field == b'The detokenized string! Cool!' 167 168Decoding optionally tokenized strings 169------------------------------------- 170The encoding used for an optionally tokenized field is not recorded in the 171protobuf. Despite this, the text can reliably be decoded. This is accomplished 172by attempting to decode the field as binary or Base64 tokenized data before 173treating it like plain text. 174 175The following diagram describes the decoding process for optionally tokenized 176fields in detail. 177 178.. mermaid:: 179 180 flowchart TD 181 start([Received bytes]) --> binary 182 183 binary[Decode as<br>binary tokenized] --> binary_ok 184 binary_ok{Detokenizes<br>successfully?} -->|no| utf8 185 binary_ok -->|yes| done_binary([Display decoded binary]) 186 187 utf8[Decode as UTF-8] --> utf8_ok 188 utf8_ok{Valid UTF-8?} -->|no| base64_encode 189 utf8_ok -->|yes| base64 190 191 base64_encode[Encode as<br>tokenized Base64] --> display 192 display([Display encoded Base64]) 193 194 base64[Decode as<br>Base64 tokenized] --> base64_ok 195 196 base64_ok{Fully<br>or partially<br>detokenized?} -->|no| is_plain_text 197 base64_ok -->|yes| base64_results 198 199 is_plain_text{Text is<br>printable?} -->|no| base64_encode 200 is_plain_text-->|yes| plain_text 201 202 base64_results([Display decoded Base64]) 203 plain_text([Display text]) 204 205Potential decoding problems 206--------------------------- 207The decoding process for optionally tokenized fields will yield correct results 208in almost every situation. In rare circumstances, it is possible for it to fail, 209but these can be avoided with a low-overhead mitigation if desired. 210 211There are two ways in which the decoding process may fail. 212 213Accidentally interpreting plain text as tokenized binary 214^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 215If a plain-text string happens to decode as a binary tokenized message, the 216incorrect message could be displayed. This is very unlikely to occur. While many 217tokens will incidentally end up being valid UTF-8 strings, it is highly unlikely 218that a device will happen to log one of these strings as plain text. The 219overwhelming majority of these strings will be nonsense. 220 221If an implementation wishes to guard against this extremely improbable 222situation, it is possible to prevent it. This situation is prevented by 223appending 0xFF (or another byte never valid in UTF-8) to binary tokenized data 224that happens to be valid UTF-8 (or all binary tokenized messages, if desired). 225When decoding, if there is an extra 0xFF byte, it is discarded. 226 227Displaying undecoded binary as plain text instead of Base64 228^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 229If a message fails to decode as binary tokenized and it is not valid UTF-8, it 230is displayed as tokenized Base64. This makes it easily recognizable as a 231tokenized message and makes it simple to decode later from the text output (for 232example, with an updated token database). 233 234A binary message for which the token is not known may coincidentally be valid 235UTF-8 or ASCII. 6.25% of 4-byte sequences are composed only of ASCII characters 236When decoding with an out-of-date token database, it is possible that some 237binary tokenized messages will be displayed as plain text rather than tokenized 238Base64. 239 240This situation is likely to occur, but should be infrequent. Even if it does 241happen, it is not a serious issue. A very small number of strings will be 242displayed incorrectly, but these strings cannot be decoded anyway. One nonsense 243string (e.g. ``a-D1``) would be displayed instead of another (``$YS1EMQ==``). 244Updating the token database would resolve the issue, though the non-Base64 logs 245would be difficult decode later from a log file. 246 247This situation can be avoided with the same approach described in 248`Accidentally interpreting plain text as tokenized binary`_. Appending 249an invalid UTF-8 character prevents the undecoded binary message from being 250interpreted as plain text. 251 252--------------------- 253Detokenization in C++ 254--------------------- 255The C++ detokenization libraries can be used in C++ or any language that can 256call into C++ with a C-linkage wrapper, such as Java or Rust. A reference 257Java Native Interface (JNI) implementation is provided. 258 259The C++ detokenization library uses binary-format token databases (created with 260``database.py create --type binary``). Read a binary format database from a 261file or include it in the source code. Pass the database array to 262``TokenDatabase::Create``, and construct a detokenizer. 263 264.. code-block:: cpp 265 266 Detokenizer detokenizer(TokenDatabase::Create(token_database_array)); 267 268 std::string ProcessLog(span<uint8_t> log_data) { 269 return detokenizer.Detokenize(log_data).BestString(); 270 } 271 272The ``TokenDatabase`` class verifies that its data is valid before using it. If 273it is invalid, the ``TokenDatabase::Create`` returns an empty database for which 274``ok()`` returns false. If the token database is included in the source code, 275this check can be done at compile time. 276 277.. code-block:: cpp 278 279 // This line fails to compile with a static_assert if the database is invalid. 280 constexpr TokenDatabase kDefaultDatabase = TokenDatabase::Create<kData>(); 281 282 Detokenizer OpenDatabase(std::string_view path) { 283 std::vector<uint8_t> data = ReadWholeFile(path); 284 285 TokenDatabase database = TokenDatabase::Create(data); 286 287 // This checks if the file contained a valid database. It is safe to use a 288 // TokenDatabase that failed to load (it will be empty), but it may be 289 // desirable to provide a default database or otherwise handle the error. 290 if (database.ok()) { 291 return Detokenizer(database); 292 } 293 return Detokenizer(kDefaultDatabase); 294 } 295 296---------------------------- 297Detokenization in TypeScript 298---------------------------- 299To detokenize in TypeScript, import ``Detokenizer`` from the ``pigweedjs`` 300package, and instantiate it with a CSV token database. 301 302.. code-block:: typescript 303 304 import { pw_tokenizer, pw_hdlc } from 'pigweedjs'; 305 const { Detokenizer } = pw_tokenizer; 306 const { Frame } = pw_hdlc; 307 308 const detokenizer = new Detokenizer(String(tokenCsv)); 309 310 function processLog(frame: Frame){ 311 const result = detokenizer.detokenize(frame); 312 console.log(result); 313 } 314 315For messages that are encoded in Base64, use ``Detokenizer::detokenizeBase64``. 316`detokenizeBase64` will also attempt to detokenize nested Base64 tokens. There 317is also `detokenizeUint8Array` that works just like `detokenize` but expects 318`Uint8Array` instead of a `Frame` argument. 319 320 321 322.. _module-pw_tokenizer-cli-detokenizing: 323 324--------------------- 325Detokenizing CLI tool 326--------------------- 327``pw_tokenizer`` provides two standalone command line utilities for detokenizing 328Base64-encoded tokenized strings. 329 330* ``detokenize.py`` -- Detokenizes Base64-encoded strings in files or from 331 stdin. 332* ``serial_detokenizer.py`` -- Detokenizes Base64-encoded strings from a 333 connected serial device. 334 335If the ``pw_tokenizer`` Python package is installed, these tools may be executed 336as runnable modules. For example: 337 338.. code-block:: 339 340 # Detokenize Base64-encoded strings in a file 341 python -m pw_tokenizer.detokenize -i input_file.txt 342 343 # Detokenize Base64-encoded strings in output from a serial device 344 python -m pw_tokenizer.serial_detokenizer --device /dev/ttyACM0 345 346See the ``--help`` options for these tools for full usage information. 347 348-------- 349Appendix 350-------- 351 352.. _module-pw_tokenizer-python-detokenization-c99-printf-notes: 353 354Python detokenization: C99 ``printf`` compatibility notes 355========================================================= 356This implementation is designed to align with the 357`C99 specification, section 7.19.6 358<https://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf>`_. 359Notably, this specification is slightly different than what is implemented 360in most compilers due to each compiler choosing to interpret undefined 361behavior in slightly different ways. Treat the following description as the 362source of truth. 363 364This implementation supports: 365 366- Overall Format: ``%[flags][width][.precision][length][specifier]`` 367- Flags (Zero or More) 368 - ``-``: Left-justify within the given field width; Right justification is 369 the default (see Width modifier). 370 - ``+``: Forces to preceed the result with a plus or minus sign (``+`` or 371 ``-``) even for positive numbers. By default, only negative numbers are 372 preceded with a ``-`` sign. 373 - (space): If no sign is going to be written, a blank space is inserted 374 before the value. 375 - ``#``: Specifies an alternative print syntax should be used. 376 - Used with ``o``, ``x`` or ``X`` specifiers the value is preceeded with 377 ``0``, ``0x`` or ``0X``, respectively, for values different than zero. 378 - Used with ``a``, ``A``, ``e``, ``E``, ``f``, ``F``, ``g``, or ``G`` it 379 forces the written output to contain a decimal point even if no more 380 digits follow. By default, if no digits follow, no decimal point is 381 written. 382 - ``0``: Left-pads the number with zeroes (``0``) instead of spaces when 383 padding is specified (see width sub-specifier). 384- Width (Optional) 385 - ``(number)``: Minimum number of characters to be printed. If the value to 386 be printed is shorter than this number, the result is padded with blank 387 spaces or ``0`` if the ``0`` flag is present. The value is not truncated 388 even if the result is larger. If the value is negative and the ``0`` flag 389 is present, the ``0``\s are padded after the ``-`` symbol. 390 - ``*``: The width is not specified in the format string, but as an 391 additional integer value argument preceding the argument that has to be 392 formatted. 393- Precision (Optional) 394 - ``.(number)`` 395 - For ``d``, ``i``, ``o``, ``u``, ``x``, ``X``, specifies the minimum 396 number of digits to be written. If the value to be written is shorter 397 than this number, the result is padded with leading zeros. The value is 398 not truncated even if the result is longer. 399 400 - A precision of ``0`` means that no character is written for the value 401 ``0``. 402 403 - For ``a``, ``A``, ``e``, ``E``, ``f``, and ``F``, specifies the number 404 of digits to be printed after the decimal point. By default, this is 405 ``6``. 406 407 - For ``g`` and ``G``, specifies the maximum number of significant digits 408 to be printed. 409 410 - For ``s``, specifies the maximum number of characters to be printed. By 411 default all characters are printed until the ending null character is 412 encountered. 413 414 - If the period is specified without an explicit value for precision, 415 ``0`` is assumed. 416 - ``.*``: The precision is not specified in the format string, but as an 417 additional integer value argument preceding the argument that has to be 418 formatted. 419- Length (Optional) 420 - ``hh``: Usable with ``d``, ``i``, ``o``, ``u``, ``x``, or ``X`` specifiers 421 to convey the argument will be a ``signed char`` or ``unsigned char``. 422 However, this is largely ignored in the implementation due to it not being 423 necessary for Python or argument decoding (since the argument is always 424 encoded at least as a 32-bit integer). 425 - ``h``: Usable with ``d``, ``i``, ``o``, ``u``, ``x``, or ``X`` specifiers 426 to convey the argument will be a ``signed short int`` or 427 ``unsigned short int``. However, this is largely ignored in the 428 implementation due to it not being necessary for Python or argument 429 decoding (since the argument is always encoded at least as a 32-bit 430 integer). 431 - ``l``: Usable with ``d``, ``i``, ``o``, ``u``, ``x``, or ``X`` specifiers 432 to convey the argument will be a ``signed long int`` or 433 ``unsigned long int``. Also is usable with ``c`` and ``s`` to specify that 434 the arguments will be encoded with ``wchar_t`` values (which isn't 435 different from normal ``char`` values). However, this is largely ignored in 436 the implementation due to it not being necessary for Python or argument 437 decoding (since the argument is always encoded at least as a 32-bit 438 integer). 439 - ``ll``: Usable with ``d``, ``i``, ``o``, ``u``, ``x``, or ``X`` specifiers 440 to convey the argument will be a ``signed long long int`` or 441 ``unsigned long long int``. This is required to properly decode the 442 argument as a 64-bit integer. 443 - ``L``: Usable with ``a``, ``A``, ``e``, ``E``, ``f``, ``F``, ``g``, or 444 ``G`` conversion specifiers applies to a long double argument. However, 445 this is ignored in the implementation due to floating point value encoded 446 that is unaffected by bit width. 447 - ``j``: Usable with ``d``, ``i``, ``o``, ``u``, ``x``, or ``X`` specifiers 448 to convey the argument will be a ``intmax_t`` or ``uintmax_t``. 449 - ``z``: Usable with ``d``, ``i``, ``o``, ``u``, ``x``, or ``X`` specifiers 450 to convey the argument will be a ``size_t``. This will force the argument 451 to be decoded as an unsigned integer. 452 - ``t``: Usable with ``d``, ``i``, ``o``, ``u``, ``x``, or ``X`` specifiers 453 to convey the argument will be a ``ptrdiff_t``. 454 - If a length modifier is provided for an incorrect specifier, it is ignored. 455- Specifier (Required) 456 - ``d`` / ``i``: Used for signed decimal integers. 457 458 - ``u``: Used for unsigned decimal integers. 459 460 - ``o``: Used for unsigned decimal integers and specifies formatting should 461 be as an octal number. 462 463 - ``x``: Used for unsigned decimal integers and specifies formatting should 464 be as a hexadecimal number using all lowercase letters. 465 466 - ``X``: Used for unsigned decimal integers and specifies formatting should 467 be as a hexadecimal number using all uppercase letters. 468 469 - ``f``: Used for floating-point values and specifies to use lowercase, 470 decimal floating point formatting. 471 472 - Default precision is ``6`` decimal places unless explicitly specified. 473 474 - ``F``: Used for floating-point values and specifies to use uppercase, 475 decimal floating point formatting. 476 477 - Default precision is ``6`` decimal places unless explicitly specified. 478 479 - ``e``: Used for floating-point values and specifies to use lowercase, 480 exponential (scientific) formatting. 481 482 - Default precision is ``6`` decimal places unless explicitly specified. 483 484 - ``E``: Used for floating-point values and specifies to use uppercase, 485 exponential (scientific) formatting. 486 487 - Default precision is ``6`` decimal places unless explicitly specified. 488 489 - ``g``: Used for floating-point values and specified to use ``f`` or ``e`` 490 formatting depending on which would be the shortest representation. 491 492 - Precision specifies the number of significant digits, not just digits 493 after the decimal place. 494 495 - If the precision is specified as ``0``, it is interpreted to mean ``1``. 496 497 - ``e`` formatting is used if the the exponent would be less than ``-4`` or 498 is greater than or equal to the precision. 499 500 - Trailing zeros are removed unless the ``#`` flag is set. 501 502 - A decimal point only appears if it is followed by a digit. 503 504 - ``NaN`` or infinities always follow ``f`` formatting. 505 506 - ``G``: Used for floating-point values and specified to use ``f`` or ``e`` 507 formatting depending on which would be the shortest representation. 508 509 - Precision specifies the number of significant digits, not just digits 510 after the decimal place. 511 512 - If the precision is specified as ``0``, it is interpreted to mean ``1``. 513 514 - ``E`` formatting is used if the the exponent would be less than ``-4`` or 515 is greater than or equal to the precision. 516 517 - Trailing zeros are removed unless the ``#`` flag is set. 518 519 - A decimal point only appears if it is followed by a digit. 520 521 - ``NaN`` or infinities always follow ``F`` formatting. 522 523 - ``c``: Used for formatting a ``char`` value. 524 525 - ``s``: Used for formatting a string of ``char`` values. 526 527 - If width is specified, the null terminator character is included as a 528 character for width count. 529 530 - If precision is specified, no more ``char``\s than that value will be 531 written from the string (padding is used to fill additional width). 532 533 - ``p``: Used for formatting a pointer address. 534 535 - ``%``: Prints a single ``%``. Only valid as ``%%`` (supports no flags, 536 width, precision, or length modifiers). 537 538Underspecified details: 539 540- If both ``+`` and (space) flags appear, the (space) is ignored. 541- The ``+`` and (space) flags will error if used with ``c`` or ``s``. 542- The ``#`` flag will error if used with ``d``, ``i``, ``u``, ``c``, ``s``, or 543 ``p``. 544- The ``0`` flag will error if used with ``c``, ``s``, or ``p``. 545- Both ``+`` and (space) can work with the unsigned integer specifiers ``u``, 546 ``o``, ``x``, and ``X``. 547- If a length modifier is provided for an incorrect specifier, it is ignored. 548- The ``z`` length modifier will decode arugments as signed as long as ``d`` or 549 ``i`` is used. 550- ``p`` is implementation defined. 551 552 - For this implementation, it will print with a ``0x`` prefix and then the 553 pointer value was printed using ``%08X``. 554 555 - ``p`` supports the ``+``, ``-``, and (space) flags, but not the ``#`` or 556 ``0`` flags. 557 558 - None of the length modifiers are usable with ``p``. 559 560 - This implementation will try to adhere to user-specified width (assuming the 561 width provided is larger than the guaranteed minimum of ``10``). 562 563 - Specifying precision for ``p`` is considered an error. 564- Only ``%%`` is allowed with no other modifiers. Things like ``%+%`` will fail 565 to decode. Some C stdlib implementations support any modifiers being 566 present between ``%``, but ignore any for the output. 567- If a width is specified with the ``0`` flag for a negative value, the padded 568 ``0``\s will appear after the ``-`` symbol. 569- A precision of ``0`` for ``d``, ``i``, ``u``, ``o``, ``x``, or ``X`` means 570 that no character is written for the value ``0``. 571- Precision cannot be specified for ``c``. 572- Using ``*`` or fixed precision with the ``s`` specifier still requires the 573 string argument to be null-terminated. This is due to argument encoding 574 happening on the C/C++-side while the precision value is not read or 575 otherwise used until decoding happens in this Python code. 576 577Non-conformant details: 578 579- ``n`` specifier: We do not support the ``n`` specifier since it is impossible 580 for us to retroactively tell the original program how many characters have 581 been printed since this decoding happens a great deal of time after the 582 device sent it, usually on a separate processing device entirely. 583