1.. _module-pw_snapshot-setup: 2 3============================== 4Setting up a Snapshot Pipeline 5============================== 6 7------------------- 8Crash Handler Setup 9------------------- 10The Snapshot proto was designed first and foremost as a crash reporting format. 11This section covers how to set up a crash handler to capture Snapshots. 12 13.. image:: images/generic_crash_flow.svg 14 :width: 600 15 :alt: Generic crash handler flow 16 17A typical crash handler has two entry points: 18 191. A software entry path through developer-written ASSERT() or CHECK() calls 20 that indicate a device should go down for a crash if a condition is not met. 212. A hardware-triggered exception handler path that is initiated when a CPU 22 encounters a fault signal (invalid memory access, bad instruction, etc.). 23 24Before deferring to a common crash handler, these entry paths should disable 25interrupts to force the system into a single-threaded execution mode. This 26prevents other threads from operating on potentially bad data or clobbering 27system state that could be useful for debugging. 28 29The first step in a crash handler should always be a check for nested crashes to 30prevent infinitely recursive crashes. Once it's deemed it's safe to continue, 31the crash handler can re-initialize logging, initialize storage for crash report 32capture, and then build a snapshot to later be retrieved from the device. Once 33the crash report collection process is complete, some post-crash callbacks can 34be run on a best-effort basis to clean up the system before rebooting. For 35devices with debug port access, it's helpful to optionally hold the device in 36an infinite loop rather than resetting to allow developers to access the device 37via a hardware debugger. 38 39Assert Handler Setup 40==================== 41:ref:`pw_assert <module-pw_assert>` is Pigweed's entry point for software 42crashes. Route any existing assert functions through pw_assert to centralize the 43software crash path. You’ll need to create a :ref:`pw_assert backend 44<module-pw_assert-backend_api>` or a custom :ref:`pw_assert_basic handler 45<module-pw_assert_basic-custom_handler>` to pass collected information to a more 46sophisticated crash handler. One way to do this is to collect the data into a 47statically allocated struct that is passed to a common crash handler. It’s 48important to immediately disable interrupts to prevent the system from doing 49other things while in an impacted state. 50 51.. code-block:: cpp 52 53 // This can be be directly accessed by a crash handler 54 static CrashData crash_data; 55 extern "C" void pw_assert_basic_HandleFailure(const char* file_name, 56 int line_number, 57 const char* format, 58 ...) { 59 // Always disable interrupts first! How this is done depends 60 // on your platform. 61 __disable_irq(); 62 63 va_list args; 64 va_start(args, format); 65 crash_data.file_name = file_name; 66 crash_data.line_number = line_number; 67 crash_data.reason_fmt = format; 68 crash_data.reason_args = &args; 69 crash_data.cpu_state = nullptr; 70 71 HandleCrash(crash_data); 72 PW_UNREACHABLE; 73 } 74 75Exception Handler Setup 76======================= 77:ref:`pw_cpu_exception <module-pw_cpu_exception>` is Pigweed's recommended entry 78point for CPU-triggered faults (divide by zero, invalid memory access, etc.). 79You will need to provide a definition for pw_cpu_exception_DefaultHandler() that 80passes the exception state produced by pw_cpu_exception to your common crash 81handler. 82 83.. code-block:: cpp 84 85 static CrashData crash_data; 86 // This helper turns a format string to a va_list that can be used by the 87 // common crash handling path. 88 void HandleExceptionWithString(pw_cpu_exception_State& state, 89 const char* fmt, 90 ...) { 91 va_list args; 92 va_start(args, fmt); 93 crash_data.cpu_state = state; 94 crash_data.file_name = nullptr; 95 crash_data.reason_fmt = fmt; 96 crash_data.reason_args = &args; 97 98 HandleCrash(crash_data); 99 PW_UNREACHABLE; 100 } 101 102 extern "C" void pw_cpu_exception_DefaultHandler( 103 pw_cpu_exception_State* state) { 104 // Always disable interrupts first! How this is done depends 105 // on your platform. 106 __disable_irq(); 107 108 crash_data.state = cpu_state; 109 // The CFSR is an extremely useful register for understanding ARMv7-M and 110 // ARMv8-M CPU faults. Other architectures should put something else here. 111 HandleExceptionWithString(crash_data, 112 "Exception encountered, cfsr=0x%", 113 cpu_state->extended.cfsr); 114 } 115 116Common Crash Handler Setup 117========================== 118To minimize duplication of crash handling logic, it's good practice to route the 119pw_assert and pw_cpu_exception handlers to a common crash handling codepath. 120Ensure you can pass both pw_cpu_exception's CPU state and pw_assert's assert 121information to the shared handler. 122 123.. code-block:: cpp 124 125 struct CrashData { 126 pw_cpu_exception_State *cpu_state; 127 const char *reason_fmt; 128 const va_list *reason_args; 129 const char *file_name; 130 int line_number; 131 }; 132 133 // This function assumes interrupts are properly disabled BEFORE it is called. 134 [[noreturn]] void HandleCrash(CrashData& crash_info) { 135 // Handle crash 136 } 137 138In the crash handler your project can re-initialize a minimal subset of the 139system needed to safely capture a snapshot before rebooting the device. The 140remainder of this section focuses on ways you can improve the reliability and 141usability of your project's crash handler. 142 143Check for Nested Crashes 144------------------------ 145It’s important to include crash handler checks that prevent infinite recursive 146nesting of crashes. Maintain a static variable that checks the crash nesting 147depth. After one or two nested crashes, abort crash handling entirely and reset 148the device or sit in an infinite loop to wait for a hardware debugger to attach. 149It’s simpler to put this logic at the beginning of the shared crash handler, but 150if your assert/exception handlers are complex it might be safer to inject the 151checks earlier in both codepaths. 152 153.. code-block:: cpp 154 155 [[noreturn]] void HandleCrash(CrashData &crash_info) { 156 static size_t crash_depth = 0; 157 if (crash_depth > kMaxCrashDepth) { 158 Abort(/*run_callbacks=*/false); 159 } 160 crash_depth++; 161 ... 162 } 163 164Re-initialize Logging (Optional) 165-------------------------------- 166Logging can be helpful for debugging your crash handler, but depending on your 167device/system design may be challenging to safely support at crash time. To 168re-initialize logging, you’ll need to re-construct C++ objects and re-initialize 169any systems/hardware in the logging codepath. You may even need an entirely 170separate logging pipeline that is single-threaded and interrupt-safe. Depending 171on your system’s design, this may be difficult to set up. 172 173Reinitialize Dependencies 174------------------------- 175It's good practice to design a crash handler that can run before C++ static 176constructors have run. This means any initialization (whether manual or through 177constructors) that your crash handler depends on should be manually invoked at 178crash time. If an initialization step might not be safe, evaluate if it's 179possible to omit the dependency. 180 181System Cleanup 182-------------- 183After collecting a snapshot, some parts of your system may benefit from some 184cleanup before explicitly resetting a device. This might include flushing 185buffers or safely shutting down attached hardware. The order of shutdown should 186be deterministic, keeping in mind that any of these steps may have the potential 187of causing a nested crash that skips the remainder of the handlers and forces 188the device to immediately reset. 189 190---------------------- 191Snapshot Storage Setup 192---------------------- 193Use a storage class with a ``pw::stream::Writer`` interface to simplify 194capturing a pw_snapshot proto. This can be a :ref:`pw::BlobStore 195<module-pw_blob_store>`, an in-memory buffer that is flushed to flash, or a 196:ref:`pw::PersistentBuffer <module-pw_persistent_ram-persistent_buffer>` that 197lives in persistent memory. It's good practice to use lazy initialization for 198storage objects used by your Snapshot capture codepath. 199 200.. code-block:: cpp 201 202 // Persistent RAM objects are highly available. They don't rely on 203 // their constructor being run, and require no initialization. 204 PW_PLACE_IN_SECTION(".noinit") 205 pw::persistent_ram::PersistentBuffer<2048> persistent_snapshot; 206 207 void CaptureSnapshot(CrashInfo& crash_info) { 208 ... 209 persistent_snapshot.clear(); 210 PersistentBufferWriter& writer = persistent_snapshot.GetWriter(); 211 ... 212 } 213 214---------------------- 215Snapshot Capture Setup 216---------------------- 217 218.. note:: 219 220 These instructions do not yet use the ``pw::protobuf::StreamEncoder``. 221 222Capturing a snapshot is as simple as encoding any other proto message. Some 223modules provide helper functions that will populate parts of a Snapshot, which 224eases the burden of custom work that must be set up uniquely for each project. 225 226Capture Reason 227============== 228A snapshot's "reason" should be considered the single most important field in a 229captured snapshot. If a snapshot capture was triggered by a crash, this should 230be the assert string. Other entry paths should describe here why the snapshot 231was captured ("Host communication buffer full!", "Exception encountered at 2320x00000004", etc.). 233 234.. code-block:: cpp 235 236 Status CaptureSnapshot(CrashData& crash_info) { 237 // Temporary buffer for encoding "reason" to. 238 static std::byte temp_buffer[500]; 239 // Temporary buffer to encode serialized proto to before dumping to the 240 // final ``pw::stream::Writer``. 241 static std::byte proto_encode_buffer[512]; 242 ... 243 pw::protobuf::NestedEncoder<kMaxDepth> proto_encoder(proto_encode_buffer); 244 pw::snapshot::Snapshot::Encoder snapshot_encoder(&proto_encoder); 245 size_t length = snprintf(temp_buffer, 246 sizeof(temp_buffer, 247 crash_info.reason_fmt), 248 *crash_info.reason_args); 249 snapshot_encoder.WriteReason(temp_buffer, length)); 250 251 // Final encode and write. 252 Result<ConstByteSpan> encoded_proto = proto_encoder.Encode(); 253 PW_TRY(encoded_proto.status()); 254 PW_TRY(writer.Write(encoded_proto.value())); 255 ... 256 } 257 258Capture CPU State 259================= 260When using pw_cpu_exception, exceptions will automatically collect CPU state 261that can be directly dumped into a snapshot. As it's not always easy to describe 262a CPU exception in a single "reason" string, this captures the information 263needed to more verbosely automatically generate a descriptive reason at analysis 264time once the snapshot is retrieved from the device. 265 266.. code-block:: cpp 267 268 Status CaptureSnapshot(CrashData& crash_info) { 269 ... 270 271 proto_encoder.clear(); 272 273 // Write CPU state. 274 if (crash_info.cpu_state) { 275 PW_TRY(DumpCpuStateProto(snapshot_encoder.GetArmv7mCpuStateEncoder(), 276 *crash_info.cpu_state)); 277 278 // Final encode and write. 279 Result<ConstByteSpan> encoded_proto = proto_encoder.Encode(); 280 PW_TRY(encoded_proto.status()); 281 PW_TRY(writer.Write(encoded_proto.value())); 282 } 283 } 284 285----------------------- 286Snapshot Transfer Setup 287----------------------- 288Pigweed’s pw_rpc system is well suited for retrieving a snapshot from a device. 289Pigweed does not yet provide a generalized transfer service for moving files 290to/from a device. When this feature is added to Pigweed, this section will be 291updated to include guidance for connecting a storage system to a transfer 292service. 293 294---------------------- 295Snapshot Tooling Setup 296---------------------- 297When using the upstream ``Snapshot`` proto, you can directly use 298``pw_snapshot.process`` to process snapshots into human-readable dumps. If 299you've opted to extend Pigweed's snapshot proto, you'll likely want to extend 300the processing tooling to handle custom project data as well. This can be done 301by creating a light wrapper around 302``pw_snapshot.processor.process_snapshots()``. 303 304.. code-block:: python 305 306 def _process_hw_failures(serialized_snapshot: bytes) -> str: 307 """Custom handler that checks wheel state.""" 308 wheel_state = wheel_state_pb2.WheelStateSnapshot() 309 output = [] 310 wheel_state.ParseFromString(serialized_snapshot) 311 312 if len(wheel_state.wheels) != 2: 313 output.append(f'Expected 2 wheels, found {len(wheel_state.wheels)}') 314 315 if len(wheel_state.wheels) < 2: 316 output.append('Wheels fell off!') 317 318 # And more... 319 320 return '\n'.join(output) 321 322 323 def process_my_snapshots(serialized_snapshot: bytes) -> str: 324 """Runs the snapshot processor with a custom callback.""" 325 return pw_snapshot.processor.process_snapshots( 326 serialized_snapshot, user_processing_callback=_process_hw_failures) 327