xref: /aosp_15_r20/external/google-breakpad/docs/processor_design.md (revision 9712c20fc9bbfbac4935993a2ca0b3958c5adad2)
1*9712c20fSFrederick Mayle# Breakpad Processor Library
2*9712c20fSFrederick Mayle
3*9712c20fSFrederick Mayle## Objective
4*9712c20fSFrederick Mayle
5*9712c20fSFrederick MayleThe Breakpad processor library is an open-source framework to access the the
6*9712c20fSFrederick Mayleinformation contained within crash dumps for multiple platforms, and to use that
7*9712c20fSFrederick Mayleinformation to produce stack traces showing the call chain of each thread in a
8*9712c20fSFrederick Mayleprocess. After processing, this data is made available to users of the library.
9*9712c20fSFrederick Mayle
10*9712c20fSFrederick Mayle## Background
11*9712c20fSFrederick Mayle
12*9712c20fSFrederick MayleThe Breakpad processor is intended to sit at the core of a comprehensive
13*9712c20fSFrederick Maylecrash-reporting system that does not require debugging information to be
14*9712c20fSFrederick Mayleprovided to those running applications being monitored. Some existing
15*9712c20fSFrederick Maylecrash-reporting systems, such as [GNOME](http://www.gnome.org/)’s Bug-Buddy and
16*9712c20fSFrederick Mayle[Apple](http://www.apple.com/)’s
17*9712c20fSFrederick Mayle[CrashReporter](http://developer.apple.com/technotes/tn2004/tn2123.html),
18*9712c20fSFrederick Maylerequire symbolic
19*9712c20fSFrederick Mayleinformation to be present on the end user’s computer; in the case of
20*9712c20fSFrederick MayleCrashReporter, the reports are transmitted only to Apple, not to third-party
21*9712c20fSFrederick Mayledevelopers. Other systems, such as [Microsoft](http://www.microsoft.com/)’s
22*9712c20fSFrederick Mayle[Windows Error Reporting](http://msdn.microsoft.com/isv/resources/wer/) and
23*9712c20fSFrederick MayleSupportSoft’s Talkback, transmit only a snapshot of a crashed process’ state,
24*9712c20fSFrederick Maylewhich can later be combined with symbolic debugging information without the need
25*9712c20fSFrederick Maylefor it to be present on end users’ computers. Because symbolic debugging
26*9712c20fSFrederick Mayleinformation consumes a large amount of space and is otherwise not needed during
27*9712c20fSFrederick Maylethe normal operation of software, and because some developers are reluctant to
28*9712c20fSFrederick Maylerelease debugging symbols to their customers, Breakpad follows the latter
29*9712c20fSFrederick Mayleapproach.
30*9712c20fSFrederick Mayle
31*9712c20fSFrederick MayleWe know of no currently-maintained crash-reporting systems that meet our
32*9712c20fSFrederick Maylerequirements, which are to: * allow for symbols to be separate from the
33*9712c20fSFrederick Mayleapplication, * handle crash reports from multiple platforms, * allow developers
34*9712c20fSFrederick Mayleto operate their own crash-reporting platform, and to * be open-source. Windows
35*9712c20fSFrederick MayleError Reporting only functions for Microsoft products, and requires the
36*9712c20fSFrederick Mayleinvolvement of Microsoft’s servers. Talkback, while cross-platform, has not been
37*9712c20fSFrederick Maylemaintained and at this point does not support Mac OS X on x86, which we consider
38*9712c20fSFrederick Mayleto be a significant platform. Talkback is also closed-source commercial
39*9712c20fSFrederick Maylesoftware, and has very specific requirements for its server platform.
40*9712c20fSFrederick Mayle
41*9712c20fSFrederick MayleWe are aware of Windows-only crash-reporting systems that leverage Microsoft’s
42*9712c20fSFrederick Mayledebugging interfaces. Such systems, even if extended to support dumps from other
43*9712c20fSFrederick Mayleplatforms, are tied to using Windows for at least a portion of the processor
44*9712c20fSFrederick Mayleplatform.
45*9712c20fSFrederick Mayle
46*9712c20fSFrederick Mayle## Overview
47*9712c20fSFrederick Mayle
48*9712c20fSFrederick MayleThe Breakpad processor itself is written in standard C++ and will work on a
49*9712c20fSFrederick Maylevariety of platforms. The dumps it accepts may also have been created on a
50*9712c20fSFrederick Maylevariety of systems. The library is able to combine dumps with symbolic debugging
51*9712c20fSFrederick Mayleinformation to create stack traces that include function signatures. The
52*9712c20fSFrederick Mayleprocessor library includes simple command-line tools to examine dumps and
53*9712c20fSFrederick Mayleprocess them, producing stack traces. It also exposes several layers of APIs
54*9712c20fSFrederick Mayleenabling crash-reporting systems to be built around the Breakpad processor.
55*9712c20fSFrederick Mayle
56*9712c20fSFrederick Mayle## Detailed Design
57*9712c20fSFrederick Mayle
58*9712c20fSFrederick Mayle### Dump Files
59*9712c20fSFrederick Mayle
60*9712c20fSFrederick MayleIn the processor, the dump data is of primary significance. Dumps typically
61*9712c20fSFrederick Maylecontain:
62*9712c20fSFrederick Mayle
63*9712c20fSFrederick Mayle*   CPU context (register data) as it was at the time the crash occurred, and an
64*9712c20fSFrederick Mayle    indication of which thread caused the crash. General-purpose registers are
65*9712c20fSFrederick Mayle    included, as are special-purpose registers such as the instruction pointer
66*9712c20fSFrederick Mayle    (program counter).
67*9712c20fSFrederick Mayle*   Information about each thread of execution within a crashed process,
68*9712c20fSFrederick Mayle    including:
69*9712c20fSFrederick Mayle    *   The memory region used for each thread’s stack.
70*9712c20fSFrederick Mayle    *   CPU context for each thread, which for various reasons is not the same
71*9712c20fSFrederick Mayle        as the crash context in the case of the crashed thread.
72*9712c20fSFrederick Mayle*   A list of loaded code segments (or modules), including:
73*9712c20fSFrederick Mayle    *   The name of the file (`.so`, `.exe`, `.dll`, etc.) which provides the
74*9712c20fSFrederick Mayle        code.
75*9712c20fSFrederick Mayle    *   The boundaries of the memory region in which the code segment is visible
76*9712c20fSFrederick Mayle        to the process.
77*9712c20fSFrederick Mayle    *   A reference to the debugging information for the code module, when such
78*9712c20fSFrederick Mayle        information is available.
79*9712c20fSFrederick Mayle
80*9712c20fSFrederick MayleOrdinarily, dumps are produced as a result of a crash, but other triggers may be
81*9712c20fSFrederick Mayleset to produce dumps at any time a developer deems appropriate. The Breakpad
82*9712c20fSFrederick Mayleprocessor can handle dumps in the minidump format, either generated by an
83*9712c20fSFrederick Mayle[Breakpad client “handler”](client_design.md) implementation, or by another
84*9712c20fSFrederick Mayleimplementation that produces dumps in this format. The
85*9712c20fSFrederick Mayle[DbgHelp.dll!MiniDumpWriteDump](http://msdn2.microsoft.com/en-us/library/ms680360.aspx)
86*9712c20fSFrederick Maylefunction on Windows
87*9712c20fSFrederick Mayleproduces dumps in this format, and is the basis for the Breakpad handler
88*9712c20fSFrederick Mayleimplementation on that platform.
89*9712c20fSFrederick Mayle
90*9712c20fSFrederick MayleThe [minidump format](http://msdn.microsoft.com/en-us/library/ms679293%28VS.85%29.aspx) is
91*9712c20fSFrederick Mayleessentially a simple container format, organized as a series of streams. Each
92*9712c20fSFrederick Maylestream contains some type of data relevant to the crash. A typical “normal”
93*9712c20fSFrederick Mayleminidump contains streams for the thread list, the module list, the CPU context
94*9712c20fSFrederick Mayleat the time of the crash, and various bits of additional system information.
95*9712c20fSFrederick MayleOther types of minidump can be generated, such as a full-memory minidump, which
96*9712c20fSFrederick Maylein addition to stack memory contains snapshots of all of a process’ mapped
97*9712c20fSFrederick Maylememory regions.
98*9712c20fSFrederick Mayle
99*9712c20fSFrederick MayleThe minidump format was chosen as Breakpad’s dump format because it has an
100*9712c20fSFrederick Mayleestablished track record on Windows, and it can be adapted to meet the needs of
101*9712c20fSFrederick Maylethe other platforms that Breakpad supports. Most other operating systems use
102*9712c20fSFrederick Mayle“core” files as their native dump formats, but the capabilities of core files
103*9712c20fSFrederick Maylevary across platforms, and because core files are usually presented in a
104*9712c20fSFrederick Mayleplatform’s native executable format, there are complications involved in
105*9712c20fSFrederick Mayleaccessing the data contained therein without the benefit of the header files
106*9712c20fSFrederick Maylethat define an executable format’s entire structure. Because minidumps are
107*9712c20fSFrederick Mayleleaner than a typical executable format, a redefinition of the format in a
108*9712c20fSFrederick Maylecross-platform header file, `minidump_format.h`, was a straightforward task.
109*9712c20fSFrederick MayleSimilarly, the capabilities of the minidump format are understood, and because
110*9712c20fSFrederick Mayleit provides an extensible container, any of Breakpad’s needs that could not be
111*9712c20fSFrederick Maylemet directly by the standard minidump format could likely be met by extending it
112*9712c20fSFrederick Mayleas needed. Finally, using this format means that the dump file is compatible
113*9712c20fSFrederick Maylewith native debugging tools at least on Windows. A possible future avenue for
114*9712c20fSFrederick Mayleexploration is the conversion of minidumps to core files, to enable this same
115*9712c20fSFrederick Maylebenefit on other platforms.
116*9712c20fSFrederick Mayle
117*9712c20fSFrederick MayleWe have already provided an extension to the minidump format that allows it to
118*9712c20fSFrederick Maylecarry dumps generated on systems with PowerPC processors. The format already
119*9712c20fSFrederick Mayleallows for variable CPUs, so our work in this area was limited to defining a
120*9712c20fSFrederick Maylecontext structure sufficient to represent the execution state of a PowerPC. We
121*9712c20fSFrederick Maylehave also defined an extension that allows minidumps to indicate which thread of
122*9712c20fSFrederick Mayleexecution requested a dump be produced for non-crash dumps.
123*9712c20fSFrederick Mayle
124*9712c20fSFrederick MayleOften, the information contained within a dump alone is sufficient to produce a
125*9712c20fSFrederick Maylefull stack backtrace for each thread. Certain optimizations that compilers
126*9712c20fSFrederick Mayleemploy in producing code frustrate this process. Specifically, the “frame
127*9712c20fSFrederick Maylepointer omission” optimization of x86 compilers can make it impossible to
128*9712c20fSFrederick Mayleproduce useful stack traces given only a stack snapshot and CPU context. In
129*9712c20fSFrederick Maylethese cases, however, compiler-emitted debugging information can aid in
130*9712c20fSFrederick Mayleproducing useful stack traces. The Breakpad processor is able to take advantage
131*9712c20fSFrederick Mayleof this debugging information as supplied by Microsoft’s C/C++ compiler, the
132*9712c20fSFrederick Mayleonly compiler to apply such optimizations by default. As a result, the Breakpad
133*9712c20fSFrederick Mayleprocessor can produce useful stack traces even from code with frame pointer
134*9712c20fSFrederick Mayleomission optimizations as produced by this compiler.
135*9712c20fSFrederick Mayle
136*9712c20fSFrederick Mayle### Symbol Files
137*9712c20fSFrederick Mayle
138*9712c20fSFrederick MayleThe [symbol files](symbol_files.md) that the Breakpad processor accepts allow
139*9712c20fSFrederick Maylefor frame pointer omission data, but this is only one of their capabilities.
140*9712c20fSFrederick MayleEach symbol file also includes information about the functions, source files,
141*9712c20fSFrederick Mayleand source code line numbers for a single module of code. A module is an
142*9712c20fSFrederick Mayleindividually-loadble chunk of code: these can be executables containing a main
143*9712c20fSFrederick Mayleprogram (`exe` files on Windows) or shared libraries (`.so` files on Linux,
144*9712c20fSFrederick Mayle`.dylib` files, frameworks, and bundles on Mac OS X, and `.dll` files on
145*9712c20fSFrederick MayleWindows). Dumps contain information about which of these modules were loaded at
146*9712c20fSFrederick Maylethe time the dump was produced, and given this information, the Breakpad
147*9712c20fSFrederick Mayleprocessor attempts to locate debugging symbols for the module through a
148*9712c20fSFrederick Mayleuser-supplied function embodied in a “symbol supplier.” Breakpad includes a
149*9712c20fSFrederick Maylesample symbol supplier, called `SimpleSymbolSupplier`, that is used by its
150*9712c20fSFrederick Maylecommand-line tools; this supplier locates symbol files by pathname.
151*9712c20fSFrederick Mayle`SimpleSymbolSupplier` is also available to other users of the Breakpad
152*9712c20fSFrederick Mayleprocessor library. This allows for the use of a simple reference implementation,
153*9712c20fSFrederick Maylebut preserves flexibility for users who may have more demanding symbol file
154*9712c20fSFrederick Maylestorage needs.
155*9712c20fSFrederick Mayle
156*9712c20fSFrederick MayleBreakpad’s symbol file format is text-based, and was defined to be fairly
157*9712c20fSFrederick Maylehuman-readable and to encompass the needs of multiple platforms. The Breakpad
158*9712c20fSFrederick Mayleprocessor itself does not operate directly with native symbol formats
159*9712c20fSFrederick Mayle([DWARF](http://dwarf.freestandards.org/) and
160*9712c20fSFrederick Mayle[STABS](http://sourceware.org/gdb/current/onlinedocs/stabs.html)
161*9712c20fSFrederick Mayleon most Unix-like systems,
162*9712c20fSFrederick Mayle[.pdb files](http://msdn2.microsoft.com/en-us/library/yd4f8bd1(VS.80).aspx)
163*9712c20fSFrederick Mayleon Windows),
164*9712c20fSFrederick Maylebecause of the complications in accessing potentially complex symbol formats
165*9712c20fSFrederick Maylewith slight variations between platforms, stored within different types of
166*9712c20fSFrederick Maylebinary formats. In the case of `.pdb` files, the debugging format is not even
167*9712c20fSFrederick Mayledocumented. Instead, Breakpad’s symbol files are produced on each platform,
168*9712c20fSFrederick Mayleusing specific debugging APIs where available, to convert native symbols to
169*9712c20fSFrederick MayleBreakpad’s cross-platform format.
170*9712c20fSFrederick Mayle
171*9712c20fSFrederick Mayle### Processing
172*9712c20fSFrederick Mayle
173*9712c20fSFrederick MayleMost commonly, a developer will enable an application to use Breakpad by
174*9712c20fSFrederick Maylebuilding it with a platform-specific [client “handler”](client_design.md)
175*9712c20fSFrederick Maylelibrary. After building the application, the developer will create symbol files
176*9712c20fSFrederick Maylefor Breakpad’s use using the included `dump_syms` or `symupload` tools, or
177*9712c20fSFrederick Mayleanother suitable tool, and place the symbol files where the processor’s symbol
178*9712c20fSFrederick Maylesupplier will be able to locate them.
179*9712c20fSFrederick Mayle
180*9712c20fSFrederick MayleWhen a dump file is given to the processor’s `MinidumpProcessor` class, it will
181*9712c20fSFrederick Mayleread it using its included minidump reader, contained in the `Minidump` family
182*9712c20fSFrederick Mayleof classes. It will collect information about the operating system and CPU that
183*9712c20fSFrederick Mayleproduced the dump, and determine whether the dump was produced as a result of a
184*9712c20fSFrederick Maylecrash or at the direct request of the application itself. It then loops over all
185*9712c20fSFrederick Mayleof the threads in a process, attempting to walk the stack associated with each
186*9712c20fSFrederick Maylethread. This process is achieved by the processor’s `Stackwalker` components, of
187*9712c20fSFrederick Maylewhich there are a slightly different implementations for each CPU type that the
188*9712c20fSFrederick Mayleprocessor is able to handle dumps from. Beginning with a thread’s context, and
189*9712c20fSFrederick Maylepossibly using debugging data, the stackwalker produces a list of stack frames,
190*9712c20fSFrederick Maylecontaining each instruction executed in the chain. These instructions are
191*9712c20fSFrederick Maylematched up with the modules that contributed them to a process, and the
192*9712c20fSFrederick Mayle`SymbolSupplier` is invoked to locate a symbol file. The symbol file is given to
193*9712c20fSFrederick Maylea `SourceLineResolver`, which matches the instruction up with a specific
194*9712c20fSFrederick Maylefunction name, source file, and line number, resulting in a representation of a
195*9712c20fSFrederick Maylestack frame that can easily be used to identify which code was executing.
196*9712c20fSFrederick Mayle
197*9712c20fSFrederick MayleThe results of processing are made available in a `ProcessState` object, which
198*9712c20fSFrederick Maylecontains a vector of threads, each containing a vector of stack frames.
199*9712c20fSFrederick Mayle
200*9712c20fSFrederick MayleFor small-scale use of the Breakpad processor, and for testing and debugging,
201*9712c20fSFrederick Maylethe `minidump_stackwalk` tool is provided. It invokes the processor and displays
202*9712c20fSFrederick Maylethe full results of processing, optionally allowing symbols to be provided to
203*9712c20fSFrederick Maylethe processor by a pathname-based symbol supplier, `SimpleSymbolSupplier`.
204*9712c20fSFrederick Mayle
205*9712c20fSFrederick MayleFor lower-level testing and debugging, the processor library also includes a
206*9712c20fSFrederick Mayle`minidump_dump` tool, which walks through an entire minidump file and displays
207*9712c20fSFrederick Mayleits contents in somewhat readable form.
208*9712c20fSFrederick Mayle
209*9712c20fSFrederick Mayle### Platform Support
210*9712c20fSFrederick Mayle
211*9712c20fSFrederick MayleThe Breakpad processor library is able to process dumps produced on Mac OS X
212*9712c20fSFrederick Maylesystems running on x86, x86-64, and PowerPC processors, on Windows and Linux
213*9712c20fSFrederick Maylesystems running on x86 or x86-64 processors, and on Android systems running ARM
214*9712c20fSFrederick Mayleor x86 processors. The processor library itself is written in standard C++, and
215*9712c20fSFrederick Mayleshould function properly in most Unix-like environments. It has been tested on
216*9712c20fSFrederick MayleLinux and Mac OS X.
217*9712c20fSFrederick Mayle
218*9712c20fSFrederick Mayle## Future Plans
219*9712c20fSFrederick Mayle
220*9712c20fSFrederick MayleThere are currently no firm plans or timetables to implement any of these
221*9712c20fSFrederick Maylefeatures, although they are possible avenues for future exploration.
222*9712c20fSFrederick Mayle
223*9712c20fSFrederick MayleThe symbol file format can be extended to carry information about the locations
224*9712c20fSFrederick Mayleof parameters and local variables as stored in stack frames and registers, and
225*9712c20fSFrederick Maylethe processor can use this information to provide enhanced stack traces showing
226*9712c20fSFrederick Maylefunction arguments and variable values.
227*9712c20fSFrederick Mayle
228*9712c20fSFrederick MayleOn Mac OS X and Linux, we can provide tools to convert files from the minidump
229*9712c20fSFrederick Mayleformat into the native core format. This will enable developers to open dump
230*9712c20fSFrederick Maylefiles in a native debugger, just as they are presently able to do with minidumps
231*9712c20fSFrederick Mayleon Windows.
232