xref: /aosp_15_r20/external/google-breakpad/docs/client_design.md (revision 9712c20fc9bbfbac4935993a2ca0b3958c5adad2)
1*9712c20fSFrederick Mayle# Breakpad Client Libraries
2*9712c20fSFrederick Mayle
3*9712c20fSFrederick Mayle## Objective
4*9712c20fSFrederick Mayle
5*9712c20fSFrederick MayleThe Breakpad client libraries are responsible for monitoring an application for
6*9712c20fSFrederick Maylecrashes (exceptions), handling them when they occur by generating a dump, and
7*9712c20fSFrederick Mayleproviding a means to upload dumps to a crash reporting server. These tasks are
8*9712c20fSFrederick Mayledivided between the “handler” (short for “exception handler”) library linked in
9*9712c20fSFrederick Mayleto an application being monitored for crashes, and the “sender” library,
10*9712c20fSFrederick Mayleintended to be linked in to a separate external program.
11*9712c20fSFrederick Mayle
12*9712c20fSFrederick Mayle## Background
13*9712c20fSFrederick Mayle
14*9712c20fSFrederick MayleAs one of the chief tasks of the client handler is to generate a dump, an
15*9712c20fSFrederick Mayleunderstanding of [dump files](processor_design.md) will aid in understanding the
16*9712c20fSFrederick Maylehandler.
17*9712c20fSFrederick Mayle
18*9712c20fSFrederick Mayle## Overview
19*9712c20fSFrederick Mayle
20*9712c20fSFrederick MayleBreakpad provides client libraries for each of its target platforms. Currently,
21*9712c20fSFrederick Maylethese exist for Windows on x86 and Mac OS X on both x86 and PowerPC. A Linux
22*9712c20fSFrederick Mayleimplementation has been written and is currently under review.
23*9712c20fSFrederick Mayle
24*9712c20fSFrederick MayleBecause the mechanisms for catching exceptions and the methods for obtaining the
25*9712c20fSFrederick Mayleinformation that a dump contains vary between operating systems, each target
26*9712c20fSFrederick Mayleoperating system requires a completely different handler implementation. Where
27*9712c20fSFrederick Maylemultiple CPUs are supported for a single operating system, the handler
28*9712c20fSFrederick Mayleimplementation will likely also require separate code for each processor type to
29*9712c20fSFrederick Mayleextract CPU-specific information. One of the goals of the Breakpad handler is to
30*9712c20fSFrederick Mayleprovide a prepackaged cross-platform system that masks many of these
31*9712c20fSFrederick Maylesystem-level differences and quirks from the application developer. Although the
32*9712c20fSFrederick Mayleunderlying implementations differ, the handler library for each system follows
33*9712c20fSFrederick Maylethe same set of principles and exposes a similar interface.
34*9712c20fSFrederick Mayle
35*9712c20fSFrederick MayleCode that wishes to take advantage of Breakpad should be linked against the
36*9712c20fSFrederick Maylehandler library, and should, at an appropriate time, install a Breakpad handler.
37*9712c20fSFrederick MayleFor applications, it is generally desirable to install the handler as early in
38*9712c20fSFrederick Maylethe start-up process as possible. Developers of library code using Breakpad to
39*9712c20fSFrederick Maylemonitor itself may wish to install a Breakpad handler when the library is
40*9712c20fSFrederick Mayleloaded, or may only want to install a handler when calls are made in to the
41*9712c20fSFrederick Maylelibrary.
42*9712c20fSFrederick Mayle
43*9712c20fSFrederick MayleThe handler can be triggered to generate a dump either by catching an exception
44*9712c20fSFrederick Mayleor at the request of the application itself. The latter case may be useful in
45*9712c20fSFrederick Mayledebugging assertions or other conditions where developers want to know how a
46*9712c20fSFrederick Mayleprogram got in to a specific non-crash state. After generating a dump, the
47*9712c20fSFrederick Maylehandler calls a user-specified callback function. The callback function may
48*9712c20fSFrederick Maylecollect additional data about the program’s state, quit the program, launch a
49*9712c20fSFrederick Maylecrash reporter application, or perform other tasks. Allowing for this
50*9712c20fSFrederick Maylefunctionality to be dictated by a callback function preserves flexibility.
51*9712c20fSFrederick Mayle
52*9712c20fSFrederick MayleThe sender library is also has a separate implementation for each supported
53*9712c20fSFrederick Mayleplatform, because of the varying interfaces for accessing network resources on
54*9712c20fSFrederick Mayledifferent operating systems. The sender transmits a dump along with other
55*9712c20fSFrederick Mayleapplication-defined information to a crash report server via HTTP. Because dumps
56*9712c20fSFrederick Maylemay contain sensitive data, the sender allows for the use of HTTPS.
57*9712c20fSFrederick Mayle
58*9712c20fSFrederick MayleThe canonical example of the entire client system would be for a monitored
59*9712c20fSFrederick Mayleapplication to link against the handler library, install a Breakpad handler from
60*9712c20fSFrederick Mayleits main function, and provide a callback to launch a small crash reporter
61*9712c20fSFrederick Mayleprogram. The crash reporter program would be linked against the sender library,
62*9712c20fSFrederick Mayleand would send the crash dump when launched. A separate process is recommended
63*9712c20fSFrederick Maylefor this function because of the unreliability inherent in doing any significant
64*9712c20fSFrederick Mayleamount of work from a crashed process.
65*9712c20fSFrederick Mayle
66*9712c20fSFrederick Mayle## Detailed Design
67*9712c20fSFrederick Mayle
68*9712c20fSFrederick Mayle### Exception Handler Installation
69*9712c20fSFrederick Mayle
70*9712c20fSFrederick MayleThe mechanisms for installing an exception handler vary between operating
71*9712c20fSFrederick Maylesystems. On Windows, it’s a relatively simple matter of making one call to
72*9712c20fSFrederick Mayleregister a [top-level exception
73*9712c20fSFrederick Maylefilter](http://msdn.microsoft.com/library/en-us/debug/base/setunhandledexceptionfilter.asp)
74*9712c20fSFrederick Maylecallback function. On most Unix-like systems such as Linux, processes are
75*9712c20fSFrederick Mayleinformed of exceptions by the delivery of a signal, so an exception handler
76*9712c20fSFrederick Mayletakes the form of a signal handler. The native mechanism to catch exceptions on
77*9712c20fSFrederick MayleMac OS X requires a large amount of code to set up a Mach port, identify it as
78*9712c20fSFrederick Maylethe exception port, and assign a thread to listen for an exception on that port.
79*9712c20fSFrederick MayleJust as the preparation of exception handlers differ, the manner in which they
80*9712c20fSFrederick Mayleare called differs as well. On Windows and most Unix-like systems, the handler
81*9712c20fSFrederick Mayleis called on the thread that caused the exception. On Mac OS X, the thread
82*9712c20fSFrederick Maylelistening to the exception port is notified that an exception has occurred. The
83*9712c20fSFrederick Mayledifferent implementations of the Breakpad handler libraries perform these tasks
84*9712c20fSFrederick Maylein the appropriate ways on each platform, while exposing a similar interface on
85*9712c20fSFrederick Mayleeach.
86*9712c20fSFrederick Mayle
87*9712c20fSFrederick MayleA Breakpad handler is embodied in an `ExceptionHandler` object. Because it’s a
88*9712c20fSFrederick MayleC++ object, `ExceptionHandler`s may be created as local variables, allowing them
89*9712c20fSFrederick Mayleto be installed and removed as functions are called and return. This provides
90*9712c20fSFrederick Mayleone possible way for a developer to monitor only a portion of an application for
91*9712c20fSFrederick Maylecrashes.
92*9712c20fSFrederick Mayle
93*9712c20fSFrederick Mayle### Exception Basics
94*9712c20fSFrederick Mayle
95*9712c20fSFrederick MayleOnce an application encounters an exception, it is in an indeterminate and
96*9712c20fSFrederick Maylepossibly hazardous state. Consequently, any code that runs after an exception
97*9712c20fSFrederick Mayleoccurs must take extreme care to avoid performing operations that might fail,
98*9712c20fSFrederick Maylehang, or cause additional exceptions. This task is not at all straightforward,
99*9712c20fSFrederick Mayleand the Breakpad handler library seeks to do it properly, accounting for all of
100*9712c20fSFrederick Maylethe minute details while allowing other application developers, even those with
101*9712c20fSFrederick Maylelittle systems programming experience, to reap the benefits. All of the Breakpad
102*9712c20fSFrederick Maylehandler code that executes after an exception occurs has been written according
103*9712c20fSFrederick Mayleto the following guidelines for safety at exception time:
104*9712c20fSFrederick Mayle
105*9712c20fSFrederick Mayle*   Use of the application heap is forbidden. The heap may be corrupt or
106*9712c20fSFrederick Mayle    otherwise unusable, and allocators may not function.
107*9712c20fSFrederick Mayle*   Resource allocation must be severely limited. The handler may create a new
108*9712c20fSFrederick Mayle    file to contain the dump, and it may attempt to launch a process to continue
109*9712c20fSFrederick Mayle    handling the crash.
110*9712c20fSFrederick Mayle*   Execution on the thread that caused the exception is significantly limited.
111*9712c20fSFrederick Mayle    The only code permitted to execute on this thread is the code necessary to
112*9712c20fSFrederick Mayle    transition handling to a dedicated preallocated handler thread, and the code
113*9712c20fSFrederick Mayle    to return from the exception handler.
114*9712c20fSFrederick Mayle*   Handlers shouldn’t handle crashes by attempting to walk stacks themselves,
115*9712c20fSFrederick Mayle    as stacks may be in inconsistent states. Dump generation should be performed
116*9712c20fSFrederick Mayle    by interfacing with the operating system’s memory manager and code module
117*9712c20fSFrederick Mayle    manager.
118*9712c20fSFrederick Mayle*   Library code, including runtime library code, must be avoided unless it
119*9712c20fSFrederick Mayle    provably meets the above guidelines. For example, this means that the STL
120*9712c20fSFrederick Mayle    string class may not be used, because it performs operations that attempt to
121*9712c20fSFrederick Mayle    allocate and use heap memory. It also means that many C runtime functions
122*9712c20fSFrederick Mayle    must be avoided, particularly on Windows, because of heap operations that
123*9712c20fSFrederick Mayle    they may perform.
124*9712c20fSFrederick Mayle
125*9712c20fSFrederick MayleA dedicated handler thread is used to preserve the state of the exception thread
126*9712c20fSFrederick Maylewhen an exception occurs: during dump generation, it is difficult if not
127*9712c20fSFrederick Mayleimpossible for a thread to accurately capture its own state. Performing all
128*9712c20fSFrederick Mayleexception-handling functions on a separate thread is also critical when handling
129*9712c20fSFrederick Maylestack-limit-exceeded exceptions. It would be hazardous to run out of stack space
130*9712c20fSFrederick Maylewhile attempting to handle an exception. Because of the rule against allocating
131*9712c20fSFrederick Mayleresources at exception time, the Breakpad handler library creates its handler
132*9712c20fSFrederick Maylethread when it installs its exception handler. On Mac OS X, this handler thread
133*9712c20fSFrederick Mayleis created during the normal setup of the exception handler, and the handler
134*9712c20fSFrederick Maylethread will be signaled directly in the event of an exception. On Windows and
135*9712c20fSFrederick MayleLinux, the handler thread is signaled by a small amount of code that executes on
136*9712c20fSFrederick Maylethe exception thread. Because the code that executes on the exception thread in
137*9712c20fSFrederick Maylethis case is small and safe, this does not pose a problem. Even when an
138*9712c20fSFrederick Mayleexception is caused by exceeding stack size limits, this code is sufficiently
139*9712c20fSFrederick Maylecompact to execute entirely within the stack’s guard page without causing an
140*9712c20fSFrederick Mayleexception.
141*9712c20fSFrederick Mayle
142*9712c20fSFrederick MayleThe handler thread may also be triggered directly by a user call, even when no
143*9712c20fSFrederick Mayleexception occurs, to allow dumps to be generated at any point deemed
144*9712c20fSFrederick Mayleinteresting.
145*9712c20fSFrederick Mayle
146*9712c20fSFrederick Mayle### Filter Callback
147*9712c20fSFrederick Mayle
148*9712c20fSFrederick MayleWhen the handler thread begins handling an exception, it calls an optional
149*9712c20fSFrederick Mayleuser-defined filter callback function, which is responsible for judging whether
150*9712c20fSFrederick MayleBreakpad’s handler should continue handling the exception or not. This mechanism
151*9712c20fSFrederick Mayleis provided for the benefit of library or plug-in code, whose developers may not
152*9712c20fSFrederick Maylebe interested in reports of crashes that occur outside of their modules but
153*9712c20fSFrederick Maylewithin processes hosting their code. If the filter callback indicates that it is
154*9712c20fSFrederick Maylenot interested in the exception, the Breakpad handler arranges for it to be
155*9712c20fSFrederick Mayledelivered to any previously-installed handler.
156*9712c20fSFrederick Mayle
157*9712c20fSFrederick Mayle### Dump Generation
158*9712c20fSFrederick Mayle
159*9712c20fSFrederick MayleAssuming that the filter callback approves (or does not exist), the handler
160*9712c20fSFrederick Maylewrites a dump in a directory specified by the application developer when the
161*9712c20fSFrederick Maylehandler was installed, using a previously generated unique identifier to avoid
162*9712c20fSFrederick Maylename collisions. The mechanics of dump generation also vary between platforms,
163*9712c20fSFrederick Maylebut in general, the process involves enumerating each thread of execution, and
164*9712c20fSFrederick Maylecapturing its state, including processor context and the active portion of its
165*9712c20fSFrederick Maylestack area. The dump also includes a list of the code modules loaded in to the
166*9712c20fSFrederick Mayleapplication, and an indicator of which thread generated the exception or
167*9712c20fSFrederick Maylerequested the dump. In order to avoid allocating memory during this process, the
168*9712c20fSFrederick Mayledump is written in place on disk.
169*9712c20fSFrederick Mayle
170*9712c20fSFrederick Mayle### Post-Dump Behavior
171*9712c20fSFrederick Mayle
172*9712c20fSFrederick MayleUpon completion of writing the dump, a second callback function is called. This
173*9712c20fSFrederick Maylecallback may be used to launch a separate crash reporting program or to collect
174*9712c20fSFrederick Mayleadditional data from the application. The callback may also be used to influence
175*9712c20fSFrederick Maylewhether Breakpad will treat the exception as handled or unhandled. Even after a
176*9712c20fSFrederick Mayledump is successfully generated, Breakpad can be made to behave as though it
177*9712c20fSFrederick Mayledidn’t actually handle an exception. This function may be useful for developers
178*9712c20fSFrederick Maylewho want to test their applications with Breakpad enabled but still retain the
179*9712c20fSFrederick Mayleability to use traditional debugging techniques. It also allows a
180*9712c20fSFrederick MayleBreakpad-enabled application to coexist with a platform’s native crash reporting
181*9712c20fSFrederick Maylesystem, such as Mac OS X’ [CrashReporter](http://developer.apple.com/technotes/tn2004/tn2123.html)
182*9712c20fSFrederick Mayleand [Windows Error Reporting](http://msdn.microsoft.com/isv/resources/wer/).
183*9712c20fSFrederick Mayle
184*9712c20fSFrederick MayleTypically, when Breakpad handles an exception fully and no debuggers are
185*9712c20fSFrederick Mayleinvolved, the crashed process will terminate.
186*9712c20fSFrederick Mayle
187*9712c20fSFrederick MayleAuthors of both callback functions that execute within a Breakpad handler are
188*9712c20fSFrederick Maylecautioned that their code will be run at exception time, and that as a result,
189*9712c20fSFrederick Maylethey should observe the same programming practices that the Breakpad handler
190*9712c20fSFrederick Mayleitself adheres to. Notably, if a callback is to be used to collect additional
191*9712c20fSFrederick Mayledata from an application, it should take care to read only “safe” data. This
192*9712c20fSFrederick Maylemight involve accessing only static memory locations that are updated
193*9712c20fSFrederick Mayleperiodically during the course of normal program execution.
194*9712c20fSFrederick Mayle
195*9712c20fSFrederick Mayle### Sender Library
196*9712c20fSFrederick Mayle
197*9712c20fSFrederick MayleThe Breakpad sender library provides a single function to send a crash report to
198*9712c20fSFrederick Maylea crash server. It accepts a crash server’s URL, a map of key-value parameters
199*9712c20fSFrederick Maylethat will accompany the dump, and the path to a dump file itself. Each of the
200*9712c20fSFrederick Maylekey-value parameters and the dump file are sent as distinct parts of a multipart
201*9712c20fSFrederick MayleHTTP POST request to the specified URL using the platform’s native HTTP
202*9712c20fSFrederick Maylefacilities. On Linux, [libcurl](http://curl.haxx.se/) is used for this function,
203*9712c20fSFrederick Mayleas it is the closest thing to a standard HTTP library available on that
204*9712c20fSFrederick Mayleplatform.
205*9712c20fSFrederick Mayle
206*9712c20fSFrederick Mayle## Future Plans
207*9712c20fSFrederick Mayle
208*9712c20fSFrederick MayleAlthough we’ve had great success with in-process dump generation by following
209*9712c20fSFrederick Mayleour guidelines for safe code at exception time, we are exploring options for
210*9712c20fSFrederick Mayleallowing dumps to be generated in a separate process, to further enhance the
211*9712c20fSFrederick Maylehandler library’s robustness.
212*9712c20fSFrederick Mayle
213*9712c20fSFrederick MayleOn Windows, we intend to offer tools to make it easier for Breakpad’s settings
214*9712c20fSFrederick Mayleto be managed by the native group policy management system.
215*9712c20fSFrederick Mayle
216*9712c20fSFrederick MayleWe also plan to offer tools that many developers would find desirable in the
217*9712c20fSFrederick Maylecontext of handling crashes, such as a mechanism to determine at launch if the
218*9712c20fSFrederick Mayleprogram last terminated in a crash, and a way to calculate “crashiness” in terms
219*9712c20fSFrederick Mayleof crashes over time or the number of application launches between crashes.
220*9712c20fSFrederick Mayle
221*9712c20fSFrederick MayleWe are also investigating methods to capture crashes that occur early in an
222*9712c20fSFrederick Mayleapplication’s launch sequence, including crashes that occur before a program’s
223*9712c20fSFrederick Maylemain function begins executing.
224