xref: /aosp_15_r20/external/google-breakpad/docs/symbol_files.md (revision 9712c20fc9bbfbac4935993a2ca0b3958c5adad2)
1*9712c20fSFrederick Mayle# Introduction
2*9712c20fSFrederick Mayle
3*9712c20fSFrederick MayleGiven a minidump file, the Breakpad processor produces stack traces that include
4*9712c20fSFrederick Maylefunction names and source locations. However, minidump files contain only the
5*9712c20fSFrederick Maylebyte-by-byte contents of threads' registers and stacks, without function names
6*9712c20fSFrederick Mayleor machine-code-to-source mapping data. The processor consults Breakpad symbol
7*9712c20fSFrederick Maylefiles for the information it needs to produce human-readable stack traces from
8*9712c20fSFrederick Maylethe binary-only minidump file.
9*9712c20fSFrederick Mayle
10*9712c20fSFrederick MayleThe platform-specific symbol dumping tools parse the debugging information the
11*9712c20fSFrederick Maylecompiler provides (whether as DWARF or STABS sections in an ELF file or as
12*9712c20fSFrederick Maylestand-alone PDB files), and write that information back out in the Breakpad
13*9712c20fSFrederick Maylesymbol file format. This format is much simpler and less detailed than compiler
14*9712c20fSFrederick Mayledebugging information, and values legibility over compactness.
15*9712c20fSFrederick Mayle
16*9712c20fSFrederick Mayle# Overview
17*9712c20fSFrederick Mayle
18*9712c20fSFrederick MayleBreakpad symbol files are ASCII text files, with lines delimited as appropriate
19*9712c20fSFrederick Maylefor the host platform. Each line is a _record_, divided into fields by single
20*9712c20fSFrederick Maylespaces; in some cases, the last field of the record can contain spaces. The
21*9712c20fSFrederick Maylefirst field is a string indicating what sort of record the line represents
22*9712c20fSFrederick Mayle(except for line records; these are very common, making them the default saves
23*9712c20fSFrederick Maylespace). Some fields hold decimal or hexadecimal numbers; hexadecimal numbers
24*9712c20fSFrederick Maylehave no "0x" prefix, and use lower-case letters.
25*9712c20fSFrederick Mayle
26*9712c20fSFrederick MayleBreakpad symbol files contain the following record types. With some
27*9712c20fSFrederick Maylerestrictions, these may appear in any order.
28*9712c20fSFrederick Mayle
29*9712c20fSFrederick Mayle*   A `MODULE` record describes the executable file or shared library from which
30*9712c20fSFrederick Mayle    this data was derived, for use by symbol suppliers. A `MODULE' record should
31*9712c20fSFrederick Mayle    be the first record in the file.
32*9712c20fSFrederick Mayle
33*9712c20fSFrederick Mayle*   A `FILE` record gives a source file name, and assigns it a number by which
34*9712c20fSFrederick Mayle    other records can refer to it.
35*9712c20fSFrederick Mayle
36*9712c20fSFrederick Mayle*   An `INLINE_ORIGIN` record holds an inline function name for `INLINE` records
37*9712c20fSFrederick Mayle    to refer to.
38*9712c20fSFrederick Mayle
39*9712c20fSFrederick Mayle*   A `FUNC` record describes a function present in the source code.
40*9712c20fSFrederick Mayle
41*9712c20fSFrederick Mayle*   An `INLINE` record describes the inline function's nest level, call site
42*9712c20fSFrederick Mayle    line and call site source file to which the given ranges of machine code
43*9712c20fSFrederick Mayle    should be attributed.
44*9712c20fSFrederick Mayle
45*9712c20fSFrederick Mayle*   A line record indicates to which source file and line a given range of
46*9712c20fSFrederick Mayle    machine code should be attributed. The line is attributed to the function
47*9712c20fSFrederick Mayle    defined by the most recent `FUNC` record.
48*9712c20fSFrederick Mayle
49*9712c20fSFrederick Mayle*   A `PUBLIC` record gives the address of a linker symbol.
50*9712c20fSFrederick Mayle
51*9712c20fSFrederick Mayle*   A `STACK` record provides information necessary to produce stack traces.
52*9712c20fSFrederick Mayle
53*9712c20fSFrederick Mayle# `MODULE` records
54*9712c20fSFrederick Mayle
55*9712c20fSFrederick MayleA `MODULE` record provides meta-information about the module the symbol file
56*9712c20fSFrederick Mayledescribes. It has the form:
57*9712c20fSFrederick Mayle
58*9712c20fSFrederick Mayle> `MODULE` _operatingsystem_ _architecture_ _id_ _name_
59*9712c20fSFrederick Mayle
60*9712c20fSFrederick MayleFor example: `MODULE Linux x86 D3096ED481217FD4C16B29CD9BC208BA0 firefox-bin
61*9712c20fSFrederick Mayle` These records provide meta-information about the executable or shared library
62*9712c20fSFrederick Maylefrom which this symbol file was generated. A symbol supplier might use this
63*9712c20fSFrederick Mayleinformation to find the correct symbol files to use to interpret a given
64*9712c20fSFrederick Mayleminidump, or to perform other sorts of validation. If present, a `MODULE` record
65*9712c20fSFrederick Mayleshould be the first line in the file.
66*9712c20fSFrederick Mayle
67*9712c20fSFrederick MayleThe fields are separated by spaces, and cannot contain spaces themselves, except
68*9712c20fSFrederick Maylefor _name_.
69*9712c20fSFrederick Mayle
70*9712c20fSFrederick Mayle*   The _operatingsystem_ field names the operating system on which the
71*9712c20fSFrederick Mayle    executable or shared library was intended to run. This field should have one
72*9712c20fSFrederick Mayle    of the following values:
73*9712c20fSFrederick Mayle
74*9712c20fSFrederick Mayle    | **Value** | **Meaning** |
75*9712c20fSFrederick Mayle    |:----------|:--------------------|
76*9712c20fSFrederick Mayle    | Linux | Linux |
77*9712c20fSFrederick Mayle    | mac | Macintosh OSX |
78*9712c20fSFrederick Mayle    | windows | Microsoft Windows |
79*9712c20fSFrederick Mayle
80*9712c20fSFrederick Mayle*   The _architecture_ field indicates what processor architecture the
81*9712c20fSFrederick Mayle    executable or shared library contains machine code for. This field should
82*9712c20fSFrederick Mayle    have one of the following values:
83*9712c20fSFrederick Mayle
84*9712c20fSFrederick Mayle    | **Value** | **Instruction Set Architecture** |
85*9712c20fSFrederick Mayle    |:----------|:---------------------------------|
86*9712c20fSFrederick Mayle    | x86 | Intel IA-32 |
87*9712c20fSFrederick Mayle    | x86\_64 | AMD64/Intel 64 |
88*9712c20fSFrederick Mayle    | ppc | 32-bit PowerPC |
89*9712c20fSFrederick Mayle    | ppc64 | 64-bit PowerPC |
90*9712c20fSFrederick Mayle    | unknown | unknown |
91*9712c20fSFrederick Mayle
92*9712c20fSFrederick Mayle*   The _id_ field is a sequence of hexadecimal digits that identifies the exact
93*9712c20fSFrederick Mayle    executable or library whose contents the symbol file describes. The way in
94*9712c20fSFrederick Mayle    which it is computed varies from platform to platform.
95*9712c20fSFrederick Mayle
96*9712c20fSFrederick Mayle*   The _name_ field contains the base name (the final component of the
97*9712c20fSFrederick Mayle    directory path) of the executable or library. It may contain spaces, and
98*9712c20fSFrederick Mayle    extends to the end of the line.
99*9712c20fSFrederick Mayle
100*9712c20fSFrederick Mayle# `FILE` records
101*9712c20fSFrederick Mayle
102*9712c20fSFrederick MayleA `FILE` record holds a source file name for other records to refer to. It has
103*9712c20fSFrederick Maylethe form:
104*9712c20fSFrederick Mayle
105*9712c20fSFrederick Mayle> `FILE` _number_ _name_
106*9712c20fSFrederick Mayle
107*9712c20fSFrederick MayleFor example: `FILE 2 /home/jimb/mc/in/browser/app/nsBrowserApp.cpp
108*9712c20fSFrederick Mayle`
109*9712c20fSFrederick Mayle
110*9712c20fSFrederick MayleA `FILE` record provides the name of a source file, and assigns it a number
111*9712c20fSFrederick Maylewhich other records (line records, in particular) can use to refer to that file
112*9712c20fSFrederick Maylename. The _number_ field is a decimal number. The _name_ field is the name of
113*9712c20fSFrederick Maylethe file; it may contain spaces.
114*9712c20fSFrederick Mayle
115*9712c20fSFrederick Mayle# `INLINE_ORIGIN` records
116*9712c20fSFrederick Mayle
117*9712c20fSFrederick MayleAn `INLINE_ORIGIN` record holds an inline function name for `INLINE` records to
118*9712c20fSFrederick Maylerefer to. It has the form:
119*9712c20fSFrederick Mayle
120*9712c20fSFrederick Mayle> `INLINE_ORIGIN` _number_ _name_
121*9712c20fSFrederick Mayle
122*9712c20fSFrederick MayleFor example: `INLINE_ORIGIN 2 nsQueryInterfaceWithError::operator()(nsID const&,
123*9712c20fSFrederick Maylevoid**) const
124*9712c20fSFrederick Mayle`
125*9712c20fSFrederick Mayle
126*9712c20fSFrederick MayleAn `INLINE_ORIGIN` record provides the name of an inline function, and assigns
127*9712c20fSFrederick Mayleit a number which other records (`INLINE` records, in particular) can use to
128*9712c20fSFrederick Maylerefer to that function name. The _number_ field is a decimal number. The _name_
129*9712c20fSFrederick Maylefield is the name of the inline function; it may contain spaces.
130*9712c20fSFrederick Mayle
131*9712c20fSFrederick Mayle# `FUNC` records
132*9712c20fSFrederick Mayle
133*9712c20fSFrederick MayleA `FUNC` record describes a source-language function. It has the form:
134*9712c20fSFrederick Mayle
135*9712c20fSFrederick Mayle> `FUNC` _[m]_ _address_ _size_ _parameter\_size_ _name_
136*9712c20fSFrederick Mayle
137*9712c20fSFrederick MayleFor example: `FUNC m c184 30 0 nsQueryInterfaceWithError::operator()(nsID const&,
138*9712c20fSFrederick Maylevoid**) const
139*9712c20fSFrederick Mayle`
140*9712c20fSFrederick Mayle
141*9712c20fSFrederick MayleThe _m_ field is optional. If present it indicates that multiple symbols
142*9712c20fSFrederick Maylereference this function's instructions. (In which case, only one symbol name is
143*9712c20fSFrederick Maylementioned within the breakpad file.) Multiple symbols referencing the same
144*9712c20fSFrederick Mayleinstructions may occur due to identical code folding by the linker.
145*9712c20fSFrederick Mayle
146*9712c20fSFrederick MayleThe _address_ and _size_ fields are hexadecimal numbers indicating the start
147*9712c20fSFrederick Mayleaddress and length in bytes of the machine code instructions the function
148*9712c20fSFrederick Mayleoccupies. (Breakpad symbol files cannot accurately describe functions whose code
149*9712c20fSFrederick Mayleis not contiguous.) The start address is relative to the module's load address.
150*9712c20fSFrederick Mayle
151*9712c20fSFrederick MayleThe _parameter\_size_ field is a hexadecimal number indicating the size, in
152*9712c20fSFrederick Maylebytes, of the arguments pushed on the stack for this function. Some calling
153*9712c20fSFrederick Mayleconventions, like the Microsoft Windows `stdcall` convention, require the called
154*9712c20fSFrederick Maylefunction to pop parameters passed to it on the stack from its caller before
155*9712c20fSFrederick Maylereturning. The stack walker uses this value, along with data from `STACK`
156*9712c20fSFrederick Maylerecords, to step from the called function's frame to the caller's frame.
157*9712c20fSFrederick Mayle
158*9712c20fSFrederick MayleThe _name_ field is the name of the function. In languages that use linker
159*9712c20fSFrederick Maylesymbol name mangling like C++, this should be the source language name (the
160*9712c20fSFrederick Mayle"unmangled" form). This field may contain spaces.
161*9712c20fSFrederick Mayle
162*9712c20fSFrederick Mayle# `INLINE` records
163*9712c20fSFrederick Mayle
164*9712c20fSFrederick MayleAn `INLINE` record describes the inline function's nest level, call site line
165*9712c20fSFrederick Mayleand call site source file to which the given ranges of machine code should be
166*9712c20fSFrederick Mayleattributed. It has the form:
167*9712c20fSFrederick Mayle
168*9712c20fSFrederick Mayle> `INLINE` _inline_nest_level_ _call_site_line_ _call_site_file_num_
169*9712c20fSFrederick Mayle> _origin_num_ [_address_ _size_]+
170*9712c20fSFrederick Mayle
171*9712c20fSFrederick MayleFor example: `INLINE 0 10 3 4 d30 2a fa1 b
172*9712c20fSFrederick Mayle`
173*9712c20fSFrederick Mayle
174*9712c20fSFrederick MayleThe _inline_nest_level_ field is a decimal number that means it's inlined at the
175*9712c20fSFrederick Maylefunction described by a previous `INLINE` record which has _inline_nest_level_
176*9712c20fSFrederick Mayleone less than its. In the example below, first and third `INLINE` records have
177*9712c20fSFrederick Mayle_inline_nest_level_ 0, which means they are inlined inside the function
178*9712c20fSFrederick Mayledescribed by the `FUNC` record. The second `INLINE` record has
179*9712c20fSFrederick Mayle_inline_nest_level_ 1 means that it's inlined at the inline function described
180*9712c20fSFrederick Mayleby first `INLINE` record.
181*9712c20fSFrederick Mayle```
182*9712c20fSFrederick MayleFUNC ...
183*9712c20fSFrederick MayleINLINE 0 ...
184*9712c20fSFrederick MayleINLINE 1 ...
185*9712c20fSFrederick MayleINLINE 0 ...
186*9712c20fSFrederick Mayle```
187*9712c20fSFrederick Mayle
188*9712c20fSFrederick MayleThe _call_site_line_ and _call_site_file_num_ fields are decimal numbers
189*9712c20fSFrederick Mayleindicating where this inline function being called at.
190*9712c20fSFrederick Mayle
191*9712c20fSFrederick MayleThe _origin_num_ field refers to an `INLINE_ORIGIN` record that has the name
192*9712c20fSFrederick Mayleof the inline function.
193*9712c20fSFrederick Mayle
194*9712c20fSFrederick MayleThe _address_ and _size_ fields are hexadecimal numbers indicating the start
195*9712c20fSFrederick Mayleaddress and length in bytes of the machine code. The address is relative to the
196*9712c20fSFrederick Maylemodule's load address. There could be more than one [_address_ _size_] range
197*9712c20fSFrederick Maylepair, since inline functions could have discontinuous address ranges. The ranges
198*9712c20fSFrederick Mayleof an `INLINE` record are always inside the ranges described by its parent
199*9712c20fSFrederick Maylerecord (a `FUNC` record or an `INLINE` record).
200*9712c20fSFrederick Mayle
201*9712c20fSFrederick MayleThe `INLINE` record is assumed to belong to the function described by the last
202*9712c20fSFrederick Maylepreceding `FUNC` record. `INLINE` records may not appear before the first `FUNC`
203*9712c20fSFrederick Maylerecord.
204*9712c20fSFrederick Mayle
205*9712c20fSFrederick Mayle# Line records
206*9712c20fSFrederick Mayle
207*9712c20fSFrederick MayleA line record describes the source file and line number to which a given range
208*9712c20fSFrederick Mayleof machine code should be attributed. It has the form:
209*9712c20fSFrederick Mayle
210*9712c20fSFrederick Mayle> _address_ _size_ _line_ _filenum_
211*9712c20fSFrederick Mayle
212*9712c20fSFrederick MayleFor example: `c184 7 59 4
213*9712c20fSFrederick Mayle`
214*9712c20fSFrederick Mayle
215*9712c20fSFrederick MayleBecause they are so common, line records do not begin with a string indicating
216*9712c20fSFrederick Maylethe record type. All other record types' names use upper-case letters;
217*9712c20fSFrederick Maylehexadecimal numbers, like a line record's _address_, use lower-case letters.
218*9712c20fSFrederick Mayle
219*9712c20fSFrederick MayleThe _address_ and _size_ fields are hexadecimal numbers indicating the start
220*9712c20fSFrederick Mayleaddress and length in bytes of the machine code. The address is relative to the
221*9712c20fSFrederick Maylemodule's load address.
222*9712c20fSFrederick Mayle
223*9712c20fSFrederick MayleThe _line_ field is the line number to which the machine code should be
224*9712c20fSFrederick Mayleattributed, in decimal; the first line of the source file is line number 1. The
225*9712c20fSFrederick Mayle_filenum_ field is a decimal number appearing in a prior `FILE` record; the name
226*9712c20fSFrederick Maylegiven in that record is the source file name for the machine code.
227*9712c20fSFrederick Mayle
228*9712c20fSFrederick MayleThe line is assumed to belong to the function described by the last preceding
229*9712c20fSFrederick Mayle`FUNC` record. Line records may not appear before the first `FUNC' record.
230*9712c20fSFrederick Mayle
231*9712c20fSFrederick MayleNo two line records in a symbol file cover the same range of addresses. However,
232*9712c20fSFrederick Maylethere may be many line records with identical line and file numbers, as a given
233*9712c20fSFrederick Maylesource line may contribute many non-contiguous blocks of machine code.
234*9712c20fSFrederick Mayle
235*9712c20fSFrederick Mayle# `PUBLIC` records
236*9712c20fSFrederick Mayle
237*9712c20fSFrederick MayleA `PUBLIC` record describes a publicly visible linker symbol, such as that used
238*9712c20fSFrederick Mayleto identify an assembly language entry point or region of memory. It has the
239*9712c20fSFrederick Mayleform:
240*9712c20fSFrederick Mayle
241*9712c20fSFrederick Mayle> PUBLIC _[m]_ _address_ _parameter\_size_ _name_
242*9712c20fSFrederick Mayle
243*9712c20fSFrederick MayleFor example: `PUBLIC m 2160 0 Public2_1
244*9712c20fSFrederick Mayle`
245*9712c20fSFrederick Mayle
246*9712c20fSFrederick MayleThe Breakpad processor essentially treats a `PUBLIC` record as defining a
247*9712c20fSFrederick Maylefunction with no line number data and an indeterminate size: the code extends to
248*9712c20fSFrederick Maylethe next address mentioned. If a given address is covered by both a `PUBLIC`
249*9712c20fSFrederick Maylerecord and a `FUNC` record, the processor uses the `FUNC` data.
250*9712c20fSFrederick Mayle
251*9712c20fSFrederick MayleThe _m_ field is optional. If present it indicates that multiple symbols
252*9712c20fSFrederick Maylereference this function's instructions. (In which case, only one symbol name is
253*9712c20fSFrederick Maylementioned within the breakpad file.) Multiple symbols referencing the same
254*9712c20fSFrederick Mayleinstructions may occur due to identical code folding by the linker.
255*9712c20fSFrederick Mayle
256*9712c20fSFrederick MayleThe _address_ field is a hexadecimal number indicating the symbol's address,
257*9712c20fSFrederick Maylerelative to the module's load address.
258*9712c20fSFrederick Mayle
259*9712c20fSFrederick MayleThe _parameter\_size_ field is a hexadecimal number indicating the size of the
260*9712c20fSFrederick Mayleparameters passed to the code whose entry point the symbol marks, if known. This
261*9712c20fSFrederick Maylefield has the same meaning as the _parameter\_size_ field of a `FUNC` record;
262*9712c20fSFrederick Maylesee that description for more details.
263*9712c20fSFrederick Mayle
264*9712c20fSFrederick MayleThe _name_ field is the name of the symbol. In languages that use linker symbol
265*9712c20fSFrederick Maylename mangling like C++, this should be the source language name (the "unmangled"
266*9712c20fSFrederick Mayleform). This field may contain spaces.
267*9712c20fSFrederick Mayle
268*9712c20fSFrederick Mayle# `STACK WIN` records
269*9712c20fSFrederick Mayle
270*9712c20fSFrederick MayleGiven a stack frame, a `STACK WIN` record indicates how to find the frame that
271*9712c20fSFrederick Maylecalled it. It has the form:
272*9712c20fSFrederick Mayle
273*9712c20fSFrederick Mayle> STACK WIN _type_ _rva_ _code\_size_ _prologue\_size_ _epilogue\_size_
274*9712c20fSFrederick Mayle> _parameter\_size_ _saved\_register\_size_ _local\_size_ _max\_stack\_size_
275*9712c20fSFrederick Mayle> _has\_program\_string_ _program\_string\_OR\_allocates\_base\_pointer_
276*9712c20fSFrederick Mayle
277*9712c20fSFrederick MayleFor example: `STACK WIN 4 2170 14 1 0 0 0 0 0 1 $eip 4 + ^ = $esp $ebp 8 + =
278*9712c20fSFrederick Mayle$ebp $ebp ^ =
279*9712c20fSFrederick Mayle`
280*9712c20fSFrederick Mayle
281*9712c20fSFrederick MayleAll fields of a `STACK WIN` record, except for the last, are hexadecimal
282*9712c20fSFrederick Maylenumbers.
283*9712c20fSFrederick Mayle
284*9712c20fSFrederick MayleThe _type_ field indicates what sort of stack frame data this record holds. Its
285*9712c20fSFrederick Maylevalue should be one of the values of the
286*9712c20fSFrederick Mayle[StackFrameTypeEnum](http://msdn.microsoft.com/en-us/library/bc5207xw%28VS.100%29.aspx)
287*9712c20fSFrederick Mayletype in Microsoft's
288*9712c20fSFrederick Mayle[Debug Interface Access (DIA)](http://msdn.microsoft.com/en-us/library/x93ctkx8%28VS.100%29.aspx) API.
289*9712c20fSFrederick MayleBreakpad uses only records of type 4 (`FrameTypeFrameData`) and 0
290*9712c20fSFrederick Mayle(`FrameTypeFPO`); it ignores others. These types differ only in whether the last
291*9712c20fSFrederick Maylefield is an _allocates\_base\_pointer_ flag (`FrameTypeFPO`) or a program string
292*9712c20fSFrederick Mayle(`FrameTypeFrameData`). If more than one record covers a given address, Breakpad
293*9712c20fSFrederick Mayleprefers `FrameTypeFrameData` records over `FrameTypeFPO` records.
294*9712c20fSFrederick Mayle
295*9712c20fSFrederick MayleThe _rva_ and _code\_size_ fields give the starting address and length in bytes
296*9712c20fSFrederick Mayleof the machine code covered by this record. The starting address is relative to
297*9712c20fSFrederick Maylethe module's load address.
298*9712c20fSFrederick Mayle
299*9712c20fSFrederick MayleThe _prologue\_size_ and _epilogue\_size_ fields give the length, in bytes, of
300*9712c20fSFrederick Maylethe prologue and epilogue machine code within the record's range. Breakpad does
301*9712c20fSFrederick Maylenot use these values.
302*9712c20fSFrederick Mayle
303*9712c20fSFrederick MayleThe _parameter\_size_ field gives the number of argument bytes this function
304*9712c20fSFrederick Mayleexpects to have been passed. This field has the same meaning as the
305*9712c20fSFrederick Mayle_parameter\_size_ field of a `FUNC` record; see that description for more
306*9712c20fSFrederick Mayledetails.
307*9712c20fSFrederick Mayle
308*9712c20fSFrederick MayleThe _saved\_register\_size_ field gives the number of bytes in the stack frame
309*9712c20fSFrederick Maylededicated to preserving the values of any callee-saves registers used by this
310*9712c20fSFrederick Maylefunction.
311*9712c20fSFrederick Mayle
312*9712c20fSFrederick MayleThe _local\_size_ field gives the number of bytes in the stack frame dedicated
313*9712c20fSFrederick Mayleto holding the function's local variables and temporary values.
314*9712c20fSFrederick Mayle
315*9712c20fSFrederick MayleThe _max\_stack\_size_ field gives the maximum number of bytes pushed on the
316*9712c20fSFrederick Maylestack in the frame. Breakpad does not use this value.
317*9712c20fSFrederick Mayle
318*9712c20fSFrederick MayleIf the _has\_program\_string_ field is zero, then the `STACK WIN` record's final
319*9712c20fSFrederick Maylefield is an _allocates\_base\_pointer_ flag, as a hexadecimal number; this is
320*9712c20fSFrederick Mayleexpected for records whose _type_ is 0. Otherwise, the final field is a program
321*9712c20fSFrederick Maylestring.
322*9712c20fSFrederick Mayle
323*9712c20fSFrederick Mayle## Interpreting a `STACK WIN` record
324*9712c20fSFrederick Mayle
325*9712c20fSFrederick MayleGiven the register values for a frame F, we can find the calling frame as
326*9712c20fSFrederick Maylefollows:
327*9712c20fSFrederick Mayle
328*9712c20fSFrederick Mayle*   If the _has\_program\_string_ field of a `STACK WIN` record is zero, then
329*9712c20fSFrederick Mayle    the final field is _allocates\_base\_pointer_, a flag indicating whether the
330*9712c20fSFrederick Mayle    frame uses the frame pointer register, `%ebp`, as a general-purpose
331*9712c20fSFrederick Mayle    register.
332*9712c20fSFrederick Mayle    *   If _allocates\_base\_pointer_ is true, then `%ebp` does not point to the
333*9712c20fSFrederick Mayle        frame's base address. Instead,
334*9712c20fSFrederick Mayle        *   Let _next\_parameter\_size_ be the parameter size of the function
335*9712c20fSFrederick Mayle            frame F called (**not** this record's _parameter\_size_ field), or
336*9712c20fSFrederick Mayle            zero if F is the youngest frame on the stack. You must find this
337*9712c20fSFrederick Mayle            value in F's callee's `FUNC`, `STACK WIN`, or `PUBLIC` records.
338*9712c20fSFrederick Mayle        *   Let _frame\_size_ be the sum of the _local\_size_ field, the
339*9712c20fSFrederick Mayle            _saved\_register\_size_ field, and _next\_parameter\_size_. > > With
340*9712c20fSFrederick Mayle            those definitions in place, we can recover the calling frame as
341*9712c20fSFrederick Mayle            follows:
342*9712c20fSFrederick Mayle        *   F's return address is at `%esp +`_frame\_size_,
343*9712c20fSFrederick Mayle        *   the caller's value of `%ebp` is saved at `%esp
344*9712c20fSFrederick Mayle            +`_next\_parameter\_size_`+`_saved\_register\_size_`- 8`, and
345*9712c20fSFrederick Mayle        *   the caller's value of `%esp` just before the call instruction was
346*9712c20fSFrederick Mayle            `%esp +`_frame\_size_`+ 4`. > > (Why do we include
347*9712c20fSFrederick Mayle            _next\_parameter\_size_ in the sum when computing _frame\_size_ and
348*9712c20fSFrederick Mayle            the address of the saved `%ebp`? When a function A has called a
349*9712c20fSFrederick Mayle            function B, the arguments that A pushed for B are considered part of
350*9712c20fSFrederick Mayle            A's stack frame: A's value for `%esp` points at the last argument
351*9712c20fSFrederick Mayle            pushed for B. Thus, we must include the size of those arguments
352*9712c20fSFrederick Mayle            (given by the debugging info for B) along with the size of A's
353*9712c20fSFrederick Mayle            register save area and local variable area (given by the debugging
354*9712c20fSFrederick Mayle            info for A) when computing the overall size of A's frame.)
355*9712c20fSFrederick Mayle    *   If _allocates\_base\_pointer_ is false, then F's function doesn't use
356*9712c20fSFrederick Mayle        `%ebp` at all. You may recover the calling frame as above, except that
357*9712c20fSFrederick Mayle        the caller's value of `%ebp` is the same as F's value for `%ebp`, so no
358*9712c20fSFrederick Mayle        steps are necessary to recover it.
359*9712c20fSFrederick Mayle*   If the _has\_program\_string_ field of a `STACK WIN` record is not zero,
360*9712c20fSFrederick Mayle    then the record's final field is a string containing a program to be
361*9712c20fSFrederick Mayle    interpreted to recover the caller's frame. The comments in the
362*9712c20fSFrederick Mayle    [postfix\_evaluator.h](../src/processor/postfix_evaluator.h#40)
363*9712c20fSFrederick Mayle    header file explain the language in which the program is written. You should
364*9712c20fSFrederick Mayle    place the following variables in the dictionary before interpreting the
365*9712c20fSFrederick Mayle    program:
366*9712c20fSFrederick Mayle    *   `$ebp` and `$esp` should be the values of the `%ebp` and `%esp`
367*9712c20fSFrederick Mayle        registers in F.
368*9712c20fSFrederick Mayle    *   `.cbParams`, `.cbSavedRegs`, and `.cbLocals`, should be the values of
369*9712c20fSFrederick Mayle        the `STACK WIN` record's _parameter\_size_, _saved\_register\_size_, and
370*9712c20fSFrederick Mayle        _local\_size_ fields.
371*9712c20fSFrederick Mayle    *   `.raSearchStart` should be set to the address on the stack to begin
372*9712c20fSFrederick Mayle        scanning for a return address, if necessary. The Breakpad processor sets
373*9712c20fSFrederick Mayle        this to the value of `%esp` in F, plus the _frame\_size_ value mentioned
374*9712c20fSFrederick Mayle        above.
375*9712c20fSFrederick Mayle
376*9712c20fSFrederick Mayle> If the program stores values for `$eip`, `$esp`, `$ebp`, `$ebx`, `$esi`, or
377*9712c20fSFrederick Mayle> `$edi`, then those are the values of the given registers in the caller. If the
378*9712c20fSFrederick Mayle> value of `$eip` is zero, that indicates that the end of the stack has been
379*9712c20fSFrederick Mayle> reached.
380*9712c20fSFrederick Mayle
381*9712c20fSFrederick MayleThe Breakpad processor checks that the value yielded by the above for the
382*9712c20fSFrederick Maylecalling frame's instruction address refers to known code; if the address seems
383*9712c20fSFrederick Mayleto be bogus, then it uses a heuristic search to find F's return address and
384*9712c20fSFrederick Maylestack base.
385*9712c20fSFrederick Mayle
386*9712c20fSFrederick Mayle# `STACK CFI` records
387*9712c20fSFrederick Mayle
388*9712c20fSFrederick Mayle`STACK CFI` ("Call Frame Information") records describe how to walk the stack
389*9712c20fSFrederick Maylewhen execution is at a given machine instruction. These records take one of two
390*9712c20fSFrederick Mayleforms:
391*9712c20fSFrederick Mayle
392*9712c20fSFrederick Mayle> `STACK CFI INIT` _address_ _size_ _register<sub>1</sub>_:
393*9712c20fSFrederick Mayle> _expression<sub>1</sub>_ _register<sub>2</sub>_: _expression<sub>2</sub>_ ...
394*9712c20fSFrederick Mayle>
395*9712c20fSFrederick Mayle> `STACK CFI` _address_ _register<sub>1</sub>_: _expression<sub>1</sub>_
396*9712c20fSFrederick Mayle> _register<sub>2</sub>_: _expression<sub>2</sub>_ ...
397*9712c20fSFrederick Mayle
398*9712c20fSFrederick MayleFor example:
399*9712c20fSFrederick Mayle
400*9712c20fSFrederick Mayle```
401*9712c20fSFrederick MayleSTACK CFI INIT 804c4b0 40 .cfa: $esp 4 + $eip: .cfa 4 - ^
402*9712c20fSFrederick MayleSTACK CFI 804c4b1 .cfa: $esp 8 + $ebp: .cfa 8 - ^
403*9712c20fSFrederick Mayle```
404*9712c20fSFrederick Mayle
405*9712c20fSFrederick MayleThe _address_ and _size_ fields are hexadecimal numbers. Each
406*9712c20fSFrederick Mayle_register_<sub>i</sub> is the name of a register or pseudoregister. Each
407*9712c20fSFrederick Mayle_expression_ is a Breakpad postfix expression, which may contain spaces, but
408*9712c20fSFrederick Maylenever ends with a colon. (The appropriate register names for a given
409*9712c20fSFrederick Maylearchitecture are determined when `STACK CFI` records are first enabled for that
410*9712c20fSFrederick Maylearchitecture, and should be documented in the appropriate
411*9712c20fSFrederick Mayle`stackwalker_`_architecture_`.cc` source file.)
412*9712c20fSFrederick Mayle
413*9712c20fSFrederick MayleSTACK CFI records describe, at each machine instruction in a given function, how
414*9712c20fSFrederick Mayleto recover the values the machine registers had in the function's caller.
415*9712c20fSFrederick MayleNaturally, some registers' values are simply lost, but there are three cases in
416*9712c20fSFrederick Maylewhich they can be recovered:
417*9712c20fSFrederick Mayle
418*9712c20fSFrederick Mayle*   You can always recover the program counter, because that's the function's
419*9712c20fSFrederick Mayle    return address. If the function is ever going to return, the PC must be
420*9712c20fSFrederick Mayle    saved somewhere.
421*9712c20fSFrederick Mayle
422*9712c20fSFrederick Mayle*   You can always recover the stack pointer. The function is responsible for
423*9712c20fSFrederick Mayle    popping its stack frame before it returns to the caller, so it must be able
424*9712c20fSFrederick Mayle    to restore this, as well.
425*9712c20fSFrederick Mayle
426*9712c20fSFrederick Mayle*   You should be able to recover the values of callee-saves registers. These
427*9712c20fSFrederick Mayle    are registers whose values the callee must preserve, either by saving them
428*9712c20fSFrederick Mayle    in its own stack frame before using them and re-loading them before
429*9712c20fSFrederick Mayle    returning, or by not using them at all.
430*9712c20fSFrederick Mayle
431*9712c20fSFrederick Mayle(As an exception, note that functions which never return may not save any of
432*9712c20fSFrederick Maylethis data. It may not be possible to walk the stack past such functions' stack
433*9712c20fSFrederick Mayleframes.)
434*9712c20fSFrederick Mayle
435*9712c20fSFrederick MayleGiven rules for recovering the values of a function's caller's registers, we can
436*9712c20fSFrederick Maylewalk up the stack. Starting with the current set of registers --- the PC of the
437*9712c20fSFrederick Mayleinstruction we're currently executing, the current stack pointer, etc. --- we
438*9712c20fSFrederick Mayleuse CFI to recover the values those registers had in the caller of the current
439*9712c20fSFrederick Mayleframe. This gives us a PC in the caller whose CFI we can look up; we apply the
440*9712c20fSFrederick Mayleprocess again to find that function's caller; and so on.
441*9712c20fSFrederick Mayle
442*9712c20fSFrederick MayleConcretely, CFI records represent a table with a row for each machine
443*9712c20fSFrederick Mayleinstruction address and a column for each register. The table entry for a given
444*9712c20fSFrederick Mayleaddress and register contains a rule describing how, when the PC is at that
445*9712c20fSFrederick Mayleaddress, to restore the value that register had in the caller.
446*9712c20fSFrederick Mayle
447*9712c20fSFrederick MayleThere are some special columns:
448*9712c20fSFrederick Mayle
449*9712c20fSFrederick Mayle*   A column named `.cfa`, for "Canonical Frame Address", tells how to compute
450*9712c20fSFrederick Mayle    the base address of the frame; other entries can refer to the CFA in their
451*9712c20fSFrederick Mayle    rules.
452*9712c20fSFrederick Mayle
453*9712c20fSFrederick Mayle*   A column named `.ra` represents the return address.
454*9712c20fSFrederick Mayle
455*9712c20fSFrederick MayleFor example, suppose we have a machine with 32-bit registers, one-byte
456*9712c20fSFrederick Mayleinstructions, a stack that grows downwards, and an assembly language that
457*9712c20fSFrederick Mayleresembles C. Suppose further that we have a function whose machine code looks
458*9712c20fSFrederick Maylelike this:
459*9712c20fSFrederick Mayle
460*9712c20fSFrederick Mayle```
461*9712c20fSFrederick Maylefunc:                                ; entry point; return address at sp
462*9712c20fSFrederick Maylefunc+0:      sp -= 16                ; allocate space for stack frame
463*9712c20fSFrederick Maylefunc+1:      sp[12] = r0             ; save 4-byte r0 at sp+12
464*9712c20fSFrederick Mayle             ...                     ; stuff that doesn't affect stack
465*9712c20fSFrederick Maylefunc+10:     sp -= 4; *sp = x        ; push some 4-byte x on the stack
466*9712c20fSFrederick Mayle             ...                     ; stuff that doesn't affect stack
467*9712c20fSFrederick Maylefunc+20:     r0 = sp[16]             ; restore saved r0
468*9712c20fSFrederick Maylefunc+21:     sp += 20                ; pop whole stack frame
469*9712c20fSFrederick Maylefunc+22:     pc = *sp; sp += 4       ; pop return address and jump to it
470*9712c20fSFrederick Mayle```
471*9712c20fSFrederick Mayle
472*9712c20fSFrederick MayleThe following table would describe the function above:
473*9712c20fSFrederick Mayle
474*9712c20fSFrederick Mayle| **code address** | **.cfa** | **r0 (on Google Code)** | **r1 (on Google Code)** | ... | **.ra**  |
475*9712c20fSFrederick Mayle|:-----------------|:---------|:------------------------|:------------------------|:----|:---------|
476*9712c20fSFrederick Mayle| func+0           | sp       |                         |                         |     | `cfa[0]` |
477*9712c20fSFrederick Mayle| func+1           | sp+16    |                         |                         |     | `cfa[0]` |
478*9712c20fSFrederick Mayle| func+2           | sp+16    | `cfa[-4]`               |                         |     | `cfa[0]` |
479*9712c20fSFrederick Mayle| func+11          | sp+20    | `cfa[-4]`               |                         |     | `cfa[0]` |
480*9712c20fSFrederick Mayle| func+21          | sp+20    |                         |                         |     | `cfa[0]` |
481*9712c20fSFrederick Mayle| func+22          | sp       |                         |                         |     | `cfa[0]` |
482*9712c20fSFrederick Mayle
483*9712c20fSFrederick MayleSome things to note here:
484*9712c20fSFrederick Mayle
485*9712c20fSFrederick Mayle*   Each row describes the state of affairs **before** executing the instruction
486*9712c20fSFrederick Mayle    at the given address. Thus, the row for func+0 describes the state before we
487*9712c20fSFrederick Mayle    execute the first instruction, which allocates the stack frame. In the next
488*9712c20fSFrederick Mayle    row, the formula for computing the CFA has changed, reflecting the
489*9712c20fSFrederick Mayle    allocation.
490*9712c20fSFrederick Mayle
491*9712c20fSFrederick Mayle*   The other entries are written in terms of the CFA; this allows them to
492*9712c20fSFrederick Mayle    remain unchanged as the stack pointer gets bumped around. For example, to
493*9712c20fSFrederick Mayle    find the caller's value for r0 (on Google Code) at func+2, we would first
494*9712c20fSFrederick Mayle    compute the CFA by adding 16 to the sp, and then subtract four from that to
495*9712c20fSFrederick Mayle    find the address at which r0 (on Google Code) was saved.
496*9712c20fSFrederick Mayle
497*9712c20fSFrederick Mayle*   Although the example doesn't show this, most calling conventions designate
498*9712c20fSFrederick Mayle    "callee-saves" and "caller-saves" registers. The callee must restore the
499*9712c20fSFrederick Mayle    values of "callee-saves" registers before returning (if it uses them at
500*9712c20fSFrederick Mayle    all), whereas the callee is free to use "caller-saves" registers without
501*9712c20fSFrederick Mayle    restoring their values. A function that uses caller-saves registers
502*9712c20fSFrederick Mayle    typically does not save their original values at all; in this case, the CFI
503*9712c20fSFrederick Mayle    marks such registers' values as "unrecoverable".
504*9712c20fSFrederick Mayle
505*9712c20fSFrederick Mayle*   Exactly where the CFA points in the frame --- at the return address? below
506*9712c20fSFrederick Mayle    it? At some fixed point within the frame? --- is a question of definition
507*9712c20fSFrederick Mayle    that depends on the architecture and ABI in use. But by definition, the CFA
508*9712c20fSFrederick Mayle    remains constant throughout the lifetime of the frame. It's up to
509*9712c20fSFrederick Mayle    architecture- specific code to know what significance to assign the CFA, if
510*9712c20fSFrederick Mayle    any.
511*9712c20fSFrederick Mayle
512*9712c20fSFrederick MayleTo save space, the most common type of CFI record only mentions the table
513*9712c20fSFrederick Mayleentries at which changes take place. So for the above, the CFI data would only
514*9712c20fSFrederick Mayleactually mention the non-blank entries here:
515*9712c20fSFrederick Mayle
516*9712c20fSFrederick Mayle| **insn** | **cfa** | **r0 (on Google Code)** | **r1 (on Google Code)** | ... | **ra**   |
517*9712c20fSFrederick Mayle|:---------|:--------|:------------------------|:------------------------|:----|:---------|
518*9712c20fSFrederick Mayle| func+0   | sp      |                         |                         |     | `cfa[0]` |
519*9712c20fSFrederick Mayle| func+1   | sp+16   |                         |                         |     |          |
520*9712c20fSFrederick Mayle| func+2   |         | `cfa[-4]`               |                         |     |          |
521*9712c20fSFrederick Mayle| func+11  | sp+20   |                         |                         |     |          |
522*9712c20fSFrederick Mayle| func+21  |         | r0 (on Google Code)     |                         |     |          |
523*9712c20fSFrederick Mayle| func+22  | sp      |                         |                         |     |          |
524*9712c20fSFrederick Mayle
525*9712c20fSFrederick MayleA `STACK CFI INIT` record indicates that, at the machine instruction at
526*9712c20fSFrederick Mayle_address_, belonging to some function, the value that _register<sub>n</sub>_ had
527*9712c20fSFrederick Maylein that function's caller can be recovered by evaluating
528*9712c20fSFrederick Mayle_expression<sub>n</sub>_. The values of any callee-saves registers not mentioned
529*9712c20fSFrederick Mayleare assumed to be unchanged. (`STACK CFI` records never mention caller-saves
530*9712c20fSFrederick Mayleregisters.) These rules apply starting at _address_ and continue up to, but not
531*9712c20fSFrederick Mayleincluding, the address given in the next `STACK CFI` record. The _size_ field is
532*9712c20fSFrederick Maylethe total number of bytes of machine code covered by this record and any
533*9712c20fSFrederick Maylesubsequent `STACK CFI` records (until the next `STACK CFI INIT` record). The
534*9712c20fSFrederick Mayle_address_ field is relative to the module's load address.
535*9712c20fSFrederick Mayle
536*9712c20fSFrederick MayleA `STACK CFI` record (no `INIT`) is the same, except that it mentions only those
537*9712c20fSFrederick Mayleregisters whose recovery rules have changed from the previous CFI record. There
538*9712c20fSFrederick Maylemust be a prior `STACK CFI INIT` or `STACK CFI` record in the symbol file. The
539*9712c20fSFrederick Mayle_address_ field of this record must be greater than that of the previous record,
540*9712c20fSFrederick Mayleand it must not be at or beyond the end of the range given by the most recent
541*9712c20fSFrederick Mayle`STACK CFI INIT` record. The address is relative to the module's load address.
542*9712c20fSFrederick Mayle
543*9712c20fSFrederick MayleEach expression is a breakpad-style postfix expression. Expressions may contain
544*9712c20fSFrederick Maylespaces, but their tokens may not end with colons. When an expression mentions a
545*9712c20fSFrederick Mayleregister, it refers to the value of that register in the callee, even if a prior
546*9712c20fSFrederick Maylename/expression pair gives that register's value in the caller. The exception is
547*9712c20fSFrederick Mayle`.cfa`, which refers to the canonical frame address computed by the .cfa rule in
548*9712c20fSFrederick Mayleforce at the current instruction.
549*9712c20fSFrederick Mayle
550*9712c20fSFrederick MayleThe special expression `.undef` indicates that the given register's value cannot
551*9712c20fSFrederick Maylebe recovered.
552*9712c20fSFrederick Mayle
553*9712c20fSFrederick MayleThe register names preceding the expressions are always followed by colons. The
554*9712c20fSFrederick Mayleexpressions themselves never contain tokens ending with colons.
555*9712c20fSFrederick Mayle
556*9712c20fSFrederick MayleThere are two special register names:
557*9712c20fSFrederick Mayle
558*9712c20fSFrederick Mayle*   `.cfa` ("Canonical Frame Address") is the base address of the stack frame.
559*9712c20fSFrederick Mayle    Other registers' rules may refer to this. If no rule is provided for the
560*9712c20fSFrederick Mayle    stack pointer, the value of `.cfa` is the caller's stack pointer.
561*9712c20fSFrederick Mayle
562*9712c20fSFrederick Mayle*   `.ra` is the return address. This is the value of the restored program
563*9712c20fSFrederick Mayle    counter. We use `.ra` instead of the architecture-specific name for the
564*9712c20fSFrederick Mayle    program counter.
565*9712c20fSFrederick Mayle
566*9712c20fSFrederick MayleThe Breakpad stack walker requires that there be rules in force for `.cfa` and
567*9712c20fSFrederick Mayle`.ra` at every code address from which it unwinds. If those rules are not
568*9712c20fSFrederick Maylepresent, the stack walker will ignore the `STACK CFI` data, and try to use a
569*9712c20fSFrederick Mayledifferent strategy.
570*9712c20fSFrederick Mayle
571*9712c20fSFrederick MayleSo the CFI for the example function above would be as follows, if `func` were at
572*9712c20fSFrederick Mayleaddress 0x1000 (relative to the module's load address):
573*9712c20fSFrederick Mayle
574*9712c20fSFrederick Mayle```
575*9712c20fSFrederick MayleSTACK CFI INIT 1000 .cfa: $sp .ra: .cfa ^
576*9712c20fSFrederick MayleSTACK CFI      1001 .cfa: $sp 16 +
577*9712c20fSFrederick MayleSTACK CFI      1002 $r0: .cfa 4 - ^
578*9712c20fSFrederick MayleSTACK CFI      100b .cfa: $sp 20 +
579*9712c20fSFrederick MayleSTACK CFI      1015 $r0: $r0
580*9712c20fSFrederick MayleSTACK CFI      1016 .cfa: $sp
581*9712c20fSFrederick Mayle```
582