|  | # Breakpad Processor Library | 
|  |  | 
|  | ## Objective | 
|  |  | 
|  | The Breakpad processor library is an open-source framework to access the the | 
|  | information contained within crash dumps for multiple platforms, and to use that | 
|  | information to produce stack traces showing the call chain of each thread in a | 
|  | process. After processing, this data is made available to users of the library. | 
|  |  | 
|  | ## Background | 
|  |  | 
|  | The Breakpad processor is intended to sit at the core of a comprehensive | 
|  | crash-reporting system that does not require debugging information to be | 
|  | provided to those running applications being monitored. Some existing | 
|  | crash-reporting systems, such as [GNOME](http://www.gnome.org/)’s Bug-Buddy and | 
|  | [Apple](http://www.apple.com/)’s | 
|  | [CrashReporter](http://developer.apple.com/technotes/tn2004/tn2123.html), | 
|  | require symbolic | 
|  | information to be present on the end user’s computer; in the case of | 
|  | CrashReporter, the reports are transmitted only to Apple, not to third-party | 
|  | developers. Other systems, such as [Microsoft](http://www.microsoft.com/)’s | 
|  | [Windows Error Reporting](http://msdn.microsoft.com/isv/resources/wer/) and | 
|  | SupportSoft’s Talkback, transmit only a snapshot of a crashed process’ state, | 
|  | which can later be combined with symbolic debugging information without the need | 
|  | for it to be present on end users’ computers. Because symbolic debugging | 
|  | information consumes a large amount of space and is otherwise not needed during | 
|  | the normal operation of software, and because some developers are reluctant to | 
|  | release debugging symbols to their customers, Breakpad follows the latter | 
|  | approach. | 
|  |  | 
|  | We know of no currently-maintained crash-reporting systems that meet our | 
|  | requirements, which are to: * allow for symbols to be separate from the | 
|  | application, * handle crash reports from multiple platforms, * allow developers | 
|  | to operate their own crash-reporting platform, and to * be open-source. Windows | 
|  | Error Reporting only functions for Microsoft products, and requires the | 
|  | involvement of Microsoft’s servers. Talkback, while cross-platform, has not been | 
|  | maintained and at this point does not support Mac OS X on x86, which we consider | 
|  | to be a significant platform. Talkback is also closed-source commercial | 
|  | software, and has very specific requirements for its server platform. | 
|  |  | 
|  | We are aware of Windows-only crash-reporting systems that leverage Microsoft’s | 
|  | debugging interfaces. Such systems, even if extended to support dumps from other | 
|  | platforms, are tied to using Windows for at least a portion of the processor | 
|  | platform. | 
|  |  | 
|  | ## Overview | 
|  |  | 
|  | The Breakpad processor itself is written in standard C++ and will work on a | 
|  | variety of platforms. The dumps it accepts may also have been created on a | 
|  | variety of systems. The library is able to combine dumps with symbolic debugging | 
|  | information to create stack traces that include function signatures. The | 
|  | processor library includes simple command-line tools to examine dumps and | 
|  | process them, producing stack traces. It also exposes several layers of APIs | 
|  | enabling crash-reporting systems to be built around the Breakpad processor. | 
|  |  | 
|  | ## Detailed Design | 
|  |  | 
|  | ### Dump Files | 
|  |  | 
|  | In the processor, the dump data is of primary significance. Dumps typically | 
|  | contain: | 
|  |  | 
|  | *   CPU context (register data) as it was at the time the crash occurred, and an | 
|  | indication of which thread caused the crash. General-purpose registers are | 
|  | included, as are special-purpose registers such as the instruction pointer | 
|  | (program counter). | 
|  | *   Information about each thread of execution within a crashed process, | 
|  | including: | 
|  | *   The memory region used for each thread’s stack. | 
|  | *   CPU context for each thread, which for various reasons is not the same | 
|  | as the crash context in the case of the crashed thread. | 
|  | *   A list of loaded code segments (or modules), including: | 
|  | *   The name of the file (`.so`, `.exe`, `.dll`, etc.) which provides the | 
|  | code. | 
|  | *   The boundaries of the memory region in which the code segment is visible | 
|  | to the process. | 
|  | *   A reference to the debugging information for the code module, when such | 
|  | information is available. | 
|  |  | 
|  | Ordinarily, dumps are produced as a result of a crash, but other triggers may be | 
|  | set to produce dumps at any time a developer deems appropriate. The Breakpad | 
|  | processor can handle dumps in the minidump format, either generated by an | 
|  | [Breakpad client “handler”](client_design.md) implementation, or by another | 
|  | implementation that produces dumps in this format. The | 
|  | [DbgHelp.dll!MiniDumpWriteDump](http://msdn2.microsoft.com/en-us/library/ms680360.aspx) | 
|  | function on Windows | 
|  | produces dumps in this format, and is the basis for the Breakpad handler | 
|  | implementation on that platform. | 
|  |  | 
|  | The [minidump format](http://msdn.microsoft.com/en-us/library/ms679293%28VS.85%29.aspx) is | 
|  | essentially a simple container format, organized as a series of streams. Each | 
|  | stream contains some type of data relevant to the crash. A typical “normal” | 
|  | minidump contains streams for the thread list, the module list, the CPU context | 
|  | at the time of the crash, and various bits of additional system information. | 
|  | Other types of minidump can be generated, such as a full-memory minidump, which | 
|  | in addition to stack memory contains snapshots of all of a process’ mapped | 
|  | memory regions. | 
|  |  | 
|  | The minidump format was chosen as Breakpad’s dump format because it has an | 
|  | established track record on Windows, and it can be adapted to meet the needs of | 
|  | the other platforms that Breakpad supports. Most other operating systems use | 
|  | “core” files as their native dump formats, but the capabilities of core files | 
|  | vary across platforms, and because core files are usually presented in a | 
|  | platform’s native executable format, there are complications involved in | 
|  | accessing the data contained therein without the benefit of the header files | 
|  | that define an executable format’s entire structure. Because minidumps are | 
|  | leaner than a typical executable format, a redefinition of the format in a | 
|  | cross-platform header file, `minidump_format.h`, was a straightforward task. | 
|  | Similarly, the capabilities of the minidump format are understood, and because | 
|  | it provides an extensible container, any of Breakpad’s needs that could not be | 
|  | met directly by the standard minidump format could likely be met by extending it | 
|  | as needed. Finally, using this format means that the dump file is compatible | 
|  | with native debugging tools at least on Windows. A possible future avenue for | 
|  | exploration is the conversion of minidumps to core files, to enable this same | 
|  | benefit on other platforms. | 
|  |  | 
|  | We have already provided an extension to the minidump format that allows it to | 
|  | carry dumps generated on systems with PowerPC processors. The format already | 
|  | allows for variable CPUs, so our work in this area was limited to defining a | 
|  | context structure sufficient to represent the execution state of a PowerPC. We | 
|  | have also defined an extension that allows minidumps to indicate which thread of | 
|  | execution requested a dump be produced for non-crash dumps. | 
|  |  | 
|  | Often, the information contained within a dump alone is sufficient to produce a | 
|  | full stack backtrace for each thread. Certain optimizations that compilers | 
|  | employ in producing code frustrate this process. Specifically, the “frame | 
|  | pointer omission” optimization of x86 compilers can make it impossible to | 
|  | produce useful stack traces given only a stack snapshot and CPU context. In | 
|  | these cases, however, compiler-emitted debugging information can aid in | 
|  | producing useful stack traces. The Breakpad processor is able to take advantage | 
|  | of this debugging information as supplied by Microsoft’s C/C++ compiler, the | 
|  | only compiler to apply such optimizations by default. As a result, the Breakpad | 
|  | processor can produce useful stack traces even from code with frame pointer | 
|  | omission optimizations as produced by this compiler. | 
|  |  | 
|  | ### Symbol Files | 
|  |  | 
|  | The [symbol files](symbol_files.md) that the Breakpad processor accepts allow | 
|  | for frame pointer omission data, but this is only one of their capabilities. | 
|  | Each symbol file also includes information about the functions, source files, | 
|  | and source code line numbers for a single module of code. A module is an | 
|  | individually-loadble chunk of code: these can be executables containing a main | 
|  | program (`exe` files on Windows) or shared libraries (`.so` files on Linux, | 
|  | `.dylib` files, frameworks, and bundles on Mac OS X, and `.dll` files on | 
|  | Windows). Dumps contain information about which of these modules were loaded at | 
|  | the time the dump was produced, and given this information, the Breakpad | 
|  | processor attempts to locate debugging symbols for the module through a | 
|  | user-supplied function embodied in a “symbol supplier.” Breakpad includes a | 
|  | sample symbol supplier, called `SimpleSymbolSupplier`, that is used by its | 
|  | command-line tools; this supplier locates symbol files by pathname. | 
|  | `SimpleSymbolSupplier` is also available to other users of the Breakpad | 
|  | processor library. This allows for the use of a simple reference implementation, | 
|  | but preserves flexibility for users who may have more demanding symbol file | 
|  | storage needs. | 
|  |  | 
|  | Breakpad’s symbol file format is text-based, and was defined to be fairly | 
|  | human-readable and to encompass the needs of multiple platforms. The Breakpad | 
|  | processor itself does not operate directly with native symbol formats | 
|  | ([DWARF](http://dwarf.freestandards.org/) and | 
|  | [STABS](http://sourceware.org/gdb/current/onlinedocs/stabs.html) | 
|  | on most Unix-like systems, | 
|  | [.pdb files](http://msdn2.microsoft.com/en-us/library/yd4f8bd1(VS.80).aspx) | 
|  | on Windows), | 
|  | because of the complications in accessing potentially complex symbol formats | 
|  | with slight variations between platforms, stored within different types of | 
|  | binary formats. In the case of `.pdb` files, the debugging format is not even | 
|  | documented. Instead, Breakpad’s symbol files are produced on each platform, | 
|  | using specific debugging APIs where available, to convert native symbols to | 
|  | Breakpad’s cross-platform format. | 
|  |  | 
|  | ### Processing | 
|  |  | 
|  | Most commonly, a developer will enable an application to use Breakpad by | 
|  | building it with a platform-specific [client “handler”](client_design.md) | 
|  | library. After building the application, the developer will create symbol files | 
|  | for Breakpad’s use using the included `dump_syms` or `symupload` tools, or | 
|  | another suitable tool, and place the symbol files where the processor’s symbol | 
|  | supplier will be able to locate them. | 
|  |  | 
|  | When a dump file is given to the processor’s `MinidumpProcessor` class, it will | 
|  | read it using its included minidump reader, contained in the `Minidump` family | 
|  | of classes. It will collect information about the operating system and CPU that | 
|  | produced the dump, and determine whether the dump was produced as a result of a | 
|  | crash or at the direct request of the application itself. It then loops over all | 
|  | of the threads in a process, attempting to walk the stack associated with each | 
|  | thread. This process is achieved by the processor’s `Stackwalker` components, of | 
|  | which there are a slightly different implementations for each CPU type that the | 
|  | processor is able to handle dumps from. Beginning with a thread’s context, and | 
|  | possibly using debugging data, the stackwalker produces a list of stack frames, | 
|  | containing each instruction executed in the chain. These instructions are | 
|  | matched up with the modules that contributed them to a process, and the | 
|  | `SymbolSupplier` is invoked to locate a symbol file. The symbol file is given to | 
|  | a `SourceLineResolver`, which matches the instruction up with a specific | 
|  | function name, source file, and line number, resulting in a representation of a | 
|  | stack frame that can easily be used to identify which code was executing. | 
|  |  | 
|  | The results of processing are made available in a `ProcessState` object, which | 
|  | contains a vector of threads, each containing a vector of stack frames. | 
|  |  | 
|  | For small-scale use of the Breakpad processor, and for testing and debugging, | 
|  | the `minidump_stackwalk` tool is provided. It invokes the processor and displays | 
|  | the full results of processing, optionally allowing symbols to be provided to | 
|  | the processor by a pathname-based symbol supplier, `SimpleSymbolSupplier`. | 
|  |  | 
|  | For lower-level testing and debugging, the processor library also includes a | 
|  | `minidump_dump` tool, which walks through an entire minidump file and displays | 
|  | its contents in somewhat readable form. | 
|  |  | 
|  | ### Platform Support | 
|  |  | 
|  | The Breakpad processor library is able to process dumps produced on Mac OS X | 
|  | systems running on x86, x86-64, and PowerPC processors, on Windows and Linux | 
|  | systems running on x86 or x86-64 processors, and on Android systems running ARM | 
|  | or x86 processors. The processor library itself is written in standard C++, and | 
|  | should function properly in most Unix-like environments. It has been tested on | 
|  | Linux and Mac OS X. | 
|  |  | 
|  | ## Future Plans | 
|  |  | 
|  | There are currently no firm plans or timetables to implement any of these | 
|  | features, although they are possible avenues for future exploration. | 
|  |  | 
|  | The symbol file format can be extended to carry information about the locations | 
|  | of parameters and local variables as stored in stack frames and registers, and | 
|  | the processor can use this information to provide enhanced stack traces showing | 
|  | function arguments and variable values. | 
|  |  | 
|  | On Mac OS X and Linux, we can provide tools to convert files from the minidump | 
|  | format into the native core format. This will enable developers to open dump | 
|  | files in a native debugger, just as they are presently able to do with minidumps | 
|  | on Windows. |