1<!--*
2# Document freshness: For more information, see go/fresh-source.
3freshness: { owner: 'haberman' reviewed: '2023-02-24' }
4*-->
5
6# upb vs. C++ Protobuf Design
7
8[upb](https://github.com/protocolbuffers/upb) is a small C protobuf library.
9While some of the design follows in the footsteps of the C++ Protobuf Library,
10upb departs from C++'s design in several key ways.  This document compares
11and contrasts the two libraries on several design points.
12
13## Design Goals
14
15Before we begin, it is worth calling out that upb and C++ have different design
16goals, and this motivates some of the differences we will see.
17
18C++ protobuf is a user-level library: it is designed to be used directly by C++
19applications.  These applications will expect a full-featured C++ API surface
20that uses C++ idioms.  The C++ library is also willing to add features to
21increase server performance, even if these features would add size or complexity
22to the library.  Because C++ protobuf is a user-level library, API stability is
23of utmost importance: breaking API changes are rare and carefully managed when
24they do occur.  The focus on C++ also means that ABI compatibility with C is not
25a priority.
26
27upb, on the other hand, is designed primarily to be wrapped by other languages.
28It is a C protobuf kernel that forms the basis on which a user-level protobuf
29library can be built.  This means we prefer to keep the API surface as small and
30orthogonal as possible.  While upb supports all protobuf features required for
31full conformance, upb prioritizes simplicity and small code size, and avoids
32adding features like lazy fields that can accelerate some use cases but at great
33cost in terms of complexity.  As upb is not aimed directly at users, there is
34much more freedom to make API-breaking changes when necessary, which helps the
35core to stay small and simple.  We want to be compatible with all FFI
36interfaces, so C ABI compatibility is a must.
37
38Despite these differences, C++ protos and upb offer [roughly the same core set
39of features](https://github.com/protocolbuffers/upb#features).
40
41## Arenas
42
43upb and C++ protos both offer arena allocation, but there are some key
44differences.
45
46### C++
47
48As a matter of history, when C++ protos were open-sourced in 2008, they did not
49support arenas.  Originally there was only unique ownership, whereby each
50message uniquely owns all child messages and will free them when the parent is
51freed.
52
53Arena allocation was added as a feature in 2014 as a way of dramatically
54reducing allocation and (especially) deallocation costs.  But the library was
55not at liberty to remove the unique ownership model, because it would break far
56too many users.  As a result, C++ has supported a **hybrid allocation model**
57ever since, allowing users to allocate messages either directly from the
58stack/heap or from an arena.  The library attempts to ensure that there are
59no dangling pointers by performing automatic copies in some cases (for example
60`a->set_allocated_b(b)`, where `a` and `b` are on different arenas).
61
62C++'s arena object itself `google::protobuf::Arena` is **thread-safe** by
63design, which allows users to allocate from multiple threads simultaneously
64without external synchronization.  The user can supply an initial block of
65memory to the arena, and can choose some parameters to control the arena block
66size.  The user can also supply block alloc/dealloc functions, but the alloc
67function is expected to always return some memory.  The C++ library in general
68does not attempt to handle out of memory conditions.
69
70### upb
71
72upb uses **arena allocation exclusively**. All messages must be allocated from
73an arena, and can only be freed by freeing the arena.  It is entirely the user's
74responsibility to ensure that there are no dangling pointers: when a user sets a
75message field, this will always trivially overwrite the pointer and will never
76perform an implicit copy.
77
78upb's `upb::Arena` is **thread-compatible**, which means it cannot be used
79concurrently without synchronization. The arena can be seeded with an initial
80block of memory, but it does not explicitly support any parameters for choosing
81block size. It supports a custom alloc/dealloc function, and this function is
82allowed to return `NULL` if no dynamic memory is available. This allows upb
83arenas to have a max/fixed size, and makes it possible in theory to write code
84that is tolerant to out-of-memory errors.
85
86upb's arena also supports a novel operation known as **fuse**, which joins two
87arenas together into a single lifetime.  Though both arenas must still be freed
88separately, none of the memory will actually be freed until *both* arenas have
89been freed.  This is useful for avoiding dangling pointers when reparenting a
90message with one that may be on a different arena.
91
92### Comparison
93
94**hybrid allocation vs. arena-only**
95
96* The C++ hybrid allocation model introduces a great deal of complexity and
97  unpredictability into the library.  upb benefits from having a much simpler
98  and more predictable design.
99* Some of the complexity in C++'s hybrid model arises from the fact that arenas
100  were added after the fact.  Designing for a hybrid model from the outset
101  would likely yield a simpler result.
102* Unique ownership does support some usage patterns that arenas cannot directly
103  accommodate.  For example, you can reparent a message and the child will precisely
104  follow the lifetime of its new parent.  An arena would require you to either
105  perform a deep copy or extend the lifetime.
106
107**thread-compatible vs. thread-safe arena**
108
109* A thread-safe arena (as in C++) is safer and easier to use.  A thread-compatible
110  arena requires that the user prove that the arena cannot be used concurrently.
111* [Thread Sanitizer](https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual)
112  is far more accessible than it was in 2014 (when C++ introduced a thread-safe
113  arena).  We now have more tools at our disposal to ensure that we do not trigger
114  data races in a thread-compatible arena like upb.
115* Thread-compatible arenas are more performant.
116* Thread-compatible arenas have a far simpler implementation.  The C++ thread-safe
117  arena relies on thread-local variables, which introduce complications on some
118  platforms.  It also requires far more subtle reasoning for correctness and
119  performance.
120
121**fuse vs. no fuse**
122
123* The `upb_Arena_Fuse()` operation is a key part of how upb supports reparenting
124  of messages when the parent may be on a different arena.  Without this, upb has
125  no way of supporting `foo.bar = bar` in dynamic languages without performing a
126  deep copy.
127* A downside of `upb_Arena_Fuse()` is that passing an arena to a function can allow
128  that function to extend the lifetime of the arena in potentially
129  unpredictable ways.  This can be prevented if necessary, as fuse can fail, eg. if
130  one arena has an initial block.  But this adds some complexity by requiring callers
131  to handle the case where fuse fails.
132
133## Code Generation vs. Tables
134
135The C++ protobuf library has always been built around code generation, while upb
136generates only tables.  In other words, `foo.pb.cc` files contain functions,
137whereas `foo.upb.c` files emit only data structures.
138
139### C++
140
141C++ generated code emits a large number of functions into `foo.pb.cc` files.
142An incomplete list:
143
144* `FooMsg::FooMsg()` (constructor): initializes all fields to their default value.
145* `FooMsg::~FooMsg()` (destructor): frees any present child messages.
146* `FooMsg::Clear()`: clears all fields back to their default/empty value.
147* `FooMsg::_InternalParse()`: generated code for parsing a message.
148* `FooMsg::_InternalSerialize()`: generated code for serializing a message.
149* `FooMsg::ByteSizeLong()`: calculates serialized size, as a first pass before serializing.
150* `FooMsg::MergeFrom()`: copies/appends present fields from another message.
151* `FooMsg::IsInitialized()`: checks whether required fields are set.
152
153This code lives in the `.text` section and contains function calls to the generated
154classes for child messages.
155
156### upb
157
158upb does not generate any code into `foo.upb.c` files, only data structures.  upb uses a
159compact data table known as a *mini table* to represent the schema and all fields.
160
161upb uses mini tables to perform all of the operations that would traditionally be done
162with generated code.  Revisiting the list from the previous section:
163
164* `FooMsg::FooMsg()` (constructor): upb instead initializes all messages with `memset(msg, 0, size)`.
165   Non-zero defaults are injected in the accessors.
166* `FooMsg::~FooMsg()` (destructor): upb messages are freed by freeing the arena.
167* `FooMsg::Clear()`: can be performed with `memset(msg, 0, size)`.
168* `FooMsg::_InternalParse()`: upb's parser uses mini tables as data, instead of generating code.
169* `FooMsg::_InternalSerialize()`: upb's serializer also uses mini-tables instead of generated code.
170* `FooMsg::ByteSizeLong()`: upb performs serialization in reverse so that an initial pass is not required.
171* `FooMsg::MergeFrom()`: upb supports this via serialize+parse from the other message.
172* `FooMsg::IsInitialized()`: upb's encoder and decoder have special flags to check for required fields.
173  A util library `upb/util/required_fields.h` handles the corner cases.
174
175### Comparison
176
177If we compare compiled code size, upb is far smaller.  Here is a comparison of the code
178size of a trivial binary that does nothing but a parse and serialize of `descriptor.proto`.
179This means we are seeing both the overhead of the core library itself as well as the
180generated code (or table) for `descriptor.proto`.  (For extra clarity we should break this
181down by generated code vs core library in the future).
182
183
184| Library         | `.text` | `.data` | `.bss` |
185|------------     |---------|---------|--------|
186| upb             |  26Ki   | 0.6Ki   | 0.01Ki |
187| C++ (lite)      | 187Ki   | 2.8Ki   | 1.25Ki |
188| C++ (code size) | 904Ki   | 6.1Ki   | 1.88Ki |
189| C++ (full)      | 983Ki   | 6.1Ki   | 1.88Ki |
190
191"C++ (code size)" refers to protos compiled with `optimize_for = CODE_SIZE`, a mode
192in which generated code contains reflection only, in an attempt to make the
193generated code size smaller (however it requires the full runtime instead
194of the lite runtime).
195
196## Bifurcated vs. Optional Reflection
197
198upb and C++ protos both offer reflection without making it mandatory.  However
199the models for enabling/disabling reflection are very different.
200
201### C++
202
203C++ messages offer full reflection by default.  Messages in C++ generally
204derive from `Message`, and the base class provides a member function
205`Reflection* Message::GetReflection()` which returns the reflection object.
206
207It follows that any message deriving from `Message` will always have reflection
208linked into the binary, whether or not the reflection object is ever used.
209Because `GetReflection()` is a function on the base class, it is not possible
210to statically determine if a given message's reflection is used:
211
212```c++
213Reflection* GetReflection(const Message& message) {
214    // Can refer to any message in the whole binary.
215    return message.GetReflection();
216}
217```
218
219The C++ library does provide a way of omitting reflection: `MessageLite`.  We can
220cause a message to be lite in two different ways:
221
222* `optimize_for = LITE_RUNTIME` in a `.proto` file will cause all messages in that
223  file to be lite.
224* `lite` as a codegen param: this will force all messages to lite, even if the
225  `.proto` file does not have `optimize_for = LITE_RUNTIME`.
226
227A lite message will derive from `MessageLite` instead of `Message`.  Since
228`MessageLite` has no `GetReflection()` function, this means no reflection is
229available, so we can avoid taking the code size hit.
230
231### upb
232
233upb does not have the `Message` vs. `MessageLite` bifurcation.  There is only one
234kind of message type `upb_Message`, which means there is no need to configure in
235a `.proto` file which messages will need reflection and which will not.
236Every message has the *option* to link in reflection from a separate `foo.upbdefs.o`
237file, without needing to change the message itself in any way.
238
239upb does not provide the equivalent of `Message::GetReflection()`: there is no
240facility for retrieving the reflection of a message whose type is not known statically.
241It would be possible to layer such a facility on top of the upb core, though this
242would probably require some kind of code generation.
243
244### Comparison
245
246* Most messages in C++ will not bother to declare themselves as "lite".  This means
247  that many C++ messages will link in reflection even when it is never used, bloating
248  binaries unnecessarily.
249* `optimize_for = LITE_RUNTIME` is difficult to use in practice, because it prevents
250  any non-lite protos from `import`ing that file.
251* Forcing all protos to lite via a codegen parameter (for example, when building for
252  mobile) is more practical than `optimize_for = LITE_RUNTIME`.  But this will break
253  the compile for any code that tries to upcast to `Message`, or tries to use a
254  non-lite method.
255* The one major advantage of the C++ model is that it can support `msg.DebugString()`
256  on a type-erased proto.  For upb you have to explicitly pass the `upb_MessageDef*`
257  separately if you want to perform an operation like printing a proto to text format.
258
259## Explicit Registration vs. Globals
260
261TODO
262