xref: /aosp_15_r20/external/leveldb/doc/index.md (revision 9507f98c5f32dee4b5f9e4a38cd499f3ff5c4490)
1*9507f98cSAndroid Build Coastguard Workerleveldb
2*9507f98cSAndroid Build Coastguard Worker=======
3*9507f98cSAndroid Build Coastguard Worker
4*9507f98cSAndroid Build Coastguard Worker_Jeff Dean, Sanjay Ghemawat_
5*9507f98cSAndroid Build Coastguard Worker
6*9507f98cSAndroid Build Coastguard WorkerThe leveldb library provides a persistent key value store. Keys and values are
7*9507f98cSAndroid Build Coastguard Workerarbitrary byte arrays.  The keys are ordered within the key value store
8*9507f98cSAndroid Build Coastguard Workeraccording to a user-specified comparator function.
9*9507f98cSAndroid Build Coastguard Worker
10*9507f98cSAndroid Build Coastguard Worker## Opening A Database
11*9507f98cSAndroid Build Coastguard Worker
12*9507f98cSAndroid Build Coastguard WorkerA leveldb database has a name which corresponds to a file system directory. All
13*9507f98cSAndroid Build Coastguard Workerof the contents of database are stored in this directory. The following example
14*9507f98cSAndroid Build Coastguard Workershows how to open a database, creating it if necessary:
15*9507f98cSAndroid Build Coastguard Worker
16*9507f98cSAndroid Build Coastguard Worker```c++
17*9507f98cSAndroid Build Coastguard Worker#include <cassert>
18*9507f98cSAndroid Build Coastguard Worker#include "leveldb/db.h"
19*9507f98cSAndroid Build Coastguard Worker
20*9507f98cSAndroid Build Coastguard Workerleveldb::DB* db;
21*9507f98cSAndroid Build Coastguard Workerleveldb::Options options;
22*9507f98cSAndroid Build Coastguard Workeroptions.create_if_missing = true;
23*9507f98cSAndroid Build Coastguard Workerleveldb::Status status = leveldb::DB::Open(options, "/tmp/testdb", &db);
24*9507f98cSAndroid Build Coastguard Workerassert(status.ok());
25*9507f98cSAndroid Build Coastguard Worker...
26*9507f98cSAndroid Build Coastguard Worker```
27*9507f98cSAndroid Build Coastguard Worker
28*9507f98cSAndroid Build Coastguard WorkerIf you want to raise an error if the database already exists, add the following
29*9507f98cSAndroid Build Coastguard Workerline before the `leveldb::DB::Open` call:
30*9507f98cSAndroid Build Coastguard Worker
31*9507f98cSAndroid Build Coastguard Worker```c++
32*9507f98cSAndroid Build Coastguard Workeroptions.error_if_exists = true;
33*9507f98cSAndroid Build Coastguard Worker```
34*9507f98cSAndroid Build Coastguard Worker
35*9507f98cSAndroid Build Coastguard Worker## Status
36*9507f98cSAndroid Build Coastguard Worker
37*9507f98cSAndroid Build Coastguard WorkerYou may have noticed the `leveldb::Status` type above. Values of this type are
38*9507f98cSAndroid Build Coastguard Workerreturned by most functions in leveldb that may encounter an error. You can check
39*9507f98cSAndroid Build Coastguard Workerif such a result is ok, and also print an associated error message:
40*9507f98cSAndroid Build Coastguard Worker
41*9507f98cSAndroid Build Coastguard Worker```c++
42*9507f98cSAndroid Build Coastguard Workerleveldb::Status s = ...;
43*9507f98cSAndroid Build Coastguard Workerif (!s.ok()) cerr << s.ToString() << endl;
44*9507f98cSAndroid Build Coastguard Worker```
45*9507f98cSAndroid Build Coastguard Worker
46*9507f98cSAndroid Build Coastguard Worker## Closing A Database
47*9507f98cSAndroid Build Coastguard Worker
48*9507f98cSAndroid Build Coastguard WorkerWhen you are done with a database, just delete the database object. Example:
49*9507f98cSAndroid Build Coastguard Worker
50*9507f98cSAndroid Build Coastguard Worker```c++
51*9507f98cSAndroid Build Coastguard Worker... open the db as described above ...
52*9507f98cSAndroid Build Coastguard Worker... do something with db ...
53*9507f98cSAndroid Build Coastguard Workerdelete db;
54*9507f98cSAndroid Build Coastguard Worker```
55*9507f98cSAndroid Build Coastguard Worker
56*9507f98cSAndroid Build Coastguard Worker## Reads And Writes
57*9507f98cSAndroid Build Coastguard Worker
58*9507f98cSAndroid Build Coastguard WorkerThe database provides Put, Delete, and Get methods to modify/query the database.
59*9507f98cSAndroid Build Coastguard WorkerFor example, the following code moves the value stored under key1 to key2.
60*9507f98cSAndroid Build Coastguard Worker
61*9507f98cSAndroid Build Coastguard Worker```c++
62*9507f98cSAndroid Build Coastguard Workerstd::string value;
63*9507f98cSAndroid Build Coastguard Workerleveldb::Status s = db->Get(leveldb::ReadOptions(), key1, &value);
64*9507f98cSAndroid Build Coastguard Workerif (s.ok()) s = db->Put(leveldb::WriteOptions(), key2, value);
65*9507f98cSAndroid Build Coastguard Workerif (s.ok()) s = db->Delete(leveldb::WriteOptions(), key1);
66*9507f98cSAndroid Build Coastguard Worker```
67*9507f98cSAndroid Build Coastguard Worker
68*9507f98cSAndroid Build Coastguard Worker## Atomic Updates
69*9507f98cSAndroid Build Coastguard Worker
70*9507f98cSAndroid Build Coastguard WorkerNote that if the process dies after the Put of key2 but before the delete of
71*9507f98cSAndroid Build Coastguard Workerkey1, the same value may be left stored under multiple keys. Such problems can
72*9507f98cSAndroid Build Coastguard Workerbe avoided by using the `WriteBatch` class to atomically apply a set of updates:
73*9507f98cSAndroid Build Coastguard Worker
74*9507f98cSAndroid Build Coastguard Worker```c++
75*9507f98cSAndroid Build Coastguard Worker#include "leveldb/write_batch.h"
76*9507f98cSAndroid Build Coastguard Worker...
77*9507f98cSAndroid Build Coastguard Workerstd::string value;
78*9507f98cSAndroid Build Coastguard Workerleveldb::Status s = db->Get(leveldb::ReadOptions(), key1, &value);
79*9507f98cSAndroid Build Coastguard Workerif (s.ok()) {
80*9507f98cSAndroid Build Coastguard Worker  leveldb::WriteBatch batch;
81*9507f98cSAndroid Build Coastguard Worker  batch.Delete(key1);
82*9507f98cSAndroid Build Coastguard Worker  batch.Put(key2, value);
83*9507f98cSAndroid Build Coastguard Worker  s = db->Write(leveldb::WriteOptions(), &batch);
84*9507f98cSAndroid Build Coastguard Worker}
85*9507f98cSAndroid Build Coastguard Worker```
86*9507f98cSAndroid Build Coastguard Worker
87*9507f98cSAndroid Build Coastguard WorkerThe `WriteBatch` holds a sequence of edits to be made to the database, and these
88*9507f98cSAndroid Build Coastguard Workeredits within the batch are applied in order. Note that we called Delete before
89*9507f98cSAndroid Build Coastguard WorkerPut so that if key1 is identical to key2, we do not end up erroneously dropping
90*9507f98cSAndroid Build Coastguard Workerthe value entirely.
91*9507f98cSAndroid Build Coastguard Worker
92*9507f98cSAndroid Build Coastguard WorkerApart from its atomicity benefits, `WriteBatch` may also be used to speed up
93*9507f98cSAndroid Build Coastguard Workerbulk updates by placing lots of individual mutations into the same batch.
94*9507f98cSAndroid Build Coastguard Worker
95*9507f98cSAndroid Build Coastguard Worker## Synchronous Writes
96*9507f98cSAndroid Build Coastguard Worker
97*9507f98cSAndroid Build Coastguard WorkerBy default, each write to leveldb is asynchronous: it returns after pushing the
98*9507f98cSAndroid Build Coastguard Workerwrite from the process into the operating system. The transfer from operating
99*9507f98cSAndroid Build Coastguard Workersystem memory to the underlying persistent storage happens asynchronously. The
100*9507f98cSAndroid Build Coastguard Workersync flag can be turned on for a particular write to make the write operation
101*9507f98cSAndroid Build Coastguard Workernot return until the data being written has been pushed all the way to
102*9507f98cSAndroid Build Coastguard Workerpersistent storage. (On Posix systems, this is implemented by calling either
103*9507f98cSAndroid Build Coastguard Worker`fsync(...)` or `fdatasync(...)` or `msync(..., MS_SYNC)` before the write
104*9507f98cSAndroid Build Coastguard Workeroperation returns.)
105*9507f98cSAndroid Build Coastguard Worker
106*9507f98cSAndroid Build Coastguard Worker```c++
107*9507f98cSAndroid Build Coastguard Workerleveldb::WriteOptions write_options;
108*9507f98cSAndroid Build Coastguard Workerwrite_options.sync = true;
109*9507f98cSAndroid Build Coastguard Workerdb->Put(write_options, ...);
110*9507f98cSAndroid Build Coastguard Worker```
111*9507f98cSAndroid Build Coastguard Worker
112*9507f98cSAndroid Build Coastguard WorkerAsynchronous writes are often more than a thousand times as fast as synchronous
113*9507f98cSAndroid Build Coastguard Workerwrites. The downside of asynchronous writes is that a crash of the machine may
114*9507f98cSAndroid Build Coastguard Workercause the last few updates to be lost. Note that a crash of just the writing
115*9507f98cSAndroid Build Coastguard Workerprocess (i.e., not a reboot) will not cause any loss since even when sync is
116*9507f98cSAndroid Build Coastguard Workerfalse, an update is pushed from the process memory into the operating system
117*9507f98cSAndroid Build Coastguard Workerbefore it is considered done.
118*9507f98cSAndroid Build Coastguard Worker
119*9507f98cSAndroid Build Coastguard WorkerAsynchronous writes can often be used safely. For example, when loading a large
120*9507f98cSAndroid Build Coastguard Workeramount of data into the database you can handle lost updates by restarting the
121*9507f98cSAndroid Build Coastguard Workerbulk load after a crash. A hybrid scheme is also possible where every Nth write
122*9507f98cSAndroid Build Coastguard Workeris synchronous, and in the event of a crash, the bulk load is restarted just
123*9507f98cSAndroid Build Coastguard Workerafter the last synchronous write finished by the previous run. (The synchronous
124*9507f98cSAndroid Build Coastguard Workerwrite can update a marker that describes where to restart on a crash.)
125*9507f98cSAndroid Build Coastguard Worker
126*9507f98cSAndroid Build Coastguard Worker`WriteBatch` provides an alternative to asynchronous writes. Multiple updates
127*9507f98cSAndroid Build Coastguard Workermay be placed in the same WriteBatch and applied together using a synchronous
128*9507f98cSAndroid Build Coastguard Workerwrite (i.e., `write_options.sync` is set to true). The extra cost of the
129*9507f98cSAndroid Build Coastguard Workersynchronous write will be amortized across all of the writes in the batch.
130*9507f98cSAndroid Build Coastguard Worker
131*9507f98cSAndroid Build Coastguard Worker## Concurrency
132*9507f98cSAndroid Build Coastguard Worker
133*9507f98cSAndroid Build Coastguard WorkerA database may only be opened by one process at a time. The leveldb
134*9507f98cSAndroid Build Coastguard Workerimplementation acquires a lock from the operating system to prevent misuse.
135*9507f98cSAndroid Build Coastguard WorkerWithin a single process, the same `leveldb::DB` object may be safely shared by
136*9507f98cSAndroid Build Coastguard Workermultiple concurrent threads. I.e., different threads may write into or fetch
137*9507f98cSAndroid Build Coastguard Workeriterators or call Get on the same database without any external synchronization
138*9507f98cSAndroid Build Coastguard Worker(the leveldb implementation will automatically do the required synchronization).
139*9507f98cSAndroid Build Coastguard WorkerHowever other objects (like Iterator and `WriteBatch`) may require external
140*9507f98cSAndroid Build Coastguard Workersynchronization. If two threads share such an object, they must protect access
141*9507f98cSAndroid Build Coastguard Workerto it using their own locking protocol. More details are available in the public
142*9507f98cSAndroid Build Coastguard Workerheader files.
143*9507f98cSAndroid Build Coastguard Worker
144*9507f98cSAndroid Build Coastguard Worker## Iteration
145*9507f98cSAndroid Build Coastguard Worker
146*9507f98cSAndroid Build Coastguard WorkerThe following example demonstrates how to print all key,value pairs in a
147*9507f98cSAndroid Build Coastguard Workerdatabase.
148*9507f98cSAndroid Build Coastguard Worker
149*9507f98cSAndroid Build Coastguard Worker```c++
150*9507f98cSAndroid Build Coastguard Workerleveldb::Iterator* it = db->NewIterator(leveldb::ReadOptions());
151*9507f98cSAndroid Build Coastguard Workerfor (it->SeekToFirst(); it->Valid(); it->Next()) {
152*9507f98cSAndroid Build Coastguard Worker  cout << it->key().ToString() << ": "  << it->value().ToString() << endl;
153*9507f98cSAndroid Build Coastguard Worker}
154*9507f98cSAndroid Build Coastguard Workerassert(it->status().ok());  // Check for any errors found during the scan
155*9507f98cSAndroid Build Coastguard Workerdelete it;
156*9507f98cSAndroid Build Coastguard Worker```
157*9507f98cSAndroid Build Coastguard Worker
158*9507f98cSAndroid Build Coastguard WorkerThe following variation shows how to process just the keys in the range
159*9507f98cSAndroid Build Coastguard Worker[start,limit):
160*9507f98cSAndroid Build Coastguard Worker
161*9507f98cSAndroid Build Coastguard Worker```c++
162*9507f98cSAndroid Build Coastguard Workerfor (it->Seek(start);
163*9507f98cSAndroid Build Coastguard Worker   it->Valid() && it->key().ToString() < limit;
164*9507f98cSAndroid Build Coastguard Worker   it->Next()) {
165*9507f98cSAndroid Build Coastguard Worker  ...
166*9507f98cSAndroid Build Coastguard Worker}
167*9507f98cSAndroid Build Coastguard Worker```
168*9507f98cSAndroid Build Coastguard Worker
169*9507f98cSAndroid Build Coastguard WorkerYou can also process entries in reverse order. (Caveat: reverse iteration may be
170*9507f98cSAndroid Build Coastguard Workersomewhat slower than forward iteration.)
171*9507f98cSAndroid Build Coastguard Worker
172*9507f98cSAndroid Build Coastguard Worker```c++
173*9507f98cSAndroid Build Coastguard Workerfor (it->SeekToLast(); it->Valid(); it->Prev()) {
174*9507f98cSAndroid Build Coastguard Worker  ...
175*9507f98cSAndroid Build Coastguard Worker}
176*9507f98cSAndroid Build Coastguard Worker```
177*9507f98cSAndroid Build Coastguard Worker
178*9507f98cSAndroid Build Coastguard Worker## Snapshots
179*9507f98cSAndroid Build Coastguard Worker
180*9507f98cSAndroid Build Coastguard WorkerSnapshots provide consistent read-only views over the entire state of the
181*9507f98cSAndroid Build Coastguard Workerkey-value store.  `ReadOptions::snapshot` may be non-NULL to indicate that a
182*9507f98cSAndroid Build Coastguard Workerread should operate on a particular version of the DB state. If
183*9507f98cSAndroid Build Coastguard Worker`ReadOptions::snapshot` is NULL, the read will operate on an implicit snapshot
184*9507f98cSAndroid Build Coastguard Workerof the current state.
185*9507f98cSAndroid Build Coastguard Worker
186*9507f98cSAndroid Build Coastguard WorkerSnapshots are created by the `DB::GetSnapshot()` method:
187*9507f98cSAndroid Build Coastguard Worker
188*9507f98cSAndroid Build Coastguard Worker```c++
189*9507f98cSAndroid Build Coastguard Workerleveldb::ReadOptions options;
190*9507f98cSAndroid Build Coastguard Workeroptions.snapshot = db->GetSnapshot();
191*9507f98cSAndroid Build Coastguard Worker... apply some updates to db ...
192*9507f98cSAndroid Build Coastguard Workerleveldb::Iterator* iter = db->NewIterator(options);
193*9507f98cSAndroid Build Coastguard Worker... read using iter to view the state when the snapshot was created ...
194*9507f98cSAndroid Build Coastguard Workerdelete iter;
195*9507f98cSAndroid Build Coastguard Workerdb->ReleaseSnapshot(options.snapshot);
196*9507f98cSAndroid Build Coastguard Worker```
197*9507f98cSAndroid Build Coastguard Worker
198*9507f98cSAndroid Build Coastguard WorkerNote that when a snapshot is no longer needed, it should be released using the
199*9507f98cSAndroid Build Coastguard Worker`DB::ReleaseSnapshot` interface. This allows the implementation to get rid of
200*9507f98cSAndroid Build Coastguard Workerstate that was being maintained just to support reading as of that snapshot.
201*9507f98cSAndroid Build Coastguard Worker
202*9507f98cSAndroid Build Coastguard Worker## Slice
203*9507f98cSAndroid Build Coastguard Worker
204*9507f98cSAndroid Build Coastguard WorkerThe return value of the `it->key()` and `it->value()` calls above are instances
205*9507f98cSAndroid Build Coastguard Workerof the `leveldb::Slice` type. Slice is a simple structure that contains a length
206*9507f98cSAndroid Build Coastguard Workerand a pointer to an external byte array. Returning a Slice is a cheaper
207*9507f98cSAndroid Build Coastguard Workeralternative to returning a `std::string` since we do not need to copy
208*9507f98cSAndroid Build Coastguard Workerpotentially large keys and values. In addition, leveldb methods do not return
209*9507f98cSAndroid Build Coastguard Workernull-terminated C-style strings since leveldb keys and values are allowed to
210*9507f98cSAndroid Build Coastguard Workercontain `'\0'` bytes.
211*9507f98cSAndroid Build Coastguard Worker
212*9507f98cSAndroid Build Coastguard WorkerC++ strings and null-terminated C-style strings can be easily converted to a
213*9507f98cSAndroid Build Coastguard WorkerSlice:
214*9507f98cSAndroid Build Coastguard Worker
215*9507f98cSAndroid Build Coastguard Worker```c++
216*9507f98cSAndroid Build Coastguard Workerleveldb::Slice s1 = "hello";
217*9507f98cSAndroid Build Coastguard Worker
218*9507f98cSAndroid Build Coastguard Workerstd::string str("world");
219*9507f98cSAndroid Build Coastguard Workerleveldb::Slice s2 = str;
220*9507f98cSAndroid Build Coastguard Worker```
221*9507f98cSAndroid Build Coastguard Worker
222*9507f98cSAndroid Build Coastguard WorkerA Slice can be easily converted back to a C++ string:
223*9507f98cSAndroid Build Coastguard Worker
224*9507f98cSAndroid Build Coastguard Worker```c++
225*9507f98cSAndroid Build Coastguard Workerstd::string str = s1.ToString();
226*9507f98cSAndroid Build Coastguard Workerassert(str == std::string("hello"));
227*9507f98cSAndroid Build Coastguard Worker```
228*9507f98cSAndroid Build Coastguard Worker
229*9507f98cSAndroid Build Coastguard WorkerBe careful when using Slices since it is up to the caller to ensure that the
230*9507f98cSAndroid Build Coastguard Workerexternal byte array into which the Slice points remains live while the Slice is
231*9507f98cSAndroid Build Coastguard Workerin use. For example, the following is buggy:
232*9507f98cSAndroid Build Coastguard Worker
233*9507f98cSAndroid Build Coastguard Worker```c++
234*9507f98cSAndroid Build Coastguard Workerleveldb::Slice slice;
235*9507f98cSAndroid Build Coastguard Workerif (...) {
236*9507f98cSAndroid Build Coastguard Worker  std::string str = ...;
237*9507f98cSAndroid Build Coastguard Worker  slice = str;
238*9507f98cSAndroid Build Coastguard Worker}
239*9507f98cSAndroid Build Coastguard WorkerUse(slice);
240*9507f98cSAndroid Build Coastguard Worker```
241*9507f98cSAndroid Build Coastguard Worker
242*9507f98cSAndroid Build Coastguard WorkerWhen the if statement goes out of scope, str will be destroyed and the backing
243*9507f98cSAndroid Build Coastguard Workerstorage for slice will disappear.
244*9507f98cSAndroid Build Coastguard Worker
245*9507f98cSAndroid Build Coastguard Worker## Comparators
246*9507f98cSAndroid Build Coastguard Worker
247*9507f98cSAndroid Build Coastguard WorkerThe preceding examples used the default ordering function for key, which orders
248*9507f98cSAndroid Build Coastguard Workerbytes lexicographically. You can however supply a custom comparator when opening
249*9507f98cSAndroid Build Coastguard Workera database.  For example, suppose each database key consists of two numbers and
250*9507f98cSAndroid Build Coastguard Workerwe should sort by the first number, breaking ties by the second number. First,
251*9507f98cSAndroid Build Coastguard Workerdefine a proper subclass of `leveldb::Comparator` that expresses these rules:
252*9507f98cSAndroid Build Coastguard Worker
253*9507f98cSAndroid Build Coastguard Worker```c++
254*9507f98cSAndroid Build Coastguard Workerclass TwoPartComparator : public leveldb::Comparator {
255*9507f98cSAndroid Build Coastguard Worker public:
256*9507f98cSAndroid Build Coastguard Worker  // Three-way comparison function:
257*9507f98cSAndroid Build Coastguard Worker  //   if a < b: negative result
258*9507f98cSAndroid Build Coastguard Worker  //   if a > b: positive result
259*9507f98cSAndroid Build Coastguard Worker  //   else: zero result
260*9507f98cSAndroid Build Coastguard Worker  int Compare(const leveldb::Slice& a, const leveldb::Slice& b) const {
261*9507f98cSAndroid Build Coastguard Worker    int a1, a2, b1, b2;
262*9507f98cSAndroid Build Coastguard Worker    ParseKey(a, &a1, &a2);
263*9507f98cSAndroid Build Coastguard Worker    ParseKey(b, &b1, &b2);
264*9507f98cSAndroid Build Coastguard Worker    if (a1 < b1) return -1;
265*9507f98cSAndroid Build Coastguard Worker    if (a1 > b1) return +1;
266*9507f98cSAndroid Build Coastguard Worker    if (a2 < b2) return -1;
267*9507f98cSAndroid Build Coastguard Worker    if (a2 > b2) return +1;
268*9507f98cSAndroid Build Coastguard Worker    return 0;
269*9507f98cSAndroid Build Coastguard Worker  }
270*9507f98cSAndroid Build Coastguard Worker
271*9507f98cSAndroid Build Coastguard Worker  // Ignore the following methods for now:
272*9507f98cSAndroid Build Coastguard Worker  const char* Name() const { return "TwoPartComparator"; }
273*9507f98cSAndroid Build Coastguard Worker  void FindShortestSeparator(std::string*, const leveldb::Slice&) const {}
274*9507f98cSAndroid Build Coastguard Worker  void FindShortSuccessor(std::string*) const {}
275*9507f98cSAndroid Build Coastguard Worker};
276*9507f98cSAndroid Build Coastguard Worker```
277*9507f98cSAndroid Build Coastguard Worker
278*9507f98cSAndroid Build Coastguard WorkerNow create a database using this custom comparator:
279*9507f98cSAndroid Build Coastguard Worker
280*9507f98cSAndroid Build Coastguard Worker```c++
281*9507f98cSAndroid Build Coastguard WorkerTwoPartComparator cmp;
282*9507f98cSAndroid Build Coastguard Workerleveldb::DB* db;
283*9507f98cSAndroid Build Coastguard Workerleveldb::Options options;
284*9507f98cSAndroid Build Coastguard Workeroptions.create_if_missing = true;
285*9507f98cSAndroid Build Coastguard Workeroptions.comparator = &cmp;
286*9507f98cSAndroid Build Coastguard Workerleveldb::Status status = leveldb::DB::Open(options, "/tmp/testdb", &db);
287*9507f98cSAndroid Build Coastguard Worker...
288*9507f98cSAndroid Build Coastguard Worker```
289*9507f98cSAndroid Build Coastguard Worker
290*9507f98cSAndroid Build Coastguard Worker### Backwards compatibility
291*9507f98cSAndroid Build Coastguard Worker
292*9507f98cSAndroid Build Coastguard WorkerThe result of the comparator's Name method is attached to the database when it
293*9507f98cSAndroid Build Coastguard Workeris created, and is checked on every subsequent database open. If the name
294*9507f98cSAndroid Build Coastguard Workerchanges, the `leveldb::DB::Open` call will fail. Therefore, change the name if
295*9507f98cSAndroid Build Coastguard Workerand only if the new key format and comparison function are incompatible with
296*9507f98cSAndroid Build Coastguard Workerexisting databases, and it is ok to discard the contents of all existing
297*9507f98cSAndroid Build Coastguard Workerdatabases.
298*9507f98cSAndroid Build Coastguard Worker
299*9507f98cSAndroid Build Coastguard WorkerYou can however still gradually evolve your key format over time with a little
300*9507f98cSAndroid Build Coastguard Workerbit of pre-planning. For example, you could store a version number at the end of
301*9507f98cSAndroid Build Coastguard Workereach key (one byte should suffice for most uses). When you wish to switch to a
302*9507f98cSAndroid Build Coastguard Workernew key format (e.g., adding an optional third part to the keys processed by
303*9507f98cSAndroid Build Coastguard Worker`TwoPartComparator`), (a) keep the same comparator name (b) increment the
304*9507f98cSAndroid Build Coastguard Workerversion number for new keys (c) change the comparator function so it uses the
305*9507f98cSAndroid Build Coastguard Workerversion numbers found in the keys to decide how to interpret them.
306*9507f98cSAndroid Build Coastguard Worker
307*9507f98cSAndroid Build Coastguard Worker## Performance
308*9507f98cSAndroid Build Coastguard Worker
309*9507f98cSAndroid Build Coastguard WorkerPerformance can be tuned by changing the default values of the types defined in
310*9507f98cSAndroid Build Coastguard Worker`include/options.h`.
311*9507f98cSAndroid Build Coastguard Worker
312*9507f98cSAndroid Build Coastguard Worker### Block size
313*9507f98cSAndroid Build Coastguard Worker
314*9507f98cSAndroid Build Coastguard Workerleveldb groups adjacent keys together into the same block and such a block is
315*9507f98cSAndroid Build Coastguard Workerthe unit of transfer to and from persistent storage. The default block size is
316*9507f98cSAndroid Build Coastguard Workerapproximately 4096 uncompressed bytes.  Applications that mostly do bulk scans
317*9507f98cSAndroid Build Coastguard Workerover the contents of the database may wish to increase this size. Applications
318*9507f98cSAndroid Build Coastguard Workerthat do a lot of point reads of small values may wish to switch to a smaller
319*9507f98cSAndroid Build Coastguard Workerblock size if performance measurements indicate an improvement. There isn't much
320*9507f98cSAndroid Build Coastguard Workerbenefit in using blocks smaller than one kilobyte, or larger than a few
321*9507f98cSAndroid Build Coastguard Workermegabytes. Also note that compression will be more effective with larger block
322*9507f98cSAndroid Build Coastguard Workersizes.
323*9507f98cSAndroid Build Coastguard Worker
324*9507f98cSAndroid Build Coastguard Worker### Compression
325*9507f98cSAndroid Build Coastguard Worker
326*9507f98cSAndroid Build Coastguard WorkerEach block is individually compressed before being written to persistent
327*9507f98cSAndroid Build Coastguard Workerstorage. Compression is on by default since the default compression method is
328*9507f98cSAndroid Build Coastguard Workervery fast, and is automatically disabled for uncompressible data. In rare cases,
329*9507f98cSAndroid Build Coastguard Workerapplications may want to disable compression entirely, but should only do so if
330*9507f98cSAndroid Build Coastguard Workerbenchmarks show a performance improvement:
331*9507f98cSAndroid Build Coastguard Worker
332*9507f98cSAndroid Build Coastguard Worker```c++
333*9507f98cSAndroid Build Coastguard Workerleveldb::Options options;
334*9507f98cSAndroid Build Coastguard Workeroptions.compression = leveldb::kNoCompression;
335*9507f98cSAndroid Build Coastguard Worker... leveldb::DB::Open(options, name, ...) ....
336*9507f98cSAndroid Build Coastguard Worker```
337*9507f98cSAndroid Build Coastguard Worker
338*9507f98cSAndroid Build Coastguard Worker### Cache
339*9507f98cSAndroid Build Coastguard Worker
340*9507f98cSAndroid Build Coastguard WorkerThe contents of the database are stored in a set of files in the filesystem and
341*9507f98cSAndroid Build Coastguard Workereach file stores a sequence of compressed blocks. If options.block_cache is
342*9507f98cSAndroid Build Coastguard Workernon-NULL, it is used to cache frequently used uncompressed block contents.
343*9507f98cSAndroid Build Coastguard Worker
344*9507f98cSAndroid Build Coastguard Worker```c++
345*9507f98cSAndroid Build Coastguard Worker#include "leveldb/cache.h"
346*9507f98cSAndroid Build Coastguard Worker
347*9507f98cSAndroid Build Coastguard Workerleveldb::Options options;
348*9507f98cSAndroid Build Coastguard Workeroptions.block_cache = leveldb::NewLRUCache(100 * 1048576);  // 100MB cache
349*9507f98cSAndroid Build Coastguard Workerleveldb::DB* db;
350*9507f98cSAndroid Build Coastguard Workerleveldb::DB::Open(options, name, &db);
351*9507f98cSAndroid Build Coastguard Worker... use the db ...
352*9507f98cSAndroid Build Coastguard Workerdelete db
353*9507f98cSAndroid Build Coastguard Workerdelete options.block_cache;
354*9507f98cSAndroid Build Coastguard Worker```
355*9507f98cSAndroid Build Coastguard Worker
356*9507f98cSAndroid Build Coastguard WorkerNote that the cache holds uncompressed data, and therefore it should be sized
357*9507f98cSAndroid Build Coastguard Workeraccording to application level data sizes, without any reduction from
358*9507f98cSAndroid Build Coastguard Workercompression. (Caching of compressed blocks is left to the operating system
359*9507f98cSAndroid Build Coastguard Workerbuffer cache, or any custom Env implementation provided by the client.)
360*9507f98cSAndroid Build Coastguard Worker
361*9507f98cSAndroid Build Coastguard WorkerWhen performing a bulk read, the application may wish to disable caching so that
362*9507f98cSAndroid Build Coastguard Workerthe data processed by the bulk read does not end up displacing most of the
363*9507f98cSAndroid Build Coastguard Workercached contents. A per-iterator option can be used to achieve this:
364*9507f98cSAndroid Build Coastguard Worker
365*9507f98cSAndroid Build Coastguard Worker```c++
366*9507f98cSAndroid Build Coastguard Workerleveldb::ReadOptions options;
367*9507f98cSAndroid Build Coastguard Workeroptions.fill_cache = false;
368*9507f98cSAndroid Build Coastguard Workerleveldb::Iterator* it = db->NewIterator(options);
369*9507f98cSAndroid Build Coastguard Workerfor (it->SeekToFirst(); it->Valid(); it->Next()) {
370*9507f98cSAndroid Build Coastguard Worker  ...
371*9507f98cSAndroid Build Coastguard Worker}
372*9507f98cSAndroid Build Coastguard Worker```
373*9507f98cSAndroid Build Coastguard Worker
374*9507f98cSAndroid Build Coastguard Worker### Key Layout
375*9507f98cSAndroid Build Coastguard Worker
376*9507f98cSAndroid Build Coastguard WorkerNote that the unit of disk transfer and caching is a block. Adjacent keys
377*9507f98cSAndroid Build Coastguard Worker(according to the database sort order) will usually be placed in the same block.
378*9507f98cSAndroid Build Coastguard WorkerTherefore the application can improve its performance by placing keys that are
379*9507f98cSAndroid Build Coastguard Workeraccessed together near each other and placing infrequently used keys in a
380*9507f98cSAndroid Build Coastguard Workerseparate region of the key space.
381*9507f98cSAndroid Build Coastguard Worker
382*9507f98cSAndroid Build Coastguard WorkerFor example, suppose we are implementing a simple file system on top of leveldb.
383*9507f98cSAndroid Build Coastguard WorkerThe types of entries we might wish to store are:
384*9507f98cSAndroid Build Coastguard Worker
385*9507f98cSAndroid Build Coastguard Worker    filename -> permission-bits, length, list of file_block_ids
386*9507f98cSAndroid Build Coastguard Worker    file_block_id -> data
387*9507f98cSAndroid Build Coastguard Worker
388*9507f98cSAndroid Build Coastguard WorkerWe might want to prefix filename keys with one letter (say '/') and the
389*9507f98cSAndroid Build Coastguard Worker`file_block_id` keys with a different letter (say '0') so that scans over just
390*9507f98cSAndroid Build Coastguard Workerthe metadata do not force us to fetch and cache bulky file contents.
391*9507f98cSAndroid Build Coastguard Worker
392*9507f98cSAndroid Build Coastguard Worker### Filters
393*9507f98cSAndroid Build Coastguard Worker
394*9507f98cSAndroid Build Coastguard WorkerBecause of the way leveldb data is organized on disk, a single `Get()` call may
395*9507f98cSAndroid Build Coastguard Workerinvolve multiple reads from disk. The optional FilterPolicy mechanism can be
396*9507f98cSAndroid Build Coastguard Workerused to reduce the number of disk reads substantially.
397*9507f98cSAndroid Build Coastguard Worker
398*9507f98cSAndroid Build Coastguard Worker```c++
399*9507f98cSAndroid Build Coastguard Workerleveldb::Options options;
400*9507f98cSAndroid Build Coastguard Workeroptions.filter_policy = NewBloomFilterPolicy(10);
401*9507f98cSAndroid Build Coastguard Workerleveldb::DB* db;
402*9507f98cSAndroid Build Coastguard Workerleveldb::DB::Open(options, "/tmp/testdb", &db);
403*9507f98cSAndroid Build Coastguard Worker... use the database ...
404*9507f98cSAndroid Build Coastguard Workerdelete db;
405*9507f98cSAndroid Build Coastguard Workerdelete options.filter_policy;
406*9507f98cSAndroid Build Coastguard Worker```
407*9507f98cSAndroid Build Coastguard Worker
408*9507f98cSAndroid Build Coastguard WorkerThe preceding code associates a Bloom filter based filtering policy with the
409*9507f98cSAndroid Build Coastguard Workerdatabase.  Bloom filter based filtering relies on keeping some number of bits of
410*9507f98cSAndroid Build Coastguard Workerdata in memory per key (in this case 10 bits per key since that is the argument
411*9507f98cSAndroid Build Coastguard Workerwe passed to `NewBloomFilterPolicy`). This filter will reduce the number of
412*9507f98cSAndroid Build Coastguard Workerunnecessary disk reads needed for Get() calls by a factor of approximately
413*9507f98cSAndroid Build Coastguard Workera 100. Increasing the bits per key will lead to a larger reduction at the cost
414*9507f98cSAndroid Build Coastguard Workerof more memory usage. We recommend that applications whose working set does not
415*9507f98cSAndroid Build Coastguard Workerfit in memory and that do a lot of random reads set a filter policy.
416*9507f98cSAndroid Build Coastguard Worker
417*9507f98cSAndroid Build Coastguard WorkerIf you are using a custom comparator, you should ensure that the filter policy
418*9507f98cSAndroid Build Coastguard Workeryou are using is compatible with your comparator. For example, consider a
419*9507f98cSAndroid Build Coastguard Workercomparator that ignores trailing spaces when comparing keys.
420*9507f98cSAndroid Build Coastguard Worker`NewBloomFilterPolicy` must not be used with such a comparator. Instead, the
421*9507f98cSAndroid Build Coastguard Workerapplication should provide a custom filter policy that also ignores trailing
422*9507f98cSAndroid Build Coastguard Workerspaces. For example:
423*9507f98cSAndroid Build Coastguard Worker
424*9507f98cSAndroid Build Coastguard Worker```c++
425*9507f98cSAndroid Build Coastguard Workerclass CustomFilterPolicy : public leveldb::FilterPolicy {
426*9507f98cSAndroid Build Coastguard Worker private:
427*9507f98cSAndroid Build Coastguard Worker  FilterPolicy* builtin_policy_;
428*9507f98cSAndroid Build Coastguard Worker
429*9507f98cSAndroid Build Coastguard Worker public:
430*9507f98cSAndroid Build Coastguard Worker  CustomFilterPolicy() : builtin_policy_(NewBloomFilterPolicy(10)) {}
431*9507f98cSAndroid Build Coastguard Worker  ~CustomFilterPolicy() { delete builtin_policy_; }
432*9507f98cSAndroid Build Coastguard Worker
433*9507f98cSAndroid Build Coastguard Worker  const char* Name() const { return "IgnoreTrailingSpacesFilter"; }
434*9507f98cSAndroid Build Coastguard Worker
435*9507f98cSAndroid Build Coastguard Worker  void CreateFilter(const Slice* keys, int n, std::string* dst) const {
436*9507f98cSAndroid Build Coastguard Worker    // Use builtin bloom filter code after removing trailing spaces
437*9507f98cSAndroid Build Coastguard Worker    std::vector<Slice> trimmed(n);
438*9507f98cSAndroid Build Coastguard Worker    for (int i = 0; i < n; i++) {
439*9507f98cSAndroid Build Coastguard Worker      trimmed[i] = RemoveTrailingSpaces(keys[i]);
440*9507f98cSAndroid Build Coastguard Worker    }
441*9507f98cSAndroid Build Coastguard Worker    return builtin_policy_->CreateFilter(trimmed.data(), n, dst);
442*9507f98cSAndroid Build Coastguard Worker  }
443*9507f98cSAndroid Build Coastguard Worker};
444*9507f98cSAndroid Build Coastguard Worker```
445*9507f98cSAndroid Build Coastguard Worker
446*9507f98cSAndroid Build Coastguard WorkerAdvanced applications may provide a filter policy that does not use a bloom
447*9507f98cSAndroid Build Coastguard Workerfilter but uses some other mechanism for summarizing a set of keys. See
448*9507f98cSAndroid Build Coastguard Worker`leveldb/filter_policy.h` for detail.
449*9507f98cSAndroid Build Coastguard Worker
450*9507f98cSAndroid Build Coastguard Worker## Checksums
451*9507f98cSAndroid Build Coastguard Worker
452*9507f98cSAndroid Build Coastguard Workerleveldb associates checksums with all data it stores in the file system. There
453*9507f98cSAndroid Build Coastguard Workerare two separate controls provided over how aggressively these checksums are
454*9507f98cSAndroid Build Coastguard Workerverified:
455*9507f98cSAndroid Build Coastguard Worker
456*9507f98cSAndroid Build Coastguard Worker`ReadOptions::verify_checksums` may be set to true to force checksum
457*9507f98cSAndroid Build Coastguard Workerverification of all data that is read from the file system on behalf of a
458*9507f98cSAndroid Build Coastguard Workerparticular read.  By default, no such verification is done.
459*9507f98cSAndroid Build Coastguard Worker
460*9507f98cSAndroid Build Coastguard Worker`Options::paranoid_checks` may be set to true before opening a database to make
461*9507f98cSAndroid Build Coastguard Workerthe database implementation raise an error as soon as it detects an internal
462*9507f98cSAndroid Build Coastguard Workercorruption. Depending on which portion of the database has been corrupted, the
463*9507f98cSAndroid Build Coastguard Workererror may be raised when the database is opened, or later by another database
464*9507f98cSAndroid Build Coastguard Workeroperation. By default, paranoid checking is off so that the database can be used
465*9507f98cSAndroid Build Coastguard Workereven if parts of its persistent storage have been corrupted.
466*9507f98cSAndroid Build Coastguard Worker
467*9507f98cSAndroid Build Coastguard WorkerIf a database is corrupted (perhaps it cannot be opened when paranoid checking
468*9507f98cSAndroid Build Coastguard Workeris turned on), the `leveldb::RepairDB` function may be used to recover as much
469*9507f98cSAndroid Build Coastguard Workerof the data as possible
470*9507f98cSAndroid Build Coastguard Worker
471*9507f98cSAndroid Build Coastguard Worker## Approximate Sizes
472*9507f98cSAndroid Build Coastguard Worker
473*9507f98cSAndroid Build Coastguard WorkerThe `GetApproximateSizes` method can used to get the approximate number of bytes
474*9507f98cSAndroid Build Coastguard Workerof file system space used by one or more key ranges.
475*9507f98cSAndroid Build Coastguard Worker
476*9507f98cSAndroid Build Coastguard Worker```c++
477*9507f98cSAndroid Build Coastguard Workerleveldb::Range ranges[2];
478*9507f98cSAndroid Build Coastguard Workerranges[0] = leveldb::Range("a", "c");
479*9507f98cSAndroid Build Coastguard Workerranges[1] = leveldb::Range("x", "z");
480*9507f98cSAndroid Build Coastguard Workeruint64_t sizes[2];
481*9507f98cSAndroid Build Coastguard Workerdb->GetApproximateSizes(ranges, 2, sizes);
482*9507f98cSAndroid Build Coastguard Worker```
483*9507f98cSAndroid Build Coastguard Worker
484*9507f98cSAndroid Build Coastguard WorkerThe preceding call will set `sizes[0]` to the approximate number of bytes of
485*9507f98cSAndroid Build Coastguard Workerfile system space used by the key range `[a..c)` and `sizes[1]` to the
486*9507f98cSAndroid Build Coastguard Workerapproximate number of bytes used by the key range `[x..z)`.
487*9507f98cSAndroid Build Coastguard Worker
488*9507f98cSAndroid Build Coastguard Worker## Environment
489*9507f98cSAndroid Build Coastguard Worker
490*9507f98cSAndroid Build Coastguard WorkerAll file operations (and other operating system calls) issued by the leveldb
491*9507f98cSAndroid Build Coastguard Workerimplementation are routed through a `leveldb::Env` object. Sophisticated clients
492*9507f98cSAndroid Build Coastguard Workermay wish to provide their own Env implementation to get better control.
493*9507f98cSAndroid Build Coastguard WorkerFor example, an application may introduce artificial delays in the file IO
494*9507f98cSAndroid Build Coastguard Workerpaths to limit the impact of leveldb on other activities in the system.
495*9507f98cSAndroid Build Coastguard Worker
496*9507f98cSAndroid Build Coastguard Worker```c++
497*9507f98cSAndroid Build Coastguard Workerclass SlowEnv : public leveldb::Env {
498*9507f98cSAndroid Build Coastguard Worker  ... implementation of the Env interface ...
499*9507f98cSAndroid Build Coastguard Worker};
500*9507f98cSAndroid Build Coastguard Worker
501*9507f98cSAndroid Build Coastguard WorkerSlowEnv env;
502*9507f98cSAndroid Build Coastguard Workerleveldb::Options options;
503*9507f98cSAndroid Build Coastguard Workeroptions.env = &env;
504*9507f98cSAndroid Build Coastguard WorkerStatus s = leveldb::DB::Open(options, ...);
505*9507f98cSAndroid Build Coastguard Worker```
506*9507f98cSAndroid Build Coastguard Worker
507*9507f98cSAndroid Build Coastguard Worker## Porting
508*9507f98cSAndroid Build Coastguard Worker
509*9507f98cSAndroid Build Coastguard Workerleveldb may be ported to a new platform by providing platform specific
510*9507f98cSAndroid Build Coastguard Workerimplementations of the types/methods/functions exported by
511*9507f98cSAndroid Build Coastguard Worker`leveldb/port/port.h`.  See `leveldb/port/port_example.h` for more details.
512*9507f98cSAndroid Build Coastguard Worker
513*9507f98cSAndroid Build Coastguard WorkerIn addition, the new platform may need a new default `leveldb::Env`
514*9507f98cSAndroid Build Coastguard Workerimplementation.  See `leveldb/util/env_posix.h` for an example.
515*9507f98cSAndroid Build Coastguard Worker
516*9507f98cSAndroid Build Coastguard Worker## Other Information
517*9507f98cSAndroid Build Coastguard Worker
518*9507f98cSAndroid Build Coastguard WorkerDetails about the leveldb implementation may be found in the following
519*9507f98cSAndroid Build Coastguard Workerdocuments:
520*9507f98cSAndroid Build Coastguard Worker
521*9507f98cSAndroid Build Coastguard Worker1. [Implementation notes](impl.md)
522*9507f98cSAndroid Build Coastguard Worker2. [Format of an immutable Table file](table_format.md)
523*9507f98cSAndroid Build Coastguard Worker3. [Format of a log file](log_format.md)
524