1*9507f98cSAndroid Build Coastguard Workerleveldb 2*9507f98cSAndroid Build Coastguard Worker======= 3*9507f98cSAndroid Build Coastguard Worker 4*9507f98cSAndroid Build Coastguard Worker_Jeff Dean, Sanjay Ghemawat_ 5*9507f98cSAndroid Build Coastguard Worker 6*9507f98cSAndroid Build Coastguard WorkerThe leveldb library provides a persistent key value store. Keys and values are 7*9507f98cSAndroid Build Coastguard Workerarbitrary byte arrays. The keys are ordered within the key value store 8*9507f98cSAndroid Build Coastguard Workeraccording to a user-specified comparator function. 9*9507f98cSAndroid Build Coastguard Worker 10*9507f98cSAndroid Build Coastguard Worker## Opening A Database 11*9507f98cSAndroid Build Coastguard Worker 12*9507f98cSAndroid Build Coastguard WorkerA leveldb database has a name which corresponds to a file system directory. All 13*9507f98cSAndroid Build Coastguard Workerof the contents of database are stored in this directory. The following example 14*9507f98cSAndroid Build Coastguard Workershows how to open a database, creating it if necessary: 15*9507f98cSAndroid Build Coastguard Worker 16*9507f98cSAndroid Build Coastguard Worker```c++ 17*9507f98cSAndroid Build Coastguard Worker#include <cassert> 18*9507f98cSAndroid Build Coastguard Worker#include "leveldb/db.h" 19*9507f98cSAndroid Build Coastguard Worker 20*9507f98cSAndroid Build Coastguard Workerleveldb::DB* db; 21*9507f98cSAndroid Build Coastguard Workerleveldb::Options options; 22*9507f98cSAndroid Build Coastguard Workeroptions.create_if_missing = true; 23*9507f98cSAndroid Build Coastguard Workerleveldb::Status status = leveldb::DB::Open(options, "/tmp/testdb", &db); 24*9507f98cSAndroid Build Coastguard Workerassert(status.ok()); 25*9507f98cSAndroid Build Coastguard Worker... 26*9507f98cSAndroid Build Coastguard Worker``` 27*9507f98cSAndroid Build Coastguard Worker 28*9507f98cSAndroid Build Coastguard WorkerIf you want to raise an error if the database already exists, add the following 29*9507f98cSAndroid Build Coastguard Workerline before the `leveldb::DB::Open` call: 30*9507f98cSAndroid Build Coastguard Worker 31*9507f98cSAndroid Build Coastguard Worker```c++ 32*9507f98cSAndroid Build Coastguard Workeroptions.error_if_exists = true; 33*9507f98cSAndroid Build Coastguard Worker``` 34*9507f98cSAndroid Build Coastguard Worker 35*9507f98cSAndroid Build Coastguard Worker## Status 36*9507f98cSAndroid Build Coastguard Worker 37*9507f98cSAndroid Build Coastguard WorkerYou may have noticed the `leveldb::Status` type above. Values of this type are 38*9507f98cSAndroid Build Coastguard Workerreturned by most functions in leveldb that may encounter an error. You can check 39*9507f98cSAndroid Build Coastguard Workerif such a result is ok, and also print an associated error message: 40*9507f98cSAndroid Build Coastguard Worker 41*9507f98cSAndroid Build Coastguard Worker```c++ 42*9507f98cSAndroid Build Coastguard Workerleveldb::Status s = ...; 43*9507f98cSAndroid Build Coastguard Workerif (!s.ok()) cerr << s.ToString() << endl; 44*9507f98cSAndroid Build Coastguard Worker``` 45*9507f98cSAndroid Build Coastguard Worker 46*9507f98cSAndroid Build Coastguard Worker## Closing A Database 47*9507f98cSAndroid Build Coastguard Worker 48*9507f98cSAndroid Build Coastguard WorkerWhen you are done with a database, just delete the database object. Example: 49*9507f98cSAndroid Build Coastguard Worker 50*9507f98cSAndroid Build Coastguard Worker```c++ 51*9507f98cSAndroid Build Coastguard Worker... open the db as described above ... 52*9507f98cSAndroid Build Coastguard Worker... do something with db ... 53*9507f98cSAndroid Build Coastguard Workerdelete db; 54*9507f98cSAndroid Build Coastguard Worker``` 55*9507f98cSAndroid Build Coastguard Worker 56*9507f98cSAndroid Build Coastguard Worker## Reads And Writes 57*9507f98cSAndroid Build Coastguard Worker 58*9507f98cSAndroid Build Coastguard WorkerThe database provides Put, Delete, and Get methods to modify/query the database. 59*9507f98cSAndroid Build Coastguard WorkerFor example, the following code moves the value stored under key1 to key2. 60*9507f98cSAndroid Build Coastguard Worker 61*9507f98cSAndroid Build Coastguard Worker```c++ 62*9507f98cSAndroid Build Coastguard Workerstd::string value; 63*9507f98cSAndroid Build Coastguard Workerleveldb::Status s = db->Get(leveldb::ReadOptions(), key1, &value); 64*9507f98cSAndroid Build Coastguard Workerif (s.ok()) s = db->Put(leveldb::WriteOptions(), key2, value); 65*9507f98cSAndroid Build Coastguard Workerif (s.ok()) s = db->Delete(leveldb::WriteOptions(), key1); 66*9507f98cSAndroid Build Coastguard Worker``` 67*9507f98cSAndroid Build Coastguard Worker 68*9507f98cSAndroid Build Coastguard Worker## Atomic Updates 69*9507f98cSAndroid Build Coastguard Worker 70*9507f98cSAndroid Build Coastguard WorkerNote that if the process dies after the Put of key2 but before the delete of 71*9507f98cSAndroid Build Coastguard Workerkey1, the same value may be left stored under multiple keys. Such problems can 72*9507f98cSAndroid Build Coastguard Workerbe avoided by using the `WriteBatch` class to atomically apply a set of updates: 73*9507f98cSAndroid Build Coastguard Worker 74*9507f98cSAndroid Build Coastguard Worker```c++ 75*9507f98cSAndroid Build Coastguard Worker#include "leveldb/write_batch.h" 76*9507f98cSAndroid Build Coastguard Worker... 77*9507f98cSAndroid Build Coastguard Workerstd::string value; 78*9507f98cSAndroid Build Coastguard Workerleveldb::Status s = db->Get(leveldb::ReadOptions(), key1, &value); 79*9507f98cSAndroid Build Coastguard Workerif (s.ok()) { 80*9507f98cSAndroid Build Coastguard Worker leveldb::WriteBatch batch; 81*9507f98cSAndroid Build Coastguard Worker batch.Delete(key1); 82*9507f98cSAndroid Build Coastguard Worker batch.Put(key2, value); 83*9507f98cSAndroid Build Coastguard Worker s = db->Write(leveldb::WriteOptions(), &batch); 84*9507f98cSAndroid Build Coastguard Worker} 85*9507f98cSAndroid Build Coastguard Worker``` 86*9507f98cSAndroid Build Coastguard Worker 87*9507f98cSAndroid Build Coastguard WorkerThe `WriteBatch` holds a sequence of edits to be made to the database, and these 88*9507f98cSAndroid Build Coastguard Workeredits within the batch are applied in order. Note that we called Delete before 89*9507f98cSAndroid Build Coastguard WorkerPut so that if key1 is identical to key2, we do not end up erroneously dropping 90*9507f98cSAndroid Build Coastguard Workerthe value entirely. 91*9507f98cSAndroid Build Coastguard Worker 92*9507f98cSAndroid Build Coastguard WorkerApart from its atomicity benefits, `WriteBatch` may also be used to speed up 93*9507f98cSAndroid Build Coastguard Workerbulk updates by placing lots of individual mutations into the same batch. 94*9507f98cSAndroid Build Coastguard Worker 95*9507f98cSAndroid Build Coastguard Worker## Synchronous Writes 96*9507f98cSAndroid Build Coastguard Worker 97*9507f98cSAndroid Build Coastguard WorkerBy default, each write to leveldb is asynchronous: it returns after pushing the 98*9507f98cSAndroid Build Coastguard Workerwrite from the process into the operating system. The transfer from operating 99*9507f98cSAndroid Build Coastguard Workersystem memory to the underlying persistent storage happens asynchronously. The 100*9507f98cSAndroid Build Coastguard Workersync flag can be turned on for a particular write to make the write operation 101*9507f98cSAndroid Build Coastguard Workernot return until the data being written has been pushed all the way to 102*9507f98cSAndroid Build Coastguard Workerpersistent storage. (On Posix systems, this is implemented by calling either 103*9507f98cSAndroid Build Coastguard Worker`fsync(...)` or `fdatasync(...)` or `msync(..., MS_SYNC)` before the write 104*9507f98cSAndroid Build Coastguard Workeroperation returns.) 105*9507f98cSAndroid Build Coastguard Worker 106*9507f98cSAndroid Build Coastguard Worker```c++ 107*9507f98cSAndroid Build Coastguard Workerleveldb::WriteOptions write_options; 108*9507f98cSAndroid Build Coastguard Workerwrite_options.sync = true; 109*9507f98cSAndroid Build Coastguard Workerdb->Put(write_options, ...); 110*9507f98cSAndroid Build Coastguard Worker``` 111*9507f98cSAndroid Build Coastguard Worker 112*9507f98cSAndroid Build Coastguard WorkerAsynchronous writes are often more than a thousand times as fast as synchronous 113*9507f98cSAndroid Build Coastguard Workerwrites. The downside of asynchronous writes is that a crash of the machine may 114*9507f98cSAndroid Build Coastguard Workercause the last few updates to be lost. Note that a crash of just the writing 115*9507f98cSAndroid Build Coastguard Workerprocess (i.e., not a reboot) will not cause any loss since even when sync is 116*9507f98cSAndroid Build Coastguard Workerfalse, an update is pushed from the process memory into the operating system 117*9507f98cSAndroid Build Coastguard Workerbefore it is considered done. 118*9507f98cSAndroid Build Coastguard Worker 119*9507f98cSAndroid Build Coastguard WorkerAsynchronous writes can often be used safely. For example, when loading a large 120*9507f98cSAndroid Build Coastguard Workeramount of data into the database you can handle lost updates by restarting the 121*9507f98cSAndroid Build Coastguard Workerbulk load after a crash. A hybrid scheme is also possible where every Nth write 122*9507f98cSAndroid Build Coastguard Workeris synchronous, and in the event of a crash, the bulk load is restarted just 123*9507f98cSAndroid Build Coastguard Workerafter the last synchronous write finished by the previous run. (The synchronous 124*9507f98cSAndroid Build Coastguard Workerwrite can update a marker that describes where to restart on a crash.) 125*9507f98cSAndroid Build Coastguard Worker 126*9507f98cSAndroid Build Coastguard Worker`WriteBatch` provides an alternative to asynchronous writes. Multiple updates 127*9507f98cSAndroid Build Coastguard Workermay be placed in the same WriteBatch and applied together using a synchronous 128*9507f98cSAndroid Build Coastguard Workerwrite (i.e., `write_options.sync` is set to true). The extra cost of the 129*9507f98cSAndroid Build Coastguard Workersynchronous write will be amortized across all of the writes in the batch. 130*9507f98cSAndroid Build Coastguard Worker 131*9507f98cSAndroid Build Coastguard Worker## Concurrency 132*9507f98cSAndroid Build Coastguard Worker 133*9507f98cSAndroid Build Coastguard WorkerA database may only be opened by one process at a time. The leveldb 134*9507f98cSAndroid Build Coastguard Workerimplementation acquires a lock from the operating system to prevent misuse. 135*9507f98cSAndroid Build Coastguard WorkerWithin a single process, the same `leveldb::DB` object may be safely shared by 136*9507f98cSAndroid Build Coastguard Workermultiple concurrent threads. I.e., different threads may write into or fetch 137*9507f98cSAndroid Build Coastguard Workeriterators or call Get on the same database without any external synchronization 138*9507f98cSAndroid Build Coastguard Worker(the leveldb implementation will automatically do the required synchronization). 139*9507f98cSAndroid Build Coastguard WorkerHowever other objects (like Iterator and `WriteBatch`) may require external 140*9507f98cSAndroid Build Coastguard Workersynchronization. If two threads share such an object, they must protect access 141*9507f98cSAndroid Build Coastguard Workerto it using their own locking protocol. More details are available in the public 142*9507f98cSAndroid Build Coastguard Workerheader files. 143*9507f98cSAndroid Build Coastguard Worker 144*9507f98cSAndroid Build Coastguard Worker## Iteration 145*9507f98cSAndroid Build Coastguard Worker 146*9507f98cSAndroid Build Coastguard WorkerThe following example demonstrates how to print all key,value pairs in a 147*9507f98cSAndroid Build Coastguard Workerdatabase. 148*9507f98cSAndroid Build Coastguard Worker 149*9507f98cSAndroid Build Coastguard Worker```c++ 150*9507f98cSAndroid Build Coastguard Workerleveldb::Iterator* it = db->NewIterator(leveldb::ReadOptions()); 151*9507f98cSAndroid Build Coastguard Workerfor (it->SeekToFirst(); it->Valid(); it->Next()) { 152*9507f98cSAndroid Build Coastguard Worker cout << it->key().ToString() << ": " << it->value().ToString() << endl; 153*9507f98cSAndroid Build Coastguard Worker} 154*9507f98cSAndroid Build Coastguard Workerassert(it->status().ok()); // Check for any errors found during the scan 155*9507f98cSAndroid Build Coastguard Workerdelete it; 156*9507f98cSAndroid Build Coastguard Worker``` 157*9507f98cSAndroid Build Coastguard Worker 158*9507f98cSAndroid Build Coastguard WorkerThe following variation shows how to process just the keys in the range 159*9507f98cSAndroid Build Coastguard Worker[start,limit): 160*9507f98cSAndroid Build Coastguard Worker 161*9507f98cSAndroid Build Coastguard Worker```c++ 162*9507f98cSAndroid Build Coastguard Workerfor (it->Seek(start); 163*9507f98cSAndroid Build Coastguard Worker it->Valid() && it->key().ToString() < limit; 164*9507f98cSAndroid Build Coastguard Worker it->Next()) { 165*9507f98cSAndroid Build Coastguard Worker ... 166*9507f98cSAndroid Build Coastguard Worker} 167*9507f98cSAndroid Build Coastguard Worker``` 168*9507f98cSAndroid Build Coastguard Worker 169*9507f98cSAndroid Build Coastguard WorkerYou can also process entries in reverse order. (Caveat: reverse iteration may be 170*9507f98cSAndroid Build Coastguard Workersomewhat slower than forward iteration.) 171*9507f98cSAndroid Build Coastguard Worker 172*9507f98cSAndroid Build Coastguard Worker```c++ 173*9507f98cSAndroid Build Coastguard Workerfor (it->SeekToLast(); it->Valid(); it->Prev()) { 174*9507f98cSAndroid Build Coastguard Worker ... 175*9507f98cSAndroid Build Coastguard Worker} 176*9507f98cSAndroid Build Coastguard Worker``` 177*9507f98cSAndroid Build Coastguard Worker 178*9507f98cSAndroid Build Coastguard Worker## Snapshots 179*9507f98cSAndroid Build Coastguard Worker 180*9507f98cSAndroid Build Coastguard WorkerSnapshots provide consistent read-only views over the entire state of the 181*9507f98cSAndroid Build Coastguard Workerkey-value store. `ReadOptions::snapshot` may be non-NULL to indicate that a 182*9507f98cSAndroid Build Coastguard Workerread should operate on a particular version of the DB state. If 183*9507f98cSAndroid Build Coastguard Worker`ReadOptions::snapshot` is NULL, the read will operate on an implicit snapshot 184*9507f98cSAndroid Build Coastguard Workerof the current state. 185*9507f98cSAndroid Build Coastguard Worker 186*9507f98cSAndroid Build Coastguard WorkerSnapshots are created by the `DB::GetSnapshot()` method: 187*9507f98cSAndroid Build Coastguard Worker 188*9507f98cSAndroid Build Coastguard Worker```c++ 189*9507f98cSAndroid Build Coastguard Workerleveldb::ReadOptions options; 190*9507f98cSAndroid Build Coastguard Workeroptions.snapshot = db->GetSnapshot(); 191*9507f98cSAndroid Build Coastguard Worker... apply some updates to db ... 192*9507f98cSAndroid Build Coastguard Workerleveldb::Iterator* iter = db->NewIterator(options); 193*9507f98cSAndroid Build Coastguard Worker... read using iter to view the state when the snapshot was created ... 194*9507f98cSAndroid Build Coastguard Workerdelete iter; 195*9507f98cSAndroid Build Coastguard Workerdb->ReleaseSnapshot(options.snapshot); 196*9507f98cSAndroid Build Coastguard Worker``` 197*9507f98cSAndroid Build Coastguard Worker 198*9507f98cSAndroid Build Coastguard WorkerNote that when a snapshot is no longer needed, it should be released using the 199*9507f98cSAndroid Build Coastguard Worker`DB::ReleaseSnapshot` interface. This allows the implementation to get rid of 200*9507f98cSAndroid Build Coastguard Workerstate that was being maintained just to support reading as of that snapshot. 201*9507f98cSAndroid Build Coastguard Worker 202*9507f98cSAndroid Build Coastguard Worker## Slice 203*9507f98cSAndroid Build Coastguard Worker 204*9507f98cSAndroid Build Coastguard WorkerThe return value of the `it->key()` and `it->value()` calls above are instances 205*9507f98cSAndroid Build Coastguard Workerof the `leveldb::Slice` type. Slice is a simple structure that contains a length 206*9507f98cSAndroid Build Coastguard Workerand a pointer to an external byte array. Returning a Slice is a cheaper 207*9507f98cSAndroid Build Coastguard Workeralternative to returning a `std::string` since we do not need to copy 208*9507f98cSAndroid Build Coastguard Workerpotentially large keys and values. In addition, leveldb methods do not return 209*9507f98cSAndroid Build Coastguard Workernull-terminated C-style strings since leveldb keys and values are allowed to 210*9507f98cSAndroid Build Coastguard Workercontain `'\0'` bytes. 211*9507f98cSAndroid Build Coastguard Worker 212*9507f98cSAndroid Build Coastguard WorkerC++ strings and null-terminated C-style strings can be easily converted to a 213*9507f98cSAndroid Build Coastguard WorkerSlice: 214*9507f98cSAndroid Build Coastguard Worker 215*9507f98cSAndroid Build Coastguard Worker```c++ 216*9507f98cSAndroid Build Coastguard Workerleveldb::Slice s1 = "hello"; 217*9507f98cSAndroid Build Coastguard Worker 218*9507f98cSAndroid Build Coastguard Workerstd::string str("world"); 219*9507f98cSAndroid Build Coastguard Workerleveldb::Slice s2 = str; 220*9507f98cSAndroid Build Coastguard Worker``` 221*9507f98cSAndroid Build Coastguard Worker 222*9507f98cSAndroid Build Coastguard WorkerA Slice can be easily converted back to a C++ string: 223*9507f98cSAndroid Build Coastguard Worker 224*9507f98cSAndroid Build Coastguard Worker```c++ 225*9507f98cSAndroid Build Coastguard Workerstd::string str = s1.ToString(); 226*9507f98cSAndroid Build Coastguard Workerassert(str == std::string("hello")); 227*9507f98cSAndroid Build Coastguard Worker``` 228*9507f98cSAndroid Build Coastguard Worker 229*9507f98cSAndroid Build Coastguard WorkerBe careful when using Slices since it is up to the caller to ensure that the 230*9507f98cSAndroid Build Coastguard Workerexternal byte array into which the Slice points remains live while the Slice is 231*9507f98cSAndroid Build Coastguard Workerin use. For example, the following is buggy: 232*9507f98cSAndroid Build Coastguard Worker 233*9507f98cSAndroid Build Coastguard Worker```c++ 234*9507f98cSAndroid Build Coastguard Workerleveldb::Slice slice; 235*9507f98cSAndroid Build Coastguard Workerif (...) { 236*9507f98cSAndroid Build Coastguard Worker std::string str = ...; 237*9507f98cSAndroid Build Coastguard Worker slice = str; 238*9507f98cSAndroid Build Coastguard Worker} 239*9507f98cSAndroid Build Coastguard WorkerUse(slice); 240*9507f98cSAndroid Build Coastguard Worker``` 241*9507f98cSAndroid Build Coastguard Worker 242*9507f98cSAndroid Build Coastguard WorkerWhen the if statement goes out of scope, str will be destroyed and the backing 243*9507f98cSAndroid Build Coastguard Workerstorage for slice will disappear. 244*9507f98cSAndroid Build Coastguard Worker 245*9507f98cSAndroid Build Coastguard Worker## Comparators 246*9507f98cSAndroid Build Coastguard Worker 247*9507f98cSAndroid Build Coastguard WorkerThe preceding examples used the default ordering function for key, which orders 248*9507f98cSAndroid Build Coastguard Workerbytes lexicographically. You can however supply a custom comparator when opening 249*9507f98cSAndroid Build Coastguard Workera database. For example, suppose each database key consists of two numbers and 250*9507f98cSAndroid Build Coastguard Workerwe should sort by the first number, breaking ties by the second number. First, 251*9507f98cSAndroid Build Coastguard Workerdefine a proper subclass of `leveldb::Comparator` that expresses these rules: 252*9507f98cSAndroid Build Coastguard Worker 253*9507f98cSAndroid Build Coastguard Worker```c++ 254*9507f98cSAndroid Build Coastguard Workerclass TwoPartComparator : public leveldb::Comparator { 255*9507f98cSAndroid Build Coastguard Worker public: 256*9507f98cSAndroid Build Coastguard Worker // Three-way comparison function: 257*9507f98cSAndroid Build Coastguard Worker // if a < b: negative result 258*9507f98cSAndroid Build Coastguard Worker // if a > b: positive result 259*9507f98cSAndroid Build Coastguard Worker // else: zero result 260*9507f98cSAndroid Build Coastguard Worker int Compare(const leveldb::Slice& a, const leveldb::Slice& b) const { 261*9507f98cSAndroid Build Coastguard Worker int a1, a2, b1, b2; 262*9507f98cSAndroid Build Coastguard Worker ParseKey(a, &a1, &a2); 263*9507f98cSAndroid Build Coastguard Worker ParseKey(b, &b1, &b2); 264*9507f98cSAndroid Build Coastguard Worker if (a1 < b1) return -1; 265*9507f98cSAndroid Build Coastguard Worker if (a1 > b1) return +1; 266*9507f98cSAndroid Build Coastguard Worker if (a2 < b2) return -1; 267*9507f98cSAndroid Build Coastguard Worker if (a2 > b2) return +1; 268*9507f98cSAndroid Build Coastguard Worker return 0; 269*9507f98cSAndroid Build Coastguard Worker } 270*9507f98cSAndroid Build Coastguard Worker 271*9507f98cSAndroid Build Coastguard Worker // Ignore the following methods for now: 272*9507f98cSAndroid Build Coastguard Worker const char* Name() const { return "TwoPartComparator"; } 273*9507f98cSAndroid Build Coastguard Worker void FindShortestSeparator(std::string*, const leveldb::Slice&) const {} 274*9507f98cSAndroid Build Coastguard Worker void FindShortSuccessor(std::string*) const {} 275*9507f98cSAndroid Build Coastguard Worker}; 276*9507f98cSAndroid Build Coastguard Worker``` 277*9507f98cSAndroid Build Coastguard Worker 278*9507f98cSAndroid Build Coastguard WorkerNow create a database using this custom comparator: 279*9507f98cSAndroid Build Coastguard Worker 280*9507f98cSAndroid Build Coastguard Worker```c++ 281*9507f98cSAndroid Build Coastguard WorkerTwoPartComparator cmp; 282*9507f98cSAndroid Build Coastguard Workerleveldb::DB* db; 283*9507f98cSAndroid Build Coastguard Workerleveldb::Options options; 284*9507f98cSAndroid Build Coastguard Workeroptions.create_if_missing = true; 285*9507f98cSAndroid Build Coastguard Workeroptions.comparator = &cmp; 286*9507f98cSAndroid Build Coastguard Workerleveldb::Status status = leveldb::DB::Open(options, "/tmp/testdb", &db); 287*9507f98cSAndroid Build Coastguard Worker... 288*9507f98cSAndroid Build Coastguard Worker``` 289*9507f98cSAndroid Build Coastguard Worker 290*9507f98cSAndroid Build Coastguard Worker### Backwards compatibility 291*9507f98cSAndroid Build Coastguard Worker 292*9507f98cSAndroid Build Coastguard WorkerThe result of the comparator's Name method is attached to the database when it 293*9507f98cSAndroid Build Coastguard Workeris created, and is checked on every subsequent database open. If the name 294*9507f98cSAndroid Build Coastguard Workerchanges, the `leveldb::DB::Open` call will fail. Therefore, change the name if 295*9507f98cSAndroid Build Coastguard Workerand only if the new key format and comparison function are incompatible with 296*9507f98cSAndroid Build Coastguard Workerexisting databases, and it is ok to discard the contents of all existing 297*9507f98cSAndroid Build Coastguard Workerdatabases. 298*9507f98cSAndroid Build Coastguard Worker 299*9507f98cSAndroid Build Coastguard WorkerYou can however still gradually evolve your key format over time with a little 300*9507f98cSAndroid Build Coastguard Workerbit of pre-planning. For example, you could store a version number at the end of 301*9507f98cSAndroid Build Coastguard Workereach key (one byte should suffice for most uses). When you wish to switch to a 302*9507f98cSAndroid Build Coastguard Workernew key format (e.g., adding an optional third part to the keys processed by 303*9507f98cSAndroid Build Coastguard Worker`TwoPartComparator`), (a) keep the same comparator name (b) increment the 304*9507f98cSAndroid Build Coastguard Workerversion number for new keys (c) change the comparator function so it uses the 305*9507f98cSAndroid Build Coastguard Workerversion numbers found in the keys to decide how to interpret them. 306*9507f98cSAndroid Build Coastguard Worker 307*9507f98cSAndroid Build Coastguard Worker## Performance 308*9507f98cSAndroid Build Coastguard Worker 309*9507f98cSAndroid Build Coastguard WorkerPerformance can be tuned by changing the default values of the types defined in 310*9507f98cSAndroid Build Coastguard Worker`include/options.h`. 311*9507f98cSAndroid Build Coastguard Worker 312*9507f98cSAndroid Build Coastguard Worker### Block size 313*9507f98cSAndroid Build Coastguard Worker 314*9507f98cSAndroid Build Coastguard Workerleveldb groups adjacent keys together into the same block and such a block is 315*9507f98cSAndroid Build Coastguard Workerthe unit of transfer to and from persistent storage. The default block size is 316*9507f98cSAndroid Build Coastguard Workerapproximately 4096 uncompressed bytes. Applications that mostly do bulk scans 317*9507f98cSAndroid Build Coastguard Workerover the contents of the database may wish to increase this size. Applications 318*9507f98cSAndroid Build Coastguard Workerthat do a lot of point reads of small values may wish to switch to a smaller 319*9507f98cSAndroid Build Coastguard Workerblock size if performance measurements indicate an improvement. There isn't much 320*9507f98cSAndroid Build Coastguard Workerbenefit in using blocks smaller than one kilobyte, or larger than a few 321*9507f98cSAndroid Build Coastguard Workermegabytes. Also note that compression will be more effective with larger block 322*9507f98cSAndroid Build Coastguard Workersizes. 323*9507f98cSAndroid Build Coastguard Worker 324*9507f98cSAndroid Build Coastguard Worker### Compression 325*9507f98cSAndroid Build Coastguard Worker 326*9507f98cSAndroid Build Coastguard WorkerEach block is individually compressed before being written to persistent 327*9507f98cSAndroid Build Coastguard Workerstorage. Compression is on by default since the default compression method is 328*9507f98cSAndroid Build Coastguard Workervery fast, and is automatically disabled for uncompressible data. In rare cases, 329*9507f98cSAndroid Build Coastguard Workerapplications may want to disable compression entirely, but should only do so if 330*9507f98cSAndroid Build Coastguard Workerbenchmarks show a performance improvement: 331*9507f98cSAndroid Build Coastguard Worker 332*9507f98cSAndroid Build Coastguard Worker```c++ 333*9507f98cSAndroid Build Coastguard Workerleveldb::Options options; 334*9507f98cSAndroid Build Coastguard Workeroptions.compression = leveldb::kNoCompression; 335*9507f98cSAndroid Build Coastguard Worker... leveldb::DB::Open(options, name, ...) .... 336*9507f98cSAndroid Build Coastguard Worker``` 337*9507f98cSAndroid Build Coastguard Worker 338*9507f98cSAndroid Build Coastguard Worker### Cache 339*9507f98cSAndroid Build Coastguard Worker 340*9507f98cSAndroid Build Coastguard WorkerThe contents of the database are stored in a set of files in the filesystem and 341*9507f98cSAndroid Build Coastguard Workereach file stores a sequence of compressed blocks. If options.block_cache is 342*9507f98cSAndroid Build Coastguard Workernon-NULL, it is used to cache frequently used uncompressed block contents. 343*9507f98cSAndroid Build Coastguard Worker 344*9507f98cSAndroid Build Coastguard Worker```c++ 345*9507f98cSAndroid Build Coastguard Worker#include "leveldb/cache.h" 346*9507f98cSAndroid Build Coastguard Worker 347*9507f98cSAndroid Build Coastguard Workerleveldb::Options options; 348*9507f98cSAndroid Build Coastguard Workeroptions.block_cache = leveldb::NewLRUCache(100 * 1048576); // 100MB cache 349*9507f98cSAndroid Build Coastguard Workerleveldb::DB* db; 350*9507f98cSAndroid Build Coastguard Workerleveldb::DB::Open(options, name, &db); 351*9507f98cSAndroid Build Coastguard Worker... use the db ... 352*9507f98cSAndroid Build Coastguard Workerdelete db 353*9507f98cSAndroid Build Coastguard Workerdelete options.block_cache; 354*9507f98cSAndroid Build Coastguard Worker``` 355*9507f98cSAndroid Build Coastguard Worker 356*9507f98cSAndroid Build Coastguard WorkerNote that the cache holds uncompressed data, and therefore it should be sized 357*9507f98cSAndroid Build Coastguard Workeraccording to application level data sizes, without any reduction from 358*9507f98cSAndroid Build Coastguard Workercompression. (Caching of compressed blocks is left to the operating system 359*9507f98cSAndroid Build Coastguard Workerbuffer cache, or any custom Env implementation provided by the client.) 360*9507f98cSAndroid Build Coastguard Worker 361*9507f98cSAndroid Build Coastguard WorkerWhen performing a bulk read, the application may wish to disable caching so that 362*9507f98cSAndroid Build Coastguard Workerthe data processed by the bulk read does not end up displacing most of the 363*9507f98cSAndroid Build Coastguard Workercached contents. A per-iterator option can be used to achieve this: 364*9507f98cSAndroid Build Coastguard Worker 365*9507f98cSAndroid Build Coastguard Worker```c++ 366*9507f98cSAndroid Build Coastguard Workerleveldb::ReadOptions options; 367*9507f98cSAndroid Build Coastguard Workeroptions.fill_cache = false; 368*9507f98cSAndroid Build Coastguard Workerleveldb::Iterator* it = db->NewIterator(options); 369*9507f98cSAndroid Build Coastguard Workerfor (it->SeekToFirst(); it->Valid(); it->Next()) { 370*9507f98cSAndroid Build Coastguard Worker ... 371*9507f98cSAndroid Build Coastguard Worker} 372*9507f98cSAndroid Build Coastguard Worker``` 373*9507f98cSAndroid Build Coastguard Worker 374*9507f98cSAndroid Build Coastguard Worker### Key Layout 375*9507f98cSAndroid Build Coastguard Worker 376*9507f98cSAndroid Build Coastguard WorkerNote that the unit of disk transfer and caching is a block. Adjacent keys 377*9507f98cSAndroid Build Coastguard Worker(according to the database sort order) will usually be placed in the same block. 378*9507f98cSAndroid Build Coastguard WorkerTherefore the application can improve its performance by placing keys that are 379*9507f98cSAndroid Build Coastguard Workeraccessed together near each other and placing infrequently used keys in a 380*9507f98cSAndroid Build Coastguard Workerseparate region of the key space. 381*9507f98cSAndroid Build Coastguard Worker 382*9507f98cSAndroid Build Coastguard WorkerFor example, suppose we are implementing a simple file system on top of leveldb. 383*9507f98cSAndroid Build Coastguard WorkerThe types of entries we might wish to store are: 384*9507f98cSAndroid Build Coastguard Worker 385*9507f98cSAndroid Build Coastguard Worker filename -> permission-bits, length, list of file_block_ids 386*9507f98cSAndroid Build Coastguard Worker file_block_id -> data 387*9507f98cSAndroid Build Coastguard Worker 388*9507f98cSAndroid Build Coastguard WorkerWe might want to prefix filename keys with one letter (say '/') and the 389*9507f98cSAndroid Build Coastguard Worker`file_block_id` keys with a different letter (say '0') so that scans over just 390*9507f98cSAndroid Build Coastguard Workerthe metadata do not force us to fetch and cache bulky file contents. 391*9507f98cSAndroid Build Coastguard Worker 392*9507f98cSAndroid Build Coastguard Worker### Filters 393*9507f98cSAndroid Build Coastguard Worker 394*9507f98cSAndroid Build Coastguard WorkerBecause of the way leveldb data is organized on disk, a single `Get()` call may 395*9507f98cSAndroid Build Coastguard Workerinvolve multiple reads from disk. The optional FilterPolicy mechanism can be 396*9507f98cSAndroid Build Coastguard Workerused to reduce the number of disk reads substantially. 397*9507f98cSAndroid Build Coastguard Worker 398*9507f98cSAndroid Build Coastguard Worker```c++ 399*9507f98cSAndroid Build Coastguard Workerleveldb::Options options; 400*9507f98cSAndroid Build Coastguard Workeroptions.filter_policy = NewBloomFilterPolicy(10); 401*9507f98cSAndroid Build Coastguard Workerleveldb::DB* db; 402*9507f98cSAndroid Build Coastguard Workerleveldb::DB::Open(options, "/tmp/testdb", &db); 403*9507f98cSAndroid Build Coastguard Worker... use the database ... 404*9507f98cSAndroid Build Coastguard Workerdelete db; 405*9507f98cSAndroid Build Coastguard Workerdelete options.filter_policy; 406*9507f98cSAndroid Build Coastguard Worker``` 407*9507f98cSAndroid Build Coastguard Worker 408*9507f98cSAndroid Build Coastguard WorkerThe preceding code associates a Bloom filter based filtering policy with the 409*9507f98cSAndroid Build Coastguard Workerdatabase. Bloom filter based filtering relies on keeping some number of bits of 410*9507f98cSAndroid Build Coastguard Workerdata in memory per key (in this case 10 bits per key since that is the argument 411*9507f98cSAndroid Build Coastguard Workerwe passed to `NewBloomFilterPolicy`). This filter will reduce the number of 412*9507f98cSAndroid Build Coastguard Workerunnecessary disk reads needed for Get() calls by a factor of approximately 413*9507f98cSAndroid Build Coastguard Workera 100. Increasing the bits per key will lead to a larger reduction at the cost 414*9507f98cSAndroid Build Coastguard Workerof more memory usage. We recommend that applications whose working set does not 415*9507f98cSAndroid Build Coastguard Workerfit in memory and that do a lot of random reads set a filter policy. 416*9507f98cSAndroid Build Coastguard Worker 417*9507f98cSAndroid Build Coastguard WorkerIf you are using a custom comparator, you should ensure that the filter policy 418*9507f98cSAndroid Build Coastguard Workeryou are using is compatible with your comparator. For example, consider a 419*9507f98cSAndroid Build Coastguard Workercomparator that ignores trailing spaces when comparing keys. 420*9507f98cSAndroid Build Coastguard Worker`NewBloomFilterPolicy` must not be used with such a comparator. Instead, the 421*9507f98cSAndroid Build Coastguard Workerapplication should provide a custom filter policy that also ignores trailing 422*9507f98cSAndroid Build Coastguard Workerspaces. For example: 423*9507f98cSAndroid Build Coastguard Worker 424*9507f98cSAndroid Build Coastguard Worker```c++ 425*9507f98cSAndroid Build Coastguard Workerclass CustomFilterPolicy : public leveldb::FilterPolicy { 426*9507f98cSAndroid Build Coastguard Worker private: 427*9507f98cSAndroid Build Coastguard Worker FilterPolicy* builtin_policy_; 428*9507f98cSAndroid Build Coastguard Worker 429*9507f98cSAndroid Build Coastguard Worker public: 430*9507f98cSAndroid Build Coastguard Worker CustomFilterPolicy() : builtin_policy_(NewBloomFilterPolicy(10)) {} 431*9507f98cSAndroid Build Coastguard Worker ~CustomFilterPolicy() { delete builtin_policy_; } 432*9507f98cSAndroid Build Coastguard Worker 433*9507f98cSAndroid Build Coastguard Worker const char* Name() const { return "IgnoreTrailingSpacesFilter"; } 434*9507f98cSAndroid Build Coastguard Worker 435*9507f98cSAndroid Build Coastguard Worker void CreateFilter(const Slice* keys, int n, std::string* dst) const { 436*9507f98cSAndroid Build Coastguard Worker // Use builtin bloom filter code after removing trailing spaces 437*9507f98cSAndroid Build Coastguard Worker std::vector<Slice> trimmed(n); 438*9507f98cSAndroid Build Coastguard Worker for (int i = 0; i < n; i++) { 439*9507f98cSAndroid Build Coastguard Worker trimmed[i] = RemoveTrailingSpaces(keys[i]); 440*9507f98cSAndroid Build Coastguard Worker } 441*9507f98cSAndroid Build Coastguard Worker return builtin_policy_->CreateFilter(trimmed.data(), n, dst); 442*9507f98cSAndroid Build Coastguard Worker } 443*9507f98cSAndroid Build Coastguard Worker}; 444*9507f98cSAndroid Build Coastguard Worker``` 445*9507f98cSAndroid Build Coastguard Worker 446*9507f98cSAndroid Build Coastguard WorkerAdvanced applications may provide a filter policy that does not use a bloom 447*9507f98cSAndroid Build Coastguard Workerfilter but uses some other mechanism for summarizing a set of keys. See 448*9507f98cSAndroid Build Coastguard Worker`leveldb/filter_policy.h` for detail. 449*9507f98cSAndroid Build Coastguard Worker 450*9507f98cSAndroid Build Coastguard Worker## Checksums 451*9507f98cSAndroid Build Coastguard Worker 452*9507f98cSAndroid Build Coastguard Workerleveldb associates checksums with all data it stores in the file system. There 453*9507f98cSAndroid Build Coastguard Workerare two separate controls provided over how aggressively these checksums are 454*9507f98cSAndroid Build Coastguard Workerverified: 455*9507f98cSAndroid Build Coastguard Worker 456*9507f98cSAndroid Build Coastguard Worker`ReadOptions::verify_checksums` may be set to true to force checksum 457*9507f98cSAndroid Build Coastguard Workerverification of all data that is read from the file system on behalf of a 458*9507f98cSAndroid Build Coastguard Workerparticular read. By default, no such verification is done. 459*9507f98cSAndroid Build Coastguard Worker 460*9507f98cSAndroid Build Coastguard Worker`Options::paranoid_checks` may be set to true before opening a database to make 461*9507f98cSAndroid Build Coastguard Workerthe database implementation raise an error as soon as it detects an internal 462*9507f98cSAndroid Build Coastguard Workercorruption. Depending on which portion of the database has been corrupted, the 463*9507f98cSAndroid Build Coastguard Workererror may be raised when the database is opened, or later by another database 464*9507f98cSAndroid Build Coastguard Workeroperation. By default, paranoid checking is off so that the database can be used 465*9507f98cSAndroid Build Coastguard Workereven if parts of its persistent storage have been corrupted. 466*9507f98cSAndroid Build Coastguard Worker 467*9507f98cSAndroid Build Coastguard WorkerIf a database is corrupted (perhaps it cannot be opened when paranoid checking 468*9507f98cSAndroid Build Coastguard Workeris turned on), the `leveldb::RepairDB` function may be used to recover as much 469*9507f98cSAndroid Build Coastguard Workerof the data as possible 470*9507f98cSAndroid Build Coastguard Worker 471*9507f98cSAndroid Build Coastguard Worker## Approximate Sizes 472*9507f98cSAndroid Build Coastguard Worker 473*9507f98cSAndroid Build Coastguard WorkerThe `GetApproximateSizes` method can used to get the approximate number of bytes 474*9507f98cSAndroid Build Coastguard Workerof file system space used by one or more key ranges. 475*9507f98cSAndroid Build Coastguard Worker 476*9507f98cSAndroid Build Coastguard Worker```c++ 477*9507f98cSAndroid Build Coastguard Workerleveldb::Range ranges[2]; 478*9507f98cSAndroid Build Coastguard Workerranges[0] = leveldb::Range("a", "c"); 479*9507f98cSAndroid Build Coastguard Workerranges[1] = leveldb::Range("x", "z"); 480*9507f98cSAndroid Build Coastguard Workeruint64_t sizes[2]; 481*9507f98cSAndroid Build Coastguard Workerdb->GetApproximateSizes(ranges, 2, sizes); 482*9507f98cSAndroid Build Coastguard Worker``` 483*9507f98cSAndroid Build Coastguard Worker 484*9507f98cSAndroid Build Coastguard WorkerThe preceding call will set `sizes[0]` to the approximate number of bytes of 485*9507f98cSAndroid Build Coastguard Workerfile system space used by the key range `[a..c)` and `sizes[1]` to the 486*9507f98cSAndroid Build Coastguard Workerapproximate number of bytes used by the key range `[x..z)`. 487*9507f98cSAndroid Build Coastguard Worker 488*9507f98cSAndroid Build Coastguard Worker## Environment 489*9507f98cSAndroid Build Coastguard Worker 490*9507f98cSAndroid Build Coastguard WorkerAll file operations (and other operating system calls) issued by the leveldb 491*9507f98cSAndroid Build Coastguard Workerimplementation are routed through a `leveldb::Env` object. Sophisticated clients 492*9507f98cSAndroid Build Coastguard Workermay wish to provide their own Env implementation to get better control. 493*9507f98cSAndroid Build Coastguard WorkerFor example, an application may introduce artificial delays in the file IO 494*9507f98cSAndroid Build Coastguard Workerpaths to limit the impact of leveldb on other activities in the system. 495*9507f98cSAndroid Build Coastguard Worker 496*9507f98cSAndroid Build Coastguard Worker```c++ 497*9507f98cSAndroid Build Coastguard Workerclass SlowEnv : public leveldb::Env { 498*9507f98cSAndroid Build Coastguard Worker ... implementation of the Env interface ... 499*9507f98cSAndroid Build Coastguard Worker}; 500*9507f98cSAndroid Build Coastguard Worker 501*9507f98cSAndroid Build Coastguard WorkerSlowEnv env; 502*9507f98cSAndroid Build Coastguard Workerleveldb::Options options; 503*9507f98cSAndroid Build Coastguard Workeroptions.env = &env; 504*9507f98cSAndroid Build Coastguard WorkerStatus s = leveldb::DB::Open(options, ...); 505*9507f98cSAndroid Build Coastguard Worker``` 506*9507f98cSAndroid Build Coastguard Worker 507*9507f98cSAndroid Build Coastguard Worker## Porting 508*9507f98cSAndroid Build Coastguard Worker 509*9507f98cSAndroid Build Coastguard Workerleveldb may be ported to a new platform by providing platform specific 510*9507f98cSAndroid Build Coastguard Workerimplementations of the types/methods/functions exported by 511*9507f98cSAndroid Build Coastguard Worker`leveldb/port/port.h`. See `leveldb/port/port_example.h` for more details. 512*9507f98cSAndroid Build Coastguard Worker 513*9507f98cSAndroid Build Coastguard WorkerIn addition, the new platform may need a new default `leveldb::Env` 514*9507f98cSAndroid Build Coastguard Workerimplementation. See `leveldb/util/env_posix.h` for an example. 515*9507f98cSAndroid Build Coastguard Worker 516*9507f98cSAndroid Build Coastguard Worker## Other Information 517*9507f98cSAndroid Build Coastguard Worker 518*9507f98cSAndroid Build Coastguard WorkerDetails about the leveldb implementation may be found in the following 519*9507f98cSAndroid Build Coastguard Workerdocuments: 520*9507f98cSAndroid Build Coastguard Worker 521*9507f98cSAndroid Build Coastguard Worker1. [Implementation notes](impl.md) 522*9507f98cSAndroid Build Coastguard Worker2. [Format of an immutable Table file](table_format.md) 523*9507f98cSAndroid Build Coastguard Worker3. [Format of a log file](log_format.md) 524