January 31st, 2013 | Published in Google Open Source
We’re pleased to announce C++ B-Tree, a C++ template library that implements B-tree containers with an analogous interface to the standard STL map, set, multimap, and multiset containers. B-trees are well-known data structures for organizing secondary storage, because they are optimized for reading and writing large blocks of data. But the same property that makes B-trees appropriate for use with databases and file systems also makes them appropriate for use in main-memory, just with smaller blocks.
C++ standard libraries commonly implement the map and set data structures using Red-Black trees, which store one element per node, requiring three pointers plus one bit of information per element to keep the tree balanced. By comparison, B-tree containers store a number of elements per node, thereby reducing pointer overhead and saving a significant amount of memory. For small data types, B-tree containers typically reduce memory use by 50 to 80% compared with Red-Black tree containers.
insertion time as a function of tree size
(256 byte node)
Storing multiple elements per node can also improve performance, especially in large containers with inexpensive key-compare functions (e.g., integer keys using less-than), relative to Red-Black tree containers. This is because performance in these cases is effectively governed by the number of cache misses, not the number of key-compare operations. For large data sets, using these B-tree containers will save memory and improve performance.
insertion time as a function of node size
(10M element tree)
These templates are nearly drop-in compatible with the STL map, set, multimap, and multiset types, with one significant exception. Unlike the standard container types, insertions and deletions invalidate outstanding iterators (we provide “safe” alternatives to handle this case).
By Josh MacDonald, 20% Contributor