Cache-efficient index access for large in-memory datasets
Many important applications, such as key-value caches and in-memory databases, can service requests at extremely high speed (e.g., 50 million req/sec), which is made possible by hosting all data in main memory and delivering them over fast network. These applications rely on index structures to organize their data. As the index has to be traversed for each request before the target data can be located, the efficiency of index access is performance-critical. Unfortunately, the index can be too large for the CPU cache even with a moderate dataset. What’s worse, the target dataset’s temporal access locality can be compromised by index walks and the spatial locality can also be compromised if hashing is used. While it only takes one sequential memory access to reach the target data, the index traversal can induce tens, sometimes hundreds of LLC misses. Consequently, these applications have to spend a majority of their time on index access instead of delivering the target data over the network, resulting in compromised performance.
Our efforts on improving index access in data-intensive applications:
We design a software-defined cache to selectively cache the target entries in indexes, which allows the index traversal to be bypassed when accessing the most frequently accessed items. We name this cache Search Lookaside Buffer, or SLB (2017 Symposium on Cloud Computing).
We propose a new ordered index structure, named Wormhole, that has a very low asymptotic lookup cost (O(log L), where L is the length of the key). The low cost is achieved by simultaneously leveraging strengths of three index structures, namely hash table, prefix tree, and B+ tree, to orchestrate a single fast ordered index.