Cache-efficient index access for large in-memory datasets

Many important applications, such as key-value caches and in-memory databases, can service requests at extremely high speed (e.g., 50 million req/sec), which is made possible by hosting all data in main memory and delivering them over fast network. These applications rely on index structures to organize their data. As the index has to be traversed for each request before the target data can be located, the efficiency of index access is performance-critical. Unfortunately, the index can be too large for the CPU cache even with a moderate dataset. What’s worse, the target dataset’s temporal access locality can be compromised by index walks and the spatial locality can also be compromised if hashing is used. While it only takes one sequential memory access to reach the target data, the index traversal can induce tens, sometimes hundreds of LLC misses. Consequently, these applications have to spend a majority of their time on index access instead of delivering the target data over the network, resulting in compromised performance.

Our efforts on improving index access in data-intensive applications:

We design a software-defined cache to selectively cache the target entries in indexes, which allows the index traversal to be bypassed when accessing the most frequently accessed items. We name this cache Search Lookaside Buffer, or SLB (2017 Symposium on Cloud Computing).

We propose a new ordered index structure, named Wormhole, that has a very low asymptotic lookup cost (O(log L), where L is the length of the key). The low cost is achieved by simultaneously leveraging strengths of three index structures, namely hash table, prefix tree, and B+ tree, to orchestrate a single fast ordered index.

SoCC 2017 Search lookaside buffer: efficient caching for index data structures Heading link

Wormhole: A Fast Ordered Index for In-memory Data Management Heading link