VoltDB is an in-memory database borne out of the H-Store research project spearheaded by Michael Stonebraker. Within that project, Michael started with the premise of building a fully transactional database with the best possible performance by using insights gathered from an in-depth study on database performance that completely removed disk access — the primary limiting factor on database performance. By removing disk access completely, we end up with a completely in-memory database where we can make additional optimizations like removing write-ahead logging, buffer management, and locks and latches. This effort resulted in the research database H-Store which was commercialized as the in-memory database VoltDB. This article takes a deeper dive into VoltDB to understand how it works and where you may benefit from this approach.
VoltDB is based on research from Michael Stonebraker who based most of his work on the core idea of building specialized database systems to solve specialized problems. With VoltDB the fundamental question was how to build an OLTP database that has the highest possible performance. The fundamental shift from traditional databases was to put all database data in-memory. In the paper OLTP through the looking glass, and what we found there, Stonebrake et al. started by simply placing an existing DB in-memory and seeing what happened. It turns out that just running MySQL in memory does not provide an immediate benefit because of how a traditional disk-backed database is designed.
It turns out that a traditional database spends a significant amount of time dealing with disk-based consistency and concurrency control that could simply be avoided through clever optimizations. Let’s look at each of these optimizations one at a time.
Because disk access requires a significant amount of I/O time, databases typically buffer data in an in-memory buffer manager as a performance optimization and form of caching. In a typical implementation, the buffer manager handles all requests for blocks of data in the database. If data is requested from the DB, and that data is not already in memory, the buffer manager reads that block of data from disk, and stores it in memory for future requests. Future requests for a block will look in memory first, and then read from disk where necessary.
Since the buffer manager is a performance optimization for dealing with the slow access to disk, once the entire dataset is moved from disk to in-memory, the buffer manager can be removed from the database design completely.
Concurrency Control (Locking and Latching)
Significant amounts of database research and implementation effort has been spent on concurrency control in the form of locks and latches. Concurrency control is primarily used to avoid having changes to data from multiple users conflict with each other by preventing access to data updates that have not been durably stored on disk. In effect, concurrency control helps make sure that database transactions are performed without violating data integrity contraints.
It turns out that most concurrency control implementations are built as a performance optimization to deal with the fact that transactions accessing disk can be slow. If transactions are slow, we don’t want to force users to wait for one transaction to complete before letting them continue on with their work. But what if transactions are fast? VoltDB researchers found that in-memory transactions involving hundreds of records can be completed in milliseconds, and forcing users to wait milliseconds for a transaction to complete is actually a viable solution for in-memory databases. VoltDB therefore made the choice to remove concurrency control completely, and build the database using a single-threaded core.
Running single threaded and avoiding concurrency control provides some optimizations, but this does limit some things a database can do compared to a traditional disk-based database. In particular, some transactions are controlled by the user, and depending on how the transaction is written or run there may be a significant user controlled pause before a transaction is committed to the database. In a single-threaded system, such transactions would force all other users to block before being able to proceed. VoltDB decided to remove the ability to control transactions defined by ad-hoc SQL queries to avoid this case. Instead, transaction control typically defined in SQL would be turned into stored procedures that can be optimized by the database engine to satisfy concurrency requirements.
One objection to single-threaded designs is that they do not take advantage of the multi-core processors available in servers today. The solution to this problem is shared our data by CPU core and then to rely on the CPU scheduler to distribute queries to the VoltDB instance running on the correct CPU core. Partitioning of data is typically required for scalable data access, by choosing to partition data by CPU core, VoltDB addresses this problem early in the design of the system and allows data to be scaled out over multiple machines or multiple cores on the same machine.
The biggest remaining question to address with an in-memory database is durability. In a traditional database, write-ahead logging persists any changes to a disk-based log before any changes are made to database blocks or buffers. If a disaster occurs, the database state can be replayed from the log to recover the correct state. If we want to avoid disk access completely, how can we ensure durability in the face of disaster? The answer is to replicate data in multiple locations rather than log data for recovery purposes.
VoltDB provides configurable data replication strategies that will copy data changes to multiple locations and a routing layer that directs queries to the correct master dataset. In VoltDB, data is replicated in multiple nodes in an active-active pattern where transactions are written to every partition at the same time, and then confirmed to be committed before returning to the user. In the case of reads, if a query reaches a node that does not have the data, a consistent hashing function is used to redirect the query to the appropriate node.
Putting it all together
VoltDB is an interesting outcome of asking “What if?” and pursuing that answer with rigour. By asking “what if we can move a database entirely in-memory?", a host of additional questions made themselves available, resulting in a complete rethink of the fundamentals of a database. From starting with a disk-based database, VoltDB made successive changes to move everything in-memory, including removing external transactions, becoming completely single threaded, partitioning data by CPU core, and using replication for durability. Together, this results in over an 80% performance improvement over disk-based databases while still providing strong ACID guarantees.
- Paper Review: The CQL continuous query language: semantic foundations and query execution
- Paper Review: BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data
- Paper Review: Informix under CONTROL: Online Query Processing
- Paper Review: An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
- Paper Review: Implementing Data Cubes Efficiently