Title and Author of Paper

Cap Twelve Years Later: How the “Rules” Have Changed. Eric Brewer.

Summary

This article provides an exploration of the CAP Theorem and how it relates to database system design. The author argues that, since partitions are likely to happen, the system designer can introduce methods for safely recovering from partitions to compensate. This strategy allows the database to continue to provide availability during a partition, and enforce consistency once the partition is resolved.

Recapping CAP

To start, let’s recap the CAP theorem. It states that database that is available over a network can have at most two out of the following three properties:

  • ( C ) consistency: meaning a single up-to-date copy of the data exists.
  • ( A ) high availability: update operations are always available.
  • ( P ) partition tolerance: the database operates in the face of network partitions.

In essence, the CAP theorem prohibits the design of a database with “perfect availability and consistency in the presence of partitions”.

Dealing with Partitions

The paper equates partition tolerance with latency (and timeout) by adding in the concept of a partition decision. The partition decision is a choice made by the system when faced with an operation that “times out”. Do you choose to (1) cancel the operation and decrease availability, or (2) proceed with the operation and risk inconsistency. Retrying indefinitely is in essence choosing C (consistency) over A (availability). By this argument, a partition is a time bound on communication, and inconsistencies may arise whenever two sides of the system are moving forward while being partitioned (i.e., without communication). Since we cannot guarantee the absence of timeouts, the challenge then is to mitigate the effect of the effects of a partition on consistency and availability.

Partition Recovery

At some point, communication between the system components will be restored and (1) the state on both sides of the partition must be made consistent and (2) any mistakes must be undone. The rest of the paper discusses strategies for achieving these goals.

The first strategy is using commutative replicated data types (CRDTs). CRDTs are a class of data structures that provably converge to a consistent state after a partition is resolved.

A second strategy is using compensating transactions to fix any inconsistencies after a partition is resolved. The compensating transaction is responsible for recovering a consistent state by reconciling the view of the system that exists on each side of the partition.

Wrapping Up

Any time a shared-data system is distributed over multiple machines you are faced with the possibility of a network partition. This paper outlines a few methods for reasoning about system state given this reality. The discussion of methods for handling partition recovery is brief but the additional resources in the paper provide a jumping off point for additional research.