Paper Review: Consistency Analysis in Bloom: a CALM and Collected Approach

Title and Author of Paper

Consistency Analysis in Bloom: a CALM and Collected Approach. Alvaro et al.

Summary

Distributed programming is difficult for even experienced developers to get correct. Understanding the tradeoff between consistency, availability, and latency, while guaranteeing data correctness, provides a wealth of problems for the application developer. This paper presents a language and method for programmatically verifying distributed consistency.

CALM - Consistency and Logical Monotonicity

There is a connection between distributed consistency algorithms and logical monotonicity, that is, our programs must be correct even in the face of the delay and re-ordering of messages and data across different nodes in a system. By designing programs with order independence, we can ensure that the logic of the system is maintained within a distributed environment.

To understand this connection, consider two types of programs: monotonic, and non-monotonic. Monotonic programs are those where the order of inputs to a system do not affect the output. In database terms, this includes selections, projections, and joins. Non-monotonic programs, on the other hand, are those where the order of inputs to a system does affect the output. Examples of this type of operation are groupings or aggregations.

The authors make the observation that there is a tight relationship between consistency and monotonicity. Particularly, monotonic programs are guaranteed to be eventually consistency regardless of how computation or message delivery are interleaved. In contrast, non-monotonic programs requires coordination schemes that must wait until inputs are complete before guaranteeing a correct output. This relationship is termed the CALM principle: Consistency and Logical Monotonicity.

Bloom

Traditional imperative programming assumes a logical order of statements that are executed in sequence. This model is a bad match to parallel and distributed systems, where ordering cannot be guaranteed except through expensive coordination algorithms. The authors present the Bloom language designed to model distributed algorithms that do not rely on order. The language uses unordered sets to represent state, and declarative rules model the computation to occur on these sets. Bloom is a purely declarative language that could be implemented using any underlying evaluation engine to produce correct programs.

Bloom programs are bundles of declarative statements about tuples. Each statement in Bloom is defined with respect to an atomic timestep. In each timestamp, certain “facts” are available as messages from outside systems. Each statement uses available facts and tuples to derive new facts or tuples. Bloom also specifies how facts persist between timesteps.

Each statement in Bloom defines the content of any derived collections of data. They can be viewed as inserting or accumulating results into a collections. No mutable state is allowed so no unknown side-effects are allowed to exist outside of the statements and their facts.

Bloom is similar in principle to Prolog, but geared towards distributed systems. Rather than rehash the syntax and semantics of Bloom, interested readers are encouraged to consult the original research paper. The key takeaways are: reasoning about logical time as part of the language, and using declarative rather than imperative statements.

Analysis

A major contribution of Bloom is the ability to perform analysis on a program to identify points of coordination. By viewing a Bloom program as a dataflow graph, external inputs can be shown as sources, outputs as sinks, and collections of data as nodes within the graph. Computation rules are edges within the graph. By viewing the algorithm as a dataflow graph, it is easy to visualize dependencies on data that require coordination between multiple systems.

Conclusions

Given the continued interest in distributed systems, and the necessity to deploy distributed systems that meet the needs of growing data sets in the Internet age, analysis and verification of distributed algorithms provides an enormous research potential. Bloom presents one such opportunity in this area. Personally, I get very excited about these tools and how they can be leveraged to design and develop correct distributed systems.

Like this post? Subscribe via RSS or email to never miss an update.