TrueTime

TrueTime is Google’s solution to providing to globally consistent timestamps to determine ordering of events. Originally developed to support the Spanner distributed database, TrueTime is a clock implementation that depends on two key factors: well engineered and accurate GPS and atomic clocks and the representation of time as an interval of uncertainty.

Representing TrueTime

TrueTime explicitly represents each timestamp an interval that includes bounded time uncertainty. TrueTime represents time as an interval of the type TTinterval, which includes two timestamps for the beginning and the end of the interval. These timestamps have type TTstamp. With this in mind, the TrueTime API provides the following methods reproduced from Spanner: Google’s Globally-Distributed Database:

Method Returns
\( TT.now() \) \( TTinterval: [earliest, latest] \)
\( TT.after(t) \) true if \( t \) had definitely passed
\( TT.before(t) \) true if \( t \) has definitely not arrived

A call to \( TT.now() \) returns a \( TTinterval \) that is guaranteed to contain the absolute time when the \( TT.now() \) method was executed. TrueTime guarantees that for an invocation of \( tt = TT.now() \) that \( tt.earliest <= t_{abs}(e) \ <= tt.latest \), where \(t_{abs}(e)\) is the absolute time when event \( e \) happened. The interval bounds \(earliest\) and \(latest\) are within a bounded uncertainty \( \epsilon \).

assets/truetime-api.png

The TrueTime API

Reference Time Sources

The time references used by TrueTime are GPS and atomic clocks deployed at all Google data centres. Both time references because each have their own failure modes that are independent (save for extreme events impact both at once). For instance, GPS time receivers may be impacted by radio interference where an atomic clocks would not be impacted by the same condition.

Every datacenter across geography has one or more time server or time master. There are two kinds of time master:

  • GPS Time Master: These nodes are equipped with GPS receivers which receive GPS signals include time information directly from satellites.
  • Armageddon Master: These nodes are equipped with local Atomic clocks. Atomic clocks are used as a supplement to GPS time masters in case satellite connections become unavailable.
assets/reference-time.jpeg

A time master in a Google data centre

TrueTime Architecture and Implementation

TrueTime is implemented by a set of time masters that are connected to a reference time source and timeslave daemons that are deployed per machine in the data centre. All master’s time references are regularly compared to each other using an algorithm (similar to NTP). In addition, each master also checks the integrity of their local clock and will voluntarily remove itself from the set of master time servers if the checks fail.

To reduce the possibility of error from any time master, the client daemons poll time information from GPS and Atomic time masters belonging to nearby as well as farther datacenters, as in the following figure.

assets/client-polling.png

Clients polling nearby band distant sources

Each daemon regularly polls a variety of masters, and uses a statistical algorithm to reject outliers and promote the best candidate time from the set (as in NTP). Between each poll to retrieve an accurate time, the daemon advertises slowly increasing uncertainty for their local time derived from a works-case local clock drift. Because Google controls the network environment for TrueTime masters and daemons, in practice the uncertainty interval varies between 1ms and 7ms

Daemons apply a variant of Marzullo’s algorithm to detect and reject liars, and synchronize the local machine clocks to the non liars.

Spanner: Google’s Globally-Distributed Database

Leveraging the Uncertainty Interval

So far, the TrueTime system is reminiscent of NTP, with the added advantage of atomic and GPS clocks at each master, and extremely low-latency data centre to data centre traffic producing time uncertainty intervals of between 1 and 7 milliseconds.

To take what we have covered so far and make TrueTime truly useful, we need to be able to take advantage of the uncertainty interval. First, consider the following case, where System A and System B both receive data that includes a transaction with the value 5. If both systems commit the transaction inside their own uncertainty intervals, it can appear like value 6 in System B happens before value 5 in System A.

assets/uncertainty-error.png

Not respecting the uncertainty interval can result in inconsistent snapshots

The strategy to leverage the uncertainty interval is actually very simple: simply wait for the uncertainty time period to complete before committing any transactions. Since all the transactions wait, and because the uncertainty interval has been reduced to be very small, this level of lag is acceptable in return for a strongly consistent viewpoint of time.

Summary

TrueTime does not necessarily represent a theoretical paradigm shift in representing time, instead relying on algorithms similar to NTP. However, the engineering effort to deploy and maintain time masters and their associated network infrastructure allows the NTP-like algorithm to be extended by providing guaranteed error bounds on timestamps. These guarantees mean that in Google networks, engineers can use TrueTime across its services to properly order events and transactions using a vastly simplified programming model.

Resources

See also