TrueTime is Google’s solution to providing to globally consistent timestamps to determine ordering of events. Originally developed to support the Spanner distributed database, TrueTime is a clock implementation that depends on two key factors: well engineered and accurate GPS and atomic clocks and the representation of time as an interval of uncertainty.
Representing TrueTime
TrueTime explicitly represents each timestamp an interval that includes
bounded time uncertainty. TrueTime represents time as an interval of the
type TTinterval
, which includes two timestamps for the beginning and the
end of the interval. These timestamps have type TTstamp
. With this in
mind, the TrueTime API provides the following methods reproduced from
Spanner: Google’s Globally-Distributed
Database:
Method | Returns |
---|---|
\( TT.now() \) | \( TTinterval: [earliest, latest] \) |
\( TT.after(t) \) | true if \( t \) had definitely passed |
\( TT.before(t) \) | true if \( t \) has definitely not arrived |
A call to \( TT.now() \) returns a \( TTinterval \) that is guaranteed to contain the absolute time when the \( TT.now() \) method was executed. TrueTime guarantees that for an invocation of \( tt = TT.now() \) that \( tt.earliest <= t_{abs}(e) \ <= tt.latest \), where \(t_{abs}(e)\) is the absolute time when event \( e \) happened. The interval bounds \(earliest\) and \(latest\) are within a bounded uncertainty \( \epsilon \).
Reference Time Sources
The time references used by TrueTime are GPS and atomic clocks deployed at all Google data centres. Both time references because each have their own failure modes that are independent (save for extreme events impact both at once). For instance, GPS time receivers may be impacted by radio interference where an atomic clocks would not be impacted by the same condition.
Every datacenter across geography has one or more time server or time master. There are two kinds of time master:
- GPS Time Master: These nodes are equipped with GPS receivers which receive GPS signals include time information directly from satellites.
- Armageddon Master: These nodes are equipped with local Atomic clocks. Atomic clocks are used as a supplement to GPS time masters in case satellite connections become unavailable.
TrueTime Architecture and Implementation
TrueTime is implemented by a set of time masters that are connected to a reference time source and timeslave daemons that are deployed per machine in the data centre. All master’s time references are regularly compared to each other using an algorithm (similar to NTP). In addition, each master also checks the integrity of their local clock and will voluntarily remove itself from the set of master time servers if the checks fail.
To reduce the possibility of error from any time master, the client daemons poll time information from GPS and Atomic time masters belonging to nearby as well as farther datacenters, as in the following figure.
Each daemon regularly polls a variety of masters, and uses a statistical
algorithm to reject outliers and promote the best candidate time from the
set (as in NTP). Between each poll to retrieve an accurate time, the
daemon advertises slowly increasing uncertainty for their local time
derived from a works-case local clock drift. Because Google controls the
network environment for TrueTime masters and daemons, in practice the
uncertainty interval varies between 1ms
and 7ms
Daemons apply a variant of Marzullo’s algorithm to detect and reject liars, and synchronize the local machine clocks to the non liars.
Leveraging the Uncertainty Interval
So far, the TrueTime system is reminiscent of NTP, with the added advantage of atomic and GPS clocks at each master, and extremely low-latency data centre to data centre traffic producing time uncertainty intervals of between 1 and 7 milliseconds.
To take what we have covered so far and make TrueTime truly useful, we
need to be able to take advantage of the uncertainty interval. First,
consider the following case, where System A and System B both receive data
that includes a transaction with the value 5
. If both systems commit the
transaction inside their own uncertainty intervals, it can appear like
value 6
in System B happens before value 5
in System A.
The strategy to leverage the uncertainty interval is actually very simple: simply wait for the uncertainty time period to complete before committing any transactions. Since all the transactions wait, and because the uncertainty interval has been reduced to be very small, this level of lag is acceptable in return for a strongly consistent viewpoint of time.
Summary
TrueTime does not necessarily represent a theoretical paradigm shift in representing time, instead relying on algorithms similar to NTP. However, the engineering effort to deploy and maintain time masters and their associated network infrastructure allows the NTP-like algorithm to be extended by providing guaranteed error bounds on timestamps. These guarantees mean that in Google networks, engineers can use TrueTime across its services to properly order events and transactions using a vastly simplified programming model.