Title and Author of Paper
CQL is a derivation of the SQL query language developed for running continuous queries over streams of data. The goals of the system are to provide a precise set of language semantics for running such continuous stream workloads. The paper starts by defining precise abstract semantics for continuous queries that cover two data types — streams and relations — and three classes of operators: ones that produce a relation from a stream, one that produces a relation from other relations, and one that produces a stream from a relation. These semantics are defined independent of the underlying implementation. The second portion of this paper defines how CQL instantiates these abstract semantics using existing SQL specifications and some new CQL additions.
The paper works through a running example based on a road traffic management application to illustrate various aspects of the query language and its implementation. The traffic management application uses variable tolling that adapts the cost of using a road to real-time traffic conditions. It is assumed that the application has knowledge of each vehicle’s position and speed. The application then computes tolls in real time and transmits those tolls back to interested parties. In the terminology of streams, this application has a single input stream — the positions and speeds of vehicles, a single continuous query computing toll charges for vehicles, and a single output stream returning the tolls to interested parties.
Streams and Relations
Streams and relations are presented using a formal model.
- A stream
Sis a (possibly infinite) bag of elements
(s, t), where
sis a tuple belonging to the schema of
tis the timestamp of the element.
In our example application, the input measuring vehicle position and speed
is a Stream with the sample schema
PosSpeedStr(vehicleId, speed, xPos).
- A relation
Ris a mapping from each time instant to a finite bag of tuples belonging to the scehma of
This definition differs from the standard definition: in the standard relational model a relation is a set of tuples, with no notion of time.
In our example application, relations are derived from continuous query
operators. One such example relation is the cost of a toll for a congested
segment of road, which is based on the number of vehicles using the road.
This can be represented using a relation based on the current number of
vehicles using the segment at a particular point in time:
The abstract semantics define how to write continuous queries and their expected results. There are three different types of operators defined on streams:
stream-to-relationoperator takes a stream
Sas input and produces a relation
Ras output with the same schema as
relation-to-relationoperator takes one or more relations
R1, ..., RNas input and produces a relation
relation-to-streamoperator takes a relation
Ras input and produces a stream
Sas output with the same schema as
Each of these operators can also be defined over a particular instant in
time, defined as an instantaneous relation or stream. There is no
stream-to-stream operator because the authors try to keep thing framed
in the context of existing relational knowledge.
Informally, we can think of continuous queries as letting time move
forward through the time domain available in the dataset. At a particular
point in time
t, all inputs up to
t are processed and the continuous
query emits any new stream results with this timestamp. Time
whenever all input up to time
t-1 have been processed. This does present
the issue of clock or timestamp skew but for defining query semantics we
assume that no such skew exists.
Continuous Query Language
CQL defines an implementation for the abstract semantics presented. The authors’ goal is to provide the bulk of operators using relational operators, and use new stream and relation operators to convert between streams and relations. The advantage of this approach is to reuse existing relational foundations as much as possible.
stream-to-relation operators are based on a sliding window of time
over a stream of data. This sliding window can be time-based, tuple-based,
or partitioned, or defined in new ways depending on the use case. The
general idea is to somehow extract a subset of the data based on the
current time period and run a query over that subset of data, thereby
converting a potentially infinite stream to a finite relation.
relation-to-relation operators are derived from SQL queries by
mapping them to an instant in time. Anywhere a traditional relation is
used in an SQL query, a relation can be used in CQL.
There are three
relation-to-stream operators in CQL, operating on
different types of streams: inserts, deletes, and relations. The insert
stream adds a tuple to the stream at a point in time
t, if that tuple
exists in the relation at
t or any point in time earlier, i.e. whenever
it is within the current sliding window. The delete stream adds a tuple to
the stream whenever that tuple is outside of the current sliding window of
time. Lastly, the relation stream adds a tuple to the stream whenever
a tuple is in the relation at exactly the point in time
Together, these operators create a stream from a relation by selecting tuples based on their relation to the current sliding window of time.
By restricting streams to focus on points in time or sliding windows of time., the authors leverage existing work in relational algebra to the concept of infinite sized streams.