Title and Author of Paper
The CQL continuous query language: semantic foundations and query execution. Arasu et al.
Summary
CQL is a derivation of the SQL query language developed for running continuous queries over streams of data. The goals of the system are to provide a precise set of language semantics for running such continuous stream workloads. The paper starts by defining precise abstract semantics for continuous queries that cover two data types — streams and relations — and three classes of operators: ones that produce a relation from a stream, one that produces a relation from other relations, and one that produces a stream from a relation. These semantics are defined independent of the underlying implementation. The second portion of this paper defines how CQL instantiates these abstract semantics using existing SQL specifications and some new CQL additions.
The paper works through a running example based on a road traffic management application to illustrate various aspects of the query language and its implementation. The traffic management application uses variable tolling that adapts the cost of using a road to real-time traffic conditions. It is assumed that the application has knowledge of each vehicle’s position and speed. The application then computes tolls in real time and transmits those tolls back to interested parties. In the terminology of streams, this application has a single input stream — the positions and speeds of vehicles, a single continuous query computing toll charges for vehicles, and a single output stream returning the tolls to interested parties.
Streams and Relations
Streams and relations are presented using a formal model.
- Stream
- A stream
S
is a (possibly infinite) bag of elements(s, t)
, wheres
is a tuple belonging to the schema ofS
andt
is the timestamp of the element.
In our example application, the input measuring vehicle position and speed
is a Stream with the sample schema PosSpeedStr(vehicleId, speed, xPos)
.
- Relation
- A relation
R
is a mapping from each time instant to a finite bag of tuples belonging to the scehma ofR
.
This definition differs from the standard definition: in the standard relational model a relation is a set of tuples, with no notion of time.
In our example application, relations are derived from continuous query
operators. One such example relation is the cost of a toll for a congested
segment of road, which is based on the number of vehicles using the road.
This can be represented using a relation based on the current number of
vehicles using the segment at a particular point in time:
SegVolRel(segNo, numVehicles)
.
Abstract Semantics
The abstract semantics define how to write continuous queries and their expected results. There are three different types of operators defined on streams:
- A
stream-to-relation
operator takes a streamS
as input and produces a relationR
as output with the same schema asS
. - A
relation-to-relation
operator takes one or more relationsR1, ..., RN
as input and produces a relationR
as output - A
relation-to-stream
operator takes a relationR
as input and produces a streamS
as output with the same schema asR
.
Each of these operators can also be defined over a particular instant in
time, defined as an instantaneous relation or stream. There is no
stream-to-stream
operator because the authors try to keep thing framed
in the context of existing relational knowledge.
Informally, we can think of continuous queries as letting time move
forward through the time domain available in the dataset. At a particular
point in time t
, all inputs up to t
are processed and the continuous
query emits any new stream results with this timestamp. Time t
advances
whenever all input up to time t-1
have been processed. This does present
the issue of clock or timestamp skew but for defining query semantics we
assume that no such skew exists.
Continuous Query Language
CQL defines an implementation for the abstract semantics presented. The authors’ goal is to provide the bulk of operators using relational operators, and use new stream and relation operators to convert between streams and relations. The advantage of this approach is to reuse existing relational foundations as much as possible.
Stream-to-relation operators
All stream-to-relation
operators are based on a sliding window of time
over a stream of data. This sliding window can be time-based, tuple-based,
or partitioned, or defined in new ways depending on the use case. The
general idea is to somehow extract a subset of the data based on the
current time period and run a query over that subset of data, thereby
converting a potentially infinite stream to a finite relation.
Relation-to-relation operators
All relation-to-relation
operators are derived from SQL queries by
mapping them to an instant in time. Anywhere a traditional relation is
used in an SQL query, a relation can be used in CQL.
Relation-to-stream operators
There are three relation-to-stream
operators in CQL, operating on
different types of streams: inserts, deletes, and relations. The insert
stream adds a tuple to the stream at a point in time t
, if that tuple
exists in the relation at t
or any point in time earlier, i.e. whenever
it is within the current sliding window. The delete stream adds a tuple to
the stream whenever that tuple is outside of the current sliding window of
time. Lastly, the relation stream adds a tuple to the stream whenever
a tuple is in the relation at exactly the point in time t
.
Together, these operators create a stream from a relation by selecting tuples based on their relation to the current sliding window of time.
Conclusions
By restricting streams to focus on points in time or sliding windows of time., the authors leverage existing work in relational algebra to the concept of infinite sized streams.