Paper Review: The CQL continuous query language: semantic foundations and query execution

Title and Author of Paper

The CQL continuous query language: semantic foundations and query execution. Arasu et al.

Summary

CQL is a derivation of the SQL query language developed for running continuous queries over streams of data. The goals of the system are to provide a precise set of language semantics for running such continuous stream workloads. The paper starts by defining precise abstract semantics for continuous queries that cover two data types — streams and relations — and three classes of operators: ones that produce a relation from a stream, one that produces a relation from other relations, and one that produces a stream from a relation. These semantics are defined independent of the underlying implementation. The second portion of this paper defines how CQL instantiates these abstract semantics using existing SQL specifications and some new CQL additions.

The paper works through a running example based on a road traffic management application to illustrate various aspects of the query language and its implementation. The traffic management application uses variable tolling that adapts the cost of using a road to real-time traffic conditions. It is assumed that the application has knowledge of each vehicle’s position and speed. The application then computes tolls in real time and transmits those tolls back to interested parties. In the terminology of streams, this application has a single input stream — the positions and speeds of vehicles, a single continuous query computing toll charges for vehicles, and a single output stream returning the tolls to interested parties.

Streams and Relations

Streams and relations are presented using a formal model.

Stream
A stream S is a (possibly infinite) bag of elements (s, t), where s is a tuple belonging to the schema of S and t is the timestamp of the element.

In our example application, the input measuring vehicle position and speed is a Stream with the sample schema PosSpeedStr(vehicleId, speed, xPos).

Relation
A relation R is a mapping from each time instant to a finite bag of tuples belonging to the scehma of R.

This definition differs from the standard definition: in the standard relational model a relation is a set of tuples, with no notion of time.

In our example application, relations are derived from continuous query operators. One such example relation is the cost of a toll for a congested segment of road, which is based on the number of vehicles using the road. This can be represented using a relation based on the current number of vehicles using the segment at a particular point in time: SegVolRel(segNo, numVehicles).

Abstract Semantics

The abstract semantics define how to write continuous queries and their expected results. There are three different types of operators defined on streams:

  1. A stream-to-relation operator takes a stream S as input and produces a relation R as output with the same schema as S.
  2. A relation-to-relation operator takes one or more relations R1, ..., RN as input and produces a relation R as output
  3. A relation-to-stream operator takes a relation R as input and produces a stream S as output with the same schema as R.

Each of these operators can also be defined over a particular instant in time, defined as an instantaneous relation or stream. There is no stream-to-stream operator because the authors try to keep thing framed in the context of existing relational knowledge.

Informally, we can think of continuous queries as letting time move forward through the time domain available in the dataset. At a particular point in time t, all inputs up to t are processed and the continuous query emits any new stream results with this timestamp. Time t advances whenever all input up to time t-1 have been processed. This does present the issue of clock or timestamp skew but for defining query semantics we assume that no such skew exists.

Continuous Query Language

CQL defines an implementation for the abstract semantics presented. The authors’ goal is to provide the bulk of operators using relational operators, and use new stream and relation operators to convert between streams and relations. The advantage of this approach is to reuse existing relational foundations as much as possible.

Stream-to-relation operators

All stream-to-relation operators are based on a sliding window of time over a stream of data. This sliding window can be time-based, tuple-based, or partitioned, or defined in new ways depending on the use case. The general idea is to somehow extract a subset of the data based on the current time period and run a query over that subset of data, thereby converting a potentially infinite stream to a finite relation.

Relation-to-relation operators

All relation-to-relation operators are derived from SQL queries by mapping them to an instant in time. Anywhere a traditional relation is used in an SQL query, a relation can be used in CQL.

Relation-to-stream operators

There are three relation-to-stream operators in CQL, operating on different types of streams: inserts, deletes, and relations. The insert stream adds a tuple to the stream at a point in time t, if that tuple exists in the relation at t or any point in time earlier, i.e. whenever it is within the current sliding window. The delete stream adds a tuple to the stream whenever that tuple is outside of the current sliding window of time. Lastly, the relation stream adds a tuple to the stream whenever a tuple is in the relation at exactly the point in time t.

Together, these operators create a stream from a relation by selecting tuples based on their relation to the current sliding window of time.

Conclusions

By restricting streams to focus on points in time or sliding windows of time., the authors leverage existing work in relational algebra to the concept of infinite sized streams.

Like this post? Subscribe via RSS or email to never miss an update.