## Title and Author of Paper

The CQL continuous query language: semantic foundations and query execution. Arasu et al.

## Summary

CQL is a derivation of the SQL query language developed for running continuous queries over streams of data. The goals of the system are to provide a precise set of language semantics for running such continuous stream workloads. The paper starts by defining precise abstract semantics for continuous queries that cover two data types — streams and relations — and three classes of operators: ones that produce a relation from a stream, one that produces a relation from other relations, and one that produces a stream from a relation. These semantics are defined independent of the underlying implementation. The second portion of this paper defines how CQL instantiates these abstract semantics using existing SQL specifications and some new CQL additions.

The paper works through a running example based on a road traffic
management application to illustrate various aspects of the query language
and its implementation. The traffic management application uses *variable
tolling* that adapts the cost of using a road to real-time traffic
conditions. It is assumed that the application has knowledge of each
vehicle’s position and speed. The application then computes tolls in real
time and transmits those tolls back to interested parties. In the
terminology of streams, this application has a single input stream — the
positions and speeds of vehicles, a single continuous query computing toll
charges for vehicles, and a single output stream returning the tolls to
interested parties.

### Streams and Relations

Streams and relations are presented using a formal model.

- Stream
- A stream
`S`

is a (possibly infinite) bag of elements`(s, t)`

, where`s`

is a tuple belonging to the schema of`S`

and`t`

is the timestamp of the element.

In our example application, the input measuring vehicle position and speed
is a Stream with the sample schema `PosSpeedStr(vehicleId, speed, xPos)`

.

- Relation
- A relation
`R`

is a mapping from each time instant to a finite bag of tuples belonging to the scehma of`R`

.

This definition differs from the standard definition: in the standard relational model a relation is a set of tuples, with no notion of time.

In our example application, relations are derived from continuous query
operators. One such example relation is the cost of a toll for a congested
segment of road, which is based on the number of vehicles using the road.
This can be represented using a relation based on the current number of
vehicles using the segment at a particular point in time:
`SegVolRel(segNo, numVehicles)`

.

### Abstract Semantics

The abstract semantics define how to write continuous queries and their expected results. There are three different types of operators defined on streams:

- A
`stream-to-relation`

operator takes a stream`S`

as input and produces a relation`R`

as output with the same schema as`S`

. - A
`relation-to-relation`

operator takes one or more relations`R1, ..., RN`

as input and produces a relation`R`

as output - A
`relation-to-stream`

operator takes a relation`R`

as input and produces a stream`S`

as output with the same schema as`R`

.

Each of these operators can also be defined over a particular instant in
time, defined as an *instantaneous* relation or stream. There is no
`stream-to-stream`

operator because the authors try to keep thing framed
in the context of existing relational knowledge.

Informally, we can think of continuous queries as letting time move
forward through the time domain available in the dataset. At a particular
point in time `t`

, all inputs up to `t`

are processed and the continuous
query emits any new stream results with this timestamp. Time `t`

advances
whenever all input up to time `t-1`

have been processed. This does present
the issue of clock or timestamp skew but for defining query semantics we
assume that no such skew exists.

### Continuous Query Language

CQL defines an implementation for the abstract semantics presented. The authors’ goal is to provide the bulk of operators using relational operators, and use new stream and relation operators to convert between streams and relations. The advantage of this approach is to reuse existing relational foundations as much as possible.

#### Stream-to-relation operators

All `stream-to-relation`

operators are based on a sliding window of time
over a stream of data. This sliding window can be time-based, tuple-based,
or partitioned, or defined in new ways depending on the use case. The
general idea is to somehow extract a subset of the data based on the
current time period and run a query over that subset of data, thereby
converting a potentially infinite stream to a finite relation.

#### Relation-to-relation operators

All `relation-to-relation`

operators are derived from SQL queries by
mapping them to an instant in time. Anywhere a traditional relation is
used in an SQL query, a relation can be used in CQL.

#### Relation-to-stream operators

There are three `relation-to-stream`

operators in CQL, operating on
different types of streams: inserts, deletes, and relations. The insert
stream adds a tuple to the stream at a point in time `t`

, if that tuple
exists in the relation at `t`

or any point in time earlier, i.e. whenever
it is within the current sliding window. The delete stream adds a tuple to
the stream whenever that tuple is outside of the current sliding window of
time. Lastly, the relation stream adds a tuple to the stream whenever
a tuple is in the relation at exactly the point in time `t`

.

Together, these operators create a stream from a relation by selecting tuples based on their relation to the current sliding window of time.

### Conclusions

By restricting streams to focus on points in time or sliding windows of time., the authors leverage existing work in relational algebra to the concept of infinite sized streams.