Amazon Kinesis has a feature set that makes it tempting to use for a variety of applications. However, it is really designed for a particular set of use cases, and those use cases must be carefully considered before adopting Kinesis. In this article, we compare Kinesis with Amazon’s Simple Queue Service (SQS), showing the benefits and drawbacks of each system, and comparing their ideal uses.
Kinesis - Streaming Data
Kinesis’ primary use case is collecting, storing and processing real-time continuous data streams. Kinesis is designed for large scale data ingestion and processing, with the ability to maximize throughput for large volumes of data. Because of this emphasis on large volumes of data, using Kinesis can provide some operational challenges.
Benefits of Kinesis
If you need the absolute maximum throughput for data ingestion or processing, Kinesis is the choice. The delay between writing a data record and being able to read it from the Stream is often less than one second, regardless of how much data you need to write.
Complicated Producer and Consumer Libraries
For maximum performance, Kinesis requires deploying producer and consumer libraries alongside your application. As a producer, you deploy a C++ binary with a Java interface for reading and writing data records to a Kinesis stream. As a consumer, you deploy a Java application that can communicate with other programming languages through an interface built on top of standard in and standard out. In either of these cases, adding new producers or consumers to a Kinesis stream presents some investment in development and maintenance.
Kinesis allows each consumer to read from the stream independently. This requires each consumer to mark their own position in the stream, and to track how far in the stream they have read. To scale out to multiple consumers running the same workload requires that each of the consumers coordinate on the set of records being read from Kinesis. The Kinesis Consumer Library accomplishes this by storing consumer metadata in a DynamoDB table. This required overhead helps to scale out the number of consumers of a stream, but requires additional logic and resources to deploy.
Although it is easy to get started with Kinesis, it does present an operational burden when you need to manage shards for the data. When you create a new stream, you specify the number of shards it contains — each shard serves as a grouping of data records. Since reads and writes are applied to shards, the number of shards in a stream determines the maximum throughput you can achieve over the entire stream. At this point in time, Kinesis does not support auto-scaling, so it is up to the application developer to track shard usage and re-shard the Kinesis stream when necessary.
Kinesis is designed for large-scale data processing over streaming data. This includes data that is generated continuously by thousands of data sources such as log files, telemetry, or click-stream activity.
SQS - Durable Messaging
SQS is a queuing service for reliably communicating among distributed software components and microservices. It provides a scalable messaging middleware for publish-subscribe use cases. SQS is useful for decoupling and coordinating distributed applications.
Ease of Use
SQS is dead-simple to use. Simply create a queue, and send messages to it. In contrast to Kinesis, you do not need any special libraries to read from or write to an SQS queue. You also do not need to coordinate among consumers, or manage scaling out.
SQS easily scales to handle a large volume of messages, without user intervention.
With SQS, once a consumer has processed a message from the queue, that message is removed and no other consumer can read that message. This means that SQS does not support multiple consumer applications reading the same set of messages from the same queue. To provide such functionality, you would need to write messages to multiple queues, potentially using SNS as a broadcast mechanism to replicate a message to multiple queues.
Since messages are removed after they are processed, SQS does not support replaying messages that have already been published. If you need to support message replay, you will need to write messages to an alternate store as they are published, and have a mechanism to allow interested consumers to replay that history.
SQS is designed for communication between distributed services. SQS is well suited for moving data between distributed applications, and can serve as the central messaging system in a distributed architecture.
There are a wealth of tools available from cloud providers with which you can build your application, and half of the job in designing software to leverage the cloud is researching the tools at your disposal, and understanding how they can be deployed. This article compares SQS and Kinesis, too seemingly similar technologies with vastly different use cases.