When designing Workiva’s durable messaging system we took a hard look at using Amazon’s Kinesis as the message storage and delivery mechanism. At first glance, Kinesis has a feature set that looks like it can solve any problem: it can store terabytes of data, it can replay old messages, and it can support multiple message consumers. But if you dig a little deeper you will find that Kinesis is well suited for a very particular use case, and if your application doesn’t fit this use case, Kinesis may be a lot more trouble than it’s worth.
In this article, I compare Kinesis with Amazon’s Simple Queue Service (SQS), showing the benefits and drawbacks of each system, and highlighting the difference between data streams and queueing. This article should make clear why we built our durable messaging system using SQS, and why your application might benefit from SQS too.
Data Streams - The Kinesis Sweet Spot
Kinesis’ primary use case is collecting, storing and processing real-time continuous data streams. Data streams are data that are generated continuously by thousands of data sources, which typically send in the data records simultaneously, and in small sizes (order of Kilobytes).
Typical data streams include log files, e-commerce analytics, in-game player activity, information from social networks, financial trading floors, or geospatial services, and telemetry from connected devices or instrumentation in data centers. At Workiva, we use Kinesis to handle the collection and processing of telemetry, logging, and analytics data streams — a great choice for this type of pplicationproblem.
Kinesis is designed for large scale data ingestion and processing, with the ability to maximize write throughput for large volumes of data.
Kinesis Use Cases
- Log and Event Data Collection
- Real-time Analytics
- Mobile Data Capture
- “Internet of Things” Data Feed
Benefits of Kinesis
If you need the absolute maximum throughput for data ingestion or processing, Kinesis is the choice. The delay between writing a data record and being able to read it from the Stream is often less than one second, regardless of how much data you need to write.
If you need to handle terabytes of a data per day in a single Stream, Kinesis can do that for you. You can push data from many data producers, as it is generated, into a reliable, highly scalable service. This data can be then stored for later processing or read out in real-time.
Drawbacks of Kinesis
Although it is easy to get started with Kinesis, it does present an operational burden when you need to manage shards for the data. When you create a new stream, you specify the number of shards it contains — each shard serves as a grouping of data records. Since reads and writes are applied to shards, the number of shards in a stream determines the maximum throughput you can achieve over the entire stream. At this point in time, Kinesis does not support auto-scaling, so it is up to the application developer to track shard usage and re-shard the Kinesis stream when necessary.
Limited Read Throughput
Kinesis has a limit of 5 reads per second from a shard, with a maximum of read output of 2MB/sec. So, if we wanted to fan-out a message to five consumers that would need to read the same data and process from a shard, we would have already reached the Kinesis fan-out limit, requiring us to manually re-shard the data stream to allow for more consumers.
Update: 27 April 2020
In August 2018, Kinesis introduced a feature called enhanced fan-out allowing each shard consumer to receive their own 2MB/second pipe of read throughput per shard.
Complicated Producer and Consumer Libraries
For maximum performance, Kinesis requires deploying producer and consumer libraries alongside your application. As a producer, you deploy a C++ binary with a Java interface for reading and writing data records to a Kinesis stream. As a consumer, you deploy a Java application that can communicate with other programming languages through an interface built on top of the shell’s standard in and standard out. In either of these cases, adding new producers or consumers to a Kinesis stream presents some investment in development and maintenance.
Kinesis allows each consumer to read from the stream independently. This requires each consumer to mark their own position in the stream, and to track how far in the stream they have read. To scale out to multiple consumers running the same workload requires that each of the consumers coordinate on the set of records being read from Kinesis. The Kinesis Consumer Library accomplishes this by storing consumer metadata in a DynamoDB table. This required overhead helps to scale out the number of consumers of a stream, but requires additional logic and resources to deploy.
Queue’s — The Bread and Butter of Messaging
A message queue makes it easy to decouple and scale microservices, distributed systems, and serverless applications. Using a queue, you can send, store, and receive messages between software components at any volume, without losing messages or requiring other services to be always available. Building applications from individual components that each perform a discrete function improves scalability and reliability, and is best practice design for modern applications. A reliable queue placed between components allows you to leverage many integration patterns for connecting services.
If you are looking for a message queue system, Amazon’s SQS fits that role. SQS delivers reliable and scalable message queues without the overhead of managing message-oriented middleware. It provides the necessary plumbing to reliably connect services in a service-oriented or microservice architecture.
SQS Use Cases
- Application integration
- Decoupling microservices
- Allocate tasks to multiple worker nodes
- Decouple live user requests from intensive background work
- Batch messages for future processing
Ease of Use
SQS is dead-simple to use. Simply create a queue, and send messages to it. In contrast to Kinesis, you do not need any special libraries to read from or write to an SQS queue. You also do not need to coordinate among consumers, or manage scaling out. SQS is reliable, supports encryption, and is extremely scalable.
SQS easily scales to handle a large volume of messages, without user intervention. It allows you to dynamically increase read throughput by scaling the number of tasks reading from a queue. This requires no pre-provisioning or scale-out of AWS resources. SQS buffers requests to transparently handle spikes in load.
With SQS, once a consumer has processed a message from the queue, that message is removed and no other consumer can read that message. This means that SQS does not support multiple consumer applications reading the same set of messages from the same queue. To provide such functionality, you would need to write messages to multiple queues, using SNS or another broadcast mechanism to replicate your message to multiple queues.
Since messages are removed after they are processed, SQS does not support replaying messages that have already been published. If you need to support message replay, you will need to write messages to an alternate store as they are published, and have a mechanism to allow interested consumers to replay that history.
There are a wealth of tools available from cloud providers with which you can build your application, and half of the job in designing software to leverage the cloud is researching the tools at your disposal, and understanding how they can be deployed. This article compares SQS and Kinesis, too seemingly similar technologies with vastly different use cases.
If you are considering adopting Kinesis to solve your problem, consider whether or not you are acting on a continuous data stream of very large size. If you are, Kinesis is the right choice. If not, consider SQS.