Ordered Messaging in an Unordered World

At Workiva, we’ve been using NATS to provide a highly-available, scalable message delivery service. The tradeoff for having these properties is that there are restrictions on the order in which messages are received by subscribers. NATS delivers messages from a single publisher in the order in which they were published. However, once you introduce multiple publishers, no such guarantees exist. Furthermore, even if messages are successfully delivered, they may not be successfully processed — if a subscriber fails to process a message correctly when it was sent, it may become out-of-order after retrying. In general, this means that message ordering is not guaranteed unless your service runs under very restrictive conditions.

This article includes an overview of what message ordering really means, and the tradeoffs required to achieve message ordering. It also provides some techniques for inducing order in an unordered messaging system.

What Is Order?

On the surface, the idea of messages being in order is a simple one. The following image shows a 10,000 foot view of message flow through an abstract messaging service:

Message Flow

In this figure, a publisher sends a message on a topic through a messaging system. The message is then delivered to one or more subscribers via a subscription. If we have a single synchronous publisher, a single synchronous subscriber, and a single synchronous message delivery server — all running on a synchronous transport layer — the notion of order is simple: messages are ordered by when the publisher successfully publishes them, by some measure of time or sequence.

However, even in this simple case, guaranteed ordering of messages would put severe constraints on throughput. Why? At any point between the messaging system receiving a message and a subscriber consuming it, something may fail. The only way to truly guarantee order of messages would be for the messaging system to deliver the messages one at a time to the subscriber, waiting to deliver the next message until the messaging system knows the subscriber has received and processed the current message via an acknowledgement.

The messaging system could instead only guarantee that the first delivery of any message is in-order, allowing redelivery attempts to happen at any time and allowing multiple messages to be sent to the subscriber at once. However, even if ordering constraints are relaxed in this way, “order” makes less sense as you introduce multiple publishers, multiple message delivery servers, and multiple subscribers.

Multiple Subscribers

Defining message order can be complicated, depending on your publishers and subscribers. For example, it is possible you have multiple subscribers processing messages on a single subscription:

Multiple Subscribers

In this case, even if messages are published to a topic in order, there are no guarantees of the order in which the messages will be processed by your subscribers. If order of processing is important, the subscribers would need to coordinate through a transactional storage system.

Multiple Publishers

Similarly, multiple publishers sending messages on the same topic makes ordering difficult:

Multiple Publishers

In this case, how do you assign an order to messages published from different publishers? Either the publishers themselves have to coordinate to assign order, or the messaging system itself has to attach a notion of order to every incoming message. In either case, any timestamp assigned to a message must be acquired from the same source to avoid clock drift, and any sequence number must be obtained from a single data source with ACID transactional guarantees.

Multiple Messaging System Servers

If the abstract message delivery service used in the examples above were a single, synchronous server, then the service itself could guarantee order. However, NATS is not a single server, and if you introduce a network topology between publishers and subscribers, there may be many paths a single message can take to get from a publisher to a subscriber. The benefit of such an architecture is that it is highly available and scalable. The drawback is limitations on message ordering.

Although we’ve chosen NATS as the message delivery service within the Messaging Platform, any distributed messaging middleware comes with the same set of limitations.

How Should I Handle Message Order Then?

Given these constraints, how do I handle message order? Simply put, if you want the highest amount of availability, throughput, and scalability, you must minimize your requirements on message order. This requirement for message order can take several forms, and depending on your use case, several alternative solutions can be used.

1. Order does not matter

Use cases where order does not matter at all are perfect for NATS (and pub/sub in general). For example, if you have a set independent tasks that need to be performed by your subscribers as part of a work group, the subscriber that receives the message performs the action without any need to coordinate on message order. As another example, if you want to compute aggregate statistics, you can publish a message for each event and then collate and update results using your persistent storage system.

Typical Use Cases: Queue of independent tasks, collection of statistics on events

2. Order matters in the final result

Sometimes, the order in which messages are processed does not matter as long as the end result is ordered properly. For example, consider logging messages from a set of distributed services. Log events can come from multiple publishers and can be processed in any order — as long as the end result can be accessed sorted by time. By attaching a timestamp to every event and storing the data in a system that allows sorting by timestamp, you can effectively induce ordering after messages are processed. This scenario assumes that system clocks are synchronized or that variations due to clock skew are acceptable in the end result.

This use case also includes updates to application state that only require access to the most recent state. For example, if you want to track changes to a document, but only care about the most recent update, you could attach a timestamp to each change event and only update your application if new messages are more recent than ones you have already processed.

Typical Use Cases: Logs, state updates

3. Order matters

Complete dependence on the order in which messages are processed is the most complicated case. Any solution that enforces strict ordering of messages is going to come at the expense of performance and throughput. You should only depend on order when it is absolutely necessary, and when you are sure that you won’t need to scale to a large number of messages per second.

There are generally three methods for guaranteeing message processing in order: knowing the entire list of outstanding messages and their order, knowing if there are outstanding messages yet to be processed, or knowing the sequence of all messages. Solutions to each of these methods requires careful coordination between subscribers using a persistent storage system.

Knowing your messages

You can guarantee the order of message processing by knowing the entire list of outstanding messages and the order in which they must be processed. You implement this by assigning each message a unique identifier (such as a hash of the message contents) and storing that in some persistent place.

A subscriber would check the persistent storage to know the next message it must process and ensure that it only processes that message next, waiting to process other messages it has received when they come up in the complete ordering. This implies that competing subscribers to the same topic must wait idle while each message is processed by individual subscribers one-by-one. At that point, it is worth considering using the persistent storage itself as the message queue and not relying on a messaging system at all.

Knowing in-flight messages

A second way to guarantee the order of message processing is by having subscribers track messages they have received and know if there are messages a subscriber hasn’t seen that need to be processed first. In this case, a subscriber could temporarily put all messages received in persistent storage and acknowledge message receipt. The subscriber would then periodically check with the messaging system to see if there are any unacked messages that the subscriber expects. If the subscriber has received all messages in the current time period, those can be processed in order and removed from temporary storage.

Knowing the sequence

Finally, if you restrict your use case to a single, synchronous publisher and a single subscriber, you could use a sequence number to ensure ordering. For each message, the publisher performs the attaches a sequence number to the message. The subscriber then keeps a buffer of unprocessed messages and resequences them as they arrive.

Summary

Having messages delivered in order simplifies the code necessary to process messages where order matters. However, providing messages in order comes at great cost to availability and scalability — no matter what messaging system you use. If availability and scalability are important features, you may need to consider dropping the requirement for ordered message processing. You will then be able to scale easily, with the Messaging Platform scaling to deliver all of your messages quickly and reliably.

Like this post? Subscribe via RSS or email to never miss an update.