Photo by Manny Becerra on Unsplash

Guaranteed event ordering when using Amazon EventBridge as your Enterprise Service Bus

Example use case of populating ordered event-driven domain read stores using the ‘flexible domain events’ pattern, with code examples in the AWS CDK and TypeScript

Serverless Advocate
12 min readSep 27, 2022

--

Contents

  1. Introduction
  2. (Recap) What is a Domain and what are Serverless Architecture Layers
  3. What is a local domain read store and why do we need them?
  4. What is the problem statement?
  5. What is SNS FIFO Pub Sub Fan Out to SQS FIFO queues?
  6. What are we building?
  7. Let’s break this down!
  8. Why not force the ordering problem onto consumers?
  9. Great DevX with custom CDK constructs!

Introduction

We are going to cover why we need to use the flexible domain events pattern when using Amazon EventBridge as our enterprise service bus (ESB) within our organisation, guaranteeing ordering and deduplication of messages for consuming domains ‘where required’.

In our example, we will use it to populate other domain read stores based on events; but in reality this could be for any domain business logic which needs guaranteed ordering of deduplicated events:

Example of what we are building and discussing throughout the article

“We are going to cover why we need to use the flexible domain events pattern when using Amazon EventBridge as our enterprise service bus (ESB) within our organisation, guaranteeing ordering and deduplication of messages for consuming domains where required

What is a Domain and what are Serverless Architecture Layers?

Prior to reading this article it would be great if you had an understanding of the Serverless Architecture Layers and Serverless Domain Driven Design:

What is a local domain read store and why do we need them? 💭

So let’s look at domain read stores and the need for them since we will cover this specific use case in our article; and now that we know about Serverless Architecture Layers and DDD.

What is a domain read store?

When we build our domain services we want them to be loosely coupled, with the majority of the communications between them asynchronous.

💡 Note: This however in reality is not always the case, and there will always be reasons to synchronously call back into a domain via its well versioned REST API. We should always look to be event driven first in our approach where possible.

To help further keep the domains decoupled we can look at how read stores help below:

Example showing why domains may need read stores of other domains based on their domain events

Why do we need them?

As you can see from the diagram above, rather than calling back to a domain synchronously every time we need some data that is pertinent to their domain (example one), we can build up our own read store which is eventually consistent in nature, but allows us to use our local read store copy instead of calling back out synchronously (example two).

This shows that if there are issues with another domain we should be unaffected as they are loosely coupled. It also means that we put less stress on the other domain service when not calling it synchronously at high load, as well as typically reducing service costs.

Eventual consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value.

What is the problem statement? 🚩

So let’s look at the problem statement at hand around read stores with the Serverless Architecture Layers pattern with Amazon EventBridge.

Our requirements:

✔️ Consumers are generally happy with EventBridge domain events which are not ordered (this will facilitate 90% of what they require, for example ‘OrderCreated’, ‘PaymentSuccess’ etc).

✔️ We want to use Amazon EventBridge for our cross-account EDA strategy due to the cost, performance, event schema registry, reliability, audit and replaying of events, direct integration with the majority of services, Global endpoints, API Destinations, decoupled publishers and subscribers and more… (see here https://aws.amazon.com/eventbridge/features)

✔️ Some consumers additionally may have a requirement of receiving deduplicated, exactly once delivery, and guaranteed ordering of domain events for certain business logic, or for building their own read stores (based on events in other domains). This could be incrementing or decrementing stock values for example.

✔️ We need to ensure that domain events are always published onto the main central shared event bus in a reliable manner.

✔️ Some consumers may have a requirement for both ordered and unordered domain events, so we need to be flexible when using Amazon EventBridge.

What problems do we have to work through?

❌ Amazon EventBridge does not guarantee ordering of events for targets. There are no services on AWS which actually guarantee this fully (other than SNS FIFO and SQS FIFO). An example of this is Amazon MSK and Amazon Kinesis only guarantee ordering per shard/partition.

❌ It is way to complex (if not almost impossible to get 100% reliable and hugely error prone) to force the ordering and processing of the events solely onto the consumers (other domains) when using Amazon EventBridge. More on this later.

As we talk through this, you can find the basic code examples here:

https://github.com/leegilmorecode/serverless-event-driven-read-stores

⚠️ Note: This code is not production ready as I have tried to keep all of the relevant code together in three stacks to make it easier to navigate and walk through, as well as verbose comments for the readers.

What is SNS FIFO Pub Sub Fan Out to SQS FIFO queues?

So we have covered so far the problem statement, including the fact that EventBridge does not guarantee the ordering of events, so our alternative is SNS FIFO and SQS FIFO working in tandem as a fan out pub/sub pattern.

An Amazon SNS FIFO topic delivers messages to subscribed Amazon SQS FIFO queues in the exact order that the messages are published to the topic. With an SQS FIFO queue, the queues’ consumers receive the messages in the exact order that the messages are sent to the queue. This setup preserves end-to-end message ordering.

Diagram showing how the fan out works with preserved ordering

You can read more about this setup in detail using the following link below:

What are we building? 🔧

The high level design (HLD) shows how this works below using the flexible domain events pattern, which allows us to be flexible for our consumers with how they want to consume our domain events:

This high level diagram shows how we use SNS FIFO and SQS FIFO (cross account) when Domain Service consumers need ordered events which are deduplicated

Our example is going to be specifically with our:

🛍️ Orders Domains — this deals with all orders related business logic, and is our event ‘producing’ domain.

📦 Stock Domain — this deals with all stock related business logic. This is a ‘consumer’.

📟 Communications (Comms) Domain — this is a centralised domain for the sending of SMS and emails. This is a ‘consumer’.

We can see that this pattern works by using:

✔️ SNS FIFO and SQS Fan Out Pattern (cross account) only when consumers need specific deduplicated ordered events. A great example of this is building a local read store from other domains events, where the ordering of updates, deletes and inserts matter.

✔️ An internal domain SQS FIFO queue which also raises the Domain Events onto the shared enterprise event bus (ESB) for domain consumers that don’t need specific ordering.

💡 Note: Typically unordered events from Amazon EventBridge is great for most Domain consumer use cases, and some domains may require both (ordered and unordered); so this pattern is very flexible (especially as we look to wrap this in a custom CDK construct).

Now lets look into the Low Level Design (LLD) which goes into specific AWS Serverless details:

The blue highlights show areas where we may want to create custom constructs

Let’s quickly run through each of the points:

✔️ (1) Users can create, update and delete orders through Amazon API Gateway

✔️ (2) The orders are stored in Amazon DynamoDB, and we use streams and a Lambda function to write the changes to an Emitter SNS FIFO Topic using a Change Data Capture pattern (CDC).

✔️ (3) The SNS FIFO Topic has cross account SQS FIFO queue subscribers in different domain accounts, with these messages deduplicated, ordered, and guaranteed once delivery. (As shown in the diagram this could be a custom L3 CDK construct which abstracts all of this complexity away — more later)

💡 Note: There is also an internal SQS FIFO queue subscriber for raising the domain events to the organisation shared event bus.

✔️ (4) There is a target rule on the shared event bus which targets the other domain event bus’s with the Order Events, i.e. in this example the Stock Domain. (these are unordered events)

✔️ (5/6) The cross-account ordered messages go to the Comms Domain and Stock Domain from SNS to SQS. (The subscription to the SNS FIFO Topic can be done using a CDK custom construct allowing a lookup on Domain(s) metadata, abstracting away the complexities from developers. More on this later)

✔️ (7/8) The Comms Domain and Stock Domain both utilise the ordered deduplicated messages to populate their own domain read stores. The SQS queue has a batch size of one which the Lambda consumes, and a DLQ for any errors.

💡 Note: SNS FIFO + SQS FIFO is no substitution for EventBridge, as the first solution is messaging, and the second is event streaming. We could however look to centralise the SNS topics into a Shared Account in a similar way to our central ESB; meaning if we had 11 domains we would have 11 topics in one place. We don’t cover this approach in the article, but this would lower the overall account dependencies.

👇 Before we go any further — please connect with me on LinkedIn for future blog posts and Serverless news https://www.linkedin.com/in/lee-james-gilmore/

Let’s break this down!

Let’s discuss this further and break it down with some visuals..

Each of the domain services use the Change Data Capture pattern (CDC) to listen to any database changes using DynamoDB streams (or database specific equivalent), and using Lambda to subscribe to the stream, publishing messages to the SNS FIFO topic in the domain account.

More info on the Change Data Capture pattern (CDC) in Serverless can be viewed here:

This essentially ensures that any database changes are captured and published as creates, updates and delete events.

The Change Data Capture pattern (CDC) in motion to raise events

The internal Order Domain SNS FIFO Topic has an internal Orders SQS FIFO Queue, which has a Lambda subscriber to read the messages and publish them as ‘Order Events’ to the central external shared enterprise service bus (Amazon EventBridge), as shown below.

💡 Note: There is a hard limit of 300 publish message calls per second with SNS FIFO queues, or 10MB per second. We can also use the PublishBatch SDK method to lower the overall cost of the messages i.e. 300 calls per second in 30 API calls. As we are consuming off the DynamoDB stream we can throttle this accordingly to make sure we never hit that limit. They also have a limit of 100 subscribers per SNS FIFO topic which is a lot of domains so will be fine!

Example showing how the Order Domain raises any Order events via the central ESB for other domains — however these events are not order guaranteed

This means that other domains can now listen to those Order Domain Events within their own domains (i.e. they can subscribe to Order events on the main ESB), however they will not be ordered as EventBridge does not guarantee ordering. These events will be suitable for the majority of domains and workloads (90% of domain needs).

Other domains can subscribe their own external SQS FIFO Queues (cross AWS account), enabling them to get guaranteed deduplicated and ordered events/messages; which is exactly what we need in specific scenarios such as building read stores based on other domains.

💡 Note: This is a Storage First Pattern from the point of view of of the consuming domains.

Guaranteed ordering and deduplication of messages/events for other domains to subscribe to so we can build read stores, or process tasks correctly such as increment/decrement of stock etc

As shown in the diagram above, each of the other domains can have the ability to subscribe to domain events which are deduplicated and guaranteed ordering so we can build out order specific read stores in those domains.

Why not force the ordering problem onto consumers?

We could look at just using Amazon EventBridge without introducing SNS and SQS FIFO queues/topics; but this puts the onus on the consuming Domain Services to process them in order. A very interesting Twitter thread below on exactly that:

Some of the things we would need to contend with are:

❌ What if we get an event out of order? Do we delay processing it? Do we stash it for now somewhere? Where and how?

❌ How do we know that the event is out of order? Do we need to understand what the previous event was, and what the next should be? (i.e. mnemonic counters).

❌ If we delay the event, then how long for? What if it is lost completely — do consumers only give the delay a certain timeframe? How do they carry on from the delay with the missing event? This would be the case even if we added sequence numbers…

❌ We can’t use timestamps as we really don’t know if we have events missing (just that one event is after or before another — but we don’t know if there should be events in between or not)

The following tweet in the chain sums this up nicely in my opinion:

There is also a great Medium article here on events in distributed systems and the problems at hand:

Great DevX with custom CDK constructs!

As this is a common pattern for most domains, we could spend some time in our platform teams (layer 5 of the Serverless Architecture Layers) to wrap the consumer and subscriber infrastructure into shared reusable CDK L3 constructs:

Example of how the custom CDK constructs could work

✔️ (1) At deploy time the Event Producer CDK Construct pushes the account and topic metadata to a shared Secrets Manager account so consumers can then pull from this for subscription and policy creation (using a consumer CDK construct).

✔️ (2) At build time the Event Consumer CDK Construct pulls the account and topic metadata from the shared Secrets Manager account to perform the subscription to the topic and the policy creation etc.

💡 Note: We use AWS Secrets Manager as this service works cross AWS account, allowing us to store the metadata in one place. Unfortunately other services like Amazon Parameter Store are account specific.

These shared constructs would build the required infrastructure as well as setting the correct cross-account policies.

Summary

I hope you found that useful showing a basic pattern for using Amazon EventBridge for our enterprise service bus; whilst still having guaranteed ordering of events/messages where needed i.e. the flexible domain event pattern.

Wrapping up 👋

Please go and subscribe on my YouTube channel for similar content!

I would love to connect with you also on any of the following:

https://www.linkedin.com/in/lee-james-gilmore/
https://twitter.com/LeeJamesGilmore

If you enjoyed the posts please follow my profile Lee James Gilmore for further posts/series, and don’t forget to connect and say Hi 👋

Please also use the ‘clap’ feature at the bottom of the post if you enjoyed it! (You can clap more than once!!)

About me

Hi, I’m Lee, an AWS Community Builder, Blogger, AWS certified cloud architect and Enterprise Serverless Architect based in the UK; currently working for City Electrical Factors (UK) & City Electric Supply (US), having worked primarily in full-stack JavaScript on AWS for the past 6 years.

I consider myself a serverless advocate with a love of all things AWS, innovation, software architecture and technology.

*** The information provided are my own personal views and I accept no responsibility on the use of the information. ***

You may also be interested in the following:

--

--

Global Head of Technology & Architecture | Serverless Advocate | Mentor | Blogger | AWS x 7 Certified 🚀