Photo by UX Indonesia on Unsplash

Serverless Architecture Layers & DDD (Part 4) — The Data Layer

Talking through the 5 Serverless Architecture Layers in detail, this time covering the Data Layer in part 4.

Serverless Advocate
20 min readDec 31, 2022

--

Contents

Part 1 — Experience Layer
⚪ Part 2 — Cross-cutting Layer
Part 3 — Domain Layer
✔️ Part 4 — Data Layer
⚪ Part 5 — Platform Layer

What we will be covering in Part 4:

✔️ Introduction.
✔️ Quick Recap.
✔️ The Data Layer.
✔️ Technical considerations & deep dive.

Introduction

In this article we are going to do a deeper dive into the Data Layer which is layer four of the five Serverless Architecture Layers of enterprise organisations, all of which is based on domain-driven design (DDD).

👇 Before we go any further — please connect with me on LinkedIn for future blog posts and Serverless news https://www.linkedin.com/in/lee-james-gilmore/

Quick recap

Below is an overview of the five Serverless Architecture Layers which are discussed in the following article:

Example diagram showing the five Serverless Architecture Layers

In the next section let’s cover what DDD is with Serverless as a recap, and why it is so important to the Data Layer.

“The Data Layer is typically progressed through both ‘Complicated Subsystem’ and ‘Platform’ teams in Team Topologies, depending on the type i.e. ML, AI, BI, ESB etc. Complicated Subsystem teams are described as “where significant mathematics/calculation/technical expertise is needed”.

Platform teams are described as “a grouping of other team types that provide a compelling internal product to accelerate delivery by Stream-aligned teams, one of which could be a central ESB” — Lee Gilmore

A prime example of this would be a single ‘platform’ in the Data Layer allowing for asynchronous communication using events between our microservices. In most enterprise organisations this is underpinned using technologies such as Amazon EventBridge, Kafka, RabbitMQ etc

What is Domain Driven Design in Serverless?

If you are happy you understand DDD in detail then you can move to the next section. If not, I would suggest reading the link below to see how it equates to Serverless and clean code.

The following article covers Domain Driven Design (DDD) with Serverless at a clean code perspective:

The Data Layer 🔌

Note: If you are comfortable with Amazon EventBridge as an Enterprise
Service Bus you can skip to the next section on Technical Considerations.

The Data Layer is not represented in the blue book; however in todays data and event-driven world, we need to think about this from an enterprise level. This includes key areas of the enterprise such as our Enterprise Service Bus, Data Lakes, Data Warehousing, Business Intelligence (BI), AI and Machine Learning (ML).

The key area that we will focus on in this article specifically is the ESB, or ‘Enterprise Service Bus’ as it is known, which is the glue between all of our domains and experiences so they can communicate asynchronously.

“‘.. ‘Enterprise Service Bus’ as it is known, which is the glue between all of our domains and experiences so they can communicate asynchronously”

An ESB can be described as:

An enterprise service bus (ESB) implements a communication system between mutually interacting software applications in a service-oriented architecture (SOA). ESB promotes agility and flexibility with regard to high-level protocol communication between applications.” — Wikipedia

An example of an ESB within our Data Layer is shown below:

An example of where the ESB sits in the five layers

The diagram below shows that using this ESB allows our domains and experiences (microservices) to subscribe to domain events and perform some action off the back of it in an eventually consistent manner:

Eventual consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value.

An event-driven architecture (or EDA) is an architecture pattern that uses events and eventual consistency to decouple application’s. When building modern applications with microservices, event-driven architecture is an effective way to create loosely coupled communication between microservices. Loosely coupled microservices are able to scale and fail independently, increasing the resilience of the application.

What is a Domain Event?

You may ask, what is a ‘Domain Event’? Martin Fowler has his own description of it:

“I go to Babur’s for a meal on Tuesday, and pay by credit card. This might be modelled as an event, whose event type is ‘make purchase’, whose subject is my credit card, and whose occurred date is Tuesday.” — Martin Fowler

I would personally describe a Domain Event as something significant that has happened in a domain (immutable and historic) that other sub domains can subscribe to in order to perform some work. An example could be approving an Order in the Order Domain when notified that a Payment was successful in the Payment Domain.

“I would describe a Domain Event as something significant that has happened in a domain (immutable and historic) that other sub domains can subscribe to in order to perform some work.” — Lee Gilmore

Types of domain events are shown below:

An Event is a change of state within a domain (something that has happened in the past and immutable). An example is ‘order created’ or ‘invoice generated’. This typically means one or more consumers can react to that event.

A Command is an intent aimed at another domain which results in some output (something that will happen in the future). An example is ‘send email’ or ‘generate pdf’. This is typically a one to one mapping, and the producer expects the consumer to deal with retries and failures.

More info: https://martinfowler.com/eaaDev/EventNarrative.html#EventsAndCommands

AWS and ESB’s = a match made in heaven ❤️

My goto service for Serverless Architecture Layers is Amazon EventBridge, which is described by AWS as:

Amazon EventBridge is a serverless event bus that lets you receive, filter, transform, route, and deliver events. — AWS

Utilising an enterprise service bus such as Amazon EventBridge now means that domains can interact with each other using well defined versioned events, as well as backend for frontend APIs (BFF’s) in the experience layer also raising events which other domains may be interested in.

Amazon EventBridge is perfect for enterprise organisations for the following key features:

✔️ Fully managed and scalable event bus — Amazon EventBridge is a serverless, fully managed, and scalable event bus that allows applications to communicate using events. There is no infrastructure to manage and no capacity to provision.

✔️ Over 100 built-in event sources and targets — Amazon EventBridge is directly integrated with over 130 event sources and over 35 targets, including AWS Lambda, Amazon SQS, Amazon SNS, AWS Step Functions, Amazon Kinesis Data Streams, and Amazon Kinesis Data Firehose.

✔️ Schema Registry — The EventBridge schema registry stores event schemas in a registry that other developers can easily search and access in your organisation, so you don’t have to find events and their structure manually.

✔️ Decoupled event publishers and subscribers — Amazon EventBridge makes it easy for you to build event-driven application architectures. Applications or microservices can publish events to the event bus without awareness of subscribers. Applications or microservices can subscribe to events without awareness of the publisher. You can also send events from your own applications to an event bus via the service’s PutEvents API.

✔️ Archive and Replay Events — Event Replay is a feature for Amazon EventBridge that allows you too reprocess past events back to an event bus or a specific EventBridge rule. This feature enables developers to debug their applications quickly, extend them by hydrating targets with historic events, and recover from errors.

The following video is a great overview of Amazon EventBridge:

Why do we need an ESB and event-driven systems?

When starting out with Serverless it is fairly easy to start building out domain services using services like Amazon API Gateway, Amazon DynamoDB and AWS Lambda, and then building out larger enterprise architectures by calling between them synchronously using HTTPS requests. However this:

  1. Increases Latency. Increases the latency of calls for the end user as they wait for all HTTPS requests to resolve in order.
  2. Very Brittle. It makes the overall architecture hugely coupled — making it massively brittle to any failures.

This is shown in the diagram below:

When there is an issue with one of the downstream services (for example a database having issues with CPU or memory) we find that everything breaks as they are all totally coupled:

The reason for this is that all of the domain services (micro services) are aware of each other, and intrinsically linked — so you find you have a domino affect when one service has issues. A better approach to domain driven development is to have your services loosely coupled, only communicating through the use of events asynchronously where possible (as shown below):

This ensures that if one system goes down or has trouble, that the events can be re-processed later when the service comes back online i.e. eventually consistent and asynchronous; and other domain services are not affected.

This is typically done using Dead Letter Queues, where unprocessed events go following errors after a configurable number of retries. They can then be reprocessed safely when the domain service comes back online, and the full flow works as normal again.

You can see from the diagram above that all of the domain services remain online other than the one bottom right, but its failed events are safely kept for re-processing, so your customers are not aware of any issues.

Note: We should always aim for decoupling micro services using events in
an eventually consistent manner; however there are valid reasons also where
you will need to utilise syncronous API calls (for example amending stock
values or taking payments).
We need to be pragmatic and use the correct communication method.

For more detailed information on the benefits of event-driven systems please read the following article:

Now let’s look at a deep dive into the technical considerations within our enterprise organisation and the serverless architecture layers.

Technical considerations & deep dive

So we have now covered the need for an Enterprise Service Bus within our Data Layer, why it is the glue between our services using domain events, and that our goto service should be Amazon EventBridge; so let’s now cover the technical considerations of this within our organisation.

We will cover:

✔️ The ‘Single-bus, multi-account pattern’.
✔️ Security.
✔️ External publishers and consumers.
✔️ No guaranteed ordering.
✔️ Event Schema Validation.
✔️ Idempotency.

The ‘Single-bus, multi-account pattern

We have discussed the need for our services in the domain and experience layers to be able to communicate asynchronously through domain events, but when it comes to utilising Amazon EventBridge we have two architecture design options:

  1. Single Bus, multi-account.
  2. Multi-bus, multi-account.

From a Serverless Architecture Layers perspective we would utilise the ‘Single Bus, multi-account’ pattern which is shown below:

https://github.com/aws-samples/amazon-eventbridge-resource-policy-samples/blob/main/patterns/single-bus-multi-account-pattern/README.md

In effect, the AWS Account in the middle is our Data Layer ESB platform being consumed via our domains and experiences; with each component being in its own AWS account as best practice.

The reason we would use this specific pattern is:

  1. This is a centralised approach, allowing us to subscribe to all events in one place for auditing, logging, security, streaming scenarios etc (more on this in the Security section below).
  2. A centralised platform team will manage the central event bus operationally inline with Team Topologies.
  3. Domain and experience services (microservices and teams) would send any events to the central event bus (for example ‘Order Created’, ‘Stock Allocated’ etc), which would then be distributed to the relevant target domain bus’s (account to account).
  4. Routing rules are via the central event bus which means they are managed in one place, meaning more governance and less onus on service teams. (We can also use AWS CDK L3/L4 constructs to abstract this away and reduce the cognitive load on teams - more on this in Part 5)
  5. There is less overhead in managing distributed rules and resource policies.
  6. Event schema developer discovery can be done by a centralised Schema Registry that would be readonly for teams from the centralised event-bus account. (more on this later in the validation section)
Note: regardless of approach, we always need to communicate bus to bus 
when going cross-account i.e. we can't directly invoke a Lambda say
in account B based off an event published on a bus in account A.

Detailed example

A more detailed example is shown below for a specific scenario:

We can see that:

  1. Account A which is a Web Store domain publishes a ‘New Order Created’ event to the Central Event Bus (Account B). The central event bus is our platform in the Data Layer.
  2. The central event bus in Account B has target rules to send the ‘New Order Created’ event to the event bus in Account C (Invoice Processing) and Account D (Forecasting).
  3. The rule in Account C on its own event bus targets a Lambda function for invoice processing.
  4. The rule in Account D on its own event bus targets a Step Function for forecast processing.

Reference: https://github.com/aws-samples/amazon-eventbridge-resource-policy-samples

How does this look from a Serverless Architecture Layers perspective?

The diagram below shows the architectural pattern with the Serverless Architecture Layers as a bi-directional flow:

Example async communication using our central EDA account between domains and experiences

We can see a simple logical flow in this example above from an experience domain event through to different consuming domains, all via our central event bus. Of course, the communication is bi-directional, and we can have publishing and consuming events between domains and experiences via the central bus.

The simplified view of this is shown below where we can see the experiences (integration with the outside world) wrapping around the domains (private to the experiences):

An example of the experiences being the communication to the outside World wrapping our private domains.

As our domains are well encapsulated and private as shown above (not accessible to the outside world), the communication is typically via a VPC Endpoint to the central bus when publishing and subscribing to events between domains:

Communication between our domain services

If we then look at a cross section of architecture including both synchronous calls using our private domain API Gateways, and our asynchronous communication using domain events via our ESB, we have the following (https://leejamesgilmore.medium.com/serverless-architecture-layers-ddd-part-3-the-domain-layer-43ffce28806f):

Security

As described above, having a centralised event bus allows for collating security related events globally, such as AWS CloudTrail events, which can be done across region to one specific regional account if required for your Enterprise Security Team:

In the example above, a company has their base of operations located in Asia Pacific (Singapore) with applications distributed across two additional Regions in US East (N. Virginia) and Europe (Frankfurt).

The applications in US East (N. Virginia) and Europe (Frankfurt) are using Amazon EventBridge for their respective applications and services. The security team in Asia Pacific (Singapore) wishes to analyse events from the respective applications as well as receive AWS CloudTrail events for specific API calls made to specific operations to monitor infrastructure security.

External publishers and consumers

In an enterprise we typically have many integration points with external companies or services (3rd party services, legacy services, on-prem applications etc), where we need to ensure that they can both publish and consume events.

In the synchronous world through REST and our well defined APIs we do this via Integration Experiences as shown below (covered in part 1):

Diagram showing the use of integration experiences — https://leejamesgilmore.medium.com/serverless-architecture-layers-ddd-part-1-the-experience-layer-fc57205153c3

This means that we have an integration experience as an anti-corruption layer between our domain services and external legacy systems, SaaS products, 3rd party suppliers etc.

In the event-driven world (asynchronous) using our central event bus in the Data Layer we have two ways of communicating with other external services:

✔️ Publishing events through the Event Gateway Pattern.

✔️ Consuming events using Amazon EventBridge API Destinations.

Publishing events through the Event Gateway Pattern.

The ‘Event Gateway Pattern’ is essentially putting an Amazon API Gateway REST API in front of your central Amazon EventBridge bus, allowing any legacy or 3rd party publisher to publish their own events via REST (API) with a direct integration with Amazon EventBridge.

Example of our external system publishing domain events to our central event bus via API Gateway.

This ensures that these 3rd party systems can always publish to our dedicated central event bus, as almost all modern languages and frameworks allow for interactions using REST and HTTPS.

Domain services and experiences can then consume the domain events as shown above

For a detailed article and example code repository please see the following article, which shows both consuming and publishing events through external services:

Consuming events using Amazon EventBridge API Destinations.

We utilise Amazon EventBridge API Destinations to integrate back out to our external services using REST API calls, allowing these external services to consume our domain events from the Domain Layer:

Example of an external service consuming domain events through API Destinations with Amazon EventBridge.

An API destination uses a Connection to manage the authentication credentials for an external target. This defines the authorisation type used, which can be an API key, OAuth client credentials grant, or a basic user name and password.

This technical approach allows us to send asynchronous results back to consumers external to our organisation if required (for example order creation to a legacy on-premise application), and it allows for down stream throttling to protect against surges in traffic, as well as resilience through retries of calls for up to 24 hours (and then finally sent to a DLQ).

This ensures that if the event is not delivered within this timeframe, it is stored durably in an Amazon SQS queue for further processing. This can also be useful if the downstream API experiences an outage for extended periods of time.

No guaranteed ordering

Amazon EventBridge does not provide ordering guarantees for events (yet), and replays are performed in a multi-threaded manner that might result in events being delivered in an order different from that of their original ordering.

Below is the summary of various messaging AWS services with respect to ordering guarantee:

As we are utilising Amazon EventBridge as our central event bus in the Serverless Architecture Layers pattern, we need an approach for dealing with this when we specifically need guaranteed event ordering.

The diagram below shows the ‘Flexible Domain Events’ pattern which combines Amazon EventBridge with Amazon SNS FIFO and Amazon SQS FIFO (Pub/Sub Fan Out) to work around this where needed:

High level diagram showing the ‘Flexible Domain Events’ pattern.

An AWS specific detailed diagram of this is shown below for the Serverless Architecture Layers and DDD:

A detailed example of the architectural approach

In our example above this works by:

✔️ (1) The order is created via Amazon API Gateway.

✔️ (2) The orders are stored in Amazon DynamoDB, and we use streams and a Lambda function to write the changes to an Emitter SNS FIFO Topic using a Change Data Capture pattern (CDC).

✔️ (3) The SNS FIFO Topic has cross account SQS FIFO queue subscribers in different domain accounts, with these messages deduplicated, ordered, and guaranteed once delivery. (As shown in the diagram this could be a custom L3 CDK construct which abstracts all of this complexity away - more on this in Part 5)

Note: There is also an internal SQS FIFO queue subscriber for raising the domain events to the organisation shared event bus.

✔️ (4) There is a target rule on the shared event bus which targets the other domain event bus’s with the Order Events, i.e. in this example the Stock Domain. (these are unordered events)

✔️ (5/6) The cross-account ordered messages go to the Comms Domain and Stock Domain from SNS to SQS. (The subscription to the SNS FIFO Topic can be done using a CDK custom construct allowing a lookup on Domain(s) metadata, abstracting away the complexities from developers)

✔️ (7/8) The Comms Domain and Stock Domain both utilise the ordered deduplicated messages to populate their own domain read stores or perform some processing. The SQS queue has a batch size of one which the Lambda consumes, and a DLQ for any errors.

For a detailed explanation of the ‘Flexible Domain Events’ pattern with accompanying code example see the following article:

Event schema validation

we should enhance the basic Amazon EventBridge schemas in our enterprise generated through schema discovery for true ‘publisher’ and ‘consumer’ schema validation, using OpenAPI 3.0; allowing us to take advantage of regexes (patterns), min and max integer values, enums and more… We also use a production shared EDA account for making our versioned schemas searchable to all consumers.

Note: The diagrams below have the bus to bus cross-account communication
removed for brevity.

We should validate events when both publishing and consuming events as shown below:

As you can see from above when using a central EDA account and Serverless Architecture Layers, if the producer is not a ‘good event citizen’ they may create duff events that are then distributed across many, many consumers.

This then causes 1..n consumers to potentially validate the events and not being able to process them as they fail.. (what then?).

When consuming events across AWS accounts we should also validate the events as shown below:

As you can see from the diagram above, in an enterprise with many distributed services and accounts, we don’t actually know where the event originated from (other than it being the central event bus which it is routed from).

When validating both the producing and consuming of events cross account we end up with the following:

As we have the versioned event schemas stored in a centralised AWS account in production they are easy to find and pull into consuming repos from a developer experience perspective.

In the Serverless Architecture Layers approach with a centralised EDA account and Schema Discovery turned on in non production accounts, we can allow distributed teams to consume the enhanced versioned event schemas from the production account as shown below:

Multiple distributed teams can consume the versioned event schemas

The following article is a deep dive into this technical approach with an accompanying code repository:

Idempotency

Idempotency is the property of an operation whereby it can be applied multiple times without changing the result beyond the initial application. You can run an idempotent operation safely multiple times without any side effects like duplicates or inconsistent data.

When working with Serverless Architecture layers and Amazon EventBridge in the Data Layer ‘Idempotency’ becomes a key consideration.

“When working with Serverless Architecture layers and Amazon EventBridge in the Data Layer ‘Idempotency’ becomes a key consideration.”

Applied to Lambda, a function is idempotent when it can be invoked multiple times with the same event with no risk of side effects. To make a function idempotent, it must first identify that an event has already been processed. Therefore, it must extract a unique identifier, called an “idempotency key”.

Amazon EventBridge supports at-least-once semantics with respect to its targets. We need to ensure that operations are idempotent or perform deduplication after the event has been sent to the target.

This idempotency key may be in the event payload (for example, orderId), a combination of multiple fields in the payload (for example, customerId, and orderId), or a dedicated idempotency key made up of the payload itself (deterministic UUID v5 and a stringified JSON payload).

The function then checks in a persistence layer (for example, Amazon DynamoDB or Amazon ElastiCache):

  • If the key is not there, then the Lambda function can proceed normally, perform the transaction, and save the idempotency key in the persistence layer. You can potentially add the result of the function in the persistence layer too, so that subsequent calls can retrieve this result directly.
  • If the key is there, then the function can return and avoid applying the transaction again.

Source: https://aws.amazon.com/blogs/compute/handling-lambda-functions-idempotency-with-aws-lambda-powertools/

Summary

So lets summarise the need for the Data Layer when it comes to Serverless Architecture Layers, and why Amazon EventBridge isthe most important shared component and is key to your enterprise:

✔️ An Enterprise Service Bus (ESB) allows us to coordinate across teams in a business.
✔️ Our ESB allows us to decouple our Domain Services across our enterprise.
✔️ Amazon EventBridge Schema Registry allows other teams to view our versioned events and consume them (easy to discover, reducing cognitive load and enhancing communication)
✔️ We can report on the data and events by streaming them to data lakes or warehouses, and consuming using tools such as Athena and QuickSite.
✔️ We can use sentiment analysis and events to build up single customer views.
✔️ Reduces cognitive load on teams using an ESB and increases their speed and agility.

Wrapping up 👋

Please go and subscribe on my YouTube channel for similar content!

I would love to connect with you also on any of the following:

https://www.linkedin.com/in/lee-james-gilmore/
https://twitter.com/LeeJamesGilmore

If you enjoyed the posts please follow my profile Lee James Gilmore for further posts/series, and don’t forget to connect and say Hi 👋

Please also use the ‘clap’ feature at the bottom of the post if you enjoyed it! (You can clap more than once!!)

About me

Hi, I’m Lee, an AWS Community Builder, Blogger, AWS certified cloud architect and Global Enterprise Serverless Architect based in the UK; currently working for City Electrical Factors (UK) & City Electric Supply (US), having worked primarily in full-stack JavaScript on AWS for the past 6 years.

I consider myself a serverless advocate with a love of all things AWS, innovation, software architecture and technology.

*** The information provided are my own personal views and I accept no responsibility on the use of the information. ***

You may also be interested in the following:

--

--

Global Head of Technology & Architecture | Serverless Advocate | Mentor | Blogger | AWS x 7 Certified 🚀