Photo by Adeolu Eletu on Unsplash

Serverless B2B Authentication — Part 2 🚀

Example of further productionising our solution from Part 1, which uses API Gateway, EventBridge API Destinations, WAF and Cognito to onboard and integrate with external customers, focusing more on security, scalability and fault tolerance. Example written in the CDK and Typescript.

Introduction

In the second part of this series we are going to look at further productionising our Part 1 solution, adding some key elements to make us more secure and resilient to faults. Part 1 of the article can be found here using the link below:

This is what we will be building out in the basic example solution

Before we go any further — please connect with me on LinkedIn for future blog posts and Serverless news:

Quick recap

As a quick recap, the following scenario is what we are building out our solution for:

🚘 Lee James Luxury Cars. This company manufactures cars, and as part of that process needs to acquire tires from a 3rd party supplier. When a Car Order is created, with a status of OrderSubmitted, it creates a Tires Order, and waits for the tire order to be completed before completing the overall car order.

🛞 LJG Tires. This company supplies car manufactures with tires for their cars, and is called via Lee James Luxury Cars to procure tires for their cars. When a tire order is complete, a call back to the Car Orders API happens to complete the overall car order.

⚠️ Note: The code and architecture are not production quality and are only to help illustrate some architectural concepts, code and services discussed in the article.

You can pull down the code using the link below:

Let’s look at what services we are using in the next section.

Which AWS services are we using? 💭

Let’s see which AWS services we are using in Part 2 when productionising the solution:

✔️ AWS VPC + Nat Gateway + Elastic IP

In this solution we add the following components to our Tires Domain Service so all traffic that is routed from it has a static IP address, which then allows the Car Domain Service to ensure it can whitelist the traffic (i.e. block all other IPs consuming the API other than the Tires API).

🔵 VPC

Amazon Virtual Private Cloud (Amazon VPC) enables you to launch AWS resources into a virtual network that you’ve defined. This virtual network closely resembles a traditional network that you’d operate in your own data centre, with the benefits of using the scalable infrastructure of AWS.

🔵 Nat Gateway

A NAT gateway is a Network Address Translation (NAT) service. You can use a NAT gateway so that instances in a private subnet can connect to services outside your VPC but external services cannot initiate a connection with those instances.

🔵 Elastic IP

An Elastic IP address is a static IPv4 address designed for dynamic cloud computing. An Elastic IP address is allocated to your AWS account, and is yours until you release it.

✔️ AWS WAF (Web Application Firewall)

AWS WAF is a web application firewall that helps protect your web applications or APIs against common web exploits and bots that may affect availability, compromise security, or consume excessive resources. AWS WAF gives you control over how traffic reaches your applications by enabling you to create security rules that block or allow traffic; with whitelisting being the sole aim in our example.

✔️ DynamoDB P.I.T.R (Point In Time Recovery)

You can create on-demand backups of your Amazon DynamoDB tables, or you can enable continuous backups using point-in-time recovery.

Point-in-time recovery helps protect your DynamoDB tables from accidental write or delete operations. With point-in-time recovery, you don’t have to worry about creating, maintaining, or scheduling on-demand backups. For example, suppose that a test script writes accidentally to a production DynamoDB table. With point-in-time recovery, you can restore that table to any point in time during the last 35 days. DynamoDB maintains incremental backups of your table.

✔️ EventBridge Archive and Replay

In Amazon EventBridge, you can create an archive of events so that you can easily replay them at a later time. For example, you might want to replay events to recover from errors or to validate new functionality in your application.

✔️ API Gateway Resource Policy

Amazon API Gateway resource policies are JSON policy documents that you attach to an API to control whether a specified principal (typically an IAM user or role) can invoke the API. You can use API Gateway resource policies to allow your API to be securely invoked by:

  • Users from a specified AWS account.
  • Specified source IP address ranges or CIDR blocks.
  • Specified virtual private clouds (VPCs) or VPC endpoints (in any account).

You can use resource policies for all API endpoint types in API Gateway: private, edge-optimized, and Regional.

In the next section let’s look at what we are building out specifically.

What are we building? 🔩

We are going to extend and amend the solution from Part 1 as follows:

Some of the additions have been added in red circles above

We have taken the example we created in Part 1 of the series and have made the following amendments:

1️⃣ Car Order Domain. We don’t want all of our Lambdas scaling out and generating their own access tokens so we can consume the Tire Order Domain, so we have a Lambda on a CRON which generates one access token and pushes this to Parameter Store before the previous one expires. Then an individual Lambda can utilise that token as it scales out.

2️⃣ Car Order Domain. A user can consume the create car order endpoint (POST on /orders) which invokes a Lambda within the VPC which writes the car order to DynamoDB. We have P.I.T.R on the table to ensure that we are more resilient to faults. The Lambda also uses the access token and an API Key from its own Usage Plan to consume the Tires Order API to create a tire order for the car.

⚠️ Note: we have now ensured that our Lambdas sit in a Private Subnet, and all outbound traffic is routed via a Nat Gateway and Elastic IP to ensure that we have a static IP address(s). We have also introduced VPC Flow Logs so we can inspect traffic moving in and out of our VPC. (Ideally we would have done this to both solutions in an ideal World)

3️⃣ Tire Order Domain. We have added a WAF (Firewall) to our API which allows us to whitelist (only allow) traffic from our Car Order Domain. We have also got layered security as we have added a resource policy to the API also which only allows traffic with the same source IP address as above.

Amazon Cognito will ensure we have a valid access token being sent through on the API request, as well as validating the API Key secret too.

4️⃣ Tire Order Domain. The API call invokes a Lambda which in turn writes the tire order to a DynamoDB table.

5️⃣ Tire Order Domain. The DynamoDB table for tire orders has P.I.T.R on it as we had with the Car Orders domain above.

6️⃣ Tire Order Domain. A Lambda on a CRON completes the tire orders asynchronously (as if the work took a period of time), and the Lambda subsequently updates the tire order as complete, and raises an EventBridge event to notify consumers of this change.

7️⃣ Tire Order Domain. An EventBridge target rule utilises API Destinations to generate an access token for the Car Orders domain service, and automatically calls it to complete the overall car order.

8️⃣ Car Order Domain. We swapped out API Key in the previous part for Amazon Cognito (client credentials grant flow) which allows us to validate the access token that has been provided by API Destinations in the tire order account.

9️⃣ Car Order Domain. Finally, the Lambda is invoked by API Gateway and an update is made to the DynamoDB table to update the overall car order as complete.

Let’s talk through some key code in the next section.

Talking through key code 💬

OK, so we have now discussed the overall architecture and the services used, so now let’s talk through the more interesting pieces of code.

✔️ Nat Gateway static elastic IP for white listing

The code below shows adding the NAT Gateway to our Car Orders domain which can be seen here:

⚠️ Note: In our example for cost we have only created one NAT Gateway across two AZs, although in production we would have this as 3 and 3 as minimum to make this highly available. There is also some work in the CDK to get the elastic IP of the Nat Gateway(s) and then outputting the value(s) to the stack so we can pull this into the Tires Company stack.

We can see from the console that our Nat Gateway in the Car Orders domain has a static elastic IP which is shown below (in our case it is 46.51.181.75):

This means that all calls coming from our Car Orders domain will have the same static IP address, which can then be subsequently white listed from the Tire Order domain. We can see in the ‘order-stock’ lambda in the Tire Order domain that the event shows the source IP:

Which is from the following line in the Lambda code:

console.log(`sourceIp: ${event.requestContext.identity.sourceIp}`);

We can then subsequently block all traffic other than this IP address using Resource Policies and WAF when working with external consumers i.e. the Car Order Domain.

✔️ Resource Policy for white listing the Car Orders domain

We can then add a Resource Policy on our Tires Company API which ensures that the only traffic that is allowed through is from the Car Orders domain static IP address(s).

⚠️ Note: We could have done this on both domain services but this is to discuss the concepts only with some high level code.

Example code for adding a resource policy to our Tires API stating the only traffic can come from the Car Orders Domain

We can then see when we put in a fictitious IP address as the source IP of the Car Orders domain we will get the following error in the logs when trying to create a new car order (the Tires Orders API gives an access denied as the source IP is not valid):

Message: 'User: anonymous is not authorized to perform: execute-api:Invoke on resource: arn:aws:execute-api:eu-west-1:********6744:xxxxxx/prod/POST/orders with an explicit deny'

⚠️ Note: We have also added the whitelisting to our WAF associated with the the Tires API to show both working in tandem.

✔️ VPC Flow Logs for the Car Orders domain

The following code means that we log all of our traffic within our VPC to VPC Flow Logs in CloudWatch:

Example of adding VPC Flow Logs to our Car Orders domain

Flow logs can help you with a number of tasks, such as:

  • Diagnosing overly restrictive security group rules.
  • Monitoring the traffic that is reaching your instance.
  • Determining the direction of the traffic to and from the network interfaces.

Flow log data is collected outside of the path of your network traffic, and therefore does not affect network throughput or latency. You can create or delete flow logs without any risk of impact to network performance.

We can now track from a security perspective all traffic which is coming through to our VPC:

Example from the console of our VPC Flow Logs setup

✔️DynamoDB P.I.T.R for our Tires Order domain

We add DynamoDB Point In Time Recovery to our DynamoDB table in the Tires Order Domain so we able to recover our database if it becomes corrupt:

Example of adding P.I.T.R to our table in the Tires Order Domain

“Point-in-time recovery helps protect your DynamoDB tables from accidental write or delete operations. With point-in-time recovery, you don’t have to worry about creating, maintaining, or scheduling on-demand backups. For example, suppose that a test script writes accidentally to a production DynamoDB table. With point-in-time recovery, you can restore that table to any point in time during the last 35 days. DynamoDB maintains incremental backups of your table.”

✔️ AWS WAF for our Tires Orders domain

We have created a WAF (firewall) with rule and IP Set which is associated to our API Gateway, which blocks all traffic accept from our one source static IP address i.e. the Car Orders domain:

⚠️ Note: We could have added a WAF on both services but added this to the tires API only for simplicity and to explain the concepts through some basic code.

We could have further added to the WAF rules with AWS Managed rules. AWS Managed Rules for AWS WAF is a managed service that provides protection against common application vulnerabilities or other unwanted traffic, without having to write your own rules.

You have the option of selecting one or more rule groups from AWS Managed Rules for each web ACL, up to the allowed maximum web ACL capacity unit (WCU) limit. You can choose whether to count (monitor) or block requests that are matched by the managed rules:

✔️ AWS EventBridge Archive and Replay

We can see from the code below that we also add Archive and Replay to our EventBus to allow us to replay events if we need too e.g. perhaps to hydrate a new service, or to replay events after a bug was fixed.

Summary

I hope you found that useful as an example of further productionising our solution in Part 1 for security, availability and scalability. There are obviously a host of further changes we could make, and more services to use, but this basic example is to discuss some concepts and to introduce people to new features and services.

Go and subscribe to my Enterprise Serverless Newsletter here for more of the same content:

Wrapping up 👋

Please go and subscribe on my YouTube channel for similar content!

I would love to connect with you also on any of the following:

https://www.linkedin.com/in/lee-james-gilmore/
https://twitter.com/LeeJamesGilmore

If you enjoyed the posts please follow my profile Lee James Gilmore for further posts/series, and don’t forget to connect and say Hi 👋

Please also use the ‘clap’ feature at the bottom of the post if you enjoyed it! (You can clap more than once!!)

About me

Hi, I’m Lee, an AWS Community Builder, Blogger, AWS certified cloud architect and Global Serverless Architect based in the UK; currently working for City Electrical Factors (UK) & City Electric Supply (US), having worked primarily in full-stack JavaScript on AWS for the past 6 years.

I consider myself a serverless advocate with a love of all things AWS, innovation, software architecture and technology.

*** The information provided are my own personal views and I accept no responsibility on the use of the information. ***

You may also be interested in the following:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store

Principal Serverless Engineer | Enterprise Cloud Architect | Serverless Advocate | Mentor | Blogger | AWS x 6 Certified 🚀