Serverless TACTICAL DD(R) 🚀

What is TACTICAL DD(R) as a tactical approach to nonfunctional requirements when it comes to Serverless solutions, and how can it compliment your DoR & DoD to ensure your solutions are more scalable, secure, fault tolerant, compliant and more? This article also contains a GitHub repo containing a download for teams to use, and shows how this compliments Serverless Threat Modelling.

Introduction

When it comes to building out serverless architectures on AWS, one of the main benefits of getting started is how easy it is, and one of the biggest downsides is also how easy it is! Teams can very quickly build out scalable MVPs (Minimum Viable Products) using AWS services like Amazon API Gateway, AWS Lambda and Amazon DynamoDB, but the downside of this is that many teams forget how to correctly productionise these solutions, and miss really key factors that end up biting them at a later date!

“When it comes to building out serverless architectures, one of the main benefits of getting started is how easy it is, and one of the biggest downsides is also how easy it is!”

This article is going to discuss how to use an approach I call Serverless TACTICAL DD(R) which compliments a teams DoR (Definition of Ready) and DoD (Definition of Done), as a checklist of questions to ask from a serverless perspective at the right time.

We are going to continue working through building out the fictitious ‘LeeJames HR’, showing how a team can implement this as an approach, and how it affects their architecture before and after.

You can find the Github repo here.

What is an NFR, DoR and DoD in the World of Agile? 💭

Before we jump into the approach, lets quickly cover some key concepts:

Having a Definition of Ready means that stories must be immediately actionable. The Team must be able to determine what needs to be done and the amount of work required to complete the User Story or PBI. — https://www.scruminc.com/definition-of-ready/

In practice, the Definition of Ready must contain all the things that have hurt you in the past or could hurt you in the future. These items must be solved for before the work begins. — https://www.leadingagile.com/

The Team must understand the “done” criteria and what tests will be performed to demonstrate that the story is complete. Think of the DoD as what the organisation requires before it can deliver a PBI to the end user. — https://www.scruminc.com/definition-of-done/

The Definition of Done describes the target state. It must be informed by the Definition of Ready. It’s that simple. The anti-pattern adopted by most teams is to neglect either. — https://www.leadingagile.com/

Nonfunctional Requirements (NFRs) define system attributes such as security, reliability, performance, maintainability, scalability, and usability. They serve as constraints or restrictions on the design of the system across the different backlogs — https://www.scaledagileframework.com/nonfunctional-requirements/

What is a serverless TACTICAL DD(R) approach? ☑️

Serverless TACTICAL DD(R) as a tactical approach to DoR and DoD aims to ensure we ‘consider’ and ‘validate’ key areas which are in my experience typically forgotten about when teams are quickly trying to realise business value when building out their solutions, and unfortunately are rarely back-ported, and typically forgotten about until issues arise!

These are not ‘gates’ to work being brought into sprint; more an extension point to reflect on if these areas should be considered from a serverless perspective.

The comments and my own experiences suggest that there are other key areas

Asking these key questions at the right time can ensure this doesn’t happen going forward with your serverless solutions. Let’s cover what TACTICAL DD(R) stands for below:

https://github.com/leegilmorecode/serverless-tactical-ddr

How did the team approach it? 🚀

The team already had a typical generic DoR and DoD previously which looked like this below (and considered the INVEST matrix):

Definition of Ready

  1. The story should be written exactly in the ‘user story’ format.
  2. Acceptance criteria must be understood by the team.
  3. A Team needs to estimate the story.
  4. The team should understand how to provide a demo of the features.
  5. Performance criteria should be understood by the team.

Definition of Done

  1. Unit and integration tested.
  2. Releasable.
  3. Release note created.
  4. Deployed to production.

As you can see from these typical examples of DoR and DoD, it allows teams to understand when a story is ready to pick up at a basic level, and when it is complete, but we can compliment this from a serverless perspective in my opinion, as the guardrails above are non-serverless related, of course.

The agile coach Ben used the example questions from the TACTICAL DD(R) approach to add specific items to their checklists for DoR and DoD when it comes to their serverless features. This ensured the team at least considered these areas in story creation, backlog refinement and planning. The reasons they are questions is that it gives more context for the team to agree whether they are suitable for them to adopt or not, and if they therefore want to add them to their own checklists as defined statements.

Photo by Jason Goodman on Unsplash

Next, the team started to pick up stories for the new feature of LeeJames HR which was around managing annual leave. As they progressed the stories through development stages they regularly looked at the items on the new definition of done which was based on the download to ensure they were progressing against it. This acted as a checklist for work being completed, and therefore added a set of serverless ‘guardrails’ to their solutions.

Photo by Jason Goodman on Unsplash

Over time the team regularly thought about these serverless guardrails by default when progressing work and releasing to production; so as each item became second nature to them, it was subsequently removed from the lists.

They simply acted as a set of areas of mental focus that could be followed with a degree of common sense, as opposed to the bouncer at the door analogy, preventing work from being brought in and progressed, or delivered into production.

So, Definition of Ready is not a mechanism to isolate teams and frustrate collaboration. Quite the opposite. It’s a tool that adds value at certain stages of the journey, when it is applied properly and for the right reasons.

Nor is it something we want to lock in place forever. When the organisation outgrows the need for it, the formal Definition of Ready can be simplified and possibly eliminated — https://www.leadingagile.com/

How did this affect LeeJames HR?

Now that we have discussed a serverless TACTICAL DD(R) approach, let’s look at a before and after to see how this affected how they architected and built one of the serverless domain services in ‘LeeJames HR’ for managing annual leave.

Note: This is a totally fictitious example and architecture below simply to discuss a concept.

How the solution would look without thinking about these key factors in design

And this is how it looks following using this approach:

Example of solution following using the TACTICAL DD(R) approach

So lets discuss in more detail what the main changes were following the TACTICAL DD(R) approach with Serverless:

  1. Threat Modelling. The team threat modelled the proposed architecture and found various potential gaps around security which were mitigated (discussed further below).
    For more information on Serverless Threat Modelling check this out:
    https://leejamesgilmore.medium.com/serverless-threat-modelling-df8e4028ef6d
  2. Authorisation. There was no specific acceptance criteria around authorisation and the team would have potentially reused scopes that were there already — potentially causing security issues with the solution. A dedicated set of scopes were therefore created for least privilege and mapped and agreed with the team and the product owner/business analysts.
  3. Compliance. Reporting had not been thought about previously, but in implementing OpenSearch it was important to redact any PII using Amazon Comprehend as it was not required. The team also ensured that no unnecessary PII was logged to Amazon CloudWatch.
  4. Testing. The team ensured that the features had the correct level of agreed testing in place, which was unit test coverage, e2e testing using Artillery, and integration testing of the Amazon DynamoDB package they created against a docker container with Jest.
  5. Internationalisation. The team had discussions around their future growth plans to move into four other countries over the next year, so decided to implement react-intl into the React app now to allow them to localise the front end, and ensured that the deployment region and a client ID was added to the Serverless setup to differentiate different deployments. This prevented a lot of rework in the future.
  6. Caching. The team decided to add Amazon DynamoDB DAX into the solution since it was very read heavy, which would decrease latency for their customers, as well as saving on potential costs through less reads on Amazon DynamoDB and quicker Lambda execution times. They also added caching within Amazon CloudFront with the correct configuration for their customers.
  7. Auditing. During the threat modelling exercise, Dan in the team discussed non-repudiation, and that we had no auditing of what users of the system were actually doing. For this reason the team implemented DynamoDB streams to a separate Audit table via Lambda, which was tied down via IAM.
  8. Load testing. Following some early load testing, it was found that the legacy downstream Finance system could not cope with the expected traffic of the new system as it couldn’t scale to the same rate. For this reason, the team implemented an SQS queue between the two services which allowed for throttling at an acceptable rate.
    For more information on Serverless Load Testing look here: https://levelup.gitconnected.com/serverless-load-testing-at-scale-with-artillery-53ef6c8b77f7
  9. Disaster Recovery. The team realised that DR had not been considered at all, and that the biggest risk lay with the databases. For this reason, the team implemented PITR (Point In Time Recovery) on their DynamoDB tables, as well as regular backups using AWS Backup. For more information see here: https://levelup.gitconnected.com/enterprise-serverless-databases-208b8790998
  10. Documentation. The team ensured that they had ADR’s, code documentation and OpenAPI schemas in place for the parts of the system that required it. For more information on Serverless Documentation see here: https://levelup.gitconnected.com/documenting-your-serverless-solutions-509f1928564b
  11. Reporting. The team understood the importance of being data-driven, and the need for reporting from a stakeholder perspective. The team decided to redact PII by using Amazon Comprehend, and streamed the relevant data to OpenSearch for reporting and analytics.

This diagram below shows the before and after view of how it would have looked compared to how it did with using this approach in an iterative manner:

An example of the before and after of this piece of work

When considering an iterative approach, potentially through a framework like SPIDR, then TACTICAL DD(R) can be applied iteratively with the framework in the same manner (i.e. as we look to split down the stories further). Not all items on the list are going to be relevant for all stories, and thats OK.

This approach in my experience can compliment Serverless Threat Modelling by becoming part of the Definition of Ready when looking at any significant changes to architecture, users or data flows:

Summary

I hope you find this approach useful when working within your teams, as in my experience these are the key areas which are typically missed when designing and building Serverless solutions, and the items are rarely back ported at a later date until big issues arise.

Teams may have their own DoR and DoD, at an organisational level, a team level, or anywhere in between. Agile can be a very opinionated area in my experience, but I personally believe it is whatever works for the team.

This simple approach to your teams day to day work can ensure that key factors in productionising your serverless solutions are not missed, and ultimately make your services more secure, resilient, compliant and secure for your customers.

Wrapping up 👋

I hope you found that useful!

Please go and subscribe on my YouTube channel for similar content!

I would love to connect with you also on any of the following:

https://www.linkedin.com/in/lee-james-gilmore/
https://twitter.com/LeeJamesGilmore

If you found the articles inspiring or useful please feel free to support me with a virtual coffee https://www.buymeacoffee.com/leegilmore and either way lets connect and chat! ☕️

If you enjoyed the posts please follow my profile Lee James Gilmore for further posts/series, and don’t forget to connect and say Hi 👋

Please also use the ‘clap’ feature at the bottom of the post if you enjoyed it! (You can clap more than once!!)

About me

Hi, I’m Lee, an AWS Community Builder, Blogger, AWS certified cloud architect and Principal Software Engineer based in the UK; currently working as a Technical Cloud Architect and Principal Serverless Developer, having worked primarily in full-stack JavaScript on AWS for the past 5 years.

I consider myself a serverless advocate with a love of all things AWS, innovation, software architecture and technology.

* The information provided are my own personal views and I accept no responsibility on the use of the information. ***

You may also be interested in the following:

Principal Serverless Engineer | Cloud Architect | Serverless Advocate | Mentor | Blogger | AWS x 4 Certified 🚀