My AWS Rocks!

How to excel at operational excellence.

Hello World, Hello Blog

Following on from my talk at the London AWS summit in June, I thought I'd write a series on how to perform well in each pillar. In this first post I'll look at the operational excellence pillar and provide some things you need to consider to perfom well from an operations perspective Firstly, let's take a look at the design principles as outlined in the pillar Perform operations as codeMake frequent, small, reversible changesRefine operations procedures frequentlyAnticipate failureLearn from all operational failuresFor me the first 3 can be summarised as treat everything you need for operations the same as any other part of your application. Put everything as code, even if scripting cli commands, and follow a software lifecycle where you are makeing small changes that are refined over time. I treat this as part of the "shift left" mindset.

You put what in a public subnet‽

So you've split your VPC into different tiers which is great, but why did you put THAT in your public subnet and not a private one?

Its great seeing peoples designs for modern solutions and especially serverless. What is more impressive is, where VPC services are in use, they are splitting them out into separate tiers and subnets. 😕But why do so many people put things in public subnets that don't need to be?In this article I'll look at what I think should be in public subnets and why you try not to put anything in a public subnet you don't need to.

Troubleshooting with VPC Flow Logs

So you built and secured your VPC. But solutions are not working? How to understand VPC Flow Logs and use them to troubleshoot connectivity in AWS.

So you built your secure VPC, but things are not working as expected. Or maybe something changed on the infrastructure and now things are not working. And as any network engineer knows, every application fault is always due to the network! So how do we prove traffic is getting to our systems and it's not the network? The answer is VPC Flow Logs. There is great guidance on Flow Logs in the AWS VPC documentation so I will try not to cover that. What I will try and do is clarify some areas and explain how we can then use them to understand what is going on in our network. Specifically how we can use the AWS CloudWatch Logs console to find out what is happening in our VPC and give us some pointers on what might be wrong.

How to use NACLs and Security Groups

Where and how should you use NACLs and Security Groups to ensure you have a secure network on AWS.

Following up from my last post (here) on what Network Access Control Lists (NACLs) and Security Groups (SGs) are, I will now take a look at where and how I think you should use them to ensure you have a secure network. I'll use a basic scenario of a VPC (10.0.0.0/16) split into two public subnets, with access to the internet (10.0.0.0/24 and 10.0.1.0/24), and two private subnets, with no route the the internet (10.0.10.0/24 and 10.0.1.0/24). The application is running on 2 EC2 behind an application load balancer to discuss the options.

Network Access Control Lists vs Security Groups

What are NACLs and Security Groups? What are the differences? How do they work?

Both are used to protect networks and resources, but there is often confusion about the difference between Network Access Control Lists (NACLs) and Security Groups, and when each should be used. This post, aims to demystify the two concepts. The differences that we will cover are: Stateful vs Stateless Inbound vs Outbound Allow vs Deny Rule Order Future post will then look at how to use this knowledge to apply both NACLs and Security Groups, and how to troubleshoot connectivity issues when NACLs and Security Groups are in place.