Skip to content

AWS MSK + Glue – Part One

antique cannon covered in snow
(Photo by Author)

Over December 2022 I got it into my head that I wanted my teams to skill up on the combination of Kafka and a Schema Register. There’s a ton of resources on introducing the use of Kafka, and some resources on using a Schema Registry, but I was not satisfied that there were any consolidated quick-starts that experienced engineers could use as templates for building real production solutions.


Give me a shell and a place to stand…

I was thinking yesterday “hmm, I seem to have more AWS KMS keys than I’m actually using”. But how to find them when they are scattered across regions?

AWS Cli to the rescue. A trivial bash script, and voila a list of KMS Key ARNs across my entire account:

for REGION in $(aws --profile $PROF ec2 describe-regions | jq -r '.Regions[].RegionName')
    aws --profile $PROF --region $REGION kms list-keys | jq '.Keys[].KeyArn'

The only complicated part was remembering how to use jq to parse JSON. The syntax for that never seems to stick in my head.

If you’re not in the habit of writing tiny shell scripts for automation — get the habit. The closer you get to the machine, the more power you have. Also — the lingua franca of Linux is ASCII text (ok, I guess technically UTF-8 now?) and getting comfortable piping results from one tool into another with appropriate text manipulation is a super power.

Zookeeper on Raspberry Pi

Ok, yeah, I probably don’t really need a Zookeeper cluster running in the house, but there are a few Raspberry Pi’s around that are mostly idle — we use them for streaming music from our various devices to the speakers in some of the rooms.


Thibault – with added diagrams

We heard that Thibault liked diagrams, so here are some diagrams about Thibault’s diagrams…

Some years ago my partner-in-crime (and life) and I presented an introductory workshop on Thibault, and as part of preparing for that, they drew up some flowcharts for the relationships between the “circles” in Thibault’s plates, from Chapter 5 through Chapter 8.


Wasabi and AWS S3 – A comparison

Wasabi is a very interesting and compelling competitor for AWS S3, but also potentially a superb collaborator.

What are Wasabi and S3 though? Stripping these services down to their barest bones, they are cloud-based, highly available and resilient object stores with effectively unlimited storage capacity. Digging a bit further, they are both key/value stores, which means that every binary object is uniquely identified by a key, in much the same way that a file on your laptop is uniquely identified by a folder path and file name.


Lies, Damned Lies and Graphs

It appears from discussion at Wikipedia that the catchphrase “lies, damned lies and statistics” is in fact unattributed. That’s a shame, because it’s a pretty important idea – statistics are very slippery, and in this time of COVID-19 I’m seeing how easily they can be misunderstood, and misused.


AWS EC2 Instance Connect – A very neat trick

One of the problems with cloud security compared to on-premise is that there is more risk that someone unauthorised will be able to gain access to your EC2 linux instances via SSH. That’s one of the reasons I’m keen on server less solutions, various X-As-A-Service services, and on not opening up a server for access by SSH at all. It’s easier to keep bad guys off a server if you don’t let anyone onto the server.



A reasonably common scenario for a data-focussed consultancy is that a client may want to ship sensitive data from their on-premise or cloud environment to your AWS environment. There are a number of reasons that they may want to copy the data into your environment: it may be difficult for you to work with it in-situ, the tools you need may not be inside their environment, their may be no ingress to their data stores from outside, or they may want to provide an extract of data rather than the raw sources. These are all valid scenarios under which the simplest scenario is to be able to dump the sensitive data into an S3 bucket under your control.


More Swarm Adventures

I recently went back to refresh my understanding of the state of Docker networking (there’s been some changes over the last few years I wanted to be sure of), and so have been working through the excellent tutorial materials they have built, and spinning off some tutorial materials of my own demonstrating automation of the setups.

For your interest, here’s a Terraform project on AWS that sets up a Docker Swarm to play with – of course in reality we’d use ECS and EKS, but this is a fun exercise in infrastructure-as-code:

Adventures with Docker Swarm

It’s been around 3 years since I last worked with Docker in any seriousness. At that time, the state of networking and deployment was quite rudimentary, and there was still reliance on deploying load balancers and similar infrastructure. I was very impressed then, when revisiting the “getting started” tutorials, at how straight-forward and powerful Docker Swarm now is.

I’ve built a small implementation of those tutorials to illustrate the ease with which a full stack can be deployed.