The Occasional Masthead

Wasabi is a very interesting and compelling competitor for AWS S3, but also potentially a superb collaborator.

What are Wasabi and S3 though? Stripping these services down to their barest bones, they are cloud-based, highly available and resilient object stores with effectively unlimited storage capacity. Digging a bit further, they are both key/value stores, which means that every binary object is uniquely identified by a key, in much the same way that a file on your laptop is uniquely identified by a folder path and file name.

There are two big conceptual benefits that immediately arise from treating object storage as key/value pairs inside a container or grouping. First up, developers have long been familiar with using key/value stores in their code – all modern languages used in the last few decades have some version of an in-memory key/value store, and support the same basic semantics: you can put an object in the store with a key, get an object from the store by supplying the key, remove an object using it’s key, or test if the object is there.

The other big benefit is that the semantics of using a key/object store in the cloud are that the basic operations against objects map directly onto the basic operations of HTTP – you can GET an object using it’s unique identifier, POST a new object with a unique identifier or use PUT to provide a new version, and DELETE an object. This opens scope for the API to be directly expressed via HTTP, making it very easy to operate against a cloud-based object store with pretty well any coding language and from any sort of computing device – your IoT light bulb might be POSTing status reports that are being retrieved by GET from your phone, all over HTTP.

There is an additional semantic layer in place with S3 and Wasabi (and for at least GCP as well): the object keys can obtain slashes (/) which makes them strongly resemble disk path names. For example, the full unique name of an object in S3 could be

s3://rahtest20200302/photos/dog/2019/January/35.jpg

which is analogous to

e:\photos\dog\2019\January\35.jpg

In other words, the “bucket” identifier “s3://rahtest20200302” resembles the identifier for a disk or volume, and the key inside the bucket “photos/dog/2019/January/35.jpg” looks very much like a file path “photos/dog/2019/January/” and the file name “35.jpg”.

This is a somewhat imaginary layer though. While these keys look like file path names, and while the Web interface for both Wasabi and AWS lets you click through the “folders” photosâ€¦dogâ€¦2019â€¦Januaryâ€¦, the truth is that the key is an arbitrary string (almost).

Of course, the key is not entirely arbitrary – the “bucket” name identifies a discrete collection of objects in your cloud account that may have different access controls or form logically separated groups of data.

So stepping back from this dive, how does Wasabi compare to AWS S3, and are they really in competition with each other?

AWS S3 was released in 2006 as one of the foundational components of the AWS platform (the other two being EC2 server instances and the SQS message queue service). It led the pack in cloud-based object storage, and because of its significant and early popularity has set the standard for how such a service should look and behave. From it’s early incarnations to now, the service has been enhanced to provide interesting handling of short-, medium- and long-term storage options, bulletproof encryption at rest, and a sophisticated and flexible access control mechanism to the object level.

The AWS S3 service is definitely the 300 kg gorilla in this space, and is often the default choice for in-cloud storage, although Azure and GCP are rising in popularity. Its popularity means that many cloud providers (and indeed some on-premise storage services like Openstack Swift) have adopted some or all of the AWS S3 API.

Wasabi is a compelling player in this space. It was founded in 2015 by David Friend and Jeff Flowers, who were the creators of the very well respected Carbonite cloud backup service, and launched for public use in May 2017. The two took what they had learned from Carbonite, and created a service focussed on performance and reliability, and purely on object storage. It appears to be a foundational belief in the company that cloud-based object storage is now a commodity market, in the same way that electricity or internet services are.

If you are not familiar with Carbonite, or with Friend and Flowers, it’s worth digging into them. One reason that Wasabi caught my eye is that these two bring very serious technical and business clout to the table, and if they say that a pure-play single-use cloud service is now a viable market offering, I am very much inclined to listen to them.

As a service, Wasabi has three key features:

11 nines (99.999999999%) durability guarantees, in line with AWS S3;
absolute compatibility with the AWS S3 API and the AWS IAM model;
potentially significantly lower costs than the other big providers.

The durability guarantees are interesting mainly because they give the same levels of comfort that AWS S3 gives. To quote Wasabi’s FAQ:

if you gave AWS or Wasabi 1 million files to store, statistically they would lose one file every 659,000 years. You are about 411 times more likely to get hit by a meteor.

They offer an additional feature over the top of this that is rather interesting though: all files are checked for corruption every 90 days, relieving anxiety that a file may not be recoverable in the future.

The absolute compatibility with AWS S3 is good, but the compatibility with the AWS IAM model is even better. To begin with, all your AWS knowledge and code about working with S3 objects and IAM definitions carry across, and indeed your code can be directly copied across. Users, Groups, Roles and Policies all use exactly the same semantics and code definitions as AWS. Going further, all of your DevOps and infrastructure-as-code tooling (such as Terraform) can be used without change – even the AWS CLI can be used directly against Wasabi just by providing Wasabi’s access/secret key pair and possibly the region end point:

$ aws --profile wasabi s3api list-buckets --endpoint-url=https://s3.eu-central-1.wasabisys.com
{
    "Owner": {
        "DisplayName": "rahook",
        "ID": "091E66F1C829410928E7641125C6C29167BE5F5BDF75A946D4543DC6A34536A6"
    },
    "Buckets": [
        {
            "CreationDate": "2020-03-02T15:56:22.000Z",
            "Name": "rahtest20200302"
        }
    ]
}

$ aws --profile wasabi s3 cp . s3://rahtest20200302 --recursive --include "*.jpg" --endpoint-url=https://s3.eu-central-1.wasabisys.com

I will talk about some of the technical characteristics of Wasabi below, but would like to detour to discuss the potential cost benefits.

Wasabi make some pretty bold claims about relative pricing on their website, and I encourage you to have a look for yourself. To begin with they assert that what they term “1st generation cloud storage” is roughly cost equivalent to on-premises storage, which I am not convinced by and intend to delve into further at a later date. They further argue that GCP and AWS are roughly the same pricing, and Azure about half the cost of GCP or AWS. The headline figure though is their cost – depending on usage patterns about 10-20% of the cost of AWS.

This is a bold claim, given that the cost of storage on AWS continues to drive toward nothing, but they make good arguments to back it up, and it’s very likely that for some use cases the cost will be lower on Wasabi than on AWS.

It is rather hard to do a direct cost comparison though, as the pricing structure for the two providers is quite different. AWS provide a sophisticated variety of storage options for different use cases and access patterns, and additionally have some costs for transferring objects in and out of storage (although you need to hit very high volumes of data transfer for these costs to become significant). Wasabi’s pricing model is much simpler, being based purely on storage volume, with no ingress/egress costs. Their current pricing is $US5.99 per Terabyte per month, with a minimum cost of $US5.99 per month. Pricing then increments in a very predictable way – $5.99/month for 0-1 Tb, $11.98/month for 1-2 Tb, $17.97/month for 2-3 Tb and so on.

There remains some subtlety to Wasabi’s pricing though which starts to hint towards optimal usage patterns. If you store 2.5 Tb for a day then delete it, you will be charged the full $17.97 that monthâ€¦ or rather for the next three months. Every object has a retention period of 90 days. In other words, when an object is deleted it is moved into a deleted object store, and is recoverable within the 90 days, but you are charged the same for the deleted object store as for your active store.

The 80% cost saving they assert is not incorrectâ€¦ but it’s not a straight forward story and I advise you to explore both the Wasabi pricing tool and AWS Pricing to make comparisons.

There is an additional aspect of Wasabi’s pricing that needs to be taken into consideration – it is predicated on the assumption that in a given month you will read no more than 100% of the data in a bucket. In other words, if you store 1Tb worth of objects, they assume that you will not (for example) retrieve the entire 1Tb three times, resulting in 3Tb of object data transfers. The company invites you to discuss pricing options if that is your use case, which sounds rather ominous, but is also reasonable – for many use cases you should anticipate that you will be reading only a subset of the stored data.

So far I’ve spent only a short time experimenting with Wasabi, and I’m intending to write a follow up illustrating potential use cases for a hybrid solution using both AWS and Wasabi.

My experience so far has been very good. Signing up for the service, and configuring IAM and bucket resources was slick and smooth. It took only minutes to set MFA on the “root” account, and run up users and groups with restricted access policies, then create my first bucket. As with the current version of AWS, by default buckets are locked down hard to prevent unauthenticated or unauthorised access, but unlike AWS S3, encryption at rest is on by default. Additional security is provided by guaranteeing that all data in flight is carried across TLS, including within the service.

I created an AWS bucket in Paris, and a Wasabi bucket in Amsterdam, then from London pushed the same ~300Mb worth of photos into each bucket from the command line. This took about 26 seconds for each provider, which suggests the limiting factor was my router, not the service speed. It could be genuinely difficult to stress test either service, as the limiting factor is very likely to be in data transfer across the public internet, not the raw speed of the services themselves. I remain struck by a comment from a senior AWS engineer some years ago who admitted on stage that they don’t have a clear idea of what bandwidth S3 in a given region can sustain, because nobody has ever generated enough traffic to be throttled by the service.

Buckets in Wasabi can be configured for versioning (as can AWS S3 buckets), which means that objects can be considered to be immutable – they can be overwritten, but not updated, and if versioning is enabled then the previous version is retained indefinitely. The smallest file that can be written to Wasabi is 4k, which is at odds with AWS which allows storing even zero-byte files, however most meaningful use cases would see you storing larger files anyway.

As with AWS, they provide a direct-connect service to allow dedicated VPN-style connections to on-premise systems, and to AWS. This is rather interesting, as it suggests that Wasabi buckets can be a drop-in replacement for AWS buckets with minimal effect on your architecture.

One big difference between the two services though is geographic distribution. AWS have many data centres around the world, and have invested heavily in providing local traffic gateways in even more places. Wasabi currently have only 4 data centres – three in the USA, and one in Amsterdam. This is potentially a problem for some customers – EU customers may be prevented from using US data centres by regulation, and may be uncomfortable with their being only one EU data centre. Having said that, we should expect that if Wasabi’s success continues, addition of more data centres and access points will be on their roadmap.

Configuring access logging in Wasabi is somewhat more straightforward than in AWS – probably because it was baked in rather than being added later – but more restricted in some ways. In particular it cannot be natively consolidated with existing CloudWatch/CloudTrail auditing and monitoring. I would tentatively suggest that using Batch or periodic Lambda to read logs from Wasabi and inject them into CloudWatch would be a reasonable integration choice.

This of course highlights the other possible drawback to Wasabi compared to AWS S3 – being a pure-play storage service, and being outside AWS, it does not have the wealth of integration with other AWS services that S3 has. This is not a negative for Wasabi, but it does suggest that it is better for some use cases than others.

I really think that the two products are not in direct competition, but between them can be used for very interesting hybrid solutions that open up a lot of scope for price optimization and serious reliability. Wasabi has got strong cost arguments for reliable write-often, read-seldom archive storage. AWS S3 has strong feature arguments for batch-oriented data processing pipelines, and interesting boil-the-ocean data analysis. This makes for potential hybrid success.

For myself, I am considering updating my personal backup strategy to duplicate critical information across both services. I’m also considering building a hybrid solution for archiving photos and maintaining a web-based gallery – stay tuned for that!

Wasabi and AWS S3 – A comparison

Post a Comment

Meta

Links

Pages

Categories

RSS Links

Archives