Sep 04 2024

Rebuilding my Website - Part 3

Update - 4 Months On

Mostly cost breakdown:

  • Route53 Hosted Zone - $0.50
  • S3 Storage and Operations - $0.01
  • Everything else - within Free Tier

This is part three of a series documenting the rework of my personal website. Here I’ll be covering the infrastructure and deployment process along with some cost savings estimates, now that I no longer use Squarespace.

Storage and Serving

As seen in the previous post on rebuilding my site, everything is static, i.e. I do not generate any pages, content, html on server side nor on request. This means the whole website may be served Web 1.0 style. This greatly simplifies things for the time being.

I knew I wanted to use cloud object storage to host the content. Object storage makes the most amount of sense for storing website assets of various media types and using a service that provides an S3-compatible API guarantees an industry standard and well-tested pattern for content management. As I don’t want to deal with availability and managing physical hardware or virtual servers, I chose to look at cloud-managed solutions.

Requirements

  • As stated previously, an S3-compliant API is a requirement.
  • Serving using a CDN for faster load times

Default Objects

I need to be able to define a default behavior for root objects, i.e. when a request for the path foo/ is resolved the server must lookup the object with the key foo/index.html. Ideally, it should resolve the original request instead of redirecting as redirects are inherently slower and bad UX. Additionally, I need to handle paths that do not have data, i.e. serve a 404 page.

Redirection

Since I am migrating my old website to a new one, hyperlinks to my old site can break. Squarespace generates rather complicated paths to content using some kind of unique identifiers (e.g. 2024-03-writing-myfirstblog-9409828491.html). I would like to redirect these using an HTTP 301 response to the new location.

Cache Invalidations

Since my website is a work in progress, I will need a way to invalidate any cached content. This is especially true when using a CDN, and not all cloud platforms provide a good way (or any way) to perform cache invalidations.

Logs and Metrics

As of writing this, I am performing zero analytics and tracking on my website. Actually, I really want to avoid tracking my users even when it means I have a worse understanding of my audience and performance. That’s fine, as I believe internet users have the right to anonymity by default. But it would be nice to see some server-side stats and metrics. How many page views do I get in a day? What pages receive the most amount of traffic? I can do this all without even logging user IP addresses, which is viewed as personal data by some.

AWS S3 plus CloudFront

I have decided to store my site in S3 and serve the site using CloudFront, with the S3 bucket as the origin. This setup comes out the cheapest, as seen in the analysis below. Additionally, CloudFront meets all of my requirements, though some may require additional work.

Notably, for custom redirects and additional site metrics I will need to setup CloudFront Functions or Lambda@Edge. I have not done this at the time of writing.

Invalidations can be handled by making a CreateInvalidation request to CloudFront with a set of old paths. More on this later.

Other Options

CloudFront + Other S3-Compatible Service

Using CloudFront with a custom origin is possible but not simple. For example, in order to use it with a Backblaze B2 bucket, I would create a Lambda to turn CloudFront requests into Backblaze API calls. I would need to handle authentication and errors. It cannot be done as simply as using an S3 origin even when APIs are compatible and there is additional cost due to ingress and egress. For these reasons, I avoid it.

Digital Ocean

This was my first choice, actually. I love DigitalOcean and the simplicity of their product always drives me to use them over AWS whenever possible. Specifically, I was looking at Spaces Object Storage, but it wasn’t a great fit outside of very simple storage. If I use the CDN, there is no configuration. I cannot set a default root object, configure a 404 page, nor define custom redirects.

Backblaze

Backblaze offers B2 Cloud storage, probably the cheapest cloud object storage I have found. But they do not provide a CDN directly, instead offering integrations with a small handful of partners. As of writing, they have deals with Cloudflare, Fastly, and bunny.net but I have not done an in-depth analysis of any. I decided not to pursue this as I have no existing operational knowledge with these partners and I am less comfortable adding such complexity right now for only a few dollars a year.

Others

I have professional experience with Heroku, Google Cloud Platform, Microsoft Azure, and a few other cloud providers but decided to avoid them for this project. I do not currently have accounts for my personal work with any, and frankly I don’t like using them unless necessary. Azure and GCP are complicated and often (extremely) poorly documented. Google has some of the worst customer service imaginable. I’ll stick to the devil I know, Amazon.

Cost Analysis

This section is a deep-dive of my cost analysis for Digital Ocean, Backblaze, and AWS S3. I estimate the storage, egress, and API cost for these services. In some cases, there are minimums or a flat fee to enable a product. If you want to see more in-depth numbers, check out the spreadsheet.

Estimated yearly costs

  • Digital Ocean (storage only): $14.40
  • Digital Ocean + CloudFront: $62.64
  • Backblaze (storage only): $14.28*
  • Backblaze + CloudFront: $62.52
  • S3 + CloudFront: $4.08

Even though both B2 and Spaces have cheaper storage-at-rest they are more expensive than AWS when considering serving content. The CloudFront free tier provides all the egress I will ever need and free data transfer between S3 and CloudFront.

Site Stats

I started by looking at my current site’s stats:

  • 4 static pages, 3 image galleries, and 7 blog posts
  • 175 MB, 112 files
  • average 8 files, 12.5MB per page If I add a new page every week (blog post or image gallery) then the site will grow to about 1.4GB and 900 files in 2 years.

Access Pattern

I took the average of a few access scenarios.

For cost, worst case scenario is a lot of traffic. Let’s say 10 people view 80% of the content on my website every day. Using the size of the website calculated previously, this amounts to approximately 175k files accessed and close to 300GB each month.

Another scenario, which is more realistic, is to assume only 20% of the content on the site is accessed. Maybe only 1 to 5 views a day. This ends up around 20k files and 30GB each month.

Digital Ocean

Digital Ocean Spaces costs $5/month to enable, but since I already use it for other projects, I omit this from my analysis. Digital Ocean charges $0.02 / GB after the first 250 GB stored. They charge $0.01 / GB for egress after the first 1TB. There are no charges for API invocations.

Backblaze

Backblaze B2 Object Storage charges a flat $6 / TB for storage. Egress is free for the first 3x of average monthly storage, then $0.01 / GB. Backblaze also charges for API invocations, grouping operations into three classes. I only considered egress operations ( GetObject) , omitting website operations such as uploading new content and performing invalidations for this analysis.

AWS S3

AWS charges $0.023 / GB for the Standard storage class. Egress to the internet is free for the first 100GB and then charged at $0.09 / GB. Egress to CloudFront and some other internal services is free. AWS charges $0.0004 / 1k GET operations. For this analysis I am omitting website operations such as uploading new content and performing invalidations for this analysis.

AWS CloudFront

AWS provides a generous free tier for CloudFront, always available to all accounts. This gives 1 TB of data transfer out to the internet per month and 10,000,000 HTTP or HTTPS requests per month. For the sake of this analysis, I assume I never exceed these quotas.

Accessing any AWS origin, such as an S3 bucket, is free. Accessing a public origin is priced by region. I assume 80% of my traffic will be from NA plus EU and 20% from the rest of the world. I am uncertain whether the 1 TB of data transfer out includes transfers out to an external origin. I assume not in my analysis.

DNS

My original DNS was on GoDaddy for no good reason. I transferred it to AWS Route53. It currently costs me about $0.53 / month. Originally, I wanted to use Digital Ocean but it does not support ALIAS records, which was a requirement with CloudFront.

Deployment

Infrastructure

Everything is managed by Terraform and it’s great. I don’t have much more to say but I hope to publish a reusable “CloudFront -> S3 + single role” Terraform module soon. I’ve used this for image hosting with various side-projects.

Website

Deploying my website is a three step process. I will go into details in this section and document some of the caveats and room for future improvements.

  1. Build
    1. Perform a build using Nix and Parcel
    2. List all the files in the local build folder, generating a receipt
  2. Deploy S3
    1. List all the files in the S3 bucket, generating a receipt file
    2. Upload all assets from the local build folder to S3
    3. Perform a difference of the two receipt files to determine what S3 paths are now obsolete
    4. Delete obsolete paths
  3. Invalidate CloudFront Paths
    1. Find the intersection of the two receipt files to determine which paths are stale
    2. Filter stale paths to remove ones that contain file hashes. This works because Parcel includes content hashes in the names of files for this exact reason.
    3. Create a CloudFront invalidation of the filtered stale paths.

To perform this in a consistent and reliable way I have developed a bash script for use on both my local machine and CI runners. The bash script is slurped up using argc, an amazing lightweight CLI framework for bash written in Rust. The whole thing is part of my Nix/home-manager dotfiles repo.

Future Work

I am considering tweaking my deployment script to only upload files to S3 that are not in the bucket already. Though S3 ingress is free, LIST and PUT operations cost money. At $0.04 / 1k, this is effectively free. The main benefit is faster deployment times as less will be uploaded.

Another improvement for the future is the ability to perform blue / green deployments. CloudFront has a feature called continuous deployments that can route subsets of traffic to a “staging” distribution, but using this seems non-trivial and is something I don’t want to explore today.