A Tale of Outage and Revival

By

So some of you may have noticed that Boiling Steam went dark for an extended period of time around the end of August 2023. For about a week and a half, give or take. We had already experienced a server issue in the first half of August, making the site unavailable for a few days. This time the problem was more serious and we faced a larger dilemna to come back online.

So as the site owner, I was faced with a big issue, but also a great opportunity, and I was reminded of this great saying:

"It's only after we've lost everything that we're free to do anything." (Fight Club)

To be fair, we did not lose everything. But it was still a great time to set out on an ambitious plan to completely move away from good old Wordpress, the most popular CMS on the planet that powered our previous version of Boiling Steam.

How would we do it? By using a static website generator.

A Wealth of Options

Which one? Well, after all, there’s a bunch of them. Pelican, Jekyll, Hugo, Zola and probably 2 dozen more. I had already tried to use them before, just to get my hands dirty and see what they could do. They are all great on paper with a list of features long like the beard of Karl Marx. But there’s a catch. They require:

  1. that you fit in their framework, i.e. their mindset when it comes to organizing your site.
  2. that you spend the time to learn how to bend them to your will.

I have never been too convinced with 1). Our needs for Boiling Steam are very specific - some are simple, some are more complex, and a framework that is made to support many types of sites is just not going to be the optimal approach. Sure, I can probably manage to eat spaghetti with a paintbrush, but a regular fork will make things easier.

As for 2) - customization… I had already experienced that the path to know exactly how to do things the way static sites generators want is not as painless as you’d think. Customizing your final product was going to be take a great chunk of time.

The alternative was always staring me right in the face: “Develop your own static site generator!”

After all, the reason why so many alternatives exist to generate static sites, is that it’s not that hard to build one. If you have a free weekend, you can get 70% there for a simple type of website. Take 2 weeks and you’ll be very close to cover a large number of features. In my case, it was fairly easy to realize that for the time it would take to learn to use someone else’s framework could be used to build something that I know exactly how to tune and customize, how to import data into it, and how to make it evolve at a faster pace since I’m fully in control. Of course, the drawback of building your own thing is that you have to maintain it. But it’s never been a real issue with my other projects.

Why Even Entertain the Idea?

A static site has tremendous advantages:

  • No need for a database.
  • No need for a complex back end relying on hundreds of dependencies.
  • Much better performance on lower end server hardware - your pages will load almost instantly for end users.
  • It’s portable and therefore scalable. You can take it and move it to another server probably in less than an hour.

As expected. you do lose a couple of nice things that you had with Wordpress:

  • No comment system (although there are ways to deal with this)
  • No online GUI editor for your articles (though we hardly used it anyway)
  • Forms and various plugins that could be added to Wordpress easily and help for various things like SEO and more.

One of the main problems with Wordpress is that it’s fairly limited out of the box. By the time you have tried to cover most of your additional needs you realize your life now depends on 20 different plugins. Such plugins will have a cost on performance, stability and security. It’s happened twice before that a plugin was hijacked and used for malicious purposes over the past 10 years. Even if security is not a concern, with such a setup, your site ends up taking 4-5 seconds to load at best, which is not very impressive for what remains a relatively simple site with text and images at the end of the day.

We could expect something better.

Managing Complexity

Of course, everything in life, including a Static Site Generator, looks easier in your mind before you start the actual work. As you begin coding, you encounter problems you did not expect, hard decisions you have to take, and corner cases to resolve. The main complexity that I had to deal with was to manage the existing corpus of text. Several years ago we decided to centralize all our work on a single git repository. This single decision has made this whole endeavour a lot more realistic to start with, but came with some inconvenience.

Now, I had to make sure that my engine could take the git repository content as is, without any modification if possible, and generate a whole new site from scratch. Most other static site generators expect you to put some form of metadata in YAML or shit like that, which is great if you start from day 1 from a clean slate, but not as fun if you start with 5 years and hundreds of articles.

My engine has to deal with:

  • Articles in markdown format, formatted in a bunch of different ways (some had titles, some did not).
  • Articles in .org format, because we have an Emacs user among us (cough podiki cough)
  • Images stored within the root folder of each articles on in subfolders, depending on the preferences of the author at that time.

The idea was for the engine to be able to handle all these kind of different situations without us having to modify anything (or minimize that kind of work) in the original git folder. This meant having the code do a bunch of assumptions based on the structure of every article, every folder, and change course depending on what it finds.

We also did not have the exact date of publication available for every article. But since everything was in git commits, an inspection of the git commit history for every file, and a set of rules based on the comments of the comments, the merge timings, and the type of changes made it possible to automate setting a proper publication date with reasonable accuracy.

The Language to Do It All

Several years ago I had started building a static site generator using Python. I never ended up finishing it (it worked but I had a long feature list, and no deadline to work with) so it gave me some insight as to how I should design the next one from scratch. And my first decision was not to use Python, but R instead. I find R, with its tidyverse variant, a lot more elegant to write, and speed is not an issue for this kind of application (we are not talking about making games in R). Even if R is a relatively slow language compared to Golang or Rust, it will still manage to process several hundreds of articles under a minute, and that’s even before I consider any kind of serious optimization.

One of the key advantages of R is the access to a wide range of libraries, such as magick which makes it easy to process pictures using Imagemagick bindings under the hood. This comes in handy to automate the compression, resizing of thousands of pictures (this is still something I am not done with) into more modern formats. Turns out that Wordpress does not support formats like AVIF or SVG out of the box, which is a great shame for a piece of software that is constantly updated and used by so many people around the world.

I was looking for a RSS feed generator - and while there are at least 4 different kind of libraries in R to process RSS feeds (such as TidyRSS), there was nothing recently available to generate a feed. Turns out that was the least of my worries. A RSS feed a simple XML document, and with a little help from the glue package I wrote a function to generate a compliant RSS feed in 20 minutes or so. I will probably distribute it as FOSS once I clean it up a bit since it might be useful for other folks out there who are interested in producing RSS feeds.

While a lot of static site generators use the JINJA templates to handle the building of templates, I went for a much, much simpler solution that relies on an existing set of HTML templates, with glue variables in them. Glue can handle loops for lists and so on, so at the end of the day I can get to something that can generate a bunch of links, buttons or other elements that need to be replicated using simple functions.

Which CSS Framework?

When considering CSS Frameworks, I could have gone for the huge bootstrap library. Or more bare-bones ones like Pure. I opted out to go back to the most simple drawing board possible by writing a simple CSS file that would handle everything we need without going overboard with grid systems and the like. We actually barely need a grid system considering the type of content we have, so nothing is lost. It’s also very easy to build something that is responsive across devices these days with a few exceptions based on the viewport size. While things are not perfect now on September 11th at the time of writing (dark mode is NOT yet recommended), I hope you will find the site to be usable on whatever device you use it with.

HTML Generation

The HTML generation does a bunch of things. It creates directories in standard ways, finds titles where they were missing, finds out when articles were published (based on git history), lists up the size and dimensions of pictures, sorts the types of files into different categories, and then processes markdown and org files for articles. It also moves pictures in the right folders, rewriting paths where necessary. The whole site is generated in a specific target folder - and that folder is initialized with its own git repository. Every new change is followed by a git add, a git commit and a git push, to track the files that were only impacted by the changes.

At the end of the day, the content of this git repository is replicated on the static site server and only the files that have changed are synchronized, to speed up the deployment. Right now this sync process is triggered manually, but I plan to automate that next using Gitea webhooks.

Deployment

This was now the time to move away from bare Linux install methods - now the server stack is deployed using a docker compose file, which makes it very easy to manage and replicate. The web server I use is Caddy which is designed with delight to be extremely simple to use - it generates the HTTPS certificates for your domain automatically on the fly using Let’s Encrypt. In other words, you won’t need much learning curve to deploy an actual server with Caddy - and a simple config file means simple maintenance.

It doesn’t hurt that Caddy performs very well, and I hope you will find that the site loads very fast currently no matter where you are. Certainly much, much faster than our previous Wordpress iteration.

The Future

The objective was to come back online as soon as possible, but there’s a lot that needs to be finished still:

  • More automation all around
  • Restoring the much older articles from the Wordpress backups
  • Fix as many broken links as possible (historical ones) - at least the ones generating the most traffic
  • Produce a good looking dark mode
  • Making the engine a bit smarter to support multiple images for different devices (1x, 2x, 3x and so on)
  • Compress most/all images to AVIF and build an image cache to avoid doing that for every re-generation since it’s cpu intensive.
  • Integrate more SEO related features
  • Make links look nicer on social media networks with thumbnails
  • Make sure the HTML document meet most standards
  • Accessibility metrics
  • Some javascript where necessary to add some useful client-side features, while the site will work without.
  • and a long list of other things…

Getting to 80% is a matter of weeks, getting to 90% a matter of months, and 99% usually a year or so? That’s how it feels at this stage. But it’s also very exciting.

The door is now opened for us to be able to do things that Wordpress could not.

Outages are bad news… until they aren’t!