It’s $current_year; hosting WordPress is a "solved" problem. And even though I'm not using WordPress (Ghost is my choice of blogging software), paying someone to do it (in this case, the base ghost.org plan for $9/mo) is usually the easiest way to do, well, anything.
However, I wanted to self host my blog for several reasons:
- Control over my data: I often see people relying on a company (e.g. Dropbox, Microsoft, Tumblr) for data and then lose it all because the company changed their mind/policy or discontinued a product. When I own the content by hosting it myself, I get to control if/when it is deleted.
- Reliability and observability: how much do you really trust your average SaaS company in terms of how reliable they are? When these companies go down, there's nothing you can do about it. With your own self-hosted infrastructure, while your average reliability might be lower, you are more in power to do something about it and see it coming!
In the end, after much deliberation, I decided to host my blog (+ my sister's portfolio) on a really basic VPS (basically treating the linux host as a “base” building block) + docker-compose stack + deployed from my laptop using
Sounds simple, but you’ll see how the complexity of the stack builds up to relieve the complexity of debugging and maintenance. It’s pretty cool.
First off, the actual blog. Hosting any containerized application becomes trivial once you manage to figure out its implicit dependencies and strip off all of the state onto the “outside world”.
Ghost CMS is a fairly standard node application, with the exception that you can apparently only run one instance at a time? Not sure why that would be the case if we just keep the actual application stateless (through the use of external data stores and console logging), but either way, we can make do with just one instance to serve any number of hits.
To do so, we can make use of CDNs or any “cache layer” (e.g. Squid (game), Varnish, etc) on top of Ghost - it is, after all, a bog-standard CMS, and it can be configured to send over
Cache-Control headers, and you won’t be really making many non-GET requests to it.
You manage Ghost via its
/ghost endpoint, which hosts its admin panel, and it is “secured” by a simple username + password. When using the endpoint, since you need to be authenticated, you won’t need to worry about accidentally caching the admin panel due to the session cookies you’ll be sending over.
As for the storage, Ghost stores the actual posts you write within the database, but needs filesystem access for any themes/routes/photos you upload (it runs some compression on pictures by default). While you can use a storage adapter that does not use the filesystem (e.g. S3 or Cloudinary), it’s fine to store these in the filesystem as long as you know where the files are stored and map them to the “outside world” (i.e. the host filesystem).
And finally, the database. Ghost uses either SQLite3 or MySQL/MariaDB to save its non-file data, but it does a connection check right when it spins up, so if your database isn’t ready by the time Ghost spins up, it will throw a tantrum and die. To tame this behaviour, you’ll need to rewrite both the entrypoint and the command (it’s just… the way Ghost has things set up) to wait on MySQL (via a shell script) before it starts up. Pain in the ass, but you’ll forget it in no time.
Then, the configuration for hosting Ghost becomes simple like this: https://github.com/JaneJeon/self-hosted/blob/master/config/blog/docker-compose.ghost.yml. Again, fairly straightforward once you “pull” the volume, database, and caching stuff out of the container itself.
Of course, we need to actually be able to reach the Ghost instance from the Wild, Wild, Web. I have an instance hosted on “some server” somewhere, and I know it has a static IP. So my DNS provider (in this case, Cloudflare) can point to the server’s IP for the domain
janejeon.dev. And then what?
The server (or, more accurately, the Docker socket) listens on ports 80 (HTTP) and 443 (HTTPS) to accept requests from the outside, and the reverse proxy (Traefik) can handle SSL termination and hand over the regular HTTP request to the appropriate container (in this case, the Ghost container). But, how does the reverse proxy know who to forward the request to?
That is where the reverse proxy’s container awareness gimmick comes in. In short, this means that Traefik asks the Docker daemon for any container with the label that matches the request address (
janejeon.dev/blahblahblah). And what do you know, Ghost has a label that says “hey give me all the requests for the host
While Traefik itself has… tons of complexities, to say the least, with it being a stateless container, the setup for the container and the routing are actually pretty simple: https://github.com/JaneJeon/self-hosted/blob/master/config/reverse-proxy/docker-compose.traefik.yml plus a label on the Ghost service.
Great, everything works fine! …as long as you use regular HTTP requests, where, for all intents and purposes, anyone can send anyone else anything. But nowadays, HTTPS is strictly required by browsers to secure and protect the contents of any HTTPS request (to say I am oversimplifying would be an understatement of the century).
So… how do I “prove” that when my Ghost service (or, more accurately, the Traefik service that is actually facing the outside world) responds to a request to my blog, that the contents are indeed what my Ghost service sent, and not, for instance, some NSA propaganda that they injected into the request by pretending to be me?
This is where I’m just gonna wave my hands and say “SSL certificates” and “encryption magic”. But, the important part for us here is that to get the benfits of said encryption magic, we need to ask someone trusted to grant us said certificates.
While you can pay Certificate Authorities (CA) for the privilege, nowadays you can get certs for free using Let’s Encrypt, a free (as in beer) CA. But beware - no matter which CA you go with, you need to 1. prove that you own the domain, and 2. renew the certificates every once in a while (this is due to the fact that the certificates that provide us encryption rely on certain math problems being “hard”, such as factoring a really big number, and that over time, the chance of someone brute forcing it becomes higher).
And if you’re going with Let’s Encrypt, an additional note that they have some very stringent rate limiting requests, so it’s best to test your setup on its staging CA, and to ensure that when you deploy subsequent iterations of your infrastructure and application services configuration, that you don’t accidentally wipe out the certificate file (speaking from experience, unfortunately).
Okay, cool, but why am I even talking about this when Traefik “automagically” resolves the host’s certificates by dynamically requesting certs from the CA (i.e. if I have a route that specifies
janejeon.dev and tell Traefik that yes, I do want TLS encryption, Traefik will go ask the configured CA for a ceritifcate of that domain)?
- You have no control over the actual certificate generating process. Sure, you can configure it to behave in some way, but there’s no way to step in and say “nuh uh, you dont fucked up” and un-fuck it if, say, Traefik hits the rate limit of your CA. You can’t re-try it, renew the cert at a later time, specify the lifecycle of said certs, etc. You’re just… stuck with whatever Traefik spits out, and if it happens to be wrong, you’re fucked until you delete Traefik’s certs and have to start it all over again (and since it isn’t configurable, it will just request all of the certs at once and worsen the rate limiting situation).
- Traefik stores all certs in a single
acme.jsonfile. I… I’m sure I don’t even need to explain why this is such a fucking clusterfuck, but in short: this is a single point of failure for your ENTIRE site (or sites - plural - if you use Traefik to route more than one domains like I do). If it becomes corrupted, you’re fucked. If more than one Traefik instance tries to modify this - even in a distributed filesystem - you’re fucked. And you can’t really easily manage the individual certs in the
In Kubernetes world, this is where something like
cert-manager would come in. However, we are down at the Docker(-Compose) purgatory, and so we must make do with
acme.sh - applications that basically do the cert issuance/renewal and ONLY that (I like that classic, UNIX-style composability). These services generate the certs onto a folder, and we map that onto Traefik.
I ended up going with a
certbot-based container called
dnsrobocert, but in reality, whatever you go with, you should be fine as long as you make sure to sync the cert folder(s).
Now, to prove that my Ghost container is indeed, the server for
janejeon.dev - there are multiple ways to go about this (read more here: https://letsencrypt.org/docs/challenge-types), but for the sake of simplicity, I chose to go with the DNS challenge. Not only is it easy and wholly automated (just pass it the API tokens for managing your sites’ DNS zones), but it also handles wildcard domains (e.g.
The service has only one dependency - the certs folder - so the setup is pretty easy: https://github.com/JaneJeon/self-hosted/blob/master/config/certs/docker-compose.dnsrobocert.yml.
A warning for those using
certbotwill make the certs folder only accessible by root, which is completely asinine. Make sure you unfuck that by
chmod -R 755’ing that folder, allowing downstream services to pick up and use the cert files.
Now that we’ve managed to actually spin up the Ghost container and all of the “connectivity”/“upstream” parts, let’s address the dependencies of Ghost, starting with the database.
As I’ve said before, Ghost can use either SQLite3 or MySQL, but for sake of doing things “properly” (not to mention the fact that the other services I would eventually end up hosting within the same docker compose stack also used MySQL), I’ve chosen to use MySQL to back Ghost’s data layer.
So… what is there to do other than just standing up the container? Quite a bit, actually. While we have already dealt with a “stateful” container and stripped out all of its stateful parts (see: Ghost container), databases are a special breed of stateful services due to the fact that they don’t “use” the filesystem directly; rather, they rely on a whole host of filesystem-based systems (journals, WALs, etc).
What this means is that in order “snapshot” the database, you need to either do streaming replication from the WAL (see: wal-g), or do an export of everything in the database, which is costly and can cause the database to seemingly “freeze” up (due to the table locks) from the point of view of the users.
So just mount the database folder to the host, and run the “snapshot” every once in a while. In https://github.com/JaneJeon/self-hosted/blob/master/config/database/docker-compose.mysql.yml, you can see the “dump” folder specifically mounted, but not the actual MySQL data folder. It’s fine if it gets lost as long as we have the snapshot dump.
As for why I’m talking about database snapshots…
Backups! If you don’t do backups of all of your stateful parts, you’ll come to regret it real quick. My backup setup has already saved my bacon (I accidentally ended up overwriting my entire stateful volume - files, certs, database data - basically I map every stateful container onto the
volumes/ folder so I can back it up all at once).
Here’s how it works.
For normal volumes, it’s easy - you just… back up the filesystem. For databases, it’s as I mentioned above - you run a snapshot before a backup to “freeze” the state of the database onto a filesystem. Usually this comes in form of a “dump” SQL script that you can run to completely restore the database.
For the actual backup, I use
restic (or more specifically, something built on top of
resticker, which provides hooks and Docker integration). The backups are diff’d and encrypted, and I try to automate as much of it as I can. The steps for using
restic go something like this:
- Initialize a repository. For me, this means basically creating a Backblaze B2 bucket specifically designated for
resticand creating a URL pointing to it.
restickerruns hooks, which does things like dumping databases onto their respective “dump folders” (and not data directories).
restickertakes all of the contents of the
volumes/directory (remember, literally everything stateful is mounted there, including the dumps) and backs it up
- It runs post-completion hooks to notify me whenever it succeeds/fails.
I run these backups nightly so that the “freeze” from database locks aren’t visible to me. In addition to these steps, you may also notice on the compose file that there are additional instances of the container.
Those serve to:
- prune older backups (so that my backup bucket doesn’t blow up in size over time)
- check backups
restic stores backups as a series of diffs underneath the hood, it is all exposed as “snapshots”, so restoring data is easy:
- Restore the snapshot onto your local
volumes/folder to rehydrate it with data.
- Run restore scripts to ingest the dumped database snapshots.
It’s that easy!
Monitoring (App & Site)
Great, but a host running some… whatever is essentially a black box, and I wanted some transparency to get an idea of how much resources it’s consuming, what’s actually running on it, how’s the status of the database, collect a whole bunch of metrics and logs across all of my containers, etc.
I know the “proper” way to do this as a self-hosting enthusiast is to use Prometheus + Grafana + ELK/Loki and setup your own dashboards, ingest pipelines, yadi yadi yah.
Ain’t nobody got time for that.
So I’m using DataDog (I’ve tried New Relic for its generous free tier, but its setup + documentation was just… ugh). In particular, I’m using their Docker agent to pull all Docker logs, host metrics and processes, and metrics for individual containers (so that I can get database metrics, for instance), and use their autogenerated dashboards for metrics + their nice UIs for log & process search.
I know I’ve said above that reliability & observability is one of the reasons why I’m self hosting, and I feel that my reliance on an external SaaS vendor in this case doesn’t really contradict that.
For 99% of the time, you’re only going to use these dashboards to go “ooh pretty” (let’s not kid ourselves). But for that 1% of the time in which my setup goes down, I can trust a company whose sole purpose is to literally just monitor other people’s stuff to stay alive and use it to debug my setup.
And because of just how much data I’m collecting (and how many prebuilt dashboards/panes there are for searching through metrics and logs), debugging is so easy. It’s literally like a glass pane into my server (Everything is slowing down to a crawl? Watch system metrics. A container is acting up? Filter logs by that container and by ERROR/WARN level, etc).
Given that it’s a stateless service, the setup’s pretty simple, too: https://github.com/JaneJeon/self-hosted/blob/master/config/monitoring/docker-compose.datadog.yml. Additionally, because of its Docker host integration, you can just tag containers for integrations, not just to “mark” it as “hey I want integration for $this”, but also to provide configuration.
Great, so we’re done, right?
Not right. While the setup within our docker-compose stack is definitely robust as-is, we must not forget that the Docker daemon is still running on a host.
Unfortunately, we are not able to abstract it away (we would’ve been able to for the most part, had we used something like Elastic Container Service), and we must try to automate away the host management part of it as much as possible.
In particular, I’m going to be focusing on managing the Docker daemon, because the VPS that we run on alive and healthy is really hands-free for the most part (other than some UFW and auto-updates, which are one-time setups), while the Docker daemon does need ongoing maintenance.
So let’s try to cut down the number of hours I spend on Docker host maintenance.
The first big one is Portainer. I’m not using it for its “stack” management features; I’m literally only using it as a web interface for the Docker host. For the times I do need to touch the containers manually, it lets me to quickly drop a shell onto a container, restart it, see which ones are healthy, etc.
It really is a goddamn crutch and I really need to change my setup to rely less on it, but for the time being, it is my #1 used app. Setup is pretty simple, though there is a bug that the core team is choosing to ignore in which the timezone is permanently stuck to UTC, which means timestamps are, for the most part, useless.
(As for why I’m still subjecting myself to this crutch, there are two reasons: 1. At this point, I have muscle memory of using Portainer whenever I need to do/check literally anything container-related, and 2. there is currently a bug in my DataDog agent setup, where DataDog wouldn’t recognize metrics for a container - all the more reason to move to a “proper” self-hosted setup using FOSS stack.)
As you add/remove services and update/change images, you will eventually have a fair share of dangling containers and images.
Yelp apparently had the same problem and wrote a nice little container for it: https://github.com/JaneJeon/self-hosted/blob/master/config/docker-gc/docker-compose.gc.yml. It basically acts as a GC for the Docker host. Set it up once, forget it forever. Neat!
Docker Image Upgrades
This… is probably the hardest part, by far.
You know how you always see news about some company being hacked because they were using an outdated version of $ServiceX? To avoid that, we all know the best practice is to keep your services updated, and this is one area where SaaS just… solves it.
However, how will you know whether there are updates to a service? Or when a new version comes out? Or which version to upgrade to? How do you know what’s included in an update? What changed? Is it safe to upgrade? Etc, etc.
This is an age-old problem that package managers (and by that I mean the Linux distro kind - think
yum, etc.) solved - just countless number of manhours spent on vetting each package at each version and including it into their “stream” that you can just trust.
This trust is what allows you to just
sudo apt update and forget about it, and it is this lack of trust that makes this problem so fucking hard to solve in Dockerland.
Sure, you can use something like Watchtower to automate the process of updating an image when a new one comes out, and restarting the containers using that image to reflect that. Hell, it can even automatically clean up the old image when it does update: https://github.com/JaneJeon/self-hosted/blob/master/config/docker-image-upgrade/docker-compose.watchtower.yml!
But how will you know that it won’t completely break your setup? Actually, it most likely will, especially over longer periods of time.
You might say, “Jane, that’s a strawman argument; we already have semver!”
True, some Docker images release on a semver schedule, and tag their images appropriately as. such. However, it falls the fuck apart on two accounts:
- You have to trust that the people making these images actually know what the fuck semver actually IS. I’ve seen WAY too many maintainers sneaking in some breaking change casually within a minor/patch release, which is fucking absurd (cough Portainer cough)!
- There’s really no way to “automatically” upgrade from a
vX.Y+1.0. Remember, Watchtower is not a package manager, and there is no such concept as package versioning in Dockerland (in its infinite wisdom), and there sure as hell isn’t a standardized way of doing so! So you don’t have an automated way to tell which is the “this version + 1 patch/minor” update, meaning that unless maintainers publish a
vXtag that they keep updated, Watchtower cannot update images without potentially including a major version bump!
So for the containers that do have a
vX.Y tag, awesome! Just tell Watchtower to automatically upgrade when a new image for that tag comes out. But for literally everything else? It’s a clusterfuck out there, and the only way to reliably do it without breaking your stack is to get Watchtower to let you know whether there was an update for a container, and check it yourself that it is indeed a non-breaking release, and then pull the latest image onto your host and then restart that specific container.
Is there a better way? A way to automate the "check if there's a
vX.Y.Z+1", a way to figure out (or rather, give context so I can make better guesses on) whether this update will break my stack or not, a way for me to simply update images?
As I've shown, while there isn't any way to do so from within your stack, if you loosen the definition of a "self-hosted stack" to include the repo that it's hosted in, actually, you can do something like this!
Enter: Renovate (not Dependabot because 1. it's still very immature, and 2. it doesn't auto-update containers defined in
Renovate is a tool that automates all of your dependencies in your repo. Normally, you'd use this for things like, making sure your
package.json packages are up to date; however, I am (ab)using this to auto-update my docker "dependencies" instead.
This immediately ticks all of the boxes:
- Renovatebot "understands" semver, even in the non-standardized Docker world (seriously, it's magic).
- Thus, Renovatebot is able to bump versions to the next major/minor/patch versions.
- Then, when it "bumps" an image, it will create a PR with the current and the (proposed) updated version for any given image, and I can simply go look at Docker Hub to see what's IN an update before making the decision to merge it!
- Once it's merged, I
git pull, and run the deploy script from my local machine.
Kinda amazing how it actually just... works, tbh.
Anyway, this is how I host my stuff, and this will serve as the basis on which I will (and have) host a whole bunch of other stuff, including stuff for the great Apple migration!