From d72bfc5e2b55dfe645cdce3cd6adeb3b1efbeae4 Mon Sep 17 00:00:00 2001 From: Joe Ardent Date: Sun, 31 Mar 2024 16:28:20 -0700 Subject: [PATCH] finish post --- content/sundries/hitman/index.md | 182 ++++++++++++++++++++++++++++++- 1 file changed, 179 insertions(+), 3 deletions(-) diff --git a/content/sundries/hitman/index.md b/content/sundries/hitman/index.md index a9f7e63..e0ff640 100644 --- a/content/sundries/hitman/index.md +++ b/content/sundries/hitman/index.md @@ -2,15 +2,191 @@ title = "Hitman: another fine essential sundry service from Nebcorp Heavy Industries and Sundries" slug = "hitman" date = "2024-03-31" -draft = true [taxonomies] tags = ["software", "sundry", "proclamation", "90s", "hitman", "web"] +++ # Hitman counts your hits, man. +Recently, someone in a community I'm part of asked the following: +> I was thinking about how we used to have website hit counters in the 2000s and I was wondering -- +> has anyone put a hit counter on your personal website? -## The dream +Some people had, it turns out, but many had not. Among the had-nots was me, and I decided to do +something about it. The bottom line up front is that you can see it in action right now at the +bottom of this very page, and if you want, check out the code +[here](https://git.kittencollective.com/nebkor/hitman); it's called Hitman! -## The reality +## What's the problem? + +Back in the day[^web1.0], there was basically only one way to have a website: you have a Linux box, +running the Apache webserver, with PHP enabled, and a MySQL database to hold state; this is your +classic LAPM stack, obviously. If this is your website, adding a visible counter is trivial: you +just use PHP to do server side rendering of the count after a quick SQL query. And because this was +basically the only way to have a website, lots of "website operators" put hitcounters on their site, +because why not? + +But this is the year 2024, and we do things differently these days. This blog, for example, is built +with a "static site generator" called [Zola](https://www.getzola.org/), which means that there's no +server side rendering, or any other kind of dynamic behavior from the backend. It's served by a +small Linux VPS that's running the [Caddy](https://caddyserver.com/) webserver, and costs about five +bucks a month to run. If I wanted to have a hitcounter, I'd have to do something non-traditional. + +## What's the solution? + +For me, it turned out to be a sidecar microservice for counting and reporting the hits. As usual +these days, my first instinct is to reach for [Axum](https://docs.rs/axum/latest/axum/), a framework +for building servers in Rust, and to use [SQLite](https://sqlite.org/) for a database. Caddy proxies +all requests to the hit-counting URL to Hitman, which is listening only on localhost. + +### That sounds simple + +Ha ha, it does, doesn't it? And in the end, it actually kinda is. But there are a few nuances to +consider. + +### Privacy + +The less I know the better, as far as I'm concerned, and I didn't see any reason to know more than I +already did with this, but I'd need to track the IP of the client that was doing the request in order to +de-duplicate views. Someone linked to [this +post](https://herman.bearblog.dev/how-bear-does-analytics-with-css/) about how the author uses a +notional CSS load to register a hit, and also how they hash the IP with the date to keep the counts +down to one per IP per day. They're doing quite a bit more actual "analytics" than I'm interested +in, but I liked the other idea. They mention scrubbing the hashes from their DB every night in order to +pre-emptively satisfy an overzealous GDPR regulator[^logs], but I had a better idea, which was to hash the +IP+date with a random number that is not disclosed, and is regenerated every time the server +restarts. + +I wound up [hashing with the date + +hour](https://git.kittencollective.com/nebkor/hitman/src/commit/1617eae17448273114ca1b1d9277b3465986e9f1/src/main.rs#L79-L94), +along with the page, IP, and the secret. This buckets views to one per IP per page per hour, vs the +once per day from the bearblog. + +### Security? + +I spent some time on this, but ultimately realized that there's + + - not much I can do, but + - not much they can do, either. + +The server [rejects remote +origins](https://git.kittencollective.com/nebkor/hitman/src/commit/1617eae17448273114ca1b1d9277b3465986e9f1/src/main.rs#L45-L48), +but the `Origin` headers can be trivially forged. On the other hand, the worst someone could do is +add a bunch of junk to my DB, and I don't care about the data that much; this is all just for +funsies, anyway! + +## The front end + +I mentioned that this blog is made using Zola, a static site generator. Zola has a built-in +templating system, so the [following +bit](https://git.kittencollective.com/nebkor/blog/commit/87afa418b239419f551459e9cc5e838f9fac7ed6) +of HTML with inlined JavaScript is enough to register a hit and return the latest count: + +``` html + + + +``` + +## Putting it all together + +OK, all the pieces are laid out, but here's the actual setup on the backend: + +### Caddy + +The Caddy configuration has the following: + +``` +proclamations.nebcorp-hias.com { + handle /hit/* { + reverse_proxy localhost:5000 + } + handle { + + } +} + +``` + +This means that requests to, eg, `https://proclamations.nebcorp-hias.com/hit/hitman` will register a +hit for this post, and return the number of views so far. + +### systemd + +I created a system user for the service, `hitman`, with a homedir in `/var/lib/hitman`, and added +the following systemd unit file into `/etc/systemd/system/hitman.service`: + +``` +Description=Hitman +After=network.target network-online.target +Requires=network-online.target + +[Service] +Type=exec +User=hitman +Group=hitman +ExecStart=/var/lib/hitman/hitman -e /var/lib/hitman/.env +TimeoutStopSec=5s +LimitNOFILE=1048576 +LimitNPROC=512 +PrivateTmp=true +ProtectSystem=full + +[Install] +WantedBy=multi-user.target +``` + +This will ensure the hitman service is running after boot, and will be restarted if it crashes: + +``` +$ systemctl status hitman.service +● hitman.service - Hitman + Loaded: loaded (/etc/systemd/system/hitman.service; enabled; preset: enabled) + Active: active (running) since Sun 2024-03-31 12:12:14 PDT; 4h 0min ago + Main PID: 46338 (hitman) + Tasks: 2 (limit: 1018) + Memory: 948.0K + CPU: 53ms + CGroup: /system.slice/hitman.service + └─46338 /var/lib/hitman/hitman -e /var/lib/hitman/.env +``` + +### Hitman + +Inside the `/var/lib/hitman` directory there's a `.env` file with the following content: + +``` +DATABASE_URL=sqlite:///${HOME}/.hitman.db +DATABASE_FILE=${HOME}/.hitman.db +LISTENING_ADDR=127.0.0.1 +LISTENING_PORT=5000 +HITMAN_ORIGIN=https://proclamations.nebcorp-hias.com +``` + +# Coda + +When I got this working, a friend said, "Drat, that means I need to follow through on my goal to +write a little web-ring server." Something like two hours later, she had [a working +webring](https://erikarow.land/notes/gleam-webring), and indeed, if you look at the bottom of this +very page, you'll see the webring links; as she says, this Web 1.0 stuff is fun! + +--- +[^web1.0]: I think of the hitcounter era as the 90s, but that's because I'm older than the person + who asked the question. + +[^logs]: They don't mention scrubbing IPs from their logs, but they do mention having logs, so clearly + the job to scrub the hit DB of hashes is just privacy kabuki.