diff --git a/content/sundries/hitman/index.md b/content/sundries/hitman/index.md index a9f7e63..e0ff640 100644 --- a/content/sundries/hitman/index.md +++ b/content/sundries/hitman/index.md @@ -2,15 +2,191 @@ title = "Hitman: another fine essential sundry service from Nebcorp Heavy Industries and Sundries" slug = "hitman" date = "2024-03-31" -draft = true [taxonomies] tags = ["software", "sundry", "proclamation", "90s", "hitman", "web"] +++ # Hitman counts your hits, man. +Recently, someone in a community I'm part of asked the following: +> I was thinking about how we used to have website hit counters in the 2000s and I was wondering -- +> has anyone put a hit counter on your personal website? -## The dream +Some people had, it turns out, but many had not. Among the had-nots was me, and I decided to do +something about it. The bottom line up front is that you can see it in action right now at the +bottom of this very page, and if you want, check out the code +[here](https://git.kittencollective.com/nebkor/hitman); it's called Hitman! -## The reality +## What's the problem? + +Back in the day[^web1.0], there was basically only one way to have a website: you have a Linux box, +running the Apache webserver, with PHP enabled, and a MySQL database to hold state; this is your +classic LAPM stack, obviously. If this is your website, adding a visible counter is trivial: you +just use PHP to do server side rendering of the count after a quick SQL query. And because this was +basically the only way to have a website, lots of "website operators" put hitcounters on their site, +because why not? + +But this is the year 2024, and we do things differently these days. This blog, for example, is built +with a "static site generator" called [Zola](https://www.getzola.org/), which means that there's no +server side rendering, or any other kind of dynamic behavior from the backend. It's served by a +small Linux VPS that's running the [Caddy](https://caddyserver.com/) webserver, and costs about five +bucks a month to run. If I wanted to have a hitcounter, I'd have to do something non-traditional. + +## What's the solution? + +For me, it turned out to be a sidecar microservice for counting and reporting the hits. As usual +these days, my first instinct is to reach for [Axum](https://docs.rs/axum/latest/axum/), a framework +for building servers in Rust, and to use [SQLite](https://sqlite.org/) for a database. Caddy proxies +all requests to the hit-counting URL to Hitman, which is listening only on localhost. + +### That sounds simple + +Ha ha, it does, doesn't it? And in the end, it actually kinda is. But there are a few nuances to +consider. + +### Privacy + +The less I know the better, as far as I'm concerned, and I didn't see any reason to know more than I +already did with this, but I'd need to track the IP of the client that was doing the request in order to +de-duplicate views. Someone linked to [this +post](https://herman.bearblog.dev/how-bear-does-analytics-with-css/) about how the author uses a +notional CSS load to register a hit, and also how they hash the IP with the date to keep the counts +down to one per IP per day. They're doing quite a bit more actual "analytics" than I'm interested +in, but I liked the other idea. They mention scrubbing the hashes from their DB every night in order to +pre-emptively satisfy an overzealous GDPR regulator[^logs], but I had a better idea, which was to hash the +IP+date with a random number that is not disclosed, and is regenerated every time the server +restarts. + +I wound up [hashing with the date + +hour](https://git.kittencollective.com/nebkor/hitman/src/commit/1617eae17448273114ca1b1d9277b3465986e9f1/src/main.rs#L79-L94), +along with the page, IP, and the secret. This buckets views to one per IP per page per hour, vs the +once per day from the bearblog. + +### Security? + +I spent some time on this, but ultimately realized that there's + + - not much I can do, but + - not much they can do, either. + +The server [rejects remote +origins](https://git.kittencollective.com/nebkor/hitman/src/commit/1617eae17448273114ca1b1d9277b3465986e9f1/src/main.rs#L45-L48), +but the `Origin` headers can be trivially forged. On the other hand, the worst someone could do is +add a bunch of junk to my DB, and I don't care about the data that much; this is all just for +funsies, anyway! + +## The front end + +I mentioned that this blog is made using Zola, a static site generator. Zola has a built-in +templating system, so the [following +bit](https://git.kittencollective.com/nebkor/blog/commit/87afa418b239419f551459e9cc5e838f9fac7ed6) +of HTML with inlined JavaScript is enough to register a hit and return the latest count: + +``` html +
+ + +``` + +## Putting it all together + +OK, all the pieces are laid out, but here's the actual setup on the backend: + +### Caddy + +The Caddy configuration has the following: + +``` +proclamations.nebcorp-hias.com { + handle /hit/* { + reverse_proxy localhost:5000 + } + handle { +