+++ title = "Hitman: another fine essential sundry service from Nebcorp Heavy Industries and Sundries" slug = "hitman" date = "2024-03-31" updated = "2024-03-31" [taxonomies] tags = ["software", "sundry", "proclamation", "90s", "hitman", "web"] +++ # Hitman counts your hits, man. Recently, someone in a community I'm part of asked the following: > I was thinking about how we used to have website hit counters in the 2000s and I was wondering -- > has anyone put a hit counter on your personal website? Some people had, it turns out, but many had not. Among the had-nots was me, and I decided to do something about it. The bottom line up front is that you can see it in action right now at the bottom of this very page, and if you want, check out the code [here](https://git.kittencollective.com/nebkor/hitman); it's called Hitman! ## What's the problem? Back in the day[^web1.0], there was basically only one way to have a website: you have a Linux box, running the Apache webserver, with PHP enabled, and a MySQL database to hold state; this is your classic LAPM stack, obviously. If this is your website, adding a visible counter is trivial: you just use PHP to do server side rendering of the count after a quick SQL query. And because this was basically the only way to have a website, lots of "website operators" put hitcounters on their site, because why not? But this is the year 2024, and we do things differently these days. This blog, for example, is built with a "static site generator" called [Zola](https://www.getzola.org/), which means that there's no server side rendering, or any other kind of dynamic behavior from the backend. It's served by a small Linux VPS that's running the [Caddy](https://caddyserver.com/) webserver, and costs about five bucks a month to run. If I wanted to have a hitcounter, I'd have to do something non-traditional. ## What's the solution? For me, it turned out to be a sidecar microservice for counting and reporting the hits. As usual these days, my first instinct is to reach for [Axum](https://docs.rs/axum/latest/axum/), a framework for building servers in Rust, and to use [SQLite](https://sqlite.org/) for a database. Caddy proxies all requests to the hit-counting URL to Hitman, which is listening only on localhost. ### That sounds simple Ha ha, it does, doesn't it? And in the end, it actually kinda is. But there are a few nuances to consider. ### Privacy The less I know the better, as far as I'm concerned, and I didn't see any reason to know more than I already did with this, but I'd need to track the IP of the client that was doing the request in order to de-duplicate views. Someone linked to [this post](https://herman.bearblog.dev/how-bear-does-analytics-with-css/) about how the author uses a notional CSS load to register a hit, and also how they hash the IP with the date to keep the counts down to one per IP per day. They're doing quite a bit more actual "analytics" than I'm interested in, but I liked the other idea. They mention scrubbing the hashes from their DB every night in order to pre-emptively satisfy an overzealous GDPR regulator[^logs], but I had a better idea, which was to hash the IP+date with a random number that is not disclosed, and is regenerated every time the server restarts. I wound up [hashing with the date + hour](https://git.kittencollective.com/nebkor/hitman/src/commit/1617eae17448273114ca1b1d9277b3465986e9f1/src/main.rs#L79-L94), along with the page, IP, and the secret. This buckets views to one per IP per page per hour, vs the once per day from the bearblog. ### Security? I spent some time on this, but ultimately realized that there's - not much I can do, but - not much they can do, either. The server [rejects remote origins](https://git.kittencollective.com/nebkor/hitman/src/commit/1617eae17448273114ca1b1d9277b3465986e9f1/src/main.rs#L45-L48), but the `Origin` headers can be trivially forged. On the other hand, the worst someone could do is add a bunch of junk to my DB, and I don't care about the data that much; this is all just for funsies, anyway! Still, after writing this out, I realized that someone could send a bunch of junk slugs and hence fill my disk from a single IP, so I [added a check against a set of allowed slugs](https://git.kittencollective.com/nebkor/hitman/commit/89a985e96098731e5e8691fd84776c1592b6184b) to guard against that. Beyond that, I'd need to start thinking about being robust against a targeted and relatively sophisticated distributed attack, and it's definitely not worth it. ## The front end I mentioned that this blog is made using Zola, a static site generator. Zola has a built-in templating system, so the [following bit](https://git.kittencollective.com/nebkor/blog/commit/87afa418b239419f551459e9cc5e838f9fac7ed6) of HTML with inlined JavaScript is enough to register a hit and return the latest count: ``` html
``` ## Putting it all together OK, all the pieces are laid out, but here's the actual setup on the backend: ### Caddy The Caddy configuration has the following: ``` proclamations.nebcorp-hias.com { handle /hit/* { reverse_proxy localhost:5000 } handle {