finish post

2024-03-31 16:28:20 -07:00 · 2024-03-31 16:28:20 -07:00 · d72bfc5e2b
commit d72bfc5e2b
parent e1f5ae35bd
1 changed files with 179 additions and 3 deletions
--- a/content/sundries/hitman/index.md
+++ b/content/sundries/hitman/index.md
@ -2,15 +2,191 @@
 title = "Hitman: another fine essential sundry service from Nebcorp Heavy Industries and Sundries"
 slug = "hitman"
 date = "2024-03-31"
-draft = true
 [taxonomies]
 tags = ["software", "sundry", "proclamation", "90s", "hitman", "web"]
 +++

 # Hitman counts your hits, man.

+Recently, someone in a community I'm part of asked the following:

+> I was thinking about how we used to have website hit counters in the 2000s and I was wondering --
+> has anyone put a hit counter on your personal website?

-## The dream
+Some people had, it turns out, but many had not. Among the had-nots was me, and I decided to do
+something about it. The bottom line up front is that you can see it in action right now at the
+bottom of this very page, and if you want, check out the code
+[here](https://git.kittencollective.com/nebkor/hitman); it's called Hitman!

-## The reality
+## What's the problem?
+
+Back in the day[^web1.0], there was basically only one way to have a website: you have a Linux box,
+running the Apache webserver, with PHP enabled, and a MySQL database to hold state; this is your
+classic LAPM stack, obviously.  If this is your website, adding a visible counter is trivial: you
+just use PHP to do server side rendering of the count after a quick SQL query. And because this was
+basically the only way to have a website, lots of "website operators" put hitcounters on their site,
+because why not?
+
+But this is the year 2024, and we do things differently these days. This blog, for example, is built
+with a "static site generator" called [Zola](https://www.getzola.org/), which means that there's no
+server side rendering, or any other kind of dynamic behavior from the backend. It's served by a
+small Linux VPS that's running the [Caddy](https://caddyserver.com/) webserver, and costs about five
+bucks a month to run. If I wanted to have a hitcounter, I'd have to do something non-traditional.
+
+## What's the solution?
+
+For me, it turned out to be a sidecar microservice for counting and reporting the hits. As usual
+these days, my first instinct is to reach for [Axum](https://docs.rs/axum/latest/axum/), a framework
+for building servers in Rust, and to use [SQLite](https://sqlite.org/) for a database. Caddy proxies
+all requests to the hit-counting URL to Hitman, which is listening only on localhost.
+
+### That sounds simple
+
+Ha ha, it does, doesn't it? And in the end, it actually kinda is. But there are a few nuances to
+consider.
+
+### Privacy
+
+The less I know the better, as far as I'm concerned, and I didn't see any reason to know more than I
+already did with this, but I'd need to track the IP of the client that was doing the request in order to
+de-duplicate views. Someone linked to [this
+post](https://herman.bearblog.dev/how-bear-does-analytics-with-css/) about how the author uses a
+notional CSS load to register a hit, and also how they hash the IP with the date to keep the counts
+down to one per IP per day. They're doing quite a bit more actual "analytics" than I'm interested
+in, but I liked the other idea. They mention scrubbing the hashes from their DB every night in order to
+pre-emptively satisfy an overzealous GDPR regulator[^logs], but I had a better idea, which was to hash the
+IP+date with a random number that is not disclosed, and is regenerated every time the server
+restarts.
+
+I wound up [hashing with the date +
+hour](https://git.kittencollective.com/nebkor/hitman/src/commit/1617eae17448273114ca1b1d9277b3465986e9f1/src/main.rs#L79-L94),
+along with the page, IP, and the secret. This buckets views to one per IP per page per hour, vs the
+once per day from the bearblog.
+
+### Security?
+
+I spent some time on this, but ultimately realized that there's
+
+ - not much I can do, but
+ - not much they can do, either.
+
+The server [rejects remote
+origins](https://git.kittencollective.com/nebkor/hitman/src/commit/1617eae17448273114ca1b1d9277b3465986e9f1/src/main.rs#L45-L48),
+but the `Origin` headers can be trivially forged. On the other hand, the worst someone could do is
+add a bunch of junk to my DB, and I don't care about the data that much; this is all just for
+funsies, anyway!
+
+## The front end
+
+I mentioned that this blog is made using Zola, a static site generator. Zola has a built-in
+templating system, so the [following
+bit](https://git.kittencollective.com/nebkor/blog/commit/87afa418b239419f551459e9cc5e838f9fac7ed6)
+of HTML with inlined JavaScript is enough to register a hit and return the latest count:
+
+``` html
+<div class=hias-footer>
+    <p>There have been <span id="hitman-count">no</span> views of this page.</p>
+</div>
+
+<script defer>
+    const hits = document.getElementById('hitman-count');
+    fetch("/hit/{{ page.slug }}").then((resp) => {
+        if (resp.ok) {
+            return resp.text();
+        } else {
+            return "I don't even know how many";
+        }
+    }).then((data) => {
+        hits.innerHTML = data;
+    });
+</script>
+```
+
+## Putting it all together
+
+OK, all the pieces are laid out, but here's the actual setup on the backend:
+
+### Caddy
+
+The Caddy configuration has the following:
+
+```
+proclamations.nebcorp-hias.com {
+    handle /hit/* {
+        reverse_proxy localhost:5000
+    }
+    handle {
+        <all the other routes on the site>
+    }
+}
+
+```
+
+This means that requests to, eg, `https://proclamations.nebcorp-hias.com/hit/hitman` will register a
+hit for this post, and return the number of views so far.
+
+### systemd
+
+I created a system user for the service, `hitman`, with a homedir in `/var/lib/hitman`, and added
+the following systemd unit file into `/etc/systemd/system/hitman.service`:
+
+```
+Description=Hitman
+After=network.target network-online.target
+Requires=network-online.target
+
+[Service]
+Type=exec
+User=hitman
+Group=hitman
+ExecStart=/var/lib/hitman/hitman -e /var/lib/hitman/.env
+TimeoutStopSec=5s
+LimitNOFILE=1048576
+LimitNPROC=512
+PrivateTmp=true
+ProtectSystem=full
+
+[Install]
+WantedBy=multi-user.target
+```
+
+This will ensure the hitman service is running after boot, and will be restarted if it crashes:
+
+```
+$ systemctl status hitman.service
+● hitman.service - Hitman
+     Loaded: loaded (/etc/systemd/system/hitman.service; enabled; preset: enabled)
+     Active: active (running) since Sun 2024-03-31 12:12:14 PDT; 4h 0min ago
+   Main PID: 46338 (hitman)
+      Tasks: 2 (limit: 1018)
+     Memory: 948.0K
+        CPU: 53ms
+     CGroup: /system.slice/hitman.service
+             └─46338 /var/lib/hitman/hitman -e /var/lib/hitman/.env
+```
+
+### Hitman
+
+Inside the `/var/lib/hitman` directory there's a `.env` file with the following content:
+
+```
+DATABASE_URL=sqlite:///${HOME}/.hitman.db
+DATABASE_FILE=${HOME}/.hitman.db
+LISTENING_ADDR=127.0.0.1
+LISTENING_PORT=5000
+HITMAN_ORIGIN=https://proclamations.nebcorp-hias.com
+```
+
+# Coda
+
+When I got this working, a friend said, "Drat, that means I need to follow through on my goal to
+write a little web-ring server." Something like two hours later, she had [a working
+webring](https://erikarow.land/notes/gleam-webring), and indeed, if you look at the bottom of this
+very page, you'll see the webring links; as she says, this Web 1.0 stuff is fun!
+
+---
+[^web1.0]: I think of the hitcounter era as the 90s, but that's because I'm older than the person
+    who asked the question.
+
+[^logs]: They don't mention scrubbing IPs from their logs, but they do mention having logs, so clearly
+    the job to scrub the hit DB of hashes is just privacy kabuki.