199 lines
7.7 KiB
Markdown
199 lines
7.7 KiB
Markdown
+++
|
|
title = "Hitman: another fine essential sundry service from Nebcorp Heavy Industries and Sundries"
|
|
slug = "hitman"
|
|
date = "2024-03-31"
|
|
updated = "2024-03-31"
|
|
[taxonomies]
|
|
tags = ["software", "sundry", "proclamation", "90s", "hitman", "web"]
|
|
+++
|
|
|
|
# Hitman counts your hits, man.
|
|
|
|
Recently, someone in a community I'm part of asked the following:
|
|
|
|
> I was thinking about how we used to have website hit counters in the 2000s and I was wondering --
|
|
> has anyone put a hit counter on your personal website?
|
|
|
|
Some people had, it turns out, but many had not. Among the had-nots was me, and I decided to do
|
|
something about it. The bottom line up front is that you can see it in action right now at the
|
|
bottom of this very page, and if you want, check out the code
|
|
[here](https://git.kittencollective.com/nebkor/hitman); it's called Hitman!
|
|
|
|
## What's the problem?
|
|
|
|
Back in the day[^web1.0], there was basically only one way to have a website: you have a Linux box,
|
|
running the Apache webserver, with PHP enabled, and a MySQL database to hold state; this is your
|
|
classic LAPM stack, obviously. If this is your website, adding a visible counter is trivial: you
|
|
just use PHP to do server side rendering of the count after a quick SQL query. And because this was
|
|
basically the only way to have a website, lots of "website operators" put hitcounters on their site,
|
|
because why not?
|
|
|
|
But this is the year 2024, and we do things differently these days. This blog, for example, is built
|
|
with a "static site generator" called [Zola](https://www.getzola.org/), which means that there's no
|
|
server side rendering, or any other kind of dynamic behavior from the backend. It's served by a
|
|
small Linux VPS that's running the [Caddy](https://caddyserver.com/) webserver, and costs about five
|
|
bucks a month to run. If I wanted to have a hitcounter, I'd have to do something non-traditional.
|
|
|
|
## What's the solution?
|
|
|
|
For me, it turned out to be a sidecar microservice for counting and reporting the hits. As usual
|
|
these days, my first instinct is to reach for [Axum](https://docs.rs/axum/latest/axum/), a framework
|
|
for building servers in Rust, and to use [SQLite](https://sqlite.org/) for a database. Caddy proxies
|
|
all requests to the hit-counting URL to Hitman, which is listening only on localhost.
|
|
|
|
### That sounds simple
|
|
|
|
Ha ha, it does, doesn't it? And in the end, it actually kinda is. But there are a few nuances to
|
|
consider.
|
|
|
|
### Privacy
|
|
|
|
The less I know the better, as far as I'm concerned, and I didn't see any reason to know more than I
|
|
already did with this, but I'd need to track the IP of the client that was doing the request in order to
|
|
de-duplicate views. Someone linked to [this
|
|
post](https://herman.bearblog.dev/how-bear-does-analytics-with-css/) about how the author uses a
|
|
notional CSS load to register a hit, and also how they hash the IP with the date to keep the counts
|
|
down to one per IP per day. They're doing quite a bit more actual "analytics" than I'm interested
|
|
in, but I liked the other idea. They mention scrubbing the hashes from their DB every night in order to
|
|
pre-emptively satisfy an overzealous GDPR regulator[^logs], but I had a better idea, which was to hash the
|
|
IP+date with a random number that is not disclosed, and is regenerated every time the server
|
|
restarts.
|
|
|
|
I wound up [hashing with the date +
|
|
hour](https://git.kittencollective.com/nebkor/hitman/src/commit/1617eae17448273114ca1b1d9277b3465986e9f1/src/main.rs#L79-L94),
|
|
along with the page, IP, and the secret. This buckets views to one per IP per page per hour, vs the
|
|
once per day from the bearblog.
|
|
|
|
### Security?
|
|
|
|
I spent some time on this, but ultimately realized that there's
|
|
|
|
- not much I can do, but
|
|
- not much they can do, either.
|
|
|
|
The server [rejects remote
|
|
origins](https://git.kittencollective.com/nebkor/hitman/src/commit/1617eae17448273114ca1b1d9277b3465986e9f1/src/main.rs#L45-L48),
|
|
but the `Origin` headers can be trivially forged. On the other hand, the worst someone could do is
|
|
add a bunch of junk to my DB, and I don't care about the data that much; this is all just for
|
|
funsies, anyway!
|
|
|
|
Still, after writing this out, I realized that someone could send a bunch of junk slugs and hence
|
|
fill my disk from a single IP, so I [added a check against a set of allowed
|
|
slugs](https://git.kittencollective.com/nebkor/hitman/commit/89a985e96098731e5e8691fd84776c1592b6184b)
|
|
to guard against that. Beyond that, I'd need to start thinking about being robust against a targeted
|
|
and relatively sophisticated distributed attack, and it's definitely not worth it.
|
|
|
|
## The front end
|
|
|
|
I mentioned that this blog is made using Zola, a static site generator. Zola has a built-in
|
|
templating system, so the [following
|
|
bit](https://git.kittencollective.com/nebkor/blog/commit/87afa418b239419f551459e9cc5e838f9fac7ed6)
|
|
of HTML with inlined JavaScript is enough to register a hit and return the latest count:
|
|
|
|
``` html
|
|
<div class=hias-footer>
|
|
<p>There have been <span id="hitman-count">no</span> views of this page.</p>
|
|
</div>
|
|
|
|
<script defer>
|
|
const hits = document.getElementById('hitman-count');
|
|
fetch("/hit/{{ page.slug }}").then((resp) => {
|
|
if (resp.ok) {
|
|
return resp.text();
|
|
} else {
|
|
return "I don't even know how many";
|
|
}
|
|
}).then((data) => {
|
|
hits.innerHTML = data;
|
|
});
|
|
</script>
|
|
```
|
|
|
|
## Putting it all together
|
|
|
|
OK, all the pieces are laid out, but here's the actual setup on the backend:
|
|
|
|
### Caddy
|
|
|
|
The Caddy configuration has the following:
|
|
|
|
```
|
|
proclamations.nebcorp-hias.com {
|
|
handle /hit/* {
|
|
reverse_proxy localhost:5000
|
|
}
|
|
handle {
|
|
<all the other routes on the site>
|
|
}
|
|
}
|
|
|
|
```
|
|
|
|
This means that requests to, eg, `https://proclamations.nebcorp-hias.com/hit/hitman` will register a
|
|
hit for this post, and return the number of views so far.
|
|
|
|
### systemd
|
|
|
|
I created a system user for the service, `hitman`, with a homedir in `/var/lib/hitman`, and added
|
|
the following systemd unit file into `/etc/systemd/system/hitman.service`:
|
|
|
|
```
|
|
Description=Hitman
|
|
After=network.target network-online.target
|
|
Requires=network-online.target
|
|
|
|
[Service]
|
|
Type=exec
|
|
User=hitman
|
|
Group=hitman
|
|
ExecStart=/var/lib/hitman/hitman -e /var/lib/hitman/.env
|
|
TimeoutStopSec=5s
|
|
LimitNOFILE=1048576
|
|
LimitNPROC=512
|
|
PrivateTmp=true
|
|
ProtectSystem=full
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
```
|
|
|
|
This will ensure the hitman service is running after boot, and will be restarted if it crashes:
|
|
|
|
```
|
|
$ systemctl status hitman.service
|
|
● hitman.service - Hitman
|
|
Loaded: loaded (/etc/systemd/system/hitman.service; enabled; preset: enabled)
|
|
Active: active (running) since Sun 2024-03-31 12:12:14 PDT; 4h 0min ago
|
|
Main PID: 46338 (hitman)
|
|
Tasks: 2 (limit: 1018)
|
|
Memory: 948.0K
|
|
CPU: 53ms
|
|
CGroup: /system.slice/hitman.service
|
|
└─46338 /var/lib/hitman/hitman -e /var/lib/hitman/.env
|
|
```
|
|
|
|
### Hitman
|
|
|
|
Inside the `/var/lib/hitman` directory there's a `.env` file with the following content:
|
|
|
|
```
|
|
DATABASE_URL=sqlite:///${HOME}/.hitman.db
|
|
DATABASE_FILE=${HOME}/.hitman.db
|
|
LISTENING_ADDR=127.0.0.1
|
|
LISTENING_PORT=5000
|
|
HITMAN_ORIGIN=https://proclamations.nebcorp-hias.com
|
|
```
|
|
|
|
# Coda
|
|
|
|
When I got this working, a friend said, "Drat, that means I need to follow through on my goal to
|
|
write a little web-ring server." Something like two hours later, she had [a working
|
|
webring](https://erikarow.land/notes/gleam-webring), and indeed, if you look at the bottom of this
|
|
very page, you'll see the webring links; as she says, this Web 1.0 stuff is fun!
|
|
|
|
---
|
|
[^web1.0]: I think of the hitcounter era as the 90s, but that's because I'm older than the person
|
|
who asked the question.
|
|
|
|
[^logs]: They don't mention scrubbing IPs from their logs, but they do mention having logs, so clearly
|
|
the job to scrub the hit DB of hashes is just privacy kabuki.
|