finish post
This commit is contained in:
parent
e1f5ae35bd
commit
d72bfc5e2b
|
@ -2,15 +2,191 @@
|
|||
title = "Hitman: another fine essential sundry service from Nebcorp Heavy Industries and Sundries"
|
||||
slug = "hitman"
|
||||
date = "2024-03-31"
|
||||
draft = true
|
||||
[taxonomies]
|
||||
tags = ["software", "sundry", "proclamation", "90s", "hitman", "web"]
|
||||
+++
|
||||
|
||||
# Hitman counts your hits, man.
|
||||
|
||||
Recently, someone in a community I'm part of asked the following:
|
||||
|
||||
> I was thinking about how we used to have website hit counters in the 2000s and I was wondering --
|
||||
> has anyone put a hit counter on your personal website?
|
||||
|
||||
## The dream
|
||||
Some people had, it turns out, but many had not. Among the had-nots was me, and I decided to do
|
||||
something about it. The bottom line up front is that you can see it in action right now at the
|
||||
bottom of this very page, and if you want, check out the code
|
||||
[here](https://git.kittencollective.com/nebkor/hitman); it's called Hitman!
|
||||
|
||||
## The reality
|
||||
## What's the problem?
|
||||
|
||||
Back in the day[^web1.0], there was basically only one way to have a website: you have a Linux box,
|
||||
running the Apache webserver, with PHP enabled, and a MySQL database to hold state; this is your
|
||||
classic LAPM stack, obviously. If this is your website, adding a visible counter is trivial: you
|
||||
just use PHP to do server side rendering of the count after a quick SQL query. And because this was
|
||||
basically the only way to have a website, lots of "website operators" put hitcounters on their site,
|
||||
because why not?
|
||||
|
||||
But this is the year 2024, and we do things differently these days. This blog, for example, is built
|
||||
with a "static site generator" called [Zola](https://www.getzola.org/), which means that there's no
|
||||
server side rendering, or any other kind of dynamic behavior from the backend. It's served by a
|
||||
small Linux VPS that's running the [Caddy](https://caddyserver.com/) webserver, and costs about five
|
||||
bucks a month to run. If I wanted to have a hitcounter, I'd have to do something non-traditional.
|
||||
|
||||
## What's the solution?
|
||||
|
||||
For me, it turned out to be a sidecar microservice for counting and reporting the hits. As usual
|
||||
these days, my first instinct is to reach for [Axum](https://docs.rs/axum/latest/axum/), a framework
|
||||
for building servers in Rust, and to use [SQLite](https://sqlite.org/) for a database. Caddy proxies
|
||||
all requests to the hit-counting URL to Hitman, which is listening only on localhost.
|
||||
|
||||
### That sounds simple
|
||||
|
||||
Ha ha, it does, doesn't it? And in the end, it actually kinda is. But there are a few nuances to
|
||||
consider.
|
||||
|
||||
### Privacy
|
||||
|
||||
The less I know the better, as far as I'm concerned, and I didn't see any reason to know more than I
|
||||
already did with this, but I'd need to track the IP of the client that was doing the request in order to
|
||||
de-duplicate views. Someone linked to [this
|
||||
post](https://herman.bearblog.dev/how-bear-does-analytics-with-css/) about how the author uses a
|
||||
notional CSS load to register a hit, and also how they hash the IP with the date to keep the counts
|
||||
down to one per IP per day. They're doing quite a bit more actual "analytics" than I'm interested
|
||||
in, but I liked the other idea. They mention scrubbing the hashes from their DB every night in order to
|
||||
pre-emptively satisfy an overzealous GDPR regulator[^logs], but I had a better idea, which was to hash the
|
||||
IP+date with a random number that is not disclosed, and is regenerated every time the server
|
||||
restarts.
|
||||
|
||||
I wound up [hashing with the date +
|
||||
hour](https://git.kittencollective.com/nebkor/hitman/src/commit/1617eae17448273114ca1b1d9277b3465986e9f1/src/main.rs#L79-L94),
|
||||
along with the page, IP, and the secret. This buckets views to one per IP per page per hour, vs the
|
||||
once per day from the bearblog.
|
||||
|
||||
### Security?
|
||||
|
||||
I spent some time on this, but ultimately realized that there's
|
||||
|
||||
- not much I can do, but
|
||||
- not much they can do, either.
|
||||
|
||||
The server [rejects remote
|
||||
origins](https://git.kittencollective.com/nebkor/hitman/src/commit/1617eae17448273114ca1b1d9277b3465986e9f1/src/main.rs#L45-L48),
|
||||
but the `Origin` headers can be trivially forged. On the other hand, the worst someone could do is
|
||||
add a bunch of junk to my DB, and I don't care about the data that much; this is all just for
|
||||
funsies, anyway!
|
||||
|
||||
## The front end
|
||||
|
||||
I mentioned that this blog is made using Zola, a static site generator. Zola has a built-in
|
||||
templating system, so the [following
|
||||
bit](https://git.kittencollective.com/nebkor/blog/commit/87afa418b239419f551459e9cc5e838f9fac7ed6)
|
||||
of HTML with inlined JavaScript is enough to register a hit and return the latest count:
|
||||
|
||||
``` html
|
||||
<div class=hias-footer>
|
||||
<p>There have been <span id="hitman-count">no</span> views of this page.</p>
|
||||
</div>
|
||||
|
||||
<script defer>
|
||||
const hits = document.getElementById('hitman-count');
|
||||
fetch("/hit/{{ page.slug }}").then((resp) => {
|
||||
if (resp.ok) {
|
||||
return resp.text();
|
||||
} else {
|
||||
return "I don't even know how many";
|
||||
}
|
||||
}).then((data) => {
|
||||
hits.innerHTML = data;
|
||||
});
|
||||
</script>
|
||||
```
|
||||
|
||||
## Putting it all together
|
||||
|
||||
OK, all the pieces are laid out, but here's the actual setup on the backend:
|
||||
|
||||
### Caddy
|
||||
|
||||
The Caddy configuration has the following:
|
||||
|
||||
```
|
||||
proclamations.nebcorp-hias.com {
|
||||
handle /hit/* {
|
||||
reverse_proxy localhost:5000
|
||||
}
|
||||
handle {
|
||||
<all the other routes on the site>
|
||||
}
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
This means that requests to, eg, `https://proclamations.nebcorp-hias.com/hit/hitman` will register a
|
||||
hit for this post, and return the number of views so far.
|
||||
|
||||
### systemd
|
||||
|
||||
I created a system user for the service, `hitman`, with a homedir in `/var/lib/hitman`, and added
|
||||
the following systemd unit file into `/etc/systemd/system/hitman.service`:
|
||||
|
||||
```
|
||||
Description=Hitman
|
||||
After=network.target network-online.target
|
||||
Requires=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=exec
|
||||
User=hitman
|
||||
Group=hitman
|
||||
ExecStart=/var/lib/hitman/hitman -e /var/lib/hitman/.env
|
||||
TimeoutStopSec=5s
|
||||
LimitNOFILE=1048576
|
||||
LimitNPROC=512
|
||||
PrivateTmp=true
|
||||
ProtectSystem=full
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
This will ensure the hitman service is running after boot, and will be restarted if it crashes:
|
||||
|
||||
```
|
||||
$ systemctl status hitman.service
|
||||
● hitman.service - Hitman
|
||||
Loaded: loaded (/etc/systemd/system/hitman.service; enabled; preset: enabled)
|
||||
Active: active (running) since Sun 2024-03-31 12:12:14 PDT; 4h 0min ago
|
||||
Main PID: 46338 (hitman)
|
||||
Tasks: 2 (limit: 1018)
|
||||
Memory: 948.0K
|
||||
CPU: 53ms
|
||||
CGroup: /system.slice/hitman.service
|
||||
└─46338 /var/lib/hitman/hitman -e /var/lib/hitman/.env
|
||||
```
|
||||
|
||||
### Hitman
|
||||
|
||||
Inside the `/var/lib/hitman` directory there's a `.env` file with the following content:
|
||||
|
||||
```
|
||||
DATABASE_URL=sqlite:///${HOME}/.hitman.db
|
||||
DATABASE_FILE=${HOME}/.hitman.db
|
||||
LISTENING_ADDR=127.0.0.1
|
||||
LISTENING_PORT=5000
|
||||
HITMAN_ORIGIN=https://proclamations.nebcorp-hias.com
|
||||
```
|
||||
|
||||
# Coda
|
||||
|
||||
When I got this working, a friend said, "Drat, that means I need to follow through on my goal to
|
||||
write a little web-ring server." Something like two hours later, she had [a working
|
||||
webring](https://erikarow.land/notes/gleam-webring), and indeed, if you look at the bottom of this
|
||||
very page, you'll see the webring links; as she says, this Web 1.0 stuff is fun!
|
||||
|
||||
---
|
||||
[^web1.0]: I think of the hitcounter era as the 90s, but that's because I'm older than the person
|
||||
who asked the question.
|
||||
|
||||
[^logs]: They don't mention scrubbing IPs from their logs, but they do mention having logs, so clearly
|
||||
the job to scrub the hit DB of hashes is just privacy kabuki.
|
||||
|
|
Loading…
Reference in New Issue