hitman/README.md
2024-03-31 17:29:48 -07:00

70 lines
2.9 KiB
Markdown

# Hitman counts your hits, man.
This is a simple webpage hit/visit counter service. To run in development, copy the provided `env.example` file
to `.env`. By default, it will look for a database file in your home directory called
`.hitman.db`. You can let hitman create it for you, or you can use the `sqlx db create` command (get
by running `cargo install sqlx-cli`; see https://crates.io/crates/sqlx-cli ); if you do that, don't
forget to also run `sqlx migrate run`, to create the tables. This project uses SQLx's compile-time
SQL checking macros, so your DB needs to have the tables to allow the SQL checks to work.
## How does it work?
You need to register the hit by doing a GET to the `/hit/:page` endpoint, where `:page` is a unique
and persistent identifier for the page; on my blog, I'm using the Zola post slug as the id. This bit
of HTML + JS shows it in action:
``` html
<p>There have been <span id="allhits">no</span> views of this page</p>
<script defer>
const hits = document.getElementById('allhits');
fetch('http://localhost:5000/hit/index.html').then((resp) => {
if (resp.ok) {
return resp.text();
} else {
return "I don't even know how many"
}
}).then((data) => {
hits.innerHTML = data;
});
</script>
```
In this example, the `:page` is "index.html". The `/hit` endpoint registers the hit and then
returns back the latest count of hits.
The `index.html` file in this repo has the above code in it; if you serve it like
`python3 -m http.server 3000 & cargo run`
then visit http://localhost:3000 you should see that there is 1 hit, if this is the first time
you're trying it out. Reloading won't increment the count until the hour changes and you visit
again, or you kill and restart Hitman.
If you see a log message like `rejecting invalid slug index.html`,
you'll need to add the allowed slugs into the `slugs` table:
``` sql
insert into slugs (slug) values ("index.html"), ("user");
```
See the note on security below.
### Privacy
The IP from the request is hashed with the date, hour of day, `:page`, and a random 64-bit number
that gets regenerated every time the service is restarted and is never disclosed. This does two
things:
1. ensures that hit counts are limited to one per hour per IP per page;
2. ensures that you can't enumerate all possible hashes from just the page, time, and then just
trying all four billion possible IPs to find the matching hash.
There is no need to put up a tracking consent form because nothing is being tracked.
### Security?
Well, you need to give it a specific origin that is allowed to connect; this isn't really enough,
though. To mitigate the potential for abuse, the code that registers a hit checks against a set of
allowed slugs. Any time you add a new page to your site, you'll need to update the `slugs` table.