pique/_experiments/2024-03-02-database-benchmark/notes.md


# Overview

The goal of this experiment is to determine what whatabase to use for Pique.
Normally, we can just go with a tried-and-true option, like PostgreSQL. However,
a few factors are working against us here:

- The goal for pageloads is under 50ms for every page load (on the server side)
- PostgreSQL uses different storage for values over 2kB (compressed), which can
  lead to much slower reads if it goes to the disk.
- We'll be storing documents up to 1 MB each in the database. This is just text
  content and does *not* include resources like images.

This combination may make Postgres an unsuitable choice! It has proven slow in
the past when, at a job, someone had put large JSON blobs (multiple kB) into a
column, and queries were over 100ms when these columns were involved. I don't
know how much of that was a Postgres limitation and how much was the particular
schema and hardware we had, so I want to find out!


# Experiment design

I'm going to run a benchmark on three databases: Postgres, MariaDB, and SQLite.
Each will start by loading in a bunch of text documents into each database,
then we will do some random reads and measure the time of each. Memory and CPU
limits will be set on the non-embedded databases.

The text documents will be generated randomly at a size and count determined to
be a reasonable size Pique's data will probably reach after a few years. Our
experiment is not particularly valid if it only lasts a year.

To sample, we will pick random IDs from in the range of (0, count), since IDs
are assigned monotonically increasing for the databases we have chosen.


# Results