pique/_experiments/2024-03-02-database-benchmark/notes.md


# Overview

The goal of this experiment is to determine what whatabase to use for Pique.
Normally, we can just go with a tried-and-true option, like PostgreSQL. However,
a few factors are working against us here:

- The goal for pageloads is under 50ms for every page load (on the server side)
- PostgreSQL uses different storage for values over 2kB (compressed), which can
  lead to much slower reads if it goes to the disk.
- We'll be storing documents up to 1 MB each in the database. This is just text
  content and does *not* include resources like images.

This combination may make Postgres an unsuitable choice! It has proven slow in
the past when, at a job, someone had put large JSON blobs (multiple kB) into a
column, and queries were over 100ms when these columns were involved. I don't
know how much of that was a Postgres limitation and how much was the particular
schema and hardware we had, so I want to find out!


# Experiment design

I'm going to run a benchmark on three databases: Postgres, MariaDB, and SQLite.
Each will start by loading in a bunch of text documents into each database,
then we will do some random reads and measure the time of each. Memory and CPU
limits will be set on the non-embedded databases.

The text documents will be generated randomly at a size and count determined to
be a reasonable size Pique's data will probably reach after a few years. Our
experiment is not particularly valid if it only lasts a year.

To sample, we will pick random IDs from in the range of (0, count), since IDs
are assigned monotonically increasing for the databases we have chosen.


# Results
Add ADR for our choice of SQLite as our primary database, backed up by experiments demonstrating that SQLite will meet all of our requirements. This also introduces ADRs in the repo, and adds a README in preparation making the repository public. 2024-03-16 15:12:46 +00:00
			`# Overview`

			`The goal of this experiment is to determine what whatabase to use for Pique.`
			`Normally, we can just go with a tried-and-true option, like PostgreSQL. However,`
			`a few factors are working against us here:`

			`- The goal for pageloads is under 50ms for every page load (on the server side)`
			`- PostgreSQL uses different storage for values over 2kB (compressed), which can`
			`lead to much slower reads if it goes to the disk.`
			`- We'll be storing documents up to 1 MB each in the database. This is just text`
			`content and does not include resources like images.`

			`This combination may make Postgres an unsuitable choice! It has proven slow in`
			`the past when, at a job, someone had put large JSON blobs (multiple kB) into a`
			`column, and queries were over 100ms when these columns were involved. I don't`
			`know how much of that was a Postgres limitation and how much was the particular`
			`schema and hardware we had, so I want to find out!`


			`# Experiment design`

			`I'm going to run a benchmark on three databases: Postgres, MariaDB, and SQLite.`
			`Each will start by loading in a bunch of text documents into each database,`
			`then we will do some random reads and measure the time of each. Memory and CPU`
			`limits will be set on the non-embedded databases.`

			`The text documents will be generated randomly at a size and count determined to`
			`be a reasonable size Pique's data will probably reach after a few years. Our`
			`experiment is not particularly valid if it only lasts a year.`

			`To sample, we will pick random IDs from in the range of (0, count), since IDs`
			`are assigned monotonically increasing for the databases we have chosen.`


			`# Results`