pique/_experiments/2024-03-02-database-benchmark/notes.md

39 lines
1.6 KiB
Markdown
Raw Normal View History

# Overview
The goal of this experiment is to determine what whatabase to use for Pique.
Normally, we can just go with a tried-and-true option, like PostgreSQL. However,
a few factors are working against us here:
- The goal for pageloads is under 50ms for every page load (on the server side)
- PostgreSQL uses different storage for values over 2kB (compressed), which can
lead to much slower reads if it goes to the disk.
- We'll be storing documents up to 1 MB each in the database. This is just text
content and does *not* include resources like images.
This combination may make Postgres an unsuitable choice! It has proven slow in
the past when, at a job, someone had put large JSON blobs (multiple kB) into a
column, and queries were over 100ms when these columns were involved. I don't
know how much of that was a Postgres limitation and how much was the particular
schema and hardware we had, so I want to find out!
# Experiment design
I'm going to run a benchmark on three databases: Postgres, MariaDB, and SQLite.
Each will start by loading in a bunch of text documents into each database,
then we will do some random reads and measure the time of each. Memory and CPU
limits will be set on the non-embedded databases.
The text documents will be generated randomly at a size and count determined to
be a reasonable size Pique's data will probably reach after a few years. Our
experiment is not particularly valid if it only lasts a year.
To sample, we will pick random IDs from in the range of (0, count), since IDs
are assigned monotonically increasing for the databases we have chosen.
# Results