pique/_experiments/2024-03-02-database-benchmark/notes.md

1.6 KiB

Overview

The goal of this experiment is to determine what whatabase to use for Pique. Normally, we can just go with a tried-and-true option, like PostgreSQL. However, a few factors are working against us here:

  • The goal for pageloads is under 50ms for every page load (on the server side)
  • PostgreSQL uses different storage for values over 2kB (compressed), which can lead to much slower reads if it goes to the disk.
  • We'll be storing documents up to 1 MB each in the database. This is just text content and does not include resources like images.

This combination may make Postgres an unsuitable choice! It has proven slow in the past when, at a job, someone had put large JSON blobs (multiple kB) into a column, and queries were over 100ms when these columns were involved. I don't know how much of that was a Postgres limitation and how much was the particular schema and hardware we had, so I want to find out!

Experiment design

I'm going to run a benchmark on three databases: Postgres, MariaDB, and SQLite. Each will start by loading in a bunch of text documents into each database, then we will do some random reads and measure the time of each. Memory and CPU limits will be set on the non-embedded databases.

The text documents will be generated randomly at a size and count determined to be a reasonable size Pique's data will probably reach after a few years. Our experiment is not particularly valid if it only lasts a year.

To sample, we will pick random IDs from in the range of (0, count), since IDs are assigned monotonically increasing for the databases we have chosen.

Results