pique/_docs/decisions/0002-primary-database-choice.md
Nicole Tietz-Sokolskaya 77d4ebb371 Add ADR for our choice of SQLite as our primary database,
backed up by experiments demonstrating that SQLite will meet all of our
requirements.

This also introduces ADRs in the repo, and adds a README in preparation
making the repository public.
2024-03-16 11:12:46 -04:00

56 lines
1.8 KiB
Markdown

# 2. Primary database choice
Date: 2024-03-16
## Status
Accepted
## Context
Pique has to store data somewhere. We're going to use a database for this, and
have to choose which one to use.
Constraints:
- Should require minimal ops
- Should support storing large-ish rows (about 64 kB)
- Should support fast random reads (page loads will be p99 under 50ms, and the
DB allocation from this is a small fraction)
## Decision
We are going to use SQLite as our primary database and [SeaORM](https://github.com/SeaQL/sea-orm)
as the ORM. We will limit rows to 8 kB or smaller to have performance margin.
This decision was made using an [experiment](../../_experiments/2024-03-02-database-benchmark),
which found that:
- The ops burden for MariaDB would be unsuitably high, requiring work to get
it setup for our size of data and some work for performance tuning
- PostgreSQL cannot meet our performance requirements on larger documents
- SQLite can meet our performance requirements on up to 64 kB documents, and
possibly higher.
These experiments were done with memory constraints on both SQLite and Postgres,
with SQLite having about 10x faster random reads.
## Consequences
This has a few consequences for Pique.
First, it means that **we will be limited to single-node hosting** unless we
implement read replication using something like [litestream](https://litestream.io/).
This is acceptable given our focus on smaller organizations, and we can shard
the application if we need to.
Second, it means that **self-hosting is more feasible**. We can more easily offer
backup downloads from within the app itself, leveraging SQLite's features for
generating a backup, and we can have everything run inside one executable with
data stored on the disk. Not requiring a separate DB process makes the hosting
story simpler.