Add ADR for our choice of SQLite as our primary database,
backed up by experiments demonstrating that SQLite will meet all of our requirements. This also introduces ADRs in the repo, and adds a README in preparation making the repository public.
This commit is contained in:
parent
05812a521e
commit
77d4ebb371
29 changed files with 6549 additions and 1 deletions
1
.adr-dir
Normal file
1
.adr-dir
Normal file
|
@ -0,0 +1 @@
|
||||||
|
_docs/decisions/
|
4
.gitignore
vendored
4
.gitignore
vendored
|
@ -1 +1,3 @@
|
||||||
/target
|
target/
|
||||||
|
*.db
|
||||||
|
*.xml
|
||||||
|
|
2456
Cargo.lock
generated
Normal file
2456
Cargo.lock
generated
Normal file
File diff suppressed because it is too large
Load diff
|
@ -1,3 +1,4 @@
|
||||||
|
workspace = { members = ["_experiments/2024-03-02-database-benchmark"] }
|
||||||
[package]
|
[package]
|
||||||
name = "pique"
|
name = "pique"
|
||||||
version = "0.1.0"
|
version = "0.1.0"
|
||||||
|
|
58
README.md
Normal file
58
README.md
Normal file
|
@ -0,0 +1,58 @@
|
||||||
|
|
||||||
|
# Pique
|
||||||
|
|
||||||
|
Pique is project management software that is a delight to use!
|
||||||
|
|
||||||
|
This project is in very early stages, so here's what you need to know:
|
||||||
|
|
||||||
|
- It's being developed by [Nicole / ntietz](https://ntietz.com/) as a side project
|
||||||
|
- It's not production ready!
|
||||||
|
- It's **not open-source** and contributions are not welcome
|
||||||
|
- It will be free to use while it's in development, but will likely transition
|
||||||
|
to paid plans pretty quickly. I hope to always offer some paid plan, but that
|
||||||
|
is if I can do it without burning my budget.
|
||||||
|
|
||||||
|
**If it's not open-source, why can you see this?** Simply because I (Nicole)
|
||||||
|
find it much better and easier to work in the open. The code is available
|
||||||
|
because there is utility in that. It has few drawbacks. If someone wants to
|
||||||
|
steal it, they can, but that's pretty illegal. Eventually it *might* wind up
|
||||||
|
open-source, or as a coop, or just as a solo dev project. I don't know, but
|
||||||
|
openness is a core value for me, so here we are.
|
||||||
|
|
||||||
|
If you want to use it, and there is not a plan available yet, just let me know.
|
||||||
|
My personal email is [me@ntietz.com](mailto:me@ntietz.com) and I can get you
|
||||||
|
set up.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Workflow and setup
|
||||||
|
|
||||||
|
### Rust
|
||||||
|
|
||||||
|
This project uses Rust. Setup the toolchain on your local machine as per usual.
|
||||||
|
We use nightly, and installation and management using [rustup][rustup] is
|
||||||
|
recommended.
|
||||||
|
|
||||||
|
|
||||||
|
### Docs
|
||||||
|
|
||||||
|
Decisions are recorded in ADRs[^adr] using a command-line tool to create and
|
||||||
|
manage them. You can install it with:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cargo install adrs
|
||||||
|
```
|
||||||
|
|
||||||
|
See the [adrs docs](https://crates.io/crates/adrs) for more infomration on
|
||||||
|
usage.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
[^adr]: [Archictecture Decision Records](https://adr.github.io/) are a
|
||||||
|
lightweight way of recording decisions made on a project.
|
||||||
|
|
||||||
|
[rustup]: https://rustup.rs/
|
||||||
|
|
20
_docs/decisions/0001-record-architecture-decisions.md
Normal file
20
_docs/decisions/0001-record-architecture-decisions.md
Normal file
|
@ -0,0 +1,20 @@
|
||||||
|
# 1. Record architecture decisions
|
||||||
|
|
||||||
|
Date: 2024-03-16
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Accepted
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
We need to record the architectural decisions made on this project.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
We will use Architecture Decision Records, as [described by Michael Nygard](http://thinkrelevance.com/blog/2011/11/15/documenting-architecture-decisions).
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
See Michael Nygard's article, linked above. For a lightweight ADR toolset, see Nat Pryce's [adr-tools](https://github.com/npryce/adr-tools).
|
||||||
|
|
56
_docs/decisions/0002-primary-database-choice.md
Normal file
56
_docs/decisions/0002-primary-database-choice.md
Normal file
|
@ -0,0 +1,56 @@
|
||||||
|
# 2. Primary database choice
|
||||||
|
|
||||||
|
Date: 2024-03-16
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Accepted
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
Pique has to store data somewhere. We're going to use a database for this, and
|
||||||
|
have to choose which one to use.
|
||||||
|
|
||||||
|
Constraints:
|
||||||
|
|
||||||
|
- Should require minimal ops
|
||||||
|
- Should support storing large-ish rows (about 64 kB)
|
||||||
|
- Should support fast random reads (page loads will be p99 under 50ms, and the
|
||||||
|
DB allocation from this is a small fraction)
|
||||||
|
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
We are going to use SQLite as our primary database and [SeaORM](https://github.com/SeaQL/sea-orm)
|
||||||
|
as the ORM. We will limit rows to 8 kB or smaller to have performance margin.
|
||||||
|
|
||||||
|
This decision was made using an [experiment](../../_experiments/2024-03-02-database-benchmark),
|
||||||
|
which found that:
|
||||||
|
|
||||||
|
- The ops burden for MariaDB would be unsuitably high, requiring work to get
|
||||||
|
it setup for our size of data and some work for performance tuning
|
||||||
|
- PostgreSQL cannot meet our performance requirements on larger documents
|
||||||
|
- SQLite can meet our performance requirements on up to 64 kB documents, and
|
||||||
|
possibly higher.
|
||||||
|
|
||||||
|
These experiments were done with memory constraints on both SQLite and Postgres,
|
||||||
|
with SQLite having about 10x faster random reads.
|
||||||
|
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
This has a few consequences for Pique.
|
||||||
|
|
||||||
|
First, it means that **we will be limited to single-node hosting** unless we
|
||||||
|
implement read replication using something like [litestream](https://litestream.io/).
|
||||||
|
This is acceptable given our focus on smaller organizations, and we can shard
|
||||||
|
the application if we need to.
|
||||||
|
|
||||||
|
Second, it means that **self-hosting is more feasible**. We can more easily offer
|
||||||
|
backup downloads from within the app itself, leveraging SQLite's features for
|
||||||
|
generating a backup, and we can have everything run inside one executable with
|
||||||
|
data stored on the disk. Not requiring a separate DB process makes the hosting
|
||||||
|
story simpler.
|
||||||
|
|
2
_experiments/2024-03-02-database-benchmark/.env
Normal file
2
_experiments/2024-03-02-database-benchmark/.env
Normal file
|
@ -0,0 +1,2 @@
|
||||||
|
DATABASE_URL=postgresql://postgres:password@localhost/postgres
|
||||||
|
#DATABASE_URL=sqlite:./database.db?mode=rwc
|
1
_experiments/2024-03-02-database-benchmark/.env-postgres
Normal file
1
_experiments/2024-03-02-database-benchmark/.env-postgres
Normal file
|
@ -0,0 +1 @@
|
||||||
|
DATABASE_URL=postgresql://postgres:password@localhost/postgres
|
1
_experiments/2024-03-02-database-benchmark/.env-sqlite
Normal file
1
_experiments/2024-03-02-database-benchmark/.env-sqlite
Normal file
|
@ -0,0 +1 @@
|
||||||
|
DATABASE_URL=sqlite:./database.db?mode=rwc
|
3445
_experiments/2024-03-02-database-benchmark/Cargo.lock
generated
Normal file
3445
_experiments/2024-03-02-database-benchmark/Cargo.lock
generated
Normal file
File diff suppressed because it is too large
Load diff
28
_experiments/2024-03-02-database-benchmark/Cargo.toml
Normal file
28
_experiments/2024-03-02-database-benchmark/Cargo.toml
Normal file
|
@ -0,0 +1,28 @@
|
||||||
|
[package]
|
||||||
|
name = "bench"
|
||||||
|
version = "0.1.0"
|
||||||
|
edition = "2021"
|
||||||
|
|
||||||
|
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
|
||||||
|
|
||||||
|
[dependencies]
|
||||||
|
anyhow = "1.0.80"
|
||||||
|
chrono = { version = "0.4.35", features = ["now"] }
|
||||||
|
criterion = { version = "0.5.1", features = ["async", "async_tokio", "async_futures", "async_std"] }
|
||||||
|
dotenvy = "0.15.7"
|
||||||
|
entity = { version = "0.1.0", path = "entity" }
|
||||||
|
env_logger = "0.11.3"
|
||||||
|
futures = "0.3.30"
|
||||||
|
log = "0.4.21"
|
||||||
|
migration = { version = "0.1.0", path = "migration" }
|
||||||
|
rand = "0.8.5"
|
||||||
|
sea-orm = { version = "0.12.14", features = ["sqlx-mysql", "sqlx-sqlite", "sqlx-postgres", "macros", "runtime-async-std-rustls"] }
|
||||||
|
serde = { version = "1.0.197", features = ["derive"] }
|
||||||
|
tokio = { version = "1.36.0", features = ["full", "rt"] }
|
||||||
|
|
||||||
|
[workspace]
|
||||||
|
members = [".", "entity", "migration"]
|
||||||
|
|
||||||
|
[[bench]]
|
||||||
|
name = "db"
|
||||||
|
harness = false
|
9
_experiments/2024-03-02-database-benchmark/Makefile
Normal file
9
_experiments/2024-03-02-database-benchmark/Makefile
Normal file
|
@ -0,0 +1,9 @@
|
||||||
|
|
||||||
|
entity: FORCE
|
||||||
|
sea-orm-cli generate entity -o entity/src/ -l
|
||||||
|
|
||||||
|
migrate: FORCE
|
||||||
|
sea-orm-cli migrate up
|
||||||
|
|
||||||
|
FORCE:
|
||||||
|
|
133
_experiments/2024-03-02-database-benchmark/benches/db.rs
Normal file
133
_experiments/2024-03-02-database-benchmark/benches/db.rs
Normal file
|
@ -0,0 +1,133 @@
|
||||||
|
use bench::data::random_entities;
|
||||||
|
use criterion::async_executor::{AsyncExecutor, FuturesExecutor};
|
||||||
|
use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion, Throughput};
|
||||||
|
use rand::distributions::{Distribution, Uniform};
|
||||||
|
use std::sync::Arc;
|
||||||
|
use std::time::Duration;
|
||||||
|
|
||||||
|
use entity::prelude::*;
|
||||||
|
use migration::Migrator;
|
||||||
|
use migration::MigratorTrait;
|
||||||
|
use sea_orm::ConnectOptions;
|
||||||
|
use sea_orm::Database;
|
||||||
|
use sea_orm::{prelude::*, Condition};
|
||||||
|
|
||||||
|
async fn load_row(db: &DatabaseConnection, count: &i32) {
|
||||||
|
let mut rng = rand::thread_rng();
|
||||||
|
let ids: Vec<i32> = Uniform::new(0, *count)
|
||||||
|
.sample_iter(&mut rng)
|
||||||
|
.take(5)
|
||||||
|
.collect();
|
||||||
|
let _ = Page::find()
|
||||||
|
.filter(Condition::all().add(entity::page::Column::Id.is_in(ids)))
|
||||||
|
.all(db)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
//let _ = Page::find_by_id(id).one(db).await.unwrap().unwrap();
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn setup_db(
|
||||||
|
db_url: &str,
|
||||||
|
dsize: usize,
|
||||||
|
dcount: usize,
|
||||||
|
) -> anyhow::Result<Arc<DatabaseConnection>> {
|
||||||
|
let mut opts = ConnectOptions::new(db_url);
|
||||||
|
opts.connect_timeout(Duration::from_secs(2));
|
||||||
|
opts.max_connections(50);
|
||||||
|
|
||||||
|
let db = Database::connect(opts).await?;
|
||||||
|
Migrator::reset(&db).await?;
|
||||||
|
Migrator::refresh(&db).await?;
|
||||||
|
|
||||||
|
match db.get_database_backend() {
|
||||||
|
sea_orm::DatabaseBackend::MySql => {
|
||||||
|
let _ = db
|
||||||
|
.execute(sea_orm::Statement::from_string(
|
||||||
|
db.get_database_backend(),
|
||||||
|
"SET GLOBAL max_allowed_packet=1073741824;",
|
||||||
|
))
|
||||||
|
.await?;
|
||||||
|
}
|
||||||
|
sea_orm::DatabaseBackend::Postgres => {}
|
||||||
|
sea_orm::DatabaseBackend::Sqlite => {}
|
||||||
|
};
|
||||||
|
|
||||||
|
let max_per_chunk = 32 * MB;
|
||||||
|
let num_chunks = (dsize * dcount) / max_per_chunk;
|
||||||
|
let pages_per_chunk = std::cmp::min(dcount / num_chunks, 5000);
|
||||||
|
|
||||||
|
let pages = random_entities(dcount, dsize);
|
||||||
|
for chunk in pages.chunks(pages_per_chunk) {
|
||||||
|
let _ = Page::insert_many(chunk.to_vec()).exec(&db).await?;
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(Arc::new(db))
|
||||||
|
}
|
||||||
|
|
||||||
|
const SQLITE_URL: &str = "sqlite:./database.db?mode=rwc";
|
||||||
|
const POSTGRES_URL: &str = "postgresql://postgres:password@localhost/postgres";
|
||||||
|
|
||||||
|
static KB: usize = 1024;
|
||||||
|
static MB: usize = 1024 * KB;
|
||||||
|
static GB: usize = 1024 * MB;
|
||||||
|
|
||||||
|
fn load_from_sqlite(c: &mut Criterion) {
|
||||||
|
let mut group = c.benchmark_group("sqlite");
|
||||||
|
|
||||||
|
//for document_size in [KB, 8 * KB, 64 * KB, 512 * KB, 4 * MB, 32 * MB] {
|
||||||
|
for document_size in [8 * KB, 64 * KB] {
|
||||||
|
let document_count = 3 * GB / document_size;
|
||||||
|
println!(
|
||||||
|
"attempting {} documents of size {}",
|
||||||
|
document_count, document_size
|
||||||
|
);
|
||||||
|
let db = FuturesExecutor
|
||||||
|
.block_on(setup_db(SQLITE_URL, document_size, document_count))
|
||||||
|
.unwrap();
|
||||||
|
println!("db setup, about to abuse it");
|
||||||
|
FuturesExecutor.block_on(async {
|
||||||
|
let res = db.execute_unprepared("PRAGMA hard_heap_limit = 1073741824").await.unwrap();
|
||||||
|
println!("{:?}", res);
|
||||||
|
});
|
||||||
|
|
||||||
|
group.throughput(Throughput::Bytes(document_size as u64));
|
||||||
|
group.bench_with_input(
|
||||||
|
BenchmarkId::from_parameter(document_size),
|
||||||
|
&(db, document_size, document_count as i32),
|
||||||
|
|b, (db, _size, count)| {
|
||||||
|
b.to_async(FuturesExecutor).iter(|| async {
|
||||||
|
load_row(&db, count).await;
|
||||||
|
});
|
||||||
|
},
|
||||||
|
);
|
||||||
|
}
|
||||||
|
group.finish();
|
||||||
|
}
|
||||||
|
|
||||||
|
fn load_from_postgres(c: &mut Criterion) {
|
||||||
|
let mut group = c.benchmark_group("postres");
|
||||||
|
|
||||||
|
//for document_size in [KB, 8 * KB, 64 * KB, 512 * KB, 4 * MB, 32 * MB] {
|
||||||
|
for document_size in [8 * KB, 64 * KB] {
|
||||||
|
let document_count = 3 * GB / document_size;
|
||||||
|
let db = FuturesExecutor
|
||||||
|
.block_on(setup_db(POSTGRES_URL, document_size, document_count))
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
group.throughput(Throughput::Bytes(document_size as u64));
|
||||||
|
group.bench_with_input(
|
||||||
|
BenchmarkId::from_parameter(document_size),
|
||||||
|
&(db, document_size, document_count as i32),
|
||||||
|
|b, (db, _size, count)| {
|
||||||
|
b.to_async(FuturesExecutor).iter(|| async {
|
||||||
|
load_row(db, count).await;
|
||||||
|
});
|
||||||
|
},
|
||||||
|
);
|
||||||
|
}
|
||||||
|
group.finish();
|
||||||
|
}
|
||||||
|
|
||||||
|
criterion_group!(benches, load_from_postgres, load_from_sqlite,);
|
||||||
|
criterion_main!(benches);
|
15
_experiments/2024-03-02-database-benchmark/entity/Cargo.toml
Normal file
15
_experiments/2024-03-02-database-benchmark/entity/Cargo.toml
Normal file
|
@ -0,0 +1,15 @@
|
||||||
|
[package]
|
||||||
|
name = "entity"
|
||||||
|
version = "0.1.0"
|
||||||
|
edition = "2021"
|
||||||
|
|
||||||
|
[lib]
|
||||||
|
name = "entity"
|
||||||
|
path = "src/lib.rs"
|
||||||
|
|
||||||
|
[dependencies.sea-orm]
|
||||||
|
version = "0.12.0"
|
||||||
|
features = [
|
||||||
|
"runtime-tokio-rustls",
|
||||||
|
"sqlx-sqlite",
|
||||||
|
]
|
|
@ -0,0 +1,5 @@
|
||||||
|
//! `SeaORM` Entity. Generated by sea-orm-codegen 0.12.14
|
||||||
|
|
||||||
|
pub mod prelude;
|
||||||
|
|
||||||
|
pub mod page;
|
|
@ -0,0 +1,5 @@
|
||||||
|
//! `SeaORM` Entity. Generated by sea-orm-codegen 0.12.14
|
||||||
|
|
||||||
|
pub mod prelude;
|
||||||
|
|
||||||
|
pub mod page;
|
|
@ -0,0 +1,18 @@
|
||||||
|
//! `SeaORM` Entity. Generated by sea-orm-codegen 0.12.14
|
||||||
|
|
||||||
|
use sea_orm::entity::prelude::*;
|
||||||
|
|
||||||
|
#[derive(Clone, Debug, PartialEq, DeriveEntityModel, Eq)]
|
||||||
|
#[sea_orm(table_name = "page")]
|
||||||
|
pub struct Model {
|
||||||
|
#[sea_orm(primary_key)]
|
||||||
|
pub id: i32,
|
||||||
|
pub external_id: i64,
|
||||||
|
pub title: String,
|
||||||
|
pub text: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Copy, Clone, Debug, EnumIter, DeriveRelation)]
|
||||||
|
pub enum Relation {}
|
||||||
|
|
||||||
|
impl ActiveModelBehavior for ActiveModel {}
|
|
@ -0,0 +1,3 @@
|
||||||
|
//! `SeaORM` Entity. Generated by sea-orm-codegen 0.12.14
|
||||||
|
|
||||||
|
pub use super::page::Entity as Page;
|
|
@ -0,0 +1,24 @@
|
||||||
|
[package]
|
||||||
|
name = "migration"
|
||||||
|
version = "0.1.0"
|
||||||
|
edition = "2021"
|
||||||
|
publish = false
|
||||||
|
|
||||||
|
[lib]
|
||||||
|
name = "migration"
|
||||||
|
path = "src/lib.rs"
|
||||||
|
|
||||||
|
[dependencies]
|
||||||
|
async-std = { version = "1", features = ["attributes", "tokio1"] }
|
||||||
|
|
||||||
|
[dependencies.sea-orm-migration]
|
||||||
|
version = "0.12.0"
|
||||||
|
features = [
|
||||||
|
# Enable at least one `ASYNC_RUNTIME` and `DATABASE_DRIVER` feature if you want to run migration via CLI.
|
||||||
|
# View the list of supported features at https://www.sea-ql.org/SeaORM/docs/install-and-config/database-and-async-runtime.
|
||||||
|
# e.g.
|
||||||
|
"runtime-tokio-rustls", # `ASYNC_RUNTIME` feature
|
||||||
|
"sqlx-sqlite", # `DATABASE_DRIVER` feature
|
||||||
|
"sqlx-postgres", # `DATABASE_DRIVER` feature
|
||||||
|
"sqlx-mysql", # `DATABASE_DRIVER` feature
|
||||||
|
]
|
|
@ -0,0 +1,41 @@
|
||||||
|
# Running Migrator CLI
|
||||||
|
|
||||||
|
- Generate a new migration file
|
||||||
|
```sh
|
||||||
|
cargo run -- generate MIGRATION_NAME
|
||||||
|
```
|
||||||
|
- Apply all pending migrations
|
||||||
|
```sh
|
||||||
|
cargo run
|
||||||
|
```
|
||||||
|
```sh
|
||||||
|
cargo run -- up
|
||||||
|
```
|
||||||
|
- Apply first 10 pending migrations
|
||||||
|
```sh
|
||||||
|
cargo run -- up -n 10
|
||||||
|
```
|
||||||
|
- Rollback last applied migrations
|
||||||
|
```sh
|
||||||
|
cargo run -- down
|
||||||
|
```
|
||||||
|
- Rollback last 10 applied migrations
|
||||||
|
```sh
|
||||||
|
cargo run -- down -n 10
|
||||||
|
```
|
||||||
|
- Drop all tables from the database, then reapply all migrations
|
||||||
|
```sh
|
||||||
|
cargo run -- fresh
|
||||||
|
```
|
||||||
|
- Rollback all applied migrations, then reapply all migrations
|
||||||
|
```sh
|
||||||
|
cargo run -- refresh
|
||||||
|
```
|
||||||
|
- Rollback all applied migrations
|
||||||
|
```sh
|
||||||
|
cargo run -- reset
|
||||||
|
```
|
||||||
|
- Check the status of all migrations
|
||||||
|
```sh
|
||||||
|
cargo run -- status
|
||||||
|
```
|
|
@ -0,0 +1,12 @@
|
||||||
|
pub use sea_orm_migration::prelude::*;
|
||||||
|
|
||||||
|
mod m20240307_110706_create_tables;
|
||||||
|
|
||||||
|
pub struct Migrator;
|
||||||
|
|
||||||
|
#[async_trait::async_trait]
|
||||||
|
impl MigratorTrait for Migrator {
|
||||||
|
fn migrations() -> Vec<Box<dyn MigrationTrait>> {
|
||||||
|
vec![Box::new(m20240307_110706_create_tables::Migration)]
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,54 @@
|
||||||
|
use std::fmt;
|
||||||
|
|
||||||
|
use sea_orm_migration::prelude::*;
|
||||||
|
|
||||||
|
#[derive(DeriveMigrationName)]
|
||||||
|
pub struct Migration;
|
||||||
|
|
||||||
|
#[async_trait::async_trait]
|
||||||
|
impl MigrationTrait for Migration {
|
||||||
|
async fn up(&self, manager: &SchemaManager) -> Result<(), DbErr> {
|
||||||
|
manager
|
||||||
|
.create_table(
|
||||||
|
Table::create()
|
||||||
|
.table(Page::Table)
|
||||||
|
.if_not_exists()
|
||||||
|
.col(
|
||||||
|
ColumnDef::new(Page::Id)
|
||||||
|
.integer()
|
||||||
|
.not_null()
|
||||||
|
.auto_increment()
|
||||||
|
.primary_key(),
|
||||||
|
)
|
||||||
|
.col(ColumnDef::new(Page::ExternalId).big_integer().not_null())
|
||||||
|
.col(ColumnDef::new(Page::Title).string().not_null())
|
||||||
|
.col(ColumnDef::new(Page::Text).string().not_null())
|
||||||
|
//.col(ColumnDef::new(Page::Text).custom(LongText).not_null())
|
||||||
|
.to_owned(),
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn down(&self, manager: &SchemaManager) -> Result<(), DbErr> {
|
||||||
|
manager
|
||||||
|
.drop_table(Table::drop().table(Page::Table).to_owned())
|
||||||
|
.await
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
pub struct LongText;
|
||||||
|
|
||||||
|
impl Iden for LongText {
|
||||||
|
fn unquoted(&self, s: &mut dyn fmt::Write) {
|
||||||
|
s.write_str("LongText").unwrap();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(DeriveIden)]
|
||||||
|
enum Page {
|
||||||
|
Table,
|
||||||
|
Id,
|
||||||
|
ExternalId,
|
||||||
|
Title,
|
||||||
|
Text,
|
||||||
|
}
|
|
@ -0,0 +1,6 @@
|
||||||
|
use sea_orm_migration::prelude::*;
|
||||||
|
|
||||||
|
#[async_std::main]
|
||||||
|
async fn main() {
|
||||||
|
cli::run_cli(migration::Migrator).await;
|
||||||
|
}
|
38
_experiments/2024-03-02-database-benchmark/notes.md
Normal file
38
_experiments/2024-03-02-database-benchmark/notes.md
Normal file
|
@ -0,0 +1,38 @@
|
||||||
|
|
||||||
|
# Overview
|
||||||
|
|
||||||
|
The goal of this experiment is to determine what whatabase to use for Pique.
|
||||||
|
Normally, we can just go with a tried-and-true option, like PostgreSQL. However,
|
||||||
|
a few factors are working against us here:
|
||||||
|
|
||||||
|
- The goal for pageloads is under 50ms for every page load (on the server side)
|
||||||
|
- PostgreSQL uses different storage for values over 2kB (compressed), which can
|
||||||
|
lead to much slower reads if it goes to the disk.
|
||||||
|
- We'll be storing documents up to 1 MB each in the database. This is just text
|
||||||
|
content and does *not* include resources like images.
|
||||||
|
|
||||||
|
This combination may make Postgres an unsuitable choice! It has proven slow in
|
||||||
|
the past when, at a job, someone had put large JSON blobs (multiple kB) into a
|
||||||
|
column, and queries were over 100ms when these columns were involved. I don't
|
||||||
|
know how much of that was a Postgres limitation and how much was the particular
|
||||||
|
schema and hardware we had, so I want to find out!
|
||||||
|
|
||||||
|
|
||||||
|
# Experiment design
|
||||||
|
|
||||||
|
I'm going to run a benchmark on three databases: Postgres, MariaDB, and SQLite.
|
||||||
|
Each will start by loading in a bunch of text documents into each database,
|
||||||
|
then we will do some random reads and measure the time of each. Memory and CPU
|
||||||
|
limits will be set on the non-embedded databases.
|
||||||
|
|
||||||
|
The text documents will be generated randomly at a size and count determined to
|
||||||
|
be a reasonable size Pique's data will probably reach after a few years. Our
|
||||||
|
experiment is not particularly valid if it only lasts a year.
|
||||||
|
|
||||||
|
To sample, we will pick random IDs from in the range of (0, count), since IDs
|
||||||
|
are assigned monotonically increasing for the databases we have chosen.
|
||||||
|
|
||||||
|
|
||||||
|
# Results
|
||||||
|
|
||||||
|
|
84
_experiments/2024-03-02-database-benchmark/src/bin/main.rs
Normal file
84
_experiments/2024-03-02-database-benchmark/src/bin/main.rs
Normal file
|
@ -0,0 +1,84 @@
|
||||||
|
use bench::data::random_entities;
|
||||||
|
use env_logger::{Builder, Env};
|
||||||
|
use log::info;
|
||||||
|
use migration::{Migrator, MigratorTrait};
|
||||||
|
use rand::prelude::*;
|
||||||
|
use sea_orm::prelude::*;
|
||||||
|
use sea_orm::sea_query::{Func, SimpleExpr};
|
||||||
|
use sea_orm::ConnectOptions;
|
||||||
|
use sea_orm::{Database, QuerySelect};
|
||||||
|
|
||||||
|
use entity::prelude::*;
|
||||||
|
|
||||||
|
async fn run() -> Result<(), anyhow::Error> {
|
||||||
|
dotenvy::dotenv()?;
|
||||||
|
|
||||||
|
let db_url = std::env::var("DATABASE_URL")?;
|
||||||
|
|
||||||
|
info!("starting db");
|
||||||
|
|
||||||
|
let opts = ConnectOptions::new(db_url);
|
||||||
|
let db = Database::connect(opts).await?;
|
||||||
|
|
||||||
|
Migrator::refresh(&db).await?;
|
||||||
|
let db = &db;
|
||||||
|
|
||||||
|
info!("connected to db");
|
||||||
|
|
||||||
|
info!("starting data load");
|
||||||
|
let pages = random_entities(1000, 1_000_000);
|
||||||
|
info!("finished data load");
|
||||||
|
|
||||||
|
info!("starting db insert");
|
||||||
|
for chunk in pages.chunks(5000) {
|
||||||
|
let _ = Page::insert_many(chunk.to_vec()).exec(db).await?;
|
||||||
|
}
|
||||||
|
info!("finished db insert");
|
||||||
|
|
||||||
|
let length_expr: SimpleExpr = Func::char_length(Expr::col((
|
||||||
|
entity::page::Entity,
|
||||||
|
entity::page::Column::Text,
|
||||||
|
)))
|
||||||
|
.into();
|
||||||
|
|
||||||
|
info!("fetching big row count");
|
||||||
|
let mut large_row_ids: Vec<i32> = entity::page::Entity::find()
|
||||||
|
.filter(length_expr.binary(migration::BinOper::GreaterThan, Expr::val(8 * 1024)))
|
||||||
|
.column(entity::page::Column::Id)
|
||||||
|
.into_tuple()
|
||||||
|
.all(db)
|
||||||
|
.await?;
|
||||||
|
info!("counted {} big rows", large_row_ids.len());
|
||||||
|
|
||||||
|
let num_rows = Page::find().count(db).await?;
|
||||||
|
info!("inserted {} rows", num_rows);
|
||||||
|
|
||||||
|
let mut rng = thread_rng();
|
||||||
|
large_row_ids.shuffle(&mut rng);
|
||||||
|
|
||||||
|
info!("starting");
|
||||||
|
let mut bytes_read = 0;
|
||||||
|
for id in large_row_ids.iter().take(1000) {
|
||||||
|
let row = Page::find_by_id(*id).one(db).await?.unwrap();
|
||||||
|
bytes_read += row.text.len();
|
||||||
|
}
|
||||||
|
println!("read {} bytes", bytes_read);
|
||||||
|
info!("done");
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::main]
|
||||||
|
async fn main() -> Result<(), anyhow::Error> {
|
||||||
|
init_logger();
|
||||||
|
|
||||||
|
run().await?;
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
fn init_logger() {
|
||||||
|
let env = Env::default().filter_or("BENCH_LOG_LEVEL", "info,sqlx=error");
|
||||||
|
|
||||||
|
Builder::from_env(env).format_timestamp_millis().init();
|
||||||
|
}
|
26
_experiments/2024-03-02-database-benchmark/src/data.rs
Normal file
26
_experiments/2024-03-02-database-benchmark/src/data.rs
Normal file
|
@ -0,0 +1,26 @@
|
||||||
|
use rand::{
|
||||||
|
distributions::{Alphanumeric, DistString},
|
||||||
|
thread_rng,
|
||||||
|
};
|
||||||
|
use sea_orm::ActiveValue;
|
||||||
|
|
||||||
|
pub fn random_entities(count: usize, text_length: usize) -> Vec<entity::page::ActiveModel> {
|
||||||
|
let mut pages = vec![];
|
||||||
|
|
||||||
|
let mut rng = thread_rng();
|
||||||
|
|
||||||
|
for idx in 0..count {
|
||||||
|
let _id = idx as i32;
|
||||||
|
let title = "dummy_title";
|
||||||
|
let text = Alphanumeric.sample_string(&mut rng, text_length);
|
||||||
|
|
||||||
|
pages.push(entity::page::ActiveModel {
|
||||||
|
external_id: ActiveValue::Set(1),
|
||||||
|
title: ActiveValue::Set(title.to_owned()),
|
||||||
|
text: ActiveValue::Set(text),
|
||||||
|
..Default::default()
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
pages
|
||||||
|
}
|
1
_experiments/2024-03-02-database-benchmark/src/lib.rs
Normal file
1
_experiments/2024-03-02-database-benchmark/src/lib.rs
Normal file
|
@ -0,0 +1 @@
|
||||||
|
pub mod data;
|
3
_experiments/2024-03-02-database-benchmark/start-pg.sh
Executable file
3
_experiments/2024-03-02-database-benchmark/start-pg.sh
Executable file
|
@ -0,0 +1,3 @@
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
podman run --name postgres --network=host --cpus 2 --memory 1g -e POSTGRES_PASSWORD=password -d postgres
|
Loading…
Reference in a new issue