411 lines
22 KiB
Markdown
411 lines
22 KiB
Markdown
+++
|
|
title = "A One-Part Serialized Mystery, Part 2: The Benchmarks"
|
|
slug = "one-part-serialized-mystery-part-2"
|
|
date = "2023-07-15"
|
|
updated = "2023-07-21"
|
|
[taxonomies]
|
|
tags = ["software", "rnd", "proclamation", "upscm", "rust", "sqlite"]
|
|
+++
|
|
|
|
# A one-part serial mystery post-hoc prequel
|
|
|
|
I [wrote recently](/rnd/one-part-serialized-mystery) about switching the types of the primary keys in
|
|
the database for an [in-progress web app](https://gitlab.com/nebkor/ww) I'm building. At that time,
|
|
I'd not yet done any benchmarking, but had reason to believe that using [sortable primary
|
|
keys](https://github.com/ulid/spec) would yield some possibly-significant gains in performance, in
|
|
both time and space. I'd also read accounts of regret that databases had not used ULIDs (instead of
|
|
[UUIDs](https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_(random))) from the
|
|
get-go, so I decided it couldn't hurt to switch to them before I had any actual data in my DB.
|
|
|
|
And that was correct: it didn't hurt performance, but it also didn't help much either. I've spent a
|
|
bunch of time now doing comparative benchmarks between ULIDs and UUIDs, and as I explain below, the
|
|
anticipated space savings did not materialize, and the initial speed-ups I got were merely
|
|
augmenting what was already more than fast enough into slightly more fasterer than that. Of course,
|
|
of course, and as always, the real treasure was the friends we made along the way etc., etc. So come
|
|
along on a brief journey of discovery!
|
|
|
|
# Bottom Line Up Front
|
|
|
|
ULIDs have a slight edge over UUIDv4s when used as primary keys, but the best primany keys are
|
|
simple integers if you can get away with it. With my final DB schema and import/benchmarking code,
|
|
there was no difference in terms of time taken or space used when using ULIDs vs UUIDs as primary
|
|
keys.
|
|
|
|
However, with my initial database layout and import code, ULIDs resulted in about 5% less space and
|
|
took only about 2/3rds as much time as when using UUIDs (5.7 vs 9.8 seconds). The same space and
|
|
time results held whether or not [`without rowid`](https://www.sqlite.org/withoutrowid.html) was
|
|
specified on table creation, which was counter to expectation, though I now understand why; I'll
|
|
explain at the end.
|
|
|
|
# It's a setup
|
|
|
|
My benchmark is pretty simple: starting from an empty database, do the following things:
|
|
|
|
1. insert 10,000 randomly chosen movies (title and year of release, from between 1965 and 2023) into
|
|
the database
|
|
1. create 1,000 random users[^random-users]
|
|
1. for each user, randomly select around 100 movies from the 10,000 available and put them on their list of
|
|
things to watch
|
|
|
|
Only that last part is significant, and is where I got my [timing
|
|
information](https://gitlab.com/nebkor/ww/-/blob/897fd993ceaf9c77433d44f8d68009eb466ac3aa/src/bin/import_users.rs#L47-58)
|
|
from.
|
|
|
|
The table that keeps track of what users want to watch was defined[^not-final-form] like this:
|
|
|
|
``` sql
|
|
create table if not exists witch_watch (
|
|
id blob not null primary key,
|
|
witch blob not null, -- "user"
|
|
watch blob not null, -- "thing to watch"
|
|
[...]
|
|
foreign key (witch) references witches (id) on delete cascade on update no action,
|
|
foreign key (watch) references watches (id) on delete cascade on update no action
|
|
);
|
|
[...]
|
|
create index if not exists ww_witch_dex on witch_watch (witch);
|
|
create index if not exists ww_watch_dex on witch_watch (watch);
|
|
```
|
|
|
|
The kind of queries I'm trying to optimize with those indices is "what movies does a certain user
|
|
want to watch?" and "what users want to watch a certain movie?". The IDs are 16-byte blobs; an
|
|
entire row in the table is less than 100 bytes.
|
|
|
|
## A digression on SQLite and performance
|
|
|
|
I've mentioned once or twice before that I'm using [SQLite](https://www.sqlite.org/index.html) for
|
|
this project. Any time I need a database, my first reach is for SQLite:
|
|
|
|
* the database is a single file, along with a couple temp files that live alongside it, simplifying
|
|
management
|
|
* there's no network involved between the client and the database; a connection to the database is
|
|
a pointer to an object that lives in the same process as the host program; this means that read
|
|
queries return data back in just a [few
|
|
*microseconds*](https://www.youtube.com/watch?v=qPfAQY_RahA)
|
|
* it scales vertically extremely well; it can handle database sizes of many terabytes
|
|
* it's one of the most widely-installed pieces of software in the world; there's at least one
|
|
sqlite database on every smartphone, and there's a robust ecosystem of [useful
|
|
extensions](https://litestream.io/) and other bits of complimentary code freely available
|
|
|
|
And, it's extremely performant. When using the [WAL journal mode](https://www.sqlite.org/wal.html)
|
|
and the [recommended durability setting](https://www.sqlite.org/pragma.html#pragma_synchronous) for
|
|
WAL mode, along with all other production-appropriate settings, I got almost 20,000 *writes* per
|
|
second[^nothing is that slow]. There were multiple concurrent writers, and each write was a
|
|
transaction that inserted about 100 rows at a time. I had [retry
|
|
logic](https://gitlab.com/nebkor/ww/-/blob/4c44aa12b081c777c82192755ac85d1fe0f5bdca/src/bin/import_users.rs#L143-145)
|
|
in case a transaction failed due to the DB being locked by another writer, but that never happened:
|
|
each write was just too fast.
|
|
|
|
# Over-indexing on sortability
|
|
|
|
The reason I had hoped that ULIDs would help with keeping the sizes of the indexes down was the
|
|
possibility of using [clustered
|
|
indexes](https://www.sqlite.org/withoutrowid.html#benefits_of_without_rowid_tables). To paraphrase
|
|
that link:
|
|
|
|
> In an ordinary SQLite table, the PRIMARY KEY is really just a UNIQUE index. The key used to look
|
|
> up records on disk is the rowid. [...]any other kind of PRIMARY KEYs, including "INT PRIMARY KEY"
|
|
> are just unique indexes in an ordinary rowid table.
|
|
>
|
|
> ...
|
|
>
|
|
> Consider querying this table to find the number of occurrences of the word "xsync".:
|
|
> SELECT cnt FROM wordcount WHERE word='xsync';
|
|
>
|
|
> This query first has to search the index B-Tree looking for any entry that contains the matching
|
|
> value for "word". When an entry is found in the index, the rowid is extracted and used to search
|
|
> the main table. Then the "cnt" value is read out of the main table and returned. Hence, two
|
|
> separate binary searches are required to fulfill the request.
|
|
>
|
|
> A WITHOUT ROWID table uses a different data design for the equivalent table. [in those tables],
|
|
> there is only a single B-Tree... Because there is only a single B-Tree, the text of the "word"
|
|
> column is only stored once in the database. Furthermore, querying the "cnt" value for a specific
|
|
> "word" only involves a single binary search into the main B-Tree, since the "cnt" value can be
|
|
> retrieved directly from the record found by that first search and without the need to do a second
|
|
> binary search on the rowid.
|
|
>
|
|
> Thus, in some cases, a WITHOUT ROWID table can use about half the amount of disk space and can
|
|
> operate nearly twice as fast. Of course, in a real-world schema, there will typically be secondary
|
|
> indices and/or UNIQUE constraints, and the situation is more complicated. But even then, there can
|
|
> often be space and performance advantages to using WITHOUT ROWID on tables that have non-integer
|
|
> or composite PRIMARY KEYs.
|
|
|
|
<div class="caption">sorry what was that about secondary indices i didn't quite catch that</div>
|
|
|
|
HALF the disk space, *and* TWICE as fast?? Yes, sign me up, please!
|
|
|
|
## Sorry, the best I can do is all the disk space
|
|
|
|
There are some [guidelines](https://www.sqlite.org/withoutrowid.html#when_to_use_without_rowid)
|
|
about when to use `without rowid`:
|
|
|
|
> The WITHOUT ROWID optimization is likely to be helpful for tables that have non-integer or
|
|
> composite (multi-column) PRIMARY KEYs and that do not store large strings or BLOBs.
|
|
>
|
|
> [...]
|
|
>
|
|
> WITHOUT ROWID tables work best when individual rows are not too large. A good rule-of-thumb is
|
|
> that the average size of a single row in a WITHOUT ROWID table should be less than about 1/20th
|
|
> the size of a database page. That means that rows should not contain more than ... about 200 bytes
|
|
> each for 4KiB page size.
|
|
|
|
As I mentioned, each row in that table was less than 100 bytes, so comfortably within the given
|
|
heuristic. In order to test this out, all I had to do was change the table creation statement to:
|
|
|
|
``` sql
|
|
create table if not exists witch_watch (
|
|
id blob not null primary key,
|
|
witch blob not null, -- "user"
|
|
watch blob not null, -- "thing to watch"
|
|
[...]
|
|
foreign key (witch) references witches (id) on delete cascade on update no action,
|
|
foreign key (watch) references watches (id) on delete cascade on update no action
|
|
) without rowid;
|
|
```
|
|
|
|
So I did.
|
|
|
|
Imagine my surprise when it took nearly 20% longer to run, and the total size on disk was nearly 5%
|
|
larger. Using random UUIDs was even slower, so there's still a relative speed win for ULIDs, but it
|
|
was still an overall loss to go without the rowid. Maybe it was time to think outside the box?
|
|
|
|
## Schema husbandry
|
|
|
|
I had several goals with this whole benchmarking endeavor. One, of course, was to get performance
|
|
data on ULIDs vs. UUIDs, at the very least so that I could write about it when I publicly had said I
|
|
would. But another, and actually-more-important goal, was to optimize the design of my database and
|
|
software, especially as it came to size on disk (my most-potentially-scarce computing resource;
|
|
network and CPU are not problems until you get *very* large, and you would have long ago
|
|
bottlenecked on storage if you weren't careful).
|
|
|
|
So it was Cool and Fine to take advantage of the new capabilities that ULIDs offered if those new
|
|
capabilities resulted in better resource use. Every table in my original, UUID-based schema had had
|
|
a `created_at` column, stored as a 64-bit signed offset from the [UNIX
|
|
epoch](https://en.wikipedia.org/wiki/Unix_time). Because ULIDs encode their creation time, I could
|
|
remove that column from every table that used ULIDs as their primary key. [Doing
|
|
so](https://gitlab.com/nebkor/ww/-/commit/5782651aa691125f11a80e241f14c681dda7a7c1) dropped the
|
|
overall DB size by 5-10% compared to UUID-based tables with a `created_at` column. This advantage
|
|
was unique to ULIDs as opposed to UUIDv4s, and so using the latter with a schema that excludude a
|
|
"created at" column was giving an unrealistic edge to UUIDs, but for my benchmarks, I was interested
|
|
in isolating their effect on index sizes, so it was OK.
|
|
|
|
I also realized that for the `watch_quests` table, no explicit ID needed to be added; there were
|
|
already two `UNIQUE` constraints for each row, that would together uniquely identify that row: the
|
|
ID of the user that wanted to watch something, and the ID of the thing they wanted to watch. Primary
|
|
keys don't need to be a single column; when two or more columns in a table are used as a primary
|
|
key, it's called a "composite key". You may recall from the "when should you use `without rowid`"
|
|
section that composite keys were one such situation where it may be beneficial. Surely this would
|
|
help!
|
|
|
|
``` sql
|
|
create table if not exists witch_watch (
|
|
witch blob not null,
|
|
watch blob not null,
|
|
[...]
|
|
primary key (witch, watch)
|
|
) without rowid;
|
|
```
|
|
<div class="caption">"witch" and "watch" are still foreign keys</div>
|
|
|
|
And, it did, a little. I also took a more critical eye to that table as a whole, and realized I
|
|
could [tidy up the
|
|
DB](https://gitlab.com/nebkor/ww/-/commit/0e016552ab6c66d5fdd82704b6277bd857c94188?view=parallel#f1043d50a0244c34e4d056fe96659145d03b549b_34_34)
|
|
a little more, and remove one more redundant field; this helped a little bit, too.
|
|
|
|
But overall, things were still looking like ULIDs had no real inherent advantage over UUIDs in the
|
|
context of clustered indexes, given the schema I was using, when it came to disk space. For sure,
|
|
ULIDs continued to enjoy an advantage in insertion speed, but as I tightened up my code for
|
|
inserting these values for this benchmark, the marginal advantage there kept shrinking. Ultimately,
|
|
this advantage completely shrank as I made the schema and code more optimal, but that's getting
|
|
slightly ahead of things. I had to this point achieved almost the final form, but one more change
|
|
had to be made.
|
|
|
|
# At last, I've reached my final form
|
|
|
|
In the course of writing this post, I had a minor epiphany, which is that the reason for the
|
|
regressed performance when using `without rowid` was that the secondary indices needed to point to
|
|
the entries in the table, using the primary key of the table as the target. So when there was a ULID
|
|
or UUID primary key, the indexes looked like, eg, this:
|
|
|
|
``` text
|
|
16-byte blob -> 16-byte blob
|
|
```
|
|
<div class="caption">left side is a user id or watch id, and right side is the id of a row in the
|
|
quests table</div>
|
|
|
|
But, in the case that there was a `rowid` primary key in the `watch_quest` table, the index entries for,
|
|
eg, `user` to "watch quest" would look like:
|
|
|
|
``` text
|
|
16-byte blob -> 8-byte number (rowid)
|
|
```
|
|
|
|
The astute among you may note that 8 is only half of 16, and if you recall that there are two
|
|
secondary indexes that look like that, the total number of secondary index bytes is 64 in the
|
|
`without rowid` case, and only 48 in the case that there is a rowid.
|
|
|
|
There's also a bit of cautious wisdom about performance implications of the implementation that
|
|
backs the `without rowid` tables:
|
|
|
|
> WITHOUT ROWID tables are implemented using ordinary B-Trees with content stored on both leaves and
|
|
> intermediate nodes. Storing content in intermediate nodes causes each intermediate node entry to
|
|
> take up more space on the page and thus reduces the fan-out, increasing the search cost.
|
|
|
|
the fan-out when using `without rowid` was about 20% lower than when using the rowids, and it seems
|
|
like this was slowing things down.
|
|
|
|
Thinking on it some more, there's really no real reason to give this table a distinct and robust
|
|
identity for its rows; the real identity is carried by its combination of `(user, watch)` columns,
|
|
but even then, the value of distinct identity for these rows is low. If that's the case, which it
|
|
is, then why give it an explicit primary key at all? The program and the users don't need to worry
|
|
about the primary key for that table. It would also eliminate an entire index (an
|
|
automatically-generated "primary key to rowid" index), resulting in the ultimate space savings.
|
|
|
|
So, [that's what I
|
|
did](https://gitlab.com/nebkor/ww/-/commit/2c7990ff09106fa2a9ec30974bbc377b44082082):
|
|
|
|
``` sql
|
|
-- table of what people want to watch
|
|
create table if not exists watch_quests (
|
|
user blob not null,
|
|
watch blob not null,
|
|
priority int, -- 1-5 how much do you want to watch it
|
|
public boolean not null default true,
|
|
watched boolean not null default false,
|
|
when_watched int,
|
|
created_at int not null default (unixepoch()),
|
|
last_updated int not null default (unixepoch()),
|
|
foreign key (user) references users (id) on delete cascade on update no action,
|
|
foreign key (watch) references watches (id) on delete cascade on update no action
|
|
);
|
|
|
|
create index if not exists quests_user_dex on watch_quests (user);
|
|
create index if not exists quests_watch_dex on watch_quests (watch);
|
|
```
|
|
|
|
There's the full and final schema.
|
|
|
|
In the default benchmark, with 1,000 users each saving about 100 things to watch, that schema change
|
|
dropped the total size on disk about 25% (from 17 megabytes to 13), and the percentage of the total
|
|
database consumed by the indexes of the `watch_quests` table went from 51% to 43% (that means the
|
|
indexes went from being about 8.6MB to 5.6MB, 35% less than when using a composite primary key).
|
|
|
|
``` text
|
|
*** Indices of table WATCH_QUESTS *********************************************
|
|
|
|
Percentage of total database...................... 43.3%
|
|
Average fanout.................................... 106.00
|
|
```
|
|
|
|
It also dropped the total time to insert the 100k records from >6 seconds to just 5; I ran the
|
|
benchmark multiple times and got the same results, then tried running it with 2,000 users saving 200
|
|
movies (4x the previous benchmark), and the results held uncannily:
|
|
|
|
``` text
|
|
$ cargo run --release --bin import_users -- -d ~/movies.db -u 2000 -m 200
|
|
[...]
|
|
Added 398119 quests in 20.818506 seconds
|
|
```
|
|
<div class="caption">20k writes/second, baby</div>
|
|
|
|
Just for kicks, I tried it with UUID-based IDs, and the time and space characteristics were finally
|
|
completely indistinguishable. This pleased me; real-world perf with ULIDs would be better than with
|
|
UUIDs with a production schema that included `created_at` columns, and UUIDs would obligate columns
|
|
like that if you wanted to keep track of, you know, when things were created. Ironically, by moving
|
|
to implicit integer rowid primary keys for the `watch_quests` table, I had to make sure that there
|
|
was a `created_at` column for it. Still a win, though!
|
|
|
|
### *UPDATE (2023-07-21)!*
|
|
|
|
Something I realized with the "final" schema is that you could have duplicate rows, since the only
|
|
unique field was the `rowid`. I didn't want this. So, rather than create a `unique index on
|
|
watch_quests (user, watch)`, I [just
|
|
added](https://gitlab.com/nebkor/ww/-/commit/c685dc1a6b08d9ff6bafc72582acb539651a350c) a `primary
|
|
key (user, watch)`.
|
|
|
|
If that looks familiar, good eye! Doing this brings the disk usage back up to 17MB in the baseline
|
|
benchmark, but the insert rate is still the same. In the grand scheme of things, this is still not a
|
|
lot of data, so I'll take it anyway.
|
|
|
|
|
|
## Next steps with IDs
|
|
|
|
This project is supposed to be more than just a testbed for learning about databases and web
|
|
frameworks and sortable unique identifiers; it's supposed to be an actual thing that my wife and I
|
|
can use for ourselves and with our friends. I even made a snazzy logo!
|
|
|
|
![what to watch][logo]
|
|
|
|
The gods, it seems, have other plans.
|
|
|
|
Namely, it bothers me that ID generation is not done inside the database itself. Aside from being a
|
|
generally bad idea, this lead to at least one frustrating debug session where I was inserting one ID
|
|
but reporting back another. SQLite doesn't have native support for this, but it does have good
|
|
native support for [loading shared libraries as plugins](https://www.sqlite.org/loadext.html) in
|
|
order to add functionality to it, and so my next step is to write one of those, and remove the ID
|
|
generation logic from the application.
|
|
|
|
Doing so would also allow me to address an underlying error in the way the application generates
|
|
them. The [ULID spec](https://github.com/ulid/spec) contains the following note about IDs generated
|
|
within the same millisecond:
|
|
|
|
> When generating a ULID within the same millisecond, we can provide some guarantees regarding sort
|
|
> order. Namely, if the same millisecond is detected, the random component is incremented by 1 bit
|
|
> in the least significant bit position (with carrying).
|
|
|
|
I don't do that[^sequential ids], because doing so requires a single ID factory, and I don't want to
|
|
have to thread that through the web app backend code. On the other hand, I *do* want to have a
|
|
single ID factory inside the database, which an extension plugin would provide.
|
|
|
|
Then I'll get back to the web app.
|
|
|
|
# Thanks and goodbye
|
|
|
|
OK, well, here we are, at the end of yet another three-thousand-word meeting that could have been an
|
|
email; sorry about that, and thanks for sticking with it until the end! As usual, it was hard to not
|
|
just keep adding more commentary and footnotes and explication, and I give myself a 'C+' there, at
|
|
best. At least there are only four footnotes.
|
|
|
|
Still, I read and watched a lot of different things in the course of doing this work. Obviously the
|
|
SQLite project was critical, and every time I need to consult their documentation, I appreciate it
|
|
more (aside from the software itself, of course!). Towards the end of the this work, right as I was
|
|
starting to write this post, I discovered this [series of
|
|
videos](https://www.youtube.com/playlist?list=PLWENznQwkAoxww-cDEfIJ-uuPDfFwbeiJ) about SQLite, from
|
|
[Mycelial](https://github.com/mycelial), who are "a maker of local-first software development
|
|
libraries". I'm a huge fan of [local-first software](https://www.inkandswitch.com/local-first/), and
|
|
one of the reasons I initially chose SQLite was for its suitability for that paradigm. Thank you,
|
|
SQLite and Mycelial!
|
|
|
|
Good bye :)
|
|
|
|
|
|
----
|
|
|
|
[^random-users]: I did the classic "open `/usr/share/dict/words` and randomly select a couple things
|
|
to stick together" method of username generation, which results in gems like
|
|
"Hershey_motivations84" and "italicizes_creaminesss54". This is old-skool generative content.
|
|
|
|
[^not-final-form]: The original schema was defined some time ago, and it took me a while to get to
|
|
the point where I was actually writing code that used it. In the course of doing the benchmarks,
|
|
and even in the course of writing this post, I've made changes in response to things I learned
|
|
from the benchmarks and to things I realized by thinking more about it and reading more docs.
|
|
|
|
[^nothing is that slow]: At one of my previous jobs, there was a rather important internal service,
|
|
written in Python and talking to a PostgreSQL backend, that would basically completely fall over
|
|
if more than 100 or so requests per second were made to it. Its introduction to
|
|
mission-criticality had pre-dated my time there, and when it had first been deployed, the
|
|
demands upon it had been more modest. But it was now a problem, and I and a teammate put aside
|
|
some time to pluck some low-hanging fruit. A colleague on a peer team, who was that team's tech
|
|
lead and truly a beast of a programmer, said that he thought that the reason it could handle
|
|
only 100 requests/second was that "Python is slow." This shocked me; Python is not that
|
|
slow. PostgreSQL is not that slow. Nothing is that slow, especially in an enterprise environment
|
|
where you're just slinging data around via API; if it's that slow, you're doing it wrong. What
|
|
he said haunts me to this very day. Anyway, we tweaked the slowest query in the API callchain a
|
|
smidge and sped it up by a few factors; we left a ton of perf on the floor there still, but
|
|
c'est la vie.
|
|
|
|
[^sequential ids]: At one point, I was worried that because all the entries in my benchmark were
|
|
being created at close to 20 per millisecond, that the resulting IDs would be essentially
|
|
random, so I forced the IDs to be sequential. This wound up being a red herring.
|
|
|
|
[logo]: ./what2watch_logo.png "what to watch logo; an eyeball filled with static, and with a red iris, looking down at you"
|