236 lines
12 KiB
Markdown
236 lines
12 KiB
Markdown
|
+++
|
||
|
title = "A One-Part Serialized Mystery, Part 2: The Benchmarks"
|
||
|
slug = "one-part-serialized-mystery-part-2"
|
||
|
date = "2023-07-09"
|
||
|
[taxonomies]
|
||
|
tags = ["software", "rnd", "proclamation", "upscm", "rust", "macros"]
|
||
|
+++
|
||
|
|
||
|
# A one-part serial mystery post-hoc prequel
|
||
|
|
||
|
I [wrote recently](/rnd/one-part-serialized-mystery) about switching the types of the primary keys in
|
||
|
the database for an [in-progress web app](https://gitlab.com/nebkor/ww) I'm building. At that time,
|
||
|
I'd not yet done any benchmarking, but had reason to believe that using [sortable primary
|
||
|
keys](https://github.com/ulid/spec) would yield some possibly-significant gains in performance, in
|
||
|
both time and space. I'd also read accounts of regret that databases had not used ULIDs (instead of
|
||
|
[UUIDs](https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_(random))) from the
|
||
|
get-go, so I decided it couldn't hurt to switch to them before I had any actual data in my DB.
|
||
|
|
||
|
And that was correct: it didn't hurt performance, but it also didn't help much either. I've spent a
|
||
|
bunch of time now doing comparative benchmarks between ULIDs and UUIDs, and as I explain below, the
|
||
|
anticipated space savings did not materialize, and the speed-up is merely augmenting what was
|
||
|
already more than fast enough into slightly more faster than that. Of course, of course, and as
|
||
|
always, the real treasure was the friends we made along the way etc., etc. So come along on a brief
|
||
|
journey of discovery!
|
||
|
|
||
|
# Bottom Line Up Front
|
||
|
|
||
|
With sqlite and my final table schema, the size difference and speed differences are negligible,
|
||
|
|
||
|
TODO MOR STUFFF
|
||
|
|
||
|
|
||
|
However, with my initial database layout and import code, ULIDs resulted in about 5% less space and
|
||
|
took only about 2/3rds as much time as when using UUIDs (5.7 vs 9.8 seconds). The same space and
|
||
|
time results held whether or not [`without rowid`](https://www.sqlite.org/withoutrowid.html) was
|
||
|
specified on table creation, which was counter to expectation, though I now understand why; I'll
|
||
|
explain at the end.
|
||
|
|
||
|
# It's a setup
|
||
|
|
||
|
My benchmark is pretty simple: starting from an empty database, do the following things:
|
||
|
|
||
|
1. insert 10,000 randomly chosen movies (title and year of release, from between 1965 and 2023) into
|
||
|
the database
|
||
|
1. create 1,000 random users[^random-users]
|
||
|
1. for each user, randomly select around 100 movies from the 10,000 available and put them on their list of
|
||
|
things to watch
|
||
|
|
||
|
Only that last part is significant, and is where I got my timing information from.
|
||
|
|
||
|
The table that keeps track of what users want to watch was defined[^not-final-form] like this:
|
||
|
|
||
|
``` sql
|
||
|
create table if not exists witch_watch (
|
||
|
id blob not null primary key,
|
||
|
witch blob not null, -- "user"
|
||
|
watch blob not null, -- "thing to watch"
|
||
|
[...]
|
||
|
foreign key (witch) references witches (id) on delete cascade on update no action,
|
||
|
foreign key (watch) references watches (id) on delete cascade on update no action
|
||
|
);
|
||
|
[...]
|
||
|
create index if not exists ww_witch_dex on witch_watch (witch);
|
||
|
create index if not exists ww_watch_dex on witch_watch (watch);
|
||
|
```
|
||
|
|
||
|
The kind of queries I'm trying to optimize with those indices is "what movies does a certain user
|
||
|
want to watch?" and "what users want to watch a certain movie?". The IDs are 16-byte blobs; an
|
||
|
entire row in the table is less than 100 bytes.
|
||
|
|
||
|
## A digression on SQLite and performance
|
||
|
|
||
|
I've mentioned once or twice before that I'm using [SQLite](https://www.sqlite.org/index.html) for
|
||
|
this project. Any time I need a database, my first reach is for SQLite:
|
||
|
|
||
|
* the database is a single file, along with a couple temp files that live alongside it, simplifying
|
||
|
management
|
||
|
* there's no network involved between the client and the database; a connection to the database is
|
||
|
a pointer to an object that lives in the same process as the host program; this means that read
|
||
|
queries return data back in just a [few
|
||
|
*microseconds*](https://www.youtube.com/watch?v=qPfAQY_RahA)
|
||
|
* it scales vertically extremely well; it can handle database sizes of many terabytes
|
||
|
* it's one of the most widely-installed pieces of software in the world; there's at least one
|
||
|
sqlite database on every smartphone, and there's a robust ecosystem of [useful
|
||
|
extensions](https://litestream.io/) and other bits of complimentary code freely available
|
||
|
|
||
|
And, it's extremely performant. When using the [WAL journal mode](https://www.sqlite.org/wal.html)
|
||
|
and the [recommended durability setting](https://www.sqlite.org/pragma.html#pragma_synchronous) for
|
||
|
WAL mode, along with all other production-appropriate settings, I got almost 20,000 *writes* per
|
||
|
second[^nothing is that slow]. There were multiple concurrent writers, and each write was a transaction that inserted about
|
||
|
100 rows at a time. I had [retry
|
||
|
logic](https://gitlab.com/nebkor/ww/-/blob/4c44aa12b081c777c82192755ac85d1fe0f5bdca/src/bin/import_users.rs#L143-145)
|
||
|
in case a transaction failed due to the DB being locked by another writer, but that never happened:
|
||
|
each write was just too fast.
|
||
|
|
||
|
# Over-indexing on sortability
|
||
|
|
||
|
The reason I had hoped that ULIDs would help with keeping the sizes of the indexes down was the
|
||
|
possibility of using [clustered
|
||
|
indexes](https://www.sqlite.org/withoutrowid.html#benefits_of_without_rowid_tables). To paraphrase
|
||
|
that link:
|
||
|
|
||
|
> In an ordinary SQLite table, the PRIMARY KEY is really just a UNIQUE index. The key used to look
|
||
|
> up records on disk is the rowid. [...]any other kind of PRIMARY KEYs, including "INT PRIMARY KEY"
|
||
|
> are just unique indexes in an ordinary rowid table.
|
||
|
>
|
||
|
> ...
|
||
|
>
|
||
|
> Consider querying this table to find the number of occurrences of the word "xsync".:
|
||
|
> SELECT cnt FROM wordcount WHERE word='xsync';
|
||
|
>
|
||
|
> This query first has to search the index B-Tree looking for any entry that contains the matching
|
||
|
> value for "word". When an entry is found in the index, the rowid is extracted and used to search
|
||
|
> the main table. Then the "cnt" value is read out of the main table and returned. Hence, two
|
||
|
> separate binary searches are required to fulfill the request.
|
||
|
>
|
||
|
> A WITHOUT ROWID table uses a different data design for the equivalent table. [in those tables],
|
||
|
> there is only a single B-Tree... Because there is only a single B-Tree, the text of the "word"
|
||
|
> column is only stored once in the database. Furthermore, querying the "cnt" value for a specific
|
||
|
> "word" only involves a single binary search into the main B-Tree, since the "cnt" value can be
|
||
|
> retrieved directly from the record found by that first search and without the need to do a second
|
||
|
> binary search on the rowid.
|
||
|
>
|
||
|
> Thus, in some cases, a WITHOUT ROWID table can use about half the amount of disk space and can
|
||
|
> operate nearly twice as fast. Of course, in a real-world schema, there will typically be secondary
|
||
|
> indices and/or UNIQUE constraints, and the situation is more complicated. But even then, there can
|
||
|
> often be space and performance advantages to using WITHOUT ROWID on tables that have non-integer
|
||
|
> or composite PRIMARY KEYs.
|
||
|
|
||
|
<div class="caption">sorry what was that about secondary indices i didn't quite catch that</div>
|
||
|
|
||
|
HALF the disk space *and* TWICE as fast?? Yes, sign me up, please!
|
||
|
|
||
|
## Sorry, the best I can do is all the disk space
|
||
|
|
||
|
There are some [guidelines](https://www.sqlite.org/withoutrowid.html#when_to_use_without_rowid)
|
||
|
about when to use `without rowid`:
|
||
|
|
||
|
> The WITHOUT ROWID optimization is likely to be helpful for tables that have non-integer or
|
||
|
> composite (multi-column) PRIMARY KEYs and that do not store large strings or BLOBs.
|
||
|
>
|
||
|
> [...]
|
||
|
>
|
||
|
> WITHOUT ROWID tables work best when individual rows are not too large. A good rule-of-thumb is
|
||
|
> that the average size of a single row in a WITHOUT ROWID table should be less than about 1/20th
|
||
|
> the size of a database page. That means that rows should not contain more than ... about 200 bytes
|
||
|
> each for 4KiB page size.
|
||
|
|
||
|
As I mentioned, each row in that table was less than 100 bytes, so comfortably within the given
|
||
|
heuristic. In order to test this out, all I had to do was change the table creation statement to:
|
||
|
|
||
|
``` sql
|
||
|
create table if not exists witch_watch (
|
||
|
id blob not null primary key,
|
||
|
witch blob not null, -- "user"
|
||
|
watch blob not null, -- "thing to watch"
|
||
|
[...]
|
||
|
foreign key (witch) references witches (id) on delete cascade on update no action,
|
||
|
foreign key (watch) references watches (id) on delete cascade on update no action
|
||
|
) without rowid;
|
||
|
```
|
||
|
|
||
|
So I did.
|
||
|
|
||
|
Imagine my surprise when it took nearly 20% longer to run, and the total size on disk was nearly 5%
|
||
|
larger. Using random UUIDs was even slower, so there's still a relative speed win for ULIDs, but it
|
||
|
was still an overall loss to go without the rowid. Maybe it was time to think outside the box?
|
||
|
|
||
|
## Schema pruning
|
||
|
|
||
|
I had several goals with this whole benchmarking endeavor. One, of course, was to get data on ULIDs
|
||
|
vs. UUIDs in terms of performance, at the very least so that I could write about when I publicly
|
||
|
said I would. But another, and actually-more-important goal, was to optimize the design of my
|
||
|
database and software, especially as it came to size on disk (my most-potentially-scare computing
|
||
|
resource; network and CPU are not problems until you get *very* large, and you would have long ago
|
||
|
bottlenecked on disk size if you weren't careful).
|
||
|
|
||
|
So it was Cool and Fine to take advantage of the new capabilities that ULIDs offered if those new
|
||
|
capabilities resulted in better resource use. Every table in my original, UUID-based schema had had
|
||
|
a `created_at` column, stored as a 64-bit signed offset from the [UNIX
|
||
|
epoch](https://en.wikipedia.org/wiki/Unix_time). Because ULIDs encode their creation time, I could
|
||
|
remove that column from every table that used ULIDs as their primary key. Doing so dropped the
|
||
|
overall DB size by 5-10% compared to UUID-based tables with a `created_at` column.
|
||
|
|
||
|
But I also realized that for the `watch_quests` table, no explicit
|
||
|
|
||
|
# At last, I've reached my final form
|
||
|
|
||
|
In the course of writing this post, I had a minor epiphany, which is that the reason for the
|
||
|
regressed performance when using `without rowid` was that the secondary indices needed to point to
|
||
|
the entries in the table, using the primary key of the table as the target. So when there was a ULID
|
||
|
or UUID primary key, the indexes looked like, eg, this:
|
||
|
|
||
|
``` text
|
||
|
16-byte blob -> 16-byte blob
|
||
|
```
|
||
|
<div class="caption">left side is, eg, user id, and right side is id of a row in the quests table</div>
|
||
|
|
||
|
|
||
|
|
||
|
using implicit rowid with ULIDs:
|
||
|
|
||
|
``` text
|
||
|
*** Indices of table WATCH_QUESTS *********************************************
|
||
|
|
||
|
Percentage of total database...................... 43.3%
|
||
|
Number of entries................................. 199296
|
||
|
Average fanout.................................... 106.00
|
||
|
```
|
||
|
|
||
|
``` text
|
||
|
$ cargo run --release --bin import_users -- -d ~/movies.db -u 2000 -m 200
|
||
|
[...]
|
||
|
Added 398119 quests in 20.818506 seconds
|
||
|
```
|
||
|
<div class="caption">20k writes/second, baby</div>
|
||
|
|
||
|
size on disk is 75% of previous size (13M vs 17M)
|
||
|
|
||
|
|
||
|
|
||
|
----
|
||
|
|
||
|
[^random-users]: I did the classic "open `/usr/share/dict/words` and randomly select a couple things
|
||
|
to stick together" method of username generation, which results in gems like
|
||
|
"Hershey_motivations84" and "italicizes_creaminesss54". This is old-skool generative AI.
|
||
|
|
||
|
[^not-final-form]: The original schema was defined some time ago, and it took me a while to get to
|
||
|
the point where I was actually writing code that used it. In the course of doing the benchmarks,
|
||
|
and even in the course of writing this post, I've made changes in response to things I learned
|
||
|
from the benchmarks and to things I realized by thinking more about it and reading more docs.
|
||
|
|
||
|
[^nothing is that slow]: old job python 100 reqs/sec fall down
|
||
|
|
||
|
[an_image]: /images/programmers_creed.jpg "some kinda image idunno"
|