Merge branch 'ulid_benchmarks'

2023-07-15 13:54:26 -07:00 · 2023-07-15 13:54:26 -07:00 · b8b81131a9
commit b8b81131a9
parent 568352597b 0e2220cfc1
2 changed files with 398 additions and 0 deletions
--- a/content/rnd/ulid_benchmarks/index.md
+++ b/content/rnd/ulid_benchmarks/index.md
@ -0,0 +1,398 @@
+++
+title = "A One-Part Serialized Mystery, Part 2: The Benchmarks"
+slug = "one-part-serialized-mystery-part-2"
+date = "2023-07-15"
+[taxonomies]
+tags = ["software", "rnd", "proclamation", "upscm", "rust", "sqlite"]
+++
+
+# A one-part serial mystery post-hoc prequel
+
+I [wrote recently](/rnd/one-part-serialized-mystery) about switching the types of the primary keys in
+the database for an [in-progress web app](https://gitlab.com/nebkor/ww) I'm building. At that time,
+I'd not yet done any benchmarking, but had reason to believe that using [sortable primary
+keys](https://github.com/ulid/spec) would yield some possibly-significant gains in performance, in
+both time and space. I'd also read accounts of regret that databases had not used ULIDs (instead of
+[UUIDs](https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_(random))) from the
+get-go, so I decided it couldn't hurt to switch to them before I had any actual data in my DB.
+
+And that was correct: it didn't hurt performance, but it also didn't help much either. I've spent a
+bunch of time now doing comparative benchmarks between ULIDs and UUIDs, and as I explain below, the
+anticipated space savings did not materialize, and the initial speed-ups I got were merely
+augmenting what was already more than fast enough into slightly more fasterer than that. Of course,
+of course, and as always, the real treasure was the friends we made along the way etc., etc. So come
+along on a brief journey of discovery!
+
+# Bottom Line Up Front
+
+ULIDs have a slight edge over UUIDv4s when used as primary keys, but the best primany keys are
+simple integers if you can get away with it. With my final DB schema and import/benchmarking code,
+there was no difference in terms of time taken or space used when using ULIDs vs UUIDs as primary
+keys.
+
+However, with my initial database layout and import code, ULIDs resulted in about 5% less space and
+took only about 2/3rds as much time as when using UUIDs (5.7 vs 9.8 seconds). The same space and
+time results held whether or not [`without rowid`](https://www.sqlite.org/withoutrowid.html) was
+specified on table creation, which was counter to expectation, though I now understand why; I'll
+explain at the end.
+
+# It's a setup
+
+My benchmark is pretty simple: starting from an empty database, do the following things:
+
+ 1. insert 10,000 randomly chosen movies (title and year of release, from between 1965 and 2023) into
+   the database
+ 1. create 1,000 random users[^random-users]
+ 1. for each user, randomly select around 100 movies from the 10,000 available and put them on their list of
+   things to watch
+
+Only that last part is significant, and is where I got my [timing
+information](https://gitlab.com/nebkor/ww/-/blob/897fd993ceaf9c77433d44f8d68009eb466ac3aa/src/bin/import_users.rs#L47-58)
+from.
+
+The table that keeps track of what users want to watch was defined[^not-final-form] like this:
+
+``` sql
+create table if not exists witch_watch (
+  id blob not null primary key,
+  witch blob not null, -- "user"
+  watch blob not null, -- "thing to watch"
+  [...]
+  foreign key (witch) references witches (id) on delete cascade on update no action,
+  foreign key (watch) references watches (id) on delete cascade on update no action
+);
+[...]
+create index if not exists ww_witch_dex on witch_watch (witch);
+create index if not exists ww_watch_dex on witch_watch (watch);
+```
+
+The kind of queries I'm trying to optimize with those indices is "what movies does a certain user
+want to watch?" and "what users want to watch a certain movie?". The IDs are 16-byte blobs; an
+entire row in the table is less than 100 bytes.
+
+## A digression on SQLite and performance
+
+I've mentioned once or twice before that I'm using [SQLite](https://www.sqlite.org/index.html) for
+this project. Any time I need a database, my first reach is for SQLite:
+
+ * the database is a single file, along with a couple temp files that live alongside it, simplifying
+   management
+ * there's no network involved between the client and the database; a connection to the database is
+   a pointer to an object that lives in the same process as the host program; this means that read
+   queries return data back in just a [few
+   *microseconds*](https://www.youtube.com/watch?v=qPfAQY_RahA)
+ * it scales vertically extremely well; it can handle database sizes of many terabytes
+ * it's one of the most widely-installed pieces of software in the world; there's at least one
+   sqlite database on every smartphone, and there's a robust ecosystem of [useful
+   extensions](https://litestream.io/) and other bits of complimentary code freely available
+
+And, it's extremely performant. When using the [WAL journal mode](https://www.sqlite.org/wal.html)
+and the [recommended durability setting](https://www.sqlite.org/pragma.html#pragma_synchronous) for
+WAL mode, along with all other production-appropriate settings, I got almost 20,000 *writes* per
+second[^nothing is that slow]. There were multiple concurrent writers, and each write was a
+transaction that inserted about 100 rows at a time. I had [retry
+logic](https://gitlab.com/nebkor/ww/-/blob/4c44aa12b081c777c82192755ac85d1fe0f5bdca/src/bin/import_users.rs#L143-145)
+in case a transaction failed due to the DB being locked by another writer, but that never happened:
+each write was just too fast.
+
+# Over-indexing on sortability
+
+The reason I had hoped that ULIDs would help with keeping the sizes of the indexes down was the
+possibility of using [clustered
+indexes](https://www.sqlite.org/withoutrowid.html#benefits_of_without_rowid_tables). To paraphrase
+that link:
+
+> In an ordinary SQLite table, the PRIMARY KEY is really just a UNIQUE index. The key used to look
+> up records on disk is the rowid. [...]any other kind of PRIMARY KEYs, including "INT PRIMARY KEY"
+> are just unique indexes in an ordinary rowid table.
+>
+> ...
+>
+> Consider querying this table to find the number of occurrences of the word "xsync".:
+>    SELECT cnt FROM wordcount WHERE word='xsync';
+>
+> This query first has to search the index B-Tree looking for any entry that contains the matching
+> value for "word". When an entry is found in the index, the rowid is extracted and used to search
+> the main table. Then the "cnt" value is read out of the main table and returned. Hence, two
+> separate binary searches are required to fulfill the request.
+>
+> A WITHOUT ROWID table uses a different data design for the equivalent table. [in those tables],
+> there is only a single B-Tree...  Because there is only a single B-Tree, the text of the "word"
+> column is only stored once in the database. Furthermore, querying the "cnt" value for a specific
+> "word" only involves a single binary search into the main B-Tree, since the "cnt" value can be
+> retrieved directly from the record found by that first search and without the need to do a second
+> binary search on the rowid.
+>
+> Thus, in some cases, a WITHOUT ROWID table can use about half the amount of disk space and can
+> operate nearly twice as fast. Of course, in a real-world schema, there will typically be secondary
+> indices and/or UNIQUE constraints, and the situation is more complicated. But even then, there can
+> often be space and performance advantages to using WITHOUT ROWID on tables that have non-integer
+> or composite PRIMARY KEYs.
+
+<div class="caption">sorry what was that about secondary indices i didn't quite catch that</div>
+
+HALF the disk space, *and* TWICE as fast?? Yes, sign me up, please!
+
+## Sorry, the best I can do is all the disk space
+
+There are some [guidelines](https://www.sqlite.org/withoutrowid.html#when_to_use_without_rowid)
+about when to use `without rowid`:
+
+> The WITHOUT ROWID optimization is likely to be helpful for tables that have non-integer or
+> composite (multi-column) PRIMARY KEYs and that do not store large strings or BLOBs.
+>
+> [...]
+>
+> WITHOUT ROWID tables work best when individual rows are not too large. A good rule-of-thumb is
+> that the average size of a single row in a WITHOUT ROWID table should be less than about 1/20th
+> the size of a database page. That means that rows should not contain more than ... about 200 bytes
+> each for 4KiB page size.
+
+As I mentioned, each row in that table was less than 100 bytes, so comfortably within the given
+heuristic. In order to test this out, all I had to do was change the table creation statement to:
+
+``` sql
+create table if not exists witch_watch (
+  id blob not null primary key,
+  witch blob not null, -- "user"
+  watch blob not null, -- "thing to watch"
+  [...]
+  foreign key (witch) references witches (id) on delete cascade on update no action,
+  foreign key (watch) references watches (id) on delete cascade on update no action
+) without rowid;
+```
+
+So I did.
+
+Imagine my surprise when it took nearly 20% longer to run, and the total size on disk was nearly 5%
+larger. Using random UUIDs was even slower, so there's still a relative speed win for ULIDs, but it
+was still an overall loss to go without the rowid. Maybe it was time to think outside the box?
+
+## Schema husbandry
+
+I had several goals with this whole benchmarking endeavor. One, of course, was to get performance
+data on ULIDs vs. UUIDs, at the very least so that I could write about it when I publicly had said I
+would. But another, and actually-more-important goal, was to optimize the design of my database and
+software, especially as it came to size on disk (my most-potentially-scarce computing resource;
+network and CPU are not problems until you get *very* large, and you would have long ago
+bottlenecked on storage if you weren't careful).
+
+So it was Cool and Fine to take advantage of the new capabilities that ULIDs offered if those new
+capabilities resulted in better resource use. Every table in my original, UUID-based schema had had
+a `created_at` column, stored as a 64-bit signed offset from the [UNIX
+epoch](https://en.wikipedia.org/wiki/Unix_time). Because ULIDs encode their creation time, I could
+remove that column from every table that used ULIDs as their primary key. [Doing
+so](https://gitlab.com/nebkor/ww/-/commit/5782651aa691125f11a80e241f14c681dda7a7c1) dropped the
+overall DB size by 5-10% compared to UUID-based tables with a `created_at` column. This advantage
+was unique to ULIDs as opposed to UUIDv4s, and so using the latter with a schema that excludude a
+"created at" column was giving an unrealistic edge to UUIDs, but for my benchmarks, I was interested
+in isolating their effect on index sizes, so it was OK.
+
+I also realized that for the `watch_quests` table, no explicit ID needed to be added; there were
+already two `UNIQUE` constraints for each row, that would together uniquely identify that row: the
+ID of the user that wanted to watch something, and the ID of the thing they wanted to watch. Primary
+keys don't need to be a single column; when two or more columns in a table are used as a primary
+key, it's called a "composite key". You may recall from the "when should you use `without rowid`"
+section that composite keys were one such situation where it may be beneficial. Surely this would
+help!
+
+``` sql
+create table if not exists witch_watch (
+  witch blob not null,
+  watch blob not null,
+  [...]
+  primary key (witch, watch)
+) without rowid;
+```
+<div class="caption">"witch" and "watch" are still foreign keys</div>
+
+And, it did, a little. I also took a more critical eye to that table as a whole, and realized I
+could [tidy up the
+DB](https://gitlab.com/nebkor/ww/-/commit/0e016552ab6c66d5fdd82704b6277bd857c94188?view=parallel#f1043d50a0244c34e4d056fe96659145d03b549b_34_34)
+a little more, and remove one more redundant field; this helped a little bit, too.
+
+But overall, things were still looking like ULIDs had no real inherent advantage over UUIDs in the
+context of clustered indexes, given the schema I was using, when it came to disk space. For sure,
+ULIDs continued to enjoy an advantage in insertion speed, but as I tightened up my code for
+inserting these values for this benchmark, the marginal advantage there kept shrinking. Ultimately,
+this advantage completely shrank as I made the schema and code more optimal, but that's getting
+slightly ahead of things. I had to this point achieved almost the final form, but one more change
+had to be made.
+
+# At last, I've reached my final form
+
+In the course of writing this post, I had a minor epiphany, which is that the reason for the
+regressed performance when using `without rowid` was that the secondary indices needed to point to
+the entries in the table, using the primary key of the table as the target. So when there was a ULID
+or UUID primary key, the indexes looked like, eg, this:
+
+``` text
+16-byte blob -> 16-byte blob
+```
+<div class="caption">left side is a user id or watch id, and right side is the id of a row in the
+quests table</div>
+
+But, in the case that there was a `rowid` primary key in the `watch_quest` table, the index entries for,
+eg, `user` to "watch quest" would look like:
+
+``` text
+16-byte blob -> 8-byte number (rowid)
+```
+
+The astute among you may note that 8 is only half of 16, and if you recall that there are two
+secondary indexes that look like that, the total number of secondary index bytes is 64 in the
+`without rowid` case, and only 48 in the case that there is a rowid.
+
+There's also a bit of cautious wisdom about performance implications of the implementation that
+backs the `without rowid` tables:
+
+> WITHOUT ROWID tables are implemented using ordinary B-Trees with content stored on both leaves and
+> intermediate nodes. Storing content in intermediate nodes causes each intermediate node entry to
+> take up more space on the page and thus reduces the fan-out, increasing the search cost.
+
+the fan-out when using `without rowid` was about 20% lower than when using the rowids, and it seems
+like this was slowing things down.
+
+Thinking on it some more, there's really no real reason to give this table a distinct and robust
+identity for its rows; the real identity is carried by its combination of `(user, watch)` columns,
+but even then, the value of distinct identity for these rows is low. If that's the case, which it
+is, then why give it an explicit primary key at all? The program and the users don't need to worry
+about the primary key for that table. It would also eliminate an entire index (an
+automatically-generated "primary key to rowid" index), resulting in the ultimate space savings.
+
+So, [that's what I
+did](https://gitlab.com/nebkor/ww/-/commit/2c7990ff09106fa2a9ec30974bbc377b44082082):
+
+``` sql
+-- table of what people want to watch
+create table if not exists watch_quests (
+  user blob not null,
+  watch blob not null,
+  priority int, -- 1-5 how much do you want to watch it
+  public boolean not null default true,
+  watched boolean not null default false,
+  when_watched int,
+  created_at int not null default (unixepoch()),
+  last_updated int not null default (unixepoch()),
+  foreign key (user) references users (id) on delete cascade on update no action,
+  foreign key (watch) references watches (id) on delete cascade on update no action
+);
+
+create index if not exists quests_user_dex on watch_quests (user);
+create index if not exists quests_watch_dex on watch_quests (watch);
+```
+
+There's the full and final schema.
+
+In the default benchmark, with 1,000 users each saving about 100 things to watch, that schema change
+dropped the total size on disk about 25% (from 17 megabytes to 13), and the percentage of the total
+database consumed by the indexes of the `watch_quests` table went from 51% to 43% (that means the
+indexes went from being about 8.6MB to 5.6MB, 35% less than when using a composite primary key).
+
+
+``` text
+*** Indices of table WATCH_QUESTS *********************************************
+
+Percentage of total database......................  43.3%
+Average fanout.................................... 106.00
+```
+
+It also dropped the total time to insert the 100k records from &gt;6 seconds to just 5; I ran the
+benchmark multiple times and got the same results, then tried running it with 2,000 users saving 200
+movies (4x the previous benchmark), and the results held uncannily:
+
+``` text
+$ cargo run --release --bin import_users -- -d ~/movies.db -u 2000 -m 200
+[...]
+Added 398119 quests in 20.818506 seconds
+```
+<div class="caption">20k writes/second, baby</div>
+
+Just for kicks, I tried it with UUID-based IDs, and the time and space characteristics were finally
+completely indistinguishable. This pleased me; real-world perf with ULIDs would be better than with
+UUIDs with a production schema that included `created_at` columns, and UUIDs would obligate columns
+like that if you wanted to keep track of, you know, when things were created. Ironically, by moving
+to implicit integer rowid primary keys for the `watch_quests` table, I had to make sure that there
+was a `created_at` column for it. Still a win, though!
+
+## Next steps with IDs
+
+This project is supposed to be more than just a testbed for learning about databases and web
+frameworks and sortable unique identifiers; it's supposed to be an actual thing that my wife and I
+can use for ourselves and with our friends. I even made a snazzy logo!
+
+![what to watch][logo]
+
+The gods, it seems, have other plans.
+
+Namely, it bothers me that ID generation is not done inside the database itself. Aside from being a
+generally bad idea, this lead to at least one frustrating debug session where I was inserting one ID
+but reporting back another. SQLite doesn't have native support for this, but it does have good
+native support for [loading shared libraries as plugins](https://www.sqlite.org/loadext.html) in
+order to add functionality to it, and so my next step is to write one of those, and remove the ID
+generation logic from the application.
+
+Doing so would also allow me to address an underlying error in the way the application generates
+them. The [ULID spec](https://github.com/ulid/spec) contains the following note about IDs generated
+within the same millisecond:
+
+> When generating a ULID within the same millisecond, we can provide some guarantees regarding sort
+> order. Namely, if the same millisecond is detected, the random component is incremented by 1 bit
+> in the least significant bit position (with carrying).
+
+I don't do that[^sequential ids], because doing so requires a single ID factory, and I don't want to
+have to thread that through the web app backend code. On the other hand, I *do* want to have a
+single ID factory inside the database, which an extension plugin would provide.
+
+Then I'll get back to the web app.
+
+# Thanks and goodbye
+
+OK, well, here we are, at the end of yet another three-thousand-word meeting that could have been an
+email; sorry about that, and thanks for sticking with it until the end! As usual, it was hard to not
+just keep adding more commentary and footnotes and explication, and I give myself a 'C+' there, at
+best.  At least there are only four footnotes.
+
+Still, I read and watched a lot of different things in the course of doing this work. Obviously the
+SQLite project was critical, and every time I need to consult their documentation, I appreciate it
+more (aside from the software itself, of course!). Towards the end of the this work, right as I was
+starting to write this post, I discovered this [series of
+videos](https://www.youtube.com/playlist?list=PLWENznQwkAoxww-cDEfIJ-uuPDfFwbeiJ) about SQLite, from
+[Mycelial](https://github.com/mycelial), who are "a maker of local-first software development
+libraries". I'm a huge fan of [local-first software](https://www.inkandswitch.com/local-first/), and
+one of the reasons I initially chose SQLite was for its suitability for that paradigm. Thank you,
+SQLite and Mycelial!
+
+Good bye :&#41;
+
+
+----
+
+[^random-users]: I did the classic "open `/usr/share/dict/words` and randomly select a couple things
+    to stick together" method of username generation, which results in gems like
+    "Hershey_motivations84" and "italicizes_creaminesss54". This is old-skool generative content.
+
+[^not-final-form]: The original schema was defined some time ago, and it took me a while to get to
+    the point where I was actually writing code that used it. In the course of doing the benchmarks,
+    and even in the course of writing this post, I've made changes in response to things I learned
+    from the benchmarks and to things I realized by thinking more about it and reading more docs.
+
+[^nothing is that slow]: At one of my previous jobs, there was a rather important internal service,
+    written in Python and talking to a PostgreSQL backend, that would basically completely fall over
+    if more than 100 or so requests per second were made to it. Its introduction to
+    mission-criticality had pre-dated my time there, and when it had first been deployed, the
+    demands upon it had been more modest. But it was now a problem, and I and a teammate put aside
+    some time to pluck some low-hanging fruit. A colleague on a peer team, who was that team's tech
+    lead and truly a beast of a programmer, said that he thought that the reason it could handle
+    only 100 requests/second was that "Python is slow." This shocked me; Python is not that
+    slow. PostgreSQL is not that slow. Nothing is that slow, especially in an enterprise environment
+    where you're just slinging data around via API; if it's that slow, you're doing it wrong. What
+    he said haunts me to this very day. Anyway, we tweaked the slowest query in the API callchain a
+    smidge and sped it up by a few factors; we left a ton of perf on the floor there still, but
+    c'est la vie.
+
+[^sequential ids]: At one point, I was worried that because all the entries in my benchmark were
+    being created at close to 20 per millisecond, that the resulting IDs would be essentially
+    random, so I forced the IDs to be sequential. This wound up being a red herring.
+
+[logo]: ./what2watch_logo.png "what to watch logo; an eyeball filled with static, and with a red iris, looking down at you"
--- a/content/rnd/ulid_benchmarks/what2watch_logo.png
+++ b/content/rnd/ulid_benchmarks/what2watch_logo.png