diff --git a/content/rnd/ulid_benchmarks/index.md b/content/rnd/ulid_benchmarks/index.md new file mode 100644 index 0000000..3b93bda --- /dev/null +++ b/content/rnd/ulid_benchmarks/index.md @@ -0,0 +1,398 @@ ++++ +title = "A One-Part Serialized Mystery, Part 2: The Benchmarks" +slug = "one-part-serialized-mystery-part-2" +date = "2023-07-15" +[taxonomies] +tags = ["software", "rnd", "proclamation", "upscm", "rust", "sqlite"] ++++ + +# A one-part serial mystery post-hoc prequel + +I [wrote recently](/rnd/one-part-serialized-mystery) about switching the types of the primary keys in +the database for an [in-progress web app](https://gitlab.com/nebkor/ww) I'm building. At that time, +I'd not yet done any benchmarking, but had reason to believe that using [sortable primary +keys](https://github.com/ulid/spec) would yield some possibly-significant gains in performance, in +both time and space. I'd also read accounts of regret that databases had not used ULIDs (instead of +[UUIDs](https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_(random))) from the +get-go, so I decided it couldn't hurt to switch to them before I had any actual data in my DB. + +And that was correct: it didn't hurt performance, but it also didn't help much either. I've spent a +bunch of time now doing comparative benchmarks between ULIDs and UUIDs, and as I explain below, the +anticipated space savings did not materialize, and the initial speed-ups I got were merely +augmenting what was already more than fast enough into slightly more fasterer than that. Of course, +of course, and as always, the real treasure was the friends we made along the way etc., etc. So come +along on a brief journey of discovery! + +# Bottom Line Up Front + +ULIDs have a slight edge over UUIDv4s when used as primary keys, but the best primany keys are +simple integers if you can get away with it. With my final DB schema and import/benchmarking code, +there was no difference in terms of time taken or space used when using ULIDs vs UUIDs as primary +keys. + +However, with my initial database layout and import code, ULIDs resulted in about 5% less space and +took only about 2/3rds as much time as when using UUIDs (5.7 vs 9.8 seconds). The same space and +time results held whether or not [`without rowid`](https://www.sqlite.org/withoutrowid.html) was +specified on table creation, which was counter to expectation, though I now understand why; I'll +explain at the end. + +# It's a setup + +My benchmark is pretty simple: starting from an empty database, do the following things: + + 1. insert 10,000 randomly chosen movies (title and year of release, from between 1965 and 2023) into + the database + 1. create 1,000 random users[^random-users] + 1. for each user, randomly select around 100 movies from the 10,000 available and put them on their list of + things to watch + +Only that last part is significant, and is where I got my [timing +information](https://gitlab.com/nebkor/ww/-/blob/897fd993ceaf9c77433d44f8d68009eb466ac3aa/src/bin/import_users.rs#L47-58) +from. + +The table that keeps track of what users want to watch was defined[^not-final-form] like this: + +``` sql +create table if not exists witch_watch ( + id blob not null primary key, + witch blob not null, -- "user" + watch blob not null, -- "thing to watch" + [...] + foreign key (witch) references witches (id) on delete cascade on update no action, + foreign key (watch) references watches (id) on delete cascade on update no action +); +[...] +create index if not exists ww_witch_dex on witch_watch (witch); +create index if not exists ww_watch_dex on witch_watch (watch); +``` + +The kind of queries I'm trying to optimize with those indices is "what movies does a certain user +want to watch?" and "what users want to watch a certain movie?". The IDs are 16-byte blobs; an +entire row in the table is less than 100 bytes. + +## A digression on SQLite and performance + +I've mentioned once or twice before that I'm using [SQLite](https://www.sqlite.org/index.html) for +this project. Any time I need a database, my first reach is for SQLite: + + * the database is a single file, along with a couple temp files that live alongside it, simplifying + management + * there's no network involved between the client and the database; a connection to the database is + a pointer to an object that lives in the same process as the host program; this means that read + queries return data back in just a [few + *microseconds*](https://www.youtube.com/watch?v=qPfAQY_RahA) + * it scales vertically extremely well; it can handle database sizes of many terabytes + * it's one of the most widely-installed pieces of software in the world; there's at least one + sqlite database on every smartphone, and there's a robust ecosystem of [useful + extensions](https://litestream.io/) and other bits of complimentary code freely available + +And, it's extremely performant. When using the [WAL journal mode](https://www.sqlite.org/wal.html) +and the [recommended durability setting](https://www.sqlite.org/pragma.html#pragma_synchronous) for +WAL mode, along with all other production-appropriate settings, I got almost 20,000 *writes* per +second[^nothing is that slow]. There were multiple concurrent writers, and each write was a +transaction that inserted about 100 rows at a time. I had [retry +logic](https://gitlab.com/nebkor/ww/-/blob/4c44aa12b081c777c82192755ac85d1fe0f5bdca/src/bin/import_users.rs#L143-145) +in case a transaction failed due to the DB being locked by another writer, but that never happened: +each write was just too fast. + +# Over-indexing on sortability + +The reason I had hoped that ULIDs would help with keeping the sizes of the indexes down was the +possibility of using [clustered +indexes](https://www.sqlite.org/withoutrowid.html#benefits_of_without_rowid_tables). To paraphrase +that link: + +> In an ordinary SQLite table, the PRIMARY KEY is really just a UNIQUE index. The key used to look +> up records on disk is the rowid. [...]any other kind of PRIMARY KEYs, including "INT PRIMARY KEY" +> are just unique indexes in an ordinary rowid table. +> +> ... +> +> Consider querying this table to find the number of occurrences of the word "xsync".: +> SELECT cnt FROM wordcount WHERE word='xsync'; +> +> This query first has to search the index B-Tree looking for any entry that contains the matching +> value for "word". When an entry is found in the index, the rowid is extracted and used to search +> the main table. Then the "cnt" value is read out of the main table and returned. Hence, two +> separate binary searches are required to fulfill the request. +> +> A WITHOUT ROWID table uses a different data design for the equivalent table. [in those tables], +> there is only a single B-Tree... Because there is only a single B-Tree, the text of the "word" +> column is only stored once in the database. Furthermore, querying the "cnt" value for a specific +> "word" only involves a single binary search into the main B-Tree, since the "cnt" value can be +> retrieved directly from the record found by that first search and without the need to do a second +> binary search on the rowid. +> +> Thus, in some cases, a WITHOUT ROWID table can use about half the amount of disk space and can +> operate nearly twice as fast. Of course, in a real-world schema, there will typically be secondary +> indices and/or UNIQUE constraints, and the situation is more complicated. But even then, there can +> often be space and performance advantages to using WITHOUT ROWID on tables that have non-integer +> or composite PRIMARY KEYs. + +
sorry what was that about secondary indices i didn't quite catch that
+ +HALF the disk space, *and* TWICE as fast?? Yes, sign me up, please! + +## Sorry, the best I can do is all the disk space + +There are some [guidelines](https://www.sqlite.org/withoutrowid.html#when_to_use_without_rowid) +about when to use `without rowid`: + +> The WITHOUT ROWID optimization is likely to be helpful for tables that have non-integer or +> composite (multi-column) PRIMARY KEYs and that do not store large strings or BLOBs. +> +> [...] +> +> WITHOUT ROWID tables work best when individual rows are not too large. A good rule-of-thumb is +> that the average size of a single row in a WITHOUT ROWID table should be less than about 1/20th +> the size of a database page. That means that rows should not contain more than ... about 200 bytes +> each for 4KiB page size. + +As I mentioned, each row in that table was less than 100 bytes, so comfortably within the given +heuristic. In order to test this out, all I had to do was change the table creation statement to: + +``` sql +create table if not exists witch_watch ( + id blob not null primary key, + witch blob not null, -- "user" + watch blob not null, -- "thing to watch" + [...] + foreign key (witch) references witches (id) on delete cascade on update no action, + foreign key (watch) references watches (id) on delete cascade on update no action +) without rowid; +``` + +So I did. + +Imagine my surprise when it took nearly 20% longer to run, and the total size on disk was nearly 5% +larger. Using random UUIDs was even slower, so there's still a relative speed win for ULIDs, but it +was still an overall loss to go without the rowid. Maybe it was time to think outside the box? + +## Schema husbandry + +I had several goals with this whole benchmarking endeavor. One, of course, was to get performance +data on ULIDs vs. UUIDs, at the very least so that I could write about it when I publicly had said I +would. But another, and actually-more-important goal, was to optimize the design of my database and +software, especially as it came to size on disk (my most-potentially-scarce computing resource; +network and CPU are not problems until you get *very* large, and you would have long ago +bottlenecked on storage if you weren't careful). + +So it was Cool and Fine to take advantage of the new capabilities that ULIDs offered if those new +capabilities resulted in better resource use. Every table in my original, UUID-based schema had had +a `created_at` column, stored as a 64-bit signed offset from the [UNIX +epoch](https://en.wikipedia.org/wiki/Unix_time). Because ULIDs encode their creation time, I could +remove that column from every table that used ULIDs as their primary key. [Doing +so](https://gitlab.com/nebkor/ww/-/commit/5782651aa691125f11a80e241f14c681dda7a7c1) dropped the +overall DB size by 5-10% compared to UUID-based tables with a `created_at` column. This advantage +was unique to ULIDs as opposed to UUIDv4s, and so using the latter with a schema that excludude a +"created at" column was giving an unrealistic edge to UUIDs, but for my benchmarks, I was interested +in isolating their effect on index sizes, so it was OK. + +I also realized that for the `watch_quests` table, no explicit ID needed to be added; there were +already two `UNIQUE` constraints for each row, that would together uniquely identify that row: the +ID of the user that wanted to watch something, and the ID of the thing they wanted to watch. Primary +keys don't need to be a single column; when two or more columns in a table are used as a primary +key, it's called a "composite key". You may recall from the "when should you use `without rowid`" +section that composite keys were one such situation where it may be beneficial. Surely this would +help! + +``` sql +create table if not exists witch_watch ( + witch blob not null, + watch blob not null, + [...] + primary key (witch, watch) +) without rowid; +``` +
"witch" and "watch" are still foreign keys
+ +And, it did, a little. I also took a more critical eye to that table as a whole, and realized I +could [tidy up the +DB](https://gitlab.com/nebkor/ww/-/commit/0e016552ab6c66d5fdd82704b6277bd857c94188?view=parallel#f1043d50a0244c34e4d056fe96659145d03b549b_34_34) +a little more, and remove one more redundant field; this helped a little bit, too. + +But overall, things were still looking like ULIDs had no real inherent advantage over UUIDs in the +context of clustered indexes, given the schema I was using, when it came to disk space. For sure, +ULIDs continued to enjoy an advantage in insertion speed, but as I tightened up my code for +inserting these values for this benchmark, the marginal advantage there kept shrinking. Ultimately, +this advantage completely shrank as I made the schema and code more optimal, but that's getting +slightly ahead of things. I had to this point achieved almost the final form, but one more change +had to be made. + +# At last, I've reached my final form + +In the course of writing this post, I had a minor epiphany, which is that the reason for the +regressed performance when using `without rowid` was that the secondary indices needed to point to +the entries in the table, using the primary key of the table as the target. So when there was a ULID +or UUID primary key, the indexes looked like, eg, this: + +``` text +16-byte blob -> 16-byte blob +``` +
left side is a user id or watch id, and right side is the id of a row in the +quests table
+ +But, in the case that there was a `rowid` primary key in the `watch_quest` table, the index entries for, +eg, `user` to "watch quest" would look like: + +``` text +16-byte blob -> 8-byte number (rowid) +``` + +The astute among you may note that 8 is only half of 16, and if you recall that there are two +secondary indexes that look like that, the total number of secondary index bytes is 64 in the +`without rowid` case, and only 48 in the case that there is a rowid. + +There's also a bit of cautious wisdom about performance implications of the implementation that +backs the `without rowid` tables: + +> WITHOUT ROWID tables are implemented using ordinary B-Trees with content stored on both leaves and +> intermediate nodes. Storing content in intermediate nodes causes each intermediate node entry to +> take up more space on the page and thus reduces the fan-out, increasing the search cost. + +the fan-out when using `without rowid` was about 20% lower than when using the rowids, and it seems +like this was slowing things down. + +Thinking on it some more, there's really no real reason to give this table a distinct and robust +identity for its rows; the real identity is carried by its combination of `(user, watch)` columns, +but even then, the value of distinct identity for these rows is low. If that's the case, which it +is, then why give it an explicit primary key at all? The program and the users don't need to worry +about the primary key for that table. It would also eliminate an entire index (an +automatically-generated "primary key to rowid" index), resulting in the ultimate space savings. + +So, [that's what I +did](https://gitlab.com/nebkor/ww/-/commit/2c7990ff09106fa2a9ec30974bbc377b44082082): + +``` sql +-- table of what people want to watch +create table if not exists watch_quests ( + user blob not null, + watch blob not null, + priority int, -- 1-5 how much do you want to watch it + public boolean not null default true, + watched boolean not null default false, + when_watched int, + created_at int not null default (unixepoch()), + last_updated int not null default (unixepoch()), + foreign key (user) references users (id) on delete cascade on update no action, + foreign key (watch) references watches (id) on delete cascade on update no action +); + +create index if not exists quests_user_dex on watch_quests (user); +create index if not exists quests_watch_dex on watch_quests (watch); +``` + +There's the full and final schema. + +In the default benchmark, with 1,000 users each saving about 100 things to watch, that schema change +dropped the total size on disk about 25% (from 17 megabytes to 13), and the percentage of the total +database consumed by the indexes of the `watch_quests` table went from 51% to 43% (that means the +indexes went from being about 8.6MB to 5.6MB, 35% less than when using a composite primary key). + + +``` text +*** Indices of table WATCH_QUESTS ********************************************* + +Percentage of total database...................... 43.3% +Average fanout.................................... 106.00 +``` + +It also dropped the total time to insert the 100k records from >6 seconds to just 5; I ran the +benchmark multiple times and got the same results, then tried running it with 2,000 users saving 200 +movies (4x the previous benchmark), and the results held uncannily: + +``` text +$ cargo run --release --bin import_users -- -d ~/movies.db -u 2000 -m 200 +[...] +Added 398119 quests in 20.818506 seconds +``` +
20k writes/second, baby
+ +Just for kicks, I tried it with UUID-based IDs, and the time and space characteristics were finally +completely indistinguishable. This pleased me; real-world perf with ULIDs would be better than with +UUIDs with a production schema that included `created_at` columns, and UUIDs would obligate columns +like that if you wanted to keep track of, you know, when things were created. Ironically, by moving +to implicit integer rowid primary keys for the `watch_quests` table, I had to make sure that there +was a `created_at` column for it. Still a win, though! + +## Next steps with IDs + +This project is supposed to be more than just a testbed for learning about databases and web +frameworks and sortable unique identifiers; it's supposed to be an actual thing that my wife and I +can use for ourselves and with our friends. I even made a snazzy logo! + +![what to watch][logo] + +The gods, it seems, have other plans. + +Namely, it bothers me that ID generation is not done inside the database itself. Aside from being a +generally bad idea, this lead to at least one frustrating debug session where I was inserting one ID +but reporting back another. SQLite doesn't have native support for this, but it does have good +native support for [loading shared libraries as plugins](https://www.sqlite.org/loadext.html) in +order to add functionality to it, and so my next step is to write one of those, and remove the ID +generation logic from the application. + +Doing so would also allow me to address an underlying error in the way the application generates +them. The [ULID spec](https://github.com/ulid/spec) contains the following note about IDs generated +within the same millisecond: + +> When generating a ULID within the same millisecond, we can provide some guarantees regarding sort +> order. Namely, if the same millisecond is detected, the random component is incremented by 1 bit +> in the least significant bit position (with carrying). + +I don't do that[^sequential ids], because doing so requires a single ID factory, and I don't want to +have to thread that through the web app backend code. On the other hand, I *do* want to have a +single ID factory inside the database, which an extension plugin would provide. + +Then I'll get back to the web app. + +# Thanks and goodbye + +OK, well, here we are, at the end of yet another three-thousand-word meeting that could have been an +email; sorry about that, and thanks for sticking with it until the end! As usual, it was hard to not +just keep adding more commentary and footnotes and explication, and I give myself a 'C+' there, at +best. At least there are only four footnotes. + +Still, I read and watched a lot of different things in the course of doing this work. Obviously the +SQLite project was critical, and every time I need to consult their documentation, I appreciate it +more (aside from the software itself, of course!). Towards the end of the this work, right as I was +starting to write this post, I discovered this [series of +videos](https://www.youtube.com/playlist?list=PLWENznQwkAoxww-cDEfIJ-uuPDfFwbeiJ) about SQLite, from +[Mycelial](https://github.com/mycelial), who are "a maker of local-first software development +libraries". I'm a huge fan of [local-first software](https://www.inkandswitch.com/local-first/), and +one of the reasons I initially chose SQLite was for its suitability for that paradigm. Thank you, +SQLite and Mycelial! + +Good bye :) + + +---- + +[^random-users]: I did the classic "open `/usr/share/dict/words` and randomly select a couple things + to stick together" method of username generation, which results in gems like + "Hershey_motivations84" and "italicizes_creaminesss54". This is old-skool generative content. + +[^not-final-form]: The original schema was defined some time ago, and it took me a while to get to + the point where I was actually writing code that used it. In the course of doing the benchmarks, + and even in the course of writing this post, I've made changes in response to things I learned + from the benchmarks and to things I realized by thinking more about it and reading more docs. + +[^nothing is that slow]: At one of my previous jobs, there was a rather important internal service, + written in Python and talking to a PostgreSQL backend, that would basically completely fall over + if more than 100 or so requests per second were made to it. Its introduction to + mission-criticality had pre-dated my time there, and when it had first been deployed, the + demands upon it had been more modest. But it was now a problem, and I and a teammate put aside + some time to pluck some low-hanging fruit. A colleague on a peer team, who was that team's tech + lead and truly a beast of a programmer, said that he thought that the reason it could handle + only 100 requests/second was that "Python is slow." This shocked me; Python is not that + slow. PostgreSQL is not that slow. Nothing is that slow, especially in an enterprise environment + where you're just slinging data around via API; if it's that slow, you're doing it wrong. What + he said haunts me to this very day. Anyway, we tweaked the slowest query in the API callchain a + smidge and sped it up by a few factors; we left a ton of perf on the floor there still, but + c'est la vie. + +[^sequential ids]: At one point, I was worried that because all the entries in my benchmark were + being created at close to 20 per millisecond, that the resulting IDs would be essentially + random, so I forced the IDs to be sequential. This wound up being a red herring. + +[logo]: ./what2watch_logo.png "what to watch logo; an eyeball filled with static, and with a red iris, looking down at you" diff --git a/content/rnd/ulid_benchmarks/what2watch_logo.png b/content/rnd/ulid_benchmarks/what2watch_logo.png new file mode 100644 index 0000000..f697339 Binary files /dev/null and b/content/rnd/ulid_benchmarks/what2watch_logo.png differ