checkpoint

2023-06-26 17:51:42 -07:00 · 2023-06-26 17:51:42 -07:00 · 2aeec60470
commit 2aeec60470
parent ccdb28c7fe
1 changed files with 77 additions and 11 deletions
--- a/content/rnd/a_serialized_mystery/index.md
+++ b/content/rnd/a_serialized_mystery/index.md
@ -17,8 +17,9 @@ one](https://github.com/ulid/spec), for an in-progress [database-backed
 web-app](https://gitlab.com/nebkor/ww). The [initial
 work](https://gitlab.com/nebkor/ww/-/commit/be96100237da56313a583be6da3dc27a4371e29d#f69082f7433f159d627269b207abdaf2ad52b24c)
 didn't take very long, but debugging the [serialization and
-deserialization](https://en.wikipedia.org/wiki/Serialization) of the new IDs took another day and a half. So come with me on
+deserialization](https://en.wikipedia.org/wiki/Serialization) of the new IDs took another day and a
-an exciting voyage of discovery, and [once again, learn from my
+half, and in the end, the alleged mystery of why it wasn't working was a red herring due to my own
 stupidity. So come with me on an exciting voyage of discovery, and [once again, learn from my
 folly](@/sundries/a-thoroughly-digital-artifact/index.md)!
 # Keys, primarily
@ -63,21 +64,77 @@ representation and efficiency!
 ## Indexes?
 And at first, that's what I did. The [external library](https://docs.rs/sqlx/latest/sqlx/) I'm using
 to interface with my database automatically writes UUIDs as a sequence of sixteen bytes, if you
 specified the type in the database[^sqlite-dataclasses] as "[blob](https://www.sqlite.org/datatype3.html)", which [I
 did](https://gitlab.com/nebkor/ww/-/commit/65a32f1f20df6c572580d796e1044bce807fd3b6#f1043d50a0244c34e4d056fe96659145d03b549b_0_5).
 But then I saw a [blog post](https://shopify.engineering/building-resilient-payment-systems) where
 the following tidbit was mentioned:
 > We prefer using an Universally Unique Lexicographically Sortable Identifier (ULID) for these
 > idempotency keys instead of a random version 4 UUID. ULIDs contain a 48-bit timestamp followed by
 > 80 bits of random data. The timestamp allows ULIDs to be sorted, which works much better with the
 > b-tree data structure databases use for indexing. In one high-throughput system at Shopify we’ve
 > seen a 50 percent decrease in INSERT statement duration by switching from UUIDv4 to ULID for
 > idempotency keys.
 Whoa, that sounds great! But [this youtube
 video](https://www.youtube.com/watch?v=f53-Iw_5ucA&t=590s) tempered my expectations a bit, by
 describing the implementation-dependent reasons for that dramatic
 improvement. Still, switching from UUIDs to ULIDs couldn't *hurt*[^no-stinkin-benches], right? Plus,
 by encoding the time of creation (at least to the nearest millisecond), I could remove a "created
 at" field from every table that used them as primary keys. Which, in my case, would be all of them,
 and I'm worried less about the speed of inserts than I am about keeping total on-disk size down
 anyway.
 Plus, I was familiar with the idea of using sortable IDs, from
 [KSUIDs](https://github.com/segmentio/ksuid). It's an attractive concept to me, and I'd considered
 using KSUIDs from the get-go, but discarded that for two main reasons:
 - they're **FOUR WHOLE BYTES!!!** larger than UUIDs
 - I'd have to manually implement serialization/deserialization for them anyway, since SQLx didn't
   have native support for them
 In reality, neither of those are real show-stoppers; 20 vs. 16 bytes is probably not that
 significant, and I'd have to do the manual serialization stuff anyway.
 I was ready to do this thing.
 # Serial problems
 "Deserilization" is the act of converting a static, non-native representation of some kind of
 datatype into a dynamic, native computer programming object, so that you can do the right computer
 programming stuff to it. It can be as simple as when a program reads in a string of digit characters
 and parses it into a real number, but of course the ceiling on complexity is limitless.
 In my case, it was about getting those sixteen bytes out of the database and turning them into
 ULIDs. Technically, I could have let Rust [handle that for me](https://serde.rs/derive.html) by
 automatically deriving that functionality. There were a couple snags with that course, though:
 - the default serialized representation of a ULID in the library I was using to provide them [is as
 26-character strings](https://docs.rs/ulid/latest/ulid/serde/index.html)
 - you could tell it to serialize as a [128-bit
 number](https://docs.rs/ulid/latest/ulid/serde/ulid_as_u128/index.html), but that only kicked the
 problem one step down the road since SQLite can only handle up to 64-bit numbers, as previously
 discussed, so I'd still have to manually do something for them
 This meant going all-in on fully custom serialization and deserialization, something I'd never done
 before, but how hard could it be? (actually not that hard!)
 ## Great coders steal
 steal the uuid serde impls from sqlx
-Imagine, if you will, that you're a computer programmer. One common trait among such creatures is a
+## A puzzling failure
 desire to be "efficient".
- - programmers like efficiency
+# When in trouble, be sure to change many things at once
- - databases have primary keys and keep indices
+
- - uuids are useful but wasteful (note: NO BENCHMARKS!)
+## Death to the littlendians, obviously
 - ulids seem cool
 - endianness
 - profit
 # First steps
 ## A puzzling failure
 ----
@ -88,4 +145,13 @@ reserved for version metadata.
 database I'm using, SQLite, only supports up to 64-bit primitive values, but it does support
 arbitrary-length sequences of bytes called "blobs".
 [^sqlite-dataclasses]: I'm using [SQLite](https://www.sqlite.org/index.html) for reasons that I
 plan to dive into in a different post, but "blob" is specific to SQLite. In general, you'll probably
 want to take advantage of implementation-specific features of whatever database you're using, which
 means that your table definitions won't be fully portable to a different database. This is fine and
 good, actually!
 [^no-stinkin-benches]: You may wonder: have I benchmarked this system with UUIDs vs. ULIDs? Ha ha,
 you must have never met a programmer before! So, no, obviously not. But that's coming in a follow-up.
 [thats_a_database]: ./thats_a_database.png "that's a database"