checkpoint

2023-06-26 17:51:42 -07:00 · 2023-06-26 17:51:42 -07:00 · 2aeec60470
commit 2aeec60470
parent ccdb28c7fe
1 changed files with 77 additions and 11 deletions
--- a/content/rnd/a_serialized_mystery/index.md
+++ b/content/rnd/a_serialized_mystery/index.md
@ -17,8 +17,9 @@ one](https://github.com/ulid/spec), for an in-progress [database-backed
 web-app](https://gitlab.com/nebkor/ww). The [initial
 work](https://gitlab.com/nebkor/ww/-/commit/be96100237da56313a583be6da3dc27a4371e29d#f69082f7433f159d627269b207abdaf2ad52b24c)
 didn't take very long, but debugging the [serialization and
-deserialization](https://en.wikipedia.org/wiki/Serialization) of the new IDs took another day and a half. So come with me on
-an exciting voyage of discovery, and [once again, learn from my
+deserialization](https://en.wikipedia.org/wiki/Serialization) of the new IDs took another day and a
+half, and in the end, the alleged mystery of why it wasn't working was a red herring due to my own
+stupidity. So come with me on an exciting voyage of discovery, and [once again, learn from my
 folly](@/sundries/a-thoroughly-digital-artifact/index.md)!

 # Keys, primarily
@ -63,21 +64,77 @@ representation and efficiency!

 ## Indexes?

+And at first, that's what I did. The [external library](https://docs.rs/sqlx/latest/sqlx/) I'm using
+to interface with my database automatically writes UUIDs as a sequence of sixteen bytes, if you
+specified the type in the database[^sqlite-dataclasses] as "[blob](https://www.sqlite.org/datatype3.html)", which [I
+did](https://gitlab.com/nebkor/ww/-/commit/65a32f1f20df6c572580d796e1044bce807fd3b6#f1043d50a0244c34e4d056fe96659145d03b549b_0_5).
+
+But then I saw a [blog post](https://shopify.engineering/building-resilient-payment-systems) where
+the following tidbit was mentioned:
+
+> We prefer using an Universally Unique Lexicographically Sortable Identifier (ULID) for these
+> idempotency keys instead of a random version 4 UUID. ULIDs contain a 48-bit timestamp followed by
+> 80 bits of random data. The timestamp allows ULIDs to be sorted, which works much better with the
+> b-tree data structure databases use for indexing. In one high-throughput system at Shopify we’ve
+> seen a 50 percent decrease in INSERT statement duration by switching from UUIDv4 to ULID for
+> idempotency keys.
+
+Whoa, that sounds great! But [this youtube
+video](https://www.youtube.com/watch?v=f53-Iw_5ucA&t=590s) tempered my expectations a bit, by
+describing the implementation-dependent reasons for that dramatic
+improvement. Still, switching from UUIDs to ULIDs couldn't *hurt*[^no-stinkin-benches], right? Plus,
+by encoding the time of creation (at least to the nearest millisecond), I could remove a "created
+at" field from every table that used them as primary keys. Which, in my case, would be all of them,
+and I'm worried less about the speed of inserts than I am about keeping total on-disk size down
+anyway.
+
+Plus, I was familiar with the idea of using sortable IDs, from
+[KSUIDs](https://github.com/segmentio/ksuid). It's an attractive concept to me, and I'd considered
+using KSUIDs from the get-go, but discarded that for two main reasons:
+
+ - they're **FOUR WHOLE BYTES!!!** larger than UUIDs
+ - I'd have to manually implement serialization/deserialization for them anyway, since SQLx didn't
+   have native support for them
+
+In reality, neither of those are real show-stoppers; 20 vs. 16 bytes is probably not that
+significant, and I'd have to do the manual serialization stuff anyway.
+
+I was ready to do this thing.
+
+# Serial problems
+
+"Deserilization" is the act of converting a static, non-native representation of some kind of
+datatype into a dynamic, native computer programming object, so that you can do the right computer
+programming stuff to it. It can be as simple as when a program reads in a string of digit characters
+and parses it into a real number, but of course the ceiling on complexity is limitless.
+
+In my case, it was about getting those sixteen bytes out of the database and turning them into
+ULIDs. Technically, I could have let Rust [handle that for me](https://serde.rs/derive.html) by
+automatically deriving that functionality. There were a couple snags with that course, though:
+
+ - the default serialized representation of a ULID in the library I was using to provide them [is as
+26-character strings](https://docs.rs/ulid/latest/ulid/serde/index.html)
+ - you could tell it to serialize as a [128-bit
+number](https://docs.rs/ulid/latest/ulid/serde/ulid_as_u128/index.html), but that only kicked the
+problem one step down the road since SQLite can only handle up to 64-bit numbers, as previously
+discussed, so I'd still have to manually do something for them
+
+This meant going all-in on fully custom serialization and deserialization, something I'd never done
+before, but how hard could it be? (actually not that hard!)
+
+## Great coders steal
+
+steal the uuid serde impls from sqlx


-Imagine, if you will, that you're a computer programmer. One common trait among such creatures is a
-desire to be "efficient".
+## A puzzling failure

- - programmers like efficiency
- - databases have primary keys and keep indices
- - uuids are useful but wasteful (note: NO BENCHMARKS!)
- - ulids seem cool
+# When in trouble, be sure to change many things at once
+
+## Death to the littlendians, obviously
 - endianness
 - profit

-# First steps
-
-## A puzzling failure

 ----

@ -88,4 +145,13 @@ reserved for version metadata.
 database I'm using, SQLite, only supports up to 64-bit primitive values, but it does support
 arbitrary-length sequences of bytes called "blobs".

+[^sqlite-dataclasses]: I'm using [SQLite](https://www.sqlite.org/index.html) for reasons that I
+plan to dive into in a different post, but "blob" is specific to SQLite. In general, you'll probably
+want to take advantage of implementation-specific features of whatever database you're using, which
+means that your table definitions won't be fully portable to a different database. This is fine and
+good, actually!
+
+[^no-stinkin-benches]: You may wonder: have I benchmarked this system with UUIDs vs. ULIDs? Ha ha,
+you must have never met a programmer before! So, no, obviously not. But that's coming in a follow-up.
+
 [thats_a_database]: ./thats_a_database.png "that's a database"