2023-06-24 21:00:00 +00:00
|
|
|
+++
|
2023-06-24 23:51:51 +00:00
|
|
|
title = "A One-Part Serialized Mystery"
|
|
|
|
slug = "one-part-serialized-mystery"
|
2023-06-24 21:00:00 +00:00
|
|
|
date = "2023-06-22"
|
|
|
|
updated = "2023-06-22"
|
|
|
|
[taxonomies]
|
2023-06-24 23:51:51 +00:00
|
|
|
tags = ["software", "rnd", "proclamation", "upscm"]
|
2023-06-24 21:00:00 +00:00
|
|
|
[extra]
|
|
|
|
toc = false
|
|
|
|
+++
|
|
|
|
|
|
|
|
# *Mise en Scene*
|
|
|
|
|
2023-06-24 23:51:51 +00:00
|
|
|
I recently spent a couple days moving from [one type of universally unique
|
|
|
|
identifier](https://commons.apache.org/sandbox/commons-id/uuid.html) to a [different
|
|
|
|
one](https://github.com/ulid/spec), for an in-progress [database-backed
|
|
|
|
web-app](https://gitlab.com/nebkor/ww). The [initial
|
|
|
|
work](https://gitlab.com/nebkor/ww/-/commit/be96100237da56313a583be6da3dc27a4371e29d#f69082f7433f159d627269b207abdaf2ad52b24c)
|
|
|
|
didn't take very long, but debugging the [serialization and
|
|
|
|
deserialization](https://en.wikipedia.org/wiki/Serialization) of the new IDs took another day and a half. So come with me on
|
|
|
|
an exciting voyage of discovery, and [once again, learn from my
|
|
|
|
folly](@/sundries/a-thoroughly-digital-artifact/index.md)!
|
|
|
|
|
|
|
|
# Keys, primarily
|
|
|
|
|
|
|
|
Most large distributed programs that people interact with daily via HTTP are, in essence, a fancy
|
|
|
|
facade for some kind of database. Facebook? Database. Gmail? Database.
|
|
|
|
|
|
|
|
![that's a database][thats_a_database]
|
|
|
|
<center><span class="caption">wikipedia? that's a database.</span></center>
|
|
|
|
|
|
|
|
In most databases, each entry ("row") has a field that acts as a [primary
|
|
|
|
key](https://en.wikipedia.org/wiki/Primary_key), used to uniquely identify that row inside the table
|
|
|
|
it's in. Since databases typically contain multiple tables, and primary keys have to be unique only
|
|
|
|
within their own table, you could just use a simple integer that's automatically incremented every
|
|
|
|
time you add a new record, and in many databases, if you create a table without specifying a primary
|
|
|
|
key, they will [automatically and implicitly use a
|
|
|
|
mechanism](https://www.sqlite.org/lang_createtable.html#rowid) like that.
|
|
|
|
|
|
|
|
This is often totally fine! If you only ever have one copy of the database, and never have to worry
|
|
|
|
about inserting rows from a different instance of the database, then you can just use those simple
|
|
|
|
values and move on your merry way.
|
|
|
|
|
|
|
|
However, if you ever think you might want to have multiple instances of your database running, and
|
|
|
|
want to make sure they're eventually consistent with each other, then you might want to use a
|
|
|
|
fancier identifier for your primary keys, to avoid collisions between primary keys.
|
|
|
|
|
|
|
|
## UUIDs
|
|
|
|
|
|
|
|
One very common type for these is called a
|
|
|
|
[UUIDv4](https://datatracker.ietf.org/doc/html/rfc4122#page-14). These are 128-bit random
|
|
|
|
numbers[^uuidv4_random], and when turned into a string, usually look something like
|
|
|
|
`1c20104f-e04f-409e-9ad3-94455e5f4fea`; this is called the "hyphenated" form, for fairly obvious
|
|
|
|
reasons. Although sometimes they're stored in a DB in that form directly, that's using 36 bytes to
|
|
|
|
store 16 bytes' worth of data, which is more than twice as many bytes than necessary. And if you're
|
|
|
|
a programmer, this sort of conspicous waste is unconscionsable.
|
|
|
|
|
|
|
|
You can cut that to 32 bytes by just dropping the dashes, but then that's still twice as many bytes
|
|
|
|
as the actual data requires. If you never have to actually display the ID inside the database, then
|
|
|
|
the simplest thing to do is just store it as a blob of 16 bytes[^blob-of-bytes]. Finally, optimal
|
|
|
|
representation and efficiency!
|
|
|
|
|
|
|
|
## Indexes?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Imagine, if you will, that you're a computer programmer. One common trait among such creatures is a
|
|
|
|
desire to be "efficient".
|
2023-06-24 21:00:00 +00:00
|
|
|
|
|
|
|
- programmers like efficiency
|
|
|
|
- databases have primary keys and keep indices
|
|
|
|
- uuids are useful but wasteful (note: NO BENCHMARKS!)
|
|
|
|
- ulids seem cool
|
|
|
|
- endianness
|
|
|
|
- profit
|
|
|
|
|
|
|
|
# First steps
|
|
|
|
|
|
|
|
## A puzzling failure
|
2023-06-24 23:51:51 +00:00
|
|
|
|
|
|
|
----
|
|
|
|
|
|
|
|
[^uuidv4_random]: Technically, most v4 UUIDs have only 122 random bits, as six out of 128 are
|
|
|
|
reserved for version metadata.
|
|
|
|
|
|
|
|
[thats_a_database]: ./thats_a_database.png "that's a database"
|