talkin' 'bout uuids

This commit is contained in:
Joe Ardent 2023-06-24 16:51:51 -07:00
parent d5b04a7f9d
commit 5a0e317f01
2 changed files with 65 additions and 4 deletions

View file

@ -1,17 +1,71 @@
+++ +++
title = "A Serialized Mystery, in One Part" title = "A One-Part Serialized Mystery"
slug = "serialized-mystery-one-part" slug = "one-part-serialized-mystery"
date = "2023-06-22" date = "2023-06-22"
updated = "2023-06-22" updated = "2023-06-22"
[taxonomies] [taxonomies]
tags = ["software", "rnd", "proclamation", "upcsm"] tags = ["software", "rnd", "proclamation", "upscm"]
[extra] [extra]
toc = false toc = false
+++ +++
# *Mise en Scene* # *Mise en Scene*
Imagine, if you will, that you're a computer programmer. I recently spent a couple days moving from [one type of universally unique
identifier](https://commons.apache.org/sandbox/commons-id/uuid.html) to a [different
one](https://github.com/ulid/spec), for an in-progress [database-backed
web-app](https://gitlab.com/nebkor/ww). The [initial
work](https://gitlab.com/nebkor/ww/-/commit/be96100237da56313a583be6da3dc27a4371e29d#f69082f7433f159d627269b207abdaf2ad52b24c)
didn't take very long, but debugging the [serialization and
deserialization](https://en.wikipedia.org/wiki/Serialization) of the new IDs took another day and a half. So come with me on
an exciting voyage of discovery, and [once again, learn from my
folly](@/sundries/a-thoroughly-digital-artifact/index.md)!
# Keys, primarily
Most large distributed programs that people interact with daily via HTTP are, in essence, a fancy
facade for some kind of database. Facebook? Database. Gmail? Database.
![that's a database][thats_a_database]
<center><span class="caption">wikipedia? that's a database.</span></center>
In most databases, each entry ("row") has a field that acts as a [primary
key](https://en.wikipedia.org/wiki/Primary_key), used to uniquely identify that row inside the table
it's in. Since databases typically contain multiple tables, and primary keys have to be unique only
within their own table, you could just use a simple integer that's automatically incremented every
time you add a new record, and in many databases, if you create a table without specifying a primary
key, they will [automatically and implicitly use a
mechanism](https://www.sqlite.org/lang_createtable.html#rowid) like that.
This is often totally fine! If you only ever have one copy of the database, and never have to worry
about inserting rows from a different instance of the database, then you can just use those simple
values and move on your merry way.
However, if you ever think you might want to have multiple instances of your database running, and
want to make sure they're eventually consistent with each other, then you might want to use a
fancier identifier for your primary keys, to avoid collisions between primary keys.
## UUIDs
One very common type for these is called a
[UUIDv4](https://datatracker.ietf.org/doc/html/rfc4122#page-14). These are 128-bit random
numbers[^uuidv4_random], and when turned into a string, usually look something like
`1c20104f-e04f-409e-9ad3-94455e5f4fea`; this is called the "hyphenated" form, for fairly obvious
reasons. Although sometimes they're stored in a DB in that form directly, that's using 36 bytes to
store 16 bytes' worth of data, which is more than twice as many bytes than necessary. And if you're
a programmer, this sort of conspicous waste is unconscionsable.
You can cut that to 32 bytes by just dropping the dashes, but then that's still twice as many bytes
as the actual data requires. If you never have to actually display the ID inside the database, then
the simplest thing to do is just store it as a blob of 16 bytes[^blob-of-bytes]. Finally, optimal
representation and efficiency!
## Indexes?
Imagine, if you will, that you're a computer programmer. One common trait among such creatures is a
desire to be "efficient".
- programmers like efficiency - programmers like efficiency
- databases have primary keys and keep indices - databases have primary keys and keep indices
@ -23,3 +77,10 @@ Imagine, if you will, that you're a computer programmer.
# First steps # First steps
## A puzzling failure ## A puzzling failure
----
[^uuidv4_random]: Technically, most v4 UUIDs have only 122 random bits, as six out of 128 are
reserved for version metadata.
[thats_a_database]: ./thats_a_database.png "that's a database"

Binary file not shown.

After

Width:  |  Height:  |  Size: 253 KiB