talkin' 'bout uuids
This commit is contained in:
parent
d5b04a7f9d
commit
5a0e317f01
2 changed files with 65 additions and 4 deletions
|
@ -1,17 +1,71 @@
|
|||
+++
|
||||
title = "A Serialized Mystery, in One Part"
|
||||
slug = "serialized-mystery-one-part"
|
||||
title = "A One-Part Serialized Mystery"
|
||||
slug = "one-part-serialized-mystery"
|
||||
date = "2023-06-22"
|
||||
updated = "2023-06-22"
|
||||
[taxonomies]
|
||||
tags = ["software", "rnd", "proclamation", "upcsm"]
|
||||
tags = ["software", "rnd", "proclamation", "upscm"]
|
||||
[extra]
|
||||
toc = false
|
||||
+++
|
||||
|
||||
# *Mise en Scene*
|
||||
|
||||
Imagine, if you will, that you're a computer programmer.
|
||||
I recently spent a couple days moving from [one type of universally unique
|
||||
identifier](https://commons.apache.org/sandbox/commons-id/uuid.html) to a [different
|
||||
one](https://github.com/ulid/spec), for an in-progress [database-backed
|
||||
web-app](https://gitlab.com/nebkor/ww). The [initial
|
||||
work](https://gitlab.com/nebkor/ww/-/commit/be96100237da56313a583be6da3dc27a4371e29d#f69082f7433f159d627269b207abdaf2ad52b24c)
|
||||
didn't take very long, but debugging the [serialization and
|
||||
deserialization](https://en.wikipedia.org/wiki/Serialization) of the new IDs took another day and a half. So come with me on
|
||||
an exciting voyage of discovery, and [once again, learn from my
|
||||
folly](@/sundries/a-thoroughly-digital-artifact/index.md)!
|
||||
|
||||
# Keys, primarily
|
||||
|
||||
Most large distributed programs that people interact with daily via HTTP are, in essence, a fancy
|
||||
facade for some kind of database. Facebook? Database. Gmail? Database.
|
||||
|
||||
![that's a database][thats_a_database]
|
||||
<center><span class="caption">wikipedia? that's a database.</span></center>
|
||||
|
||||
In most databases, each entry ("row") has a field that acts as a [primary
|
||||
key](https://en.wikipedia.org/wiki/Primary_key), used to uniquely identify that row inside the table
|
||||
it's in. Since databases typically contain multiple tables, and primary keys have to be unique only
|
||||
within their own table, you could just use a simple integer that's automatically incremented every
|
||||
time you add a new record, and in many databases, if you create a table without specifying a primary
|
||||
key, they will [automatically and implicitly use a
|
||||
mechanism](https://www.sqlite.org/lang_createtable.html#rowid) like that.
|
||||
|
||||
This is often totally fine! If you only ever have one copy of the database, and never have to worry
|
||||
about inserting rows from a different instance of the database, then you can just use those simple
|
||||
values and move on your merry way.
|
||||
|
||||
However, if you ever think you might want to have multiple instances of your database running, and
|
||||
want to make sure they're eventually consistent with each other, then you might want to use a
|
||||
fancier identifier for your primary keys, to avoid collisions between primary keys.
|
||||
|
||||
## UUIDs
|
||||
|
||||
One very common type for these is called a
|
||||
[UUIDv4](https://datatracker.ietf.org/doc/html/rfc4122#page-14). These are 128-bit random
|
||||
numbers[^uuidv4_random], and when turned into a string, usually look something like
|
||||
`1c20104f-e04f-409e-9ad3-94455e5f4fea`; this is called the "hyphenated" form, for fairly obvious
|
||||
reasons. Although sometimes they're stored in a DB in that form directly, that's using 36 bytes to
|
||||
store 16 bytes' worth of data, which is more than twice as many bytes than necessary. And if you're
|
||||
a programmer, this sort of conspicous waste is unconscionsable.
|
||||
|
||||
You can cut that to 32 bytes by just dropping the dashes, but then that's still twice as many bytes
|
||||
as the actual data requires. If you never have to actually display the ID inside the database, then
|
||||
the simplest thing to do is just store it as a blob of 16 bytes[^blob-of-bytes]. Finally, optimal
|
||||
representation and efficiency!
|
||||
|
||||
## Indexes?
|
||||
|
||||
|
||||
|
||||
Imagine, if you will, that you're a computer programmer. One common trait among such creatures is a
|
||||
desire to be "efficient".
|
||||
|
||||
- programmers like efficiency
|
||||
- databases have primary keys and keep indices
|
||||
|
@ -23,3 +77,10 @@ Imagine, if you will, that you're a computer programmer.
|
|||
# First steps
|
||||
|
||||
## A puzzling failure
|
||||
|
||||
----
|
||||
|
||||
[^uuidv4_random]: Technically, most v4 UUIDs have only 122 random bits, as six out of 128 are
|
||||
reserved for version metadata.
|
||||
|
||||
[thats_a_database]: ./thats_a_database.png "that's a database"
|
||||
|
|
BIN
content/rnd/a_serialized_mystery/thats_a_database.png
Normal file
BIN
content/rnd/a_serialized_mystery/thats_a_database.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 253 KiB |
Loading…
Reference in a new issue