talkin' 'bout uuids
This commit is contained in:
parent
d5b04a7f9d
commit
5a0e317f01
2 changed files with 65 additions and 4 deletions
|
@ -1,17 +1,71 @@
|
||||||
+++
|
+++
|
||||||
title = "A Serialized Mystery, in One Part"
|
title = "A One-Part Serialized Mystery"
|
||||||
slug = "serialized-mystery-one-part"
|
slug = "one-part-serialized-mystery"
|
||||||
date = "2023-06-22"
|
date = "2023-06-22"
|
||||||
updated = "2023-06-22"
|
updated = "2023-06-22"
|
||||||
[taxonomies]
|
[taxonomies]
|
||||||
tags = ["software", "rnd", "proclamation", "upcsm"]
|
tags = ["software", "rnd", "proclamation", "upscm"]
|
||||||
[extra]
|
[extra]
|
||||||
toc = false
|
toc = false
|
||||||
+++
|
+++
|
||||||
|
|
||||||
# *Mise en Scene*
|
# *Mise en Scene*
|
||||||
|
|
||||||
Imagine, if you will, that you're a computer programmer.
|
I recently spent a couple days moving from [one type of universally unique
|
||||||
|
identifier](https://commons.apache.org/sandbox/commons-id/uuid.html) to a [different
|
||||||
|
one](https://github.com/ulid/spec), for an in-progress [database-backed
|
||||||
|
web-app](https://gitlab.com/nebkor/ww). The [initial
|
||||||
|
work](https://gitlab.com/nebkor/ww/-/commit/be96100237da56313a583be6da3dc27a4371e29d#f69082f7433f159d627269b207abdaf2ad52b24c)
|
||||||
|
didn't take very long, but debugging the [serialization and
|
||||||
|
deserialization](https://en.wikipedia.org/wiki/Serialization) of the new IDs took another day and a half. So come with me on
|
||||||
|
an exciting voyage of discovery, and [once again, learn from my
|
||||||
|
folly](@/sundries/a-thoroughly-digital-artifact/index.md)!
|
||||||
|
|
||||||
|
# Keys, primarily
|
||||||
|
|
||||||
|
Most large distributed programs that people interact with daily via HTTP are, in essence, a fancy
|
||||||
|
facade for some kind of database. Facebook? Database. Gmail? Database.
|
||||||
|
|
||||||
|
![that's a database][thats_a_database]
|
||||||
|
<center><span class="caption">wikipedia? that's a database.</span></center>
|
||||||
|
|
||||||
|
In most databases, each entry ("row") has a field that acts as a [primary
|
||||||
|
key](https://en.wikipedia.org/wiki/Primary_key), used to uniquely identify that row inside the table
|
||||||
|
it's in. Since databases typically contain multiple tables, and primary keys have to be unique only
|
||||||
|
within their own table, you could just use a simple integer that's automatically incremented every
|
||||||
|
time you add a new record, and in many databases, if you create a table without specifying a primary
|
||||||
|
key, they will [automatically and implicitly use a
|
||||||
|
mechanism](https://www.sqlite.org/lang_createtable.html#rowid) like that.
|
||||||
|
|
||||||
|
This is often totally fine! If you only ever have one copy of the database, and never have to worry
|
||||||
|
about inserting rows from a different instance of the database, then you can just use those simple
|
||||||
|
values and move on your merry way.
|
||||||
|
|
||||||
|
However, if you ever think you might want to have multiple instances of your database running, and
|
||||||
|
want to make sure they're eventually consistent with each other, then you might want to use a
|
||||||
|
fancier identifier for your primary keys, to avoid collisions between primary keys.
|
||||||
|
|
||||||
|
## UUIDs
|
||||||
|
|
||||||
|
One very common type for these is called a
|
||||||
|
[UUIDv4](https://datatracker.ietf.org/doc/html/rfc4122#page-14). These are 128-bit random
|
||||||
|
numbers[^uuidv4_random], and when turned into a string, usually look something like
|
||||||
|
`1c20104f-e04f-409e-9ad3-94455e5f4fea`; this is called the "hyphenated" form, for fairly obvious
|
||||||
|
reasons. Although sometimes they're stored in a DB in that form directly, that's using 36 bytes to
|
||||||
|
store 16 bytes' worth of data, which is more than twice as many bytes than necessary. And if you're
|
||||||
|
a programmer, this sort of conspicous waste is unconscionsable.
|
||||||
|
|
||||||
|
You can cut that to 32 bytes by just dropping the dashes, but then that's still twice as many bytes
|
||||||
|
as the actual data requires. If you never have to actually display the ID inside the database, then
|
||||||
|
the simplest thing to do is just store it as a blob of 16 bytes[^blob-of-bytes]. Finally, optimal
|
||||||
|
representation and efficiency!
|
||||||
|
|
||||||
|
## Indexes?
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Imagine, if you will, that you're a computer programmer. One common trait among such creatures is a
|
||||||
|
desire to be "efficient".
|
||||||
|
|
||||||
- programmers like efficiency
|
- programmers like efficiency
|
||||||
- databases have primary keys and keep indices
|
- databases have primary keys and keep indices
|
||||||
|
@ -23,3 +77,10 @@ Imagine, if you will, that you're a computer programmer.
|
||||||
# First steps
|
# First steps
|
||||||
|
|
||||||
## A puzzling failure
|
## A puzzling failure
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
[^uuidv4_random]: Technically, most v4 UUIDs have only 122 random bits, as six out of 128 are
|
||||||
|
reserved for version metadata.
|
||||||
|
|
||||||
|
[thats_a_database]: ./thats_a_database.png "that's a database"
|
||||||
|
|
BIN
content/rnd/a_serialized_mystery/thats_a_database.png
Normal file
BIN
content/rnd/a_serialized_mystery/thats_a_database.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 253 KiB |
Loading…
Reference in a new issue