535 lines
26 KiB
Markdown
535 lines
26 KiB
Markdown
+++
|
||
title = "A One-Part Serialized Mystery"
|
||
slug = "one-part-serialized-mystery"
|
||
date = "2023-06-29"
|
||
updated = "2023-07-29"
|
||
[taxonomies]
|
||
tags = ["software", "rnd", "proclamation", "upscm", "rust", "ulid", "sqlite"]
|
||
+++
|
||
|
||
# *Mise en Scene*
|
||
|
||
I recently spent a couple days moving from [one type of universally unique
|
||
identifier](https://commons.apache.org/sandbox/commons-id/uuid.html) to a [different
|
||
one](https://github.com/ulid/spec), for an in-progress [database-backed
|
||
web-app](https://gitlab.com/nebkor/ww). The [initial
|
||
work](https://gitlab.com/nebkor/ww/-/commit/be96100237da56313a583be6da3dc27a4371e29d#f69082f7433f159d627269b207abdaf2ad52b24c)
|
||
didn't take very long, but debugging the [serialization and
|
||
deserialization](https://en.wikipedia.org/wiki/Serialization) of the new IDs took another day and a
|
||
half, and in the end, the alleged mystery of why it wasn't working was a red herring due to my own
|
||
stupidity. So come with me on an exciting voyage of discovery, and [once again, learn from my
|
||
folly](@/sundries/a-thoroughly-digital-artifact/index.md)!
|
||
|
||
# Keys, primarily
|
||
|
||
Most large distributed programs that people interact with daily via HTTP are, in essence, a fancy
|
||
facade for some kind of database. Facebook? That's a database. Gmail? That's a database.
|
||
|
||
![that's a database][thats_a_database]
|
||
<div class="caption">wikipedia? that's a database.</div>
|
||
|
||
In most databases, each entry ("row") has a field that acts as a [primary
|
||
key](https://en.wikipedia.org/wiki/Primary_key), used to uniquely identify that row inside the table
|
||
it's in. Since databases typically contain multiple tables, and primary keys have to be unique only
|
||
within their own table, you could just use a simple integer that's automatically incremented every
|
||
time you add a new record, and in many databases, if you create a table without specifying a primary
|
||
key, they will [automatically and implicitly use a
|
||
mechanism](https://www.sqlite.org/lang_createtable.html#rowid) like that. You may also recognize the
|
||
idea of "serial numbers", which is what these sorts of IDs are.
|
||
|
||
This is often totally fine! If you only ever have one copy of the database, and never have to worry
|
||
about inserting rows from a different instance of the database, then you can just use those simple
|
||
values and move on your merry way.
|
||
|
||
However, if you ever think you might want to have multiple instances of your database running, and
|
||
want to make sure they're eventually consistent with each other, then you might want to use a
|
||
fancier identifier for your primary keys, to avoid collisions between them.
|
||
|
||
## UUIDs
|
||
|
||
A popular type for fancy keys is called a
|
||
[v4 UUIDs](https://datatracker.ietf.org/doc/html/rfc4122#page-14). These are 128-bit random
|
||
numbers[^uuidv4_random], and when turned into a string, usually look something like
|
||
`1c20104f-e04f-409e-9ad3-94455e5f4fea`; this is called the "hyphenated" form, for fairly obvious
|
||
reasons. Although sometimes they're stored in a DB in that form directly, that's using 36 bytes to
|
||
store 16 bytes' worth of data, which is more than twice as many bytes as necessary. And if you're
|
||
a programmer, this sort of conspicuous waste is unconscionable.
|
||
|
||
You can cut that to 32 bytes by just dropping the dashes, but then that's still twice as many bytes
|
||
as the actual data requires. If you never have to actually display the ID inside the database, then
|
||
the simplest thing to do is just store it as a blob of 16 bytes[^blob-of-bytes]. Finally, optimal
|
||
representation and efficiency!
|
||
|
||
## Indexes?
|
||
|
||
And at first, that's what I did. The [external library](https://docs.rs/sqlx/latest/sqlx/) I'm using
|
||
to interface with my database automatically writes UUIDs as a sequence of sixteen bytes, if you
|
||
specified the type in the database[^sqlite-dataclasses] as "[blob](https://www.sqlite.org/datatype3.html)", which [I
|
||
did](https://gitlab.com/nebkor/ww/-/commit/65a32f1f20df6c572580d796e1044bce807fd3b6#f1043d50a0244c34e4d056fe96659145d03b549b_0_5).
|
||
|
||
But then I saw a [blog post](https://shopify.engineering/building-resilient-payment-systems) where
|
||
the following tidbit was mentioned:
|
||
|
||
> We prefer using an Universally Unique Lexicographically Sortable Identifier (ULID) for these
|
||
> idempotency keys instead of a random version 4 UUID. ULIDs contain a 48-bit timestamp followed by
|
||
> 80 bits of random data. The timestamp allows ULIDs to be sorted, which works much better with the
|
||
> b-tree data structure databases use for indexing. In one high-throughput system at Shopify we’ve
|
||
> seen a 50 percent decrease in INSERT statement duration by switching from UUIDv4 to ULID for
|
||
> idempotency keys.
|
||
|
||
Whoa, that sounds great! But [this youtube
|
||
video](https://www.youtube.com/watch?v=f53-Iw_5ucA&t=590s) tempered my expectations a bit, by
|
||
describing the implementation-dependent reasons for that dramatic
|
||
improvement. Still, switching from UUIDs to ULIDs couldn't *hurt*[^no-stinkin-benches], right? Plus,
|
||
by encoding the time of creation (at least to the nearest millisecond), I could remove a "created
|
||
at" field from every table that used them as primary keys. Which, in my case, would be all of them,
|
||
and I'm worried less about the speed of inserts than I am about keeping total on-disk size down
|
||
anyway.
|
||
|
||
I was actually already familiar with the idea of using time-based sortable IDs, from
|
||
[KSUIDs](https://github.com/segmentio/ksuid). It's an attractive concept to me, and I'd considered
|
||
using them from the get-go, but discarded that for two main reasons:
|
||
|
||
- they're **FOUR WHOLE BYTES!!!** larger than UUIDs
|
||
- I'd have to manually implement serialization/deserialization, since SQLx doesn't
|
||
have native support for them
|
||
|
||
In reality, neither of those are real show-stoppers; 20 vs. 16 bytes is probably not that
|
||
significant, and I'd have to do the manual serialization stuff for anything besides a
|
||
less-than-8-bytes number or a normal UUID. Still, four bytes is four bytes, and all other things
|
||
being equal, I'd rather go for the trimmer, 64-bit-aligned value.
|
||
|
||
Finally, I'd recently finished with adding some ability to actually interact with data in a
|
||
meaningful way, and to add new records to the database, which meant that it was now or never for
|
||
standardizing on a type for the primary keys. I was ready to do this thing.
|
||
|
||
# Serial problems
|
||
|
||
"Deserilization" is the act of converting a static, non-native representation of some kind of
|
||
datatype into a dynamic, native computer programming object, so that you can do the right computer
|
||
programming stuff to it. It can be as simple as when a program reads in a string of digit characters
|
||
and parses it into a real number, but of course the ceiling on complexity is limitless.
|
||
|
||
In my case, it was about getting those sixteen bytes out of the database and turning them into
|
||
ULIDs. Technically, I could have let Rust [handle that for me](https://serde.rs/derive.html) by
|
||
automatically deriving that functionality. There were a couple snags with that course, though:
|
||
|
||
- the default serialized representation of a ULID in the library I was using to provide them [is as
|
||
26-character strings](https://docs.rs/ulid/latest/ulid/serde/index.html), and I wanted to use only
|
||
16 bytes in the database
|
||
- you could tell it to serialize as a [128-bit
|
||
number](https://docs.rs/ulid/latest/ulid/serde/ulid_as_u128/index.html), but that merely kicked the
|
||
problem one step down the road since SQLite can only handle up to 64-bit numbers, as previously
|
||
discussed, so I'd still have to manually do something for them
|
||
|
||
This meant going all-in on fully custom serialization and deserialization, something I'd never done
|
||
before, but how hard could it be? (spoiler: actually not that hard!)
|
||
|
||
## Great coders steal
|
||
|
||
Something I appreciate about the [Rust programming language](https://www.rust-lang.org/) is that
|
||
because of the way the compiler works[^rust-generics], the full source code almost always has to be
|
||
available to you, the end-user coder. The culture around it is also very biased toward open source,
|
||
and so all the extremely useful libraries are just sitting there, ready to be studied and copied. So
|
||
the first thing I did was take a look at how [SQLx handled
|
||
UUIDs](https://github.com/launchbadge/sqlx/blob/main/sqlx-sqlite/src/types/uuid.rs):
|
||
|
||
``` rust
|
||
impl Type<Sqlite> for Uuid {
|
||
fn type_info() -> SqliteTypeInfo {
|
||
SqliteTypeInfo(DataType::Blob)
|
||
}
|
||
|
||
fn compatible(ty: &SqliteTypeInfo) -> bool {
|
||
matches!(ty.0, DataType::Blob | DataType::Text)
|
||
}
|
||
}
|
||
|
||
impl<'q> Encode<'q, Sqlite> for Uuid {
|
||
fn encode_by_ref(&self, args: &mut Vec<SqliteArgumentValue<'q>>) -> IsNull {
|
||
args.push(SqliteArgumentValue::Blob(Cow::Owned(
|
||
self.as_bytes().to_vec(),
|
||
)));
|
||
|
||
IsNull::No
|
||
}
|
||
}
|
||
|
||
impl Decode<'_, Sqlite> for Uuid {
|
||
fn decode(value: SqliteValueRef<'_>) -> Result<Self, BoxDynError> {
|
||
// construct a Uuid from the returned bytes
|
||
Uuid::from_slice(value.blob()).map_err(Into::into)
|
||
}
|
||
}
|
||
```
|
||
|
||
There's not a ton going on there, as you can see. To "encode" it just gets the bytes out of the
|
||
UUID, and to "decode" it just gets the bytes out of the database. I couldn't use that exactly as
|
||
done by the SQLx authors, as they were using datatypes that were private to their crate, but it was
|
||
close enough; here's mine:
|
||
|
||
``` rust
|
||
impl sqlx::Type<sqlx::Sqlite> for DbId {
|
||
fn type_info() -> <sqlx::Sqlite as sqlx::Database>::TypeInfo {
|
||
<&[u8] as sqlx::Type<sqlx::Sqlite>>::type_info()
|
||
}
|
||
}
|
||
|
||
impl<'q> Encode<'q, Sqlite> for DbId {
|
||
fn encode_by_ref(&self, args: &mut Vec<SqliteArgumentValue<'q>>) -> IsNull {
|
||
args.push(SqliteArgumentValue::Blob(Cow::Owned(self.bytes().to_vec())));
|
||
IsNull::No
|
||
}
|
||
}
|
||
|
||
impl Decode<'_, Sqlite> for DbId {
|
||
fn decode(value: SqliteValueRef<'_>) -> Result<Self, sqlx::error::BoxDynError> {
|
||
let bytes = <&[u8] as Decode<Sqlite>>::decode(value)?;
|
||
let bytes: [u8; 16] = bytes.try_into().unwrap_or_default();
|
||
Ok(u128::from_ne_bytes(bytes).into())
|
||
}
|
||
}
|
||
```
|
||
|
||
(In order to implement the required methods from SQLx, I had to wrap the ULID in a new, custom type,
|
||
which I called `DbId`, to comply with the [orphan rules](https://github.com/Ixrec/rust-orphan-rules).)
|
||
|
||
That's only half the story, though. If all I had to worry about was getting data in and out of the
|
||
database, that would be fine, but because I'm building a web app, I need to be able to include my
|
||
new ID type in messages sent over a network or as part of a web page, and for that, it needed to
|
||
implement some functionality from a different library, called [Serde](https://serde.rs/). My
|
||
original implementation for *deserializing* looked like this:
|
||
|
||
``` rust
|
||
struct DbIdVisitor;
|
||
|
||
impl<'de> Visitor<'de> for DbIdVisitor {
|
||
type Value = DbId;
|
||
|
||
// make a DbId from a slice of bytes
|
||
fn visit_bytes<E>(self, v: &[u8]) -> Result<Self::Value, E>
|
||
where
|
||
E: serde::de::Error,
|
||
{
|
||
...
|
||
}
|
||
|
||
// make a DbId from a Vec of bytes
|
||
fn visit_byte_buf<E>(self, v: Vec<u8>) -> Result<Self::Value, E>
|
||
where
|
||
E: serde::de::Error,
|
||
{
|
||
...
|
||
}
|
||
|
||
// you get the picture
|
||
fn visit_string() ...
|
||
fn visit_u128() ...
|
||
fn visit_i128() ...
|
||
}
|
||
```
|
||
|
||
In my mind, the only important pieces were the `visit_bytes()` and `visit_byte_buf()` methods,
|
||
which worked basically the same as the `decode()` function for SQLx. I mean, as far as I could tell,
|
||
the only time something would be encountering a serialized `DbId` would be in the form of raw bytes
|
||
from the database; no one else would be trying to serialize one as something else that I didn't
|
||
anticipate, right?
|
||
|
||
RIGHT???
|
||
|
||
(wrong)
|
||
|
||
## A puzzling failure
|
||
|
||
As soon as my code compiled, I ran my tests. Everything passed... except for one, that tested
|
||
logging in.
|
||
|
||
This was very strange. All the other tests were passing, and basically every operation requires
|
||
getting one of these IDs into or out of the database. But at this point, it was late, and I set it
|
||
down until the next day.
|
||
|
||
# When in doubt, change many things at once
|
||
|
||
The next day I sat back down to get back to work, and in the course of examining what was going on,
|
||
realized that I'd missed something crucial: these things were supposed to be *sortable*. But the way
|
||
I was inserting them meant that they weren't, because of endianness.
|
||
|
||
## More like shmexicographic, amirite
|
||
|
||
"ULID" stands for "Universally Unique Lexicographically Sortable
|
||
Identifier"[^uulsid]. "[Lexicographic order](https://en.wikipedia.org/wiki/Lexicographic_order)"
|
||
basically means, "like alphabetical, but for anything with a defined total order". Numbers have a
|
||
defined total order; bigger numbers always go after smaller.
|
||
|
||
But sometimes numbers get sorted out of order, if they're not treated as numbers. Like say you had a
|
||
directory with twelve files in it, called "1.txt" up through "12.txt". If you were to ask to see
|
||
them listed out in lexicographic order, it would go like:
|
||
|
||
``` text
|
||
$ ls
|
||
10.txt
|
||
11.txt
|
||
12.txt
|
||
1.txt
|
||
2.txt
|
||
3.txt
|
||
4.txt
|
||
5.txt
|
||
6.txt
|
||
7.txt
|
||
8.txt
|
||
9.txt
|
||
```
|
||
|
||
This is because '10' is "less than" '2' (and '0' is "less than" '.', which is why "10.txt" is before "1.txt"). The solution, as all
|
||
data-entering people know, is to pad the number with leading '0's:
|
||
|
||
``` text
|
||
$ ls
|
||
01.txt
|
||
02.txt
|
||
03.txt
|
||
04.txt
|
||
05.txt
|
||
06.txt
|
||
07.txt
|
||
08.txt
|
||
09.txt
|
||
10.txt
|
||
11.txt
|
||
12.txt
|
||
```
|
||
|
||
Now the names are lexicographically sorted in the right numerical order[^confusing-yes].
|
||
|
||
So, now that we're all expert lexicographicographers, we understand that our IDs are just
|
||
supposed to naturally sort themselves in the correct order, based on when they were created; IDs
|
||
created later should sort after IDs created earlier.
|
||
|
||
The implementation for my ULIDs only guaranteed this property for the string form of them, but I was
|
||
not storing them in string from. Fundamentally, the ULID was a simple [128-bit primitive
|
||
integer](https://doc.rust-lang.org/std/primitive.u128.html), capable of holding values between 0 and
|
||
340,282,366,920,938,463,463,374,607,431,768,211,455.
|
||
|
||
But there's a problem: I was storing the ID in the database as a sequence of 16 bytes. I was asking
|
||
for those bytes in "native endian", which in my case, meant "little endian". If you're not familiar
|
||
with endianness, there are two varieties: big, and little. "Big" makes the most sense for a lot of
|
||
people; if you see a number like "512", it's big-endian; the end is the part that's left-most, and
|
||
"big" means that it is the most-significant-digit. This is the same as what westerners think of as
|
||
"normal" numbers. In the number "512", the "most significant digit" is `5`, which correspends to
|
||
`500`, which is added to the next-most-significant digit, `1`, corresponding to `10`, which is added
|
||
to the next-most-significant digit, which is also the least-most-significant-digit, which is `2`,
|
||
which is just `2`, giving us the full number `512`.
|
||
|
||
If we put the least-significant-digit first, we'd write the number `512` as "215"; the order when
|
||
written out would be reversed. This means that the lexicographic sort of `512, 521` would have "125"
|
||
come before "215", which is backwards.
|
||
|
||
Little-endianness is like that. If a multibyte numeric value is on a little-endian system, the
|
||
least-significant bytes will come first, and a lexicographic sorting of those bytes would be
|
||
non-numeric.
|
||
|
||
The solution, though, is simple: just write them out in big-endian order! This was literally a
|
||
one-line change in the code, to switch from `to_ne_bytes()` ("ne" for "native endian") to
|
||
`to_be_bytes()`. I confirmed that the bytes written into were being written in the correct
|
||
lexicographic order:
|
||
|
||
``` sql
|
||
sqlite> select hex(id), username from users order by id asc;
|
||
018903CDDCAAB0C6872A4509F396D388|first_user
|
||
018903D0E591525EA42202FF461AA5FA|second_user
|
||
```
|
||
|
||
Note the first six characters are the same, for these two users created some time apart[^ulid-timestamps].
|
||
|
||
Boom. "Sorted".
|
||
|
||
## The actual problem
|
||
|
||
Except that the logins were still broken; it wasn't just the test. What was even stranger is that
|
||
with advanced debugging techniques[^advanced-debugging], I confirmed that the login *was*
|
||
working. By which I mean, when the user submitted a login request, the function that handled the
|
||
request was:
|
||
|
||
- correctly confirming password match
|
||
- retrieving the user from the database
|
||
|
||
The second thing was required for the first. It was even creating a session in the session table:
|
||
|
||
``` sql
|
||
sqlite> select * from async_sessions;
|
||
..|..|{"id":"ZY...","expiry":"...","data":{"_user_id":"[1,137,3,205,220,170,176,198,135,42,69,9,243,150,211,136]","_auth_id":"\"oM..."}}
|
||
```
|
||
|
||
I noticed that the ID was present in the session entry, but as what looked like an array of decimal
|
||
values. The less not-astute among you may have noticed that the session table seemed to be using
|
||
JSON to store information. This wasn't my code, but it was easy enough to find the
|
||
[culprit](https://github.com/http-rs/async-session/blob/d28cef30c7da38f52639b3d60fc8cf4489c92830/src/session.rs#L214):
|
||
|
||
``` rust
|
||
pub fn insert(&mut self, key: &str, value: impl Serialize) -> Result<(), serde_json::Error> {
|
||
self.insert_raw(key, serde_json::to_string(&value)?);
|
||
Ok(())
|
||
}
|
||
```
|
||
|
||
This was in the [external library](https://docs.rs/async-session/latest/async_session/) I was using
|
||
to provide cookie-based sessions for my web app, and was transitively invoked when I called the
|
||
`login()` method in my own code. Someone else was serializing my IDs, in a way I hadn't anticipated!
|
||
|
||
The way that Serde decides what code to call is based on its [data
|
||
model](https://serde.rs/data-model.html). And wouldn't you know it, the following words are right
|
||
there, hiding in plain sight, as they had been all along:
|
||
|
||
> When deserializing a data structure from some format, the Deserialize implementation for the data
|
||
> structure is responsible for mapping the data structure into the Serde data model by passing to
|
||
> the Deserializer a Visitor implementation that can receive the various types of the data model...
|
||
>
|
||
> [...]
|
||
>
|
||
> * seq
|
||
> - A variably sized heterogeneous sequence of values, for example Vec<T> or HashSet<T>. ...
|
||
>
|
||
> [...]
|
||
>
|
||
> The flexibility around mapping into the Serde data model is profound and powerful. When
|
||
> implementing Serialize and Deserialize, be aware of the broader context of your type that may make
|
||
> the most instinctive mapping not the best choice.
|
||
|
||
Well, when you put it that way, I can't help but understand: I needed to implement a `visit_seq()`
|
||
method in my deserialization code.
|
||
|
||
![fine, fine, I see the light][see_the_light]
|
||
<div class = "caption">fine, fine, i see the light</div>
|
||
|
||
You can see that
|
||
[here](https://gitlab.com/nebkor/ww/-/blob/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L194-216)
|
||
if you'd like, but I'll actually come back to it in a second. The important part was that my logins
|
||
were working again; time to party!
|
||
|
||
## Wait, why *isn't* it broken?
|
||
|
||
I'd just spent the day banging my head against this problem, and so when everything worked again, I
|
||
committed and pushed the change and signed off. But something was still bothering me, and the next
|
||
day, I dove back into it.
|
||
|
||
|
||
All my serialization code was calling a method called
|
||
[`bytes()`](https://gitlab.com/nebkor/ww/-/blob/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L18),
|
||
which simply called another method that would return an array of 16 bytes, in big-endian order, so
|
||
it could go into the database and be sortable, as discussed.
|
||
|
||
But all[^actually_not_all] my *deserialization* code was constructing the IDs as [though the bytes
|
||
were
|
||
*little*-endian](https://gitlab.com/nebkor/ww/-/blob/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L212). Which
|
||
lead me to ask:
|
||
|
||
what the fuck?
|
||
|
||
Like, everything was *working*. Why did I need to construct from a different byte order? I felt like
|
||
I was losing my mind, so I reached out to the [Recurse Center](https://www.recurse.com) community
|
||
and presented my case.
|
||
|
||
Basically, I showed that bytes were written correctly, resident in the DB in big-endian form, but
|
||
then were "backwards" coming out and "had to be" cast using little-endian constructors
|
||
("`from_ne_bytes()`").
|
||
|
||
What had actually happened is that as long as there was agreement about what order to use for reconstructing the
|
||
ID from the bytes, it didn't matter if it was big or little-endian, it just had to be the same on
|
||
both the
|
||
[SQLx](https://gitlab.com/nebkor/ww/-/commit/84d70336d39293294fd47b4cf115c70091552c11#ce34dd57be10530addc52a3273548f2b8d3b8a9b_106_105)
|
||
side and on the
|
||
[Serde](https://gitlab.com/nebkor/ww/-/commit/84d70336d39293294fd47b4cf115c70091552c11#ce34dd57be10530addc52a3273548f2b8d3b8a9b_210_209)
|
||
side. This is also irrespective of the order they were written out in, but again, the two sides must
|
||
agree on the convention used. Inside the Serde method, I had added some debug printing of the bytes
|
||
it was getting, and they were in little-endian order. What I had not realized is that that was
|
||
because they were first passing through the SQLx method which reversed them.
|
||
|
||
Mmmmm, delicious, delicous red herring.
|
||
|
||
Two people were especially helpful, Julia Evans and Nicole Tietz-Sokolskaya; Julia grabbed a copy of
|
||
my database file and poked it with Python, and could not replicate the behavior I was seeing, and
|
||
Nicole did the same but with a little Rust program she wrote. Huge thanks to both of them (but not
|
||
just them) for the extended [rubber ducking](https://en.wikipedia.org/wiki/Rubber_duck_debugging)!
|
||
And apologies for the initial gas-lighting; Julia was quite patient and diplomatic when pushing back
|
||
against "the bytes are coming out of the db backwards".
|
||
|
||
|
||
# Lessons learned
|
||
|
||
Welp, here we are, the end of the line; I hope this has been informative, or barring that, at least
|
||
entertaining. Or the other way around, I'm not that fussy!
|
||
|
||
Obviously, the biggest mistake was to futz with being clever about endianness before understanding
|
||
why the login code was now failing. Had I gotten it working correctly first, I would have been able to
|
||
figure out the requirement for agreement on convention between the two different serialization
|
||
systems much sooner, and I would not have wasted mine and others' time on misunderstanding.
|
||
|
||
On the other hand, it's hard to see these things on the first try, especially when you're on your
|
||
own, and are on your first fumbling steps in a new domain or ecosystem; for me, that was getting
|
||
into the nitty-gritty with Serde, and for that matter, dealing directly with serialization-specific
|
||
issues. Collaboration is a great technique for navigating these situations, and I definitely need to
|
||
focus a bit more on enabling that[^solo-yolo-dev].
|
||
|
||
In the course of debugging this issue, I tried to get more insight via
|
||
[testing](https://gitlab.com/nebkor/ww/-/commit/656e6dceedf0d86e2805e000c9821e931958a920#ce34dd57be10530addc52a3273548f2b8d3b8a9b_143_251),
|
||
and though that helped a little, it was not nearly enough; the problem was that I misunderstood how
|
||
something worked, not that I had mistakenly implemented something I was comfortable with. Tests
|
||
aren't a substitute for understanding!
|
||
|
||
And of course, I'm now much more confident and comfortable with Serde; reading the Serde code for
|
||
other things, like [UUIDs](https://github.com/uuid-rs/uuid/blob/main/src/external/serde_support.rs),
|
||
is no longer an exercise in eye-glaze-control. Maybe this has helped you with that too?
|
||
|
||
----
|
||
|
||
[^uuidv4_random]: Technically, most v4 UUIDs have only 122 random bits, as six out of 128 are
|
||
reserved for version information.
|
||
|
||
[^blob-of-bytes]: Some databases have direct support for 128-bit primitive values (numbers). The
|
||
database I'm using, SQLite, only supports up to 64-bit primitive values, but it does support
|
||
arbitrary-length sequences of bytes called "blobs".
|
||
|
||
[^sqlite-dataclasses]: I'm using [SQLite](https://www.sqlite.org/index.html) for reasons that I plan
|
||
to dive into in a different post, but "blob" is specific to it. In general, you'll probably want
|
||
to take advantage of implementation-specific features of whatever database you're using, which
|
||
means that your table definitions won't be fully portable to a different database. This is fine
|
||
and good, actually!
|
||
|
||
[^no-stinkin-benches]: You may wonder: have I benchmarked this system with UUIDs vs. ULIDs? Ha ha,
|
||
you must have never met a programmer before! No, of course not. But, that's coming in a
|
||
follow-up.
|
||
|
||
[^rust-generics]: If the code you're using has
|
||
[generics](https://doc.rust-lang.org/book/ch10-01-syntax.html) in it, then the compiler needs to
|
||
generate specialized versions of that generic code based on how you use it; this is called
|
||
"[monomorphization](https://doc.rust-lang.org/book/ch10-01-syntax.html#performance-of-code-using-generics)",
|
||
and it requires the original generic source to work. That's also true in C++, which is why most
|
||
templated code is [header-only](https://isocpp.org/wiki/faq/templates#templates-defn-vs-decl),
|
||
but Rust doesn't have header files.
|
||
|
||
[^uulsid]: I guess the extra 'U' and 'S' are invisible.
|
||
|
||
[^confusing-yes]: Is this confusing? Yes, 100%, it is not just you. Don't get discouraged.
|
||
|
||
[^ulid-timestamps]: The 6 most-significant bytes make up the timestamp in a ULID, which in the hex
|
||
dump form pasted there would be the first twelve characters, since each byte is two hex
|
||
digits.
|
||
|
||
[^advanced-debugging]: "adding `dbg!()` statements in the code"
|
||
|
||
[^actually_not_all]: Upon further review, I discovered that the only methods that were constructing
|
||
with little-endian order were the SQLx `decode()` method, and the Serde `visit_seq()` method,
|
||
which were also the only ones that were being called at all. The
|
||
[`visit_bytes()`](https://gitlab.com/nebkor/ww/-/blob/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L152)
|
||
and `visit_byte_buf()` methods, that I had thought were so important, were correctly treating
|
||
the bytes as big-endian, but were simply never actually used. I fixed [in the next
|
||
commit](https://gitlab.com/nebkor/ww/-/commit/84d70336d39293294fd47b4cf115c70091552c11#ce34dd57be10530addc52a3273548f2b8d3b8a9b)
|
||
|
||
[^solo-yolo-dev]: I've described my current practices as "solo-yolo", which has its plusses and
|
||
minuses, as you may imagine.
|
||
|
||
|
||
[thats_a_database]: ./thats_a_database.png "simpsons that's-a-paddling guy"
|
||
|
||
[see_the_light]: ./seen_the_light.png "jake blues seeing the light"
|