diff --git a/content/rnd/a_serialized_mystery/index.md b/content/rnd/a_serialized_mystery/index.md index 168db89..bf3cce0 100644 --- a/content/rnd/a_serialized_mystery/index.md +++ b/content/rnd/a_serialized_mystery/index.md @@ -28,7 +28,7 @@ Most large distributed programs that people interact with daily via HTTP are, in facade for some kind of database. Facebook? That's a database. Gmail? That's a database. ![that's a database][thats_a_database] -
wikipedia? that's a database.
+
wikipedia? that's a database.
In most databases, each entry ("row") has a field that acts as a [primary key](https://en.wikipedia.org/wiki/Primary_key), used to uniquely identify that row inside the table @@ -333,17 +333,104 @@ non-numeric. The solution, though, is simple: just write them out in big-endian order! This was literally a one-line change in the code, to switch from `to_ne_bytes()` ("ne" for "native endian") to -`to_be_bytes()`. +`to_be_bytes()`. I confirmed that the bytes written into were being written in the correct +lexicographic order: -Boom. Sorted. +``` sql +sqlite> select hex(id), username from users order by id asc; +018903CDDCAAB0C6872A4509F396D388|first_user +018903D0E591525EA42202FF461AA5FA|second_user +``` + +Note the first six characters are the same, for these two users created some time apart[^ulid-timestamps]. + +Boom. "Sorted". ## The actual problem -there was no visitor for seqs, which is what json byte arrays are, and what async_sessions was doing. +Except that the logins were still broken; it wasn't just the test. What was even stranger is that +with advanced debugging techniques[^advanced-debugging], I confirmed that the login *was* +working. By which I mean, when the user submitted a login request, the function that handled the +request was: -## Wait, why isn't it broken? + - correctly confirming password match + - retrieving the user from the database -oh, it's not +The second thing was required for the first. It was even creating a session in the session table: + +``` sql +sqlite> select * from async_sessions; +..|..|{"id":"ZY...","expiry":"...","data":{"_user_id":"[1,137,3,205,220,170,176,198,135,42,69,9,243,150,211,136]","_auth_id":"\"oM..."}} +``` + +I noticed that the ID was present in the session entry, but as what looked like an array of decimal +values. The less not-astute among you may have noticed that the session table seemed to be using +JSON to store information. This wasn't my code, but it was easy enough to find the +[culprit](https://github.com/http-rs/async-session/blob/d28cef30c7da38f52639b3d60fc8cf4489c92830/src/session.rs#L214): + +``` rust + pub fn insert(&mut self, key: &str, value: impl Serialize) -> Result<(), serde_json::Error> { + self.insert_raw(key, serde_json::to_string(&value)?); + Ok(()) + } +``` + +This was in the [external library](https://docs.rs/async-session/latest/async_session/) I was using +to provide cookie-based sessions for my web app, and was transitively invoked when I called the +`login()` method in my own code. Someone else was serializing my IDs, in a way I hadn't anticipated! + +The way that Serde decides what code to call is based on its [data +model](https://serde.rs/data-model.html). And wouldn't you know it, the following words are right +there, hiding in plain sight, as they had been all along: + +> When deserializing a data structure from some format, the Deserialize implementation for the data +> structure is responsible for mapping the data structure into the Serde data model by passing to +> the Deserializer a Visitor implementation that can receive the various types of the data model... +> +> [...] +> +> * seq +> - A variably sized heterogeneous sequence of values, for example Vec or HashSet. ... +> +> [...] +> +> The flexibility around mapping into the Serde data model is profound and powerful. When +> implementing Serialize and Deserialize, be aware of the broader context of your type that may make +> the most instinctive mapping not the best choice. + +Well, when you put it that way, I can't help but understand: I needed to implement a `visit_seq()` +method in my deserialization code. + +![fine, fine, I see the light][see_the_light] +
fine, fine, i see the light
+ +You can see that +[here](https://gitlab.com/nebkor/ww/-/blob/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L194-216) +if you'd like, but I'll actually come back to it in a second. The important part was that my logins +were working again; time to party! + +## Wait, why *isn't* it broken? + +I'd just spent the day banging my head against this problem, and so when everything worked again, I +committed and pushed the change and signed off. But something was still bothering me, and the next +day, I dove back into it. + + +All my serialization code was calling a method called +[`bytes()`](https://gitlab.com/nebkor/ww/-/blob/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L18), +which simply called `another method that would return an array of 16 bytes, in big-endian order, so +it could go into the database and be sortable, as discussed. + +But all[^actually_not_all] my *deserialization* code was constructing the IDs as [though the bytes +were +*little*-endian](https://gitlab.com/nebkor/ww/-/blob/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L212). Which +lead me to ask: + +what the fuck? + +Like, everything was *working*. Why did I need to construct from a different byte order? I felt like +I was losing my mind, so I reached out to the [Recurse Center](https://www.recurse.com) community +and presented my case. # Lessons learned @@ -382,5 +469,21 @@ automated tests aren't enough [^confusing-yes]: Is this confusing? Yes, 100%, it is not just you. Don't get discouraged. +[^ulid-timestamps]: The 7 most-significant bytes make up the timestamp in a ULID, which in the hex + dump form pasted there would be the first fourteen characters, since each byte is two hex + digits. + +[^advanced-debugging]: "adding `dbg!()` statements in the code" + +[^actually_not_all]: Upon further review, I discovered that the only methods that were constructing + with little-endian order were the SQLx `decode()` method, and the Serde `visit_seq()` method, + which were also the only ones that were being called at all. The + [`visit_bytes()`](https://gitlab.com/nebkor/ww/-/blob/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L152) + and `visit_byte_buf()` methods, that I had thought were so important, were correctly treating + the bytes as big-endian, but were simply never actually used. I fixed [in the next + commit](https://gitlab.com/nebkor/ww/-/commit/84d70336d39293294fd47b4cf115c70091552c11#ce34dd57be10530addc52a3273548f2b8d3b8a9b) + [thats_a_database]: ./thats_a_database.png "that's a database" + +[see_the_light]: ./seen_the_light.png diff --git a/content/rnd/a_serialized_mystery/seen_the_light.png b/content/rnd/a_serialized_mystery/seen_the_light.png new file mode 100644 index 0000000..96a5702 Binary files /dev/null and b/content/rnd/a_serialized_mystery/seen_the_light.png differ