checkpoint; recurse

This commit is contained in:
Joe Ardent 2023-06-28 16:51:56 -07:00
parent 64d5597802
commit a30f2188ca
2 changed files with 109 additions and 6 deletions

View file

@ -28,7 +28,7 @@ Most large distributed programs that people interact with daily via HTTP are, in
facade for some kind of database. Facebook? That's a database. Gmail? That's a database.
![that's a database][thats_a_database]
<center><span class="caption">wikipedia? that's a database.</span></center>
<div class="caption">wikipedia? that's a database.</div>
In most databases, each entry ("row") has a field that acts as a [primary
key](https://en.wikipedia.org/wiki/Primary_key), used to uniquely identify that row inside the table
@ -333,17 +333,104 @@ non-numeric.
The solution, though, is simple: just write them out in big-endian order! This was literally a
one-line change in the code, to switch from `to_ne_bytes()` ("ne" for "native endian") to
`to_be_bytes()`.
`to_be_bytes()`. I confirmed that the bytes written into were being written in the correct
lexicographic order:
Boom. Sorted.
``` sql
sqlite> select hex(id), username from users order by id asc;
018903CDDCAAB0C6872A4509F396D388|first_user
018903D0E591525EA42202FF461AA5FA|second_user
```
Note the first six characters are the same, for these two users created some time apart[^ulid-timestamps].
Boom. "Sorted".
## The actual problem
there was no visitor for seqs, which is what json byte arrays are, and what async_sessions was doing.
Except that the logins were still broken; it wasn't just the test. What was even stranger is that
with advanced debugging techniques[^advanced-debugging], I confirmed that the login *was*
working. By which I mean, when the user submitted a login request, the function that handled the
request was:
## Wait, why isn't it broken?
- correctly confirming password match
- retrieving the user from the database
oh, it's not
The second thing was required for the first. It was even creating a session in the session table:
``` sql
sqlite> select * from async_sessions;
..|..|{"id":"ZY...","expiry":"...","data":{"_user_id":"[1,137,3,205,220,170,176,198,135,42,69,9,243,150,211,136]","_auth_id":"\"oM..."}}
```
I noticed that the ID was present in the session entry, but as what looked like an array of decimal
values. The less not-astute among you may have noticed that the session table seemed to be using
JSON to store information. This wasn't my code, but it was easy enough to find the
[culprit](https://github.com/http-rs/async-session/blob/d28cef30c7da38f52639b3d60fc8cf4489c92830/src/session.rs#L214):
``` rust
pub fn insert(&mut self, key: &str, value: impl Serialize) -> Result<(), serde_json::Error> {
self.insert_raw(key, serde_json::to_string(&value)?);
Ok(())
}
```
This was in the [external library](https://docs.rs/async-session/latest/async_session/) I was using
to provide cookie-based sessions for my web app, and was transitively invoked when I called the
`login()` method in my own code. Someone else was serializing my IDs, in a way I hadn't anticipated!
The way that Serde decides what code to call is based on its [data
model](https://serde.rs/data-model.html). And wouldn't you know it, the following words are right
there, hiding in plain sight, as they had been all along:
> When deserializing a data structure from some format, the Deserialize implementation for the data
> structure is responsible for mapping the data structure into the Serde data model by passing to
> the Deserializer a Visitor implementation that can receive the various types of the data model...
>
> [...]
>
> * seq
> - A variably sized heterogeneous sequence of values, for example Vec<T> or HashSet<T>. ...
>
> [...]
>
> The flexibility around mapping into the Serde data model is profound and powerful. When
> implementing Serialize and Deserialize, be aware of the broader context of your type that may make
> the most instinctive mapping not the best choice.
Well, when you put it that way, I can't help but understand: I needed to implement a `visit_seq()`
method in my deserialization code.
![fine, fine, I see the light][see_the_light]
<div class = "caption">fine, fine, i see the light</div>
You can see that
[here](https://gitlab.com/nebkor/ww/-/blob/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L194-216)
if you'd like, but I'll actually come back to it in a second. The important part was that my logins
were working again; time to party!
## Wait, why *isn't* it broken?
I'd just spent the day banging my head against this problem, and so when everything worked again, I
committed and pushed the change and signed off. But something was still bothering me, and the next
day, I dove back into it.
All my serialization code was calling a method called
[`bytes()`](https://gitlab.com/nebkor/ww/-/blob/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L18),
which simply called `another method that would return an array of 16 bytes, in big-endian order, so
it could go into the database and be sortable, as discussed.
But all[^actually_not_all] my *deserialization* code was constructing the IDs as [though the bytes
were
*little*-endian](https://gitlab.com/nebkor/ww/-/blob/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L212). Which
lead me to ask:
what the fuck?
Like, everything was *working*. Why did I need to construct from a different byte order? I felt like
I was losing my mind, so I reached out to the [Recurse Center](https://www.recurse.com) community
and presented my case.
# Lessons learned
@ -382,5 +469,21 @@ automated tests aren't enough
[^confusing-yes]: Is this confusing? Yes, 100%, it is not just you. Don't get discouraged.
[^ulid-timestamps]: The 7 most-significant bytes make up the timestamp in a ULID, which in the hex
dump form pasted there would be the first fourteen characters, since each byte is two hex
digits.
[^advanced-debugging]: "adding `dbg!()` statements in the code"
[^actually_not_all]: Upon further review, I discovered that the only methods that were constructing
with little-endian order were the SQLx `decode()` method, and the Serde `visit_seq()` method,
which were also the only ones that were being called at all. The
[`visit_bytes()`](https://gitlab.com/nebkor/ww/-/blob/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L152)
and `visit_byte_buf()` methods, that I had thought were so important, were correctly treating
the bytes as big-endian, but were simply never actually used. I fixed [in the next
commit](https://gitlab.com/nebkor/ww/-/commit/84d70336d39293294fd47b4cf115c70091552c11#ce34dd57be10530addc52a3273548f2b8d3b8a9b)
[thats_a_database]: ./thats_a_database.png "that's a database"
[see_the_light]: ./seen_the_light.png

Binary file not shown.

After

Width:  |  Height:  |  Size: 330 KiB