checkpoint; recurse
This commit is contained in:
parent
64d5597802
commit
a30f2188ca
2 changed files with 109 additions and 6 deletions
|
@ -28,7 +28,7 @@ Most large distributed programs that people interact with daily via HTTP are, in
|
||||||
facade for some kind of database. Facebook? That's a database. Gmail? That's a database.
|
facade for some kind of database. Facebook? That's a database. Gmail? That's a database.
|
||||||
|
|
||||||
![that's a database][thats_a_database]
|
![that's a database][thats_a_database]
|
||||||
<center><span class="caption">wikipedia? that's a database.</span></center>
|
<div class="caption">wikipedia? that's a database.</div>
|
||||||
|
|
||||||
In most databases, each entry ("row") has a field that acts as a [primary
|
In most databases, each entry ("row") has a field that acts as a [primary
|
||||||
key](https://en.wikipedia.org/wiki/Primary_key), used to uniquely identify that row inside the table
|
key](https://en.wikipedia.org/wiki/Primary_key), used to uniquely identify that row inside the table
|
||||||
|
@ -333,17 +333,104 @@ non-numeric.
|
||||||
|
|
||||||
The solution, though, is simple: just write them out in big-endian order! This was literally a
|
The solution, though, is simple: just write them out in big-endian order! This was literally a
|
||||||
one-line change in the code, to switch from `to_ne_bytes()` ("ne" for "native endian") to
|
one-line change in the code, to switch from `to_ne_bytes()` ("ne" for "native endian") to
|
||||||
`to_be_bytes()`.
|
`to_be_bytes()`. I confirmed that the bytes written into were being written in the correct
|
||||||
|
lexicographic order:
|
||||||
|
|
||||||
Boom. Sorted.
|
``` sql
|
||||||
|
sqlite> select hex(id), username from users order by id asc;
|
||||||
|
018903CDDCAAB0C6872A4509F396D388|first_user
|
||||||
|
018903D0E591525EA42202FF461AA5FA|second_user
|
||||||
|
```
|
||||||
|
|
||||||
|
Note the first six characters are the same, for these two users created some time apart[^ulid-timestamps].
|
||||||
|
|
||||||
|
Boom. "Sorted".
|
||||||
|
|
||||||
## The actual problem
|
## The actual problem
|
||||||
|
|
||||||
there was no visitor for seqs, which is what json byte arrays are, and what async_sessions was doing.
|
Except that the logins were still broken; it wasn't just the test. What was even stranger is that
|
||||||
|
with advanced debugging techniques[^advanced-debugging], I confirmed that the login *was*
|
||||||
|
working. By which I mean, when the user submitted a login request, the function that handled the
|
||||||
|
request was:
|
||||||
|
|
||||||
## Wait, why isn't it broken?
|
- correctly confirming password match
|
||||||
|
- retrieving the user from the database
|
||||||
|
|
||||||
oh, it's not
|
The second thing was required for the first. It was even creating a session in the session table:
|
||||||
|
|
||||||
|
``` sql
|
||||||
|
sqlite> select * from async_sessions;
|
||||||
|
..|..|{"id":"ZY...","expiry":"...","data":{"_user_id":"[1,137,3,205,220,170,176,198,135,42,69,9,243,150,211,136]","_auth_id":"\"oM..."}}
|
||||||
|
```
|
||||||
|
|
||||||
|
I noticed that the ID was present in the session entry, but as what looked like an array of decimal
|
||||||
|
values. The less not-astute among you may have noticed that the session table seemed to be using
|
||||||
|
JSON to store information. This wasn't my code, but it was easy enough to find the
|
||||||
|
[culprit](https://github.com/http-rs/async-session/blob/d28cef30c7da38f52639b3d60fc8cf4489c92830/src/session.rs#L214):
|
||||||
|
|
||||||
|
``` rust
|
||||||
|
pub fn insert(&mut self, key: &str, value: impl Serialize) -> Result<(), serde_json::Error> {
|
||||||
|
self.insert_raw(key, serde_json::to_string(&value)?);
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This was in the [external library](https://docs.rs/async-session/latest/async_session/) I was using
|
||||||
|
to provide cookie-based sessions for my web app, and was transitively invoked when I called the
|
||||||
|
`login()` method in my own code. Someone else was serializing my IDs, in a way I hadn't anticipated!
|
||||||
|
|
||||||
|
The way that Serde decides what code to call is based on its [data
|
||||||
|
model](https://serde.rs/data-model.html). And wouldn't you know it, the following words are right
|
||||||
|
there, hiding in plain sight, as they had been all along:
|
||||||
|
|
||||||
|
> When deserializing a data structure from some format, the Deserialize implementation for the data
|
||||||
|
> structure is responsible for mapping the data structure into the Serde data model by passing to
|
||||||
|
> the Deserializer a Visitor implementation that can receive the various types of the data model...
|
||||||
|
>
|
||||||
|
> [...]
|
||||||
|
>
|
||||||
|
> * seq
|
||||||
|
> - A variably sized heterogeneous sequence of values, for example Vec<T> or HashSet<T>. ...
|
||||||
|
>
|
||||||
|
> [...]
|
||||||
|
>
|
||||||
|
> The flexibility around mapping into the Serde data model is profound and powerful. When
|
||||||
|
> implementing Serialize and Deserialize, be aware of the broader context of your type that may make
|
||||||
|
> the most instinctive mapping not the best choice.
|
||||||
|
|
||||||
|
Well, when you put it that way, I can't help but understand: I needed to implement a `visit_seq()`
|
||||||
|
method in my deserialization code.
|
||||||
|
|
||||||
|
![fine, fine, I see the light][see_the_light]
|
||||||
|
<div class = "caption">fine, fine, i see the light</div>
|
||||||
|
|
||||||
|
You can see that
|
||||||
|
[here](https://gitlab.com/nebkor/ww/-/blob/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L194-216)
|
||||||
|
if you'd like, but I'll actually come back to it in a second. The important part was that my logins
|
||||||
|
were working again; time to party!
|
||||||
|
|
||||||
|
## Wait, why *isn't* it broken?
|
||||||
|
|
||||||
|
I'd just spent the day banging my head against this problem, and so when everything worked again, I
|
||||||
|
committed and pushed the change and signed off. But something was still bothering me, and the next
|
||||||
|
day, I dove back into it.
|
||||||
|
|
||||||
|
|
||||||
|
All my serialization code was calling a method called
|
||||||
|
[`bytes()`](https://gitlab.com/nebkor/ww/-/blob/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L18),
|
||||||
|
which simply called `another method that would return an array of 16 bytes, in big-endian order, so
|
||||||
|
it could go into the database and be sortable, as discussed.
|
||||||
|
|
||||||
|
But all[^actually_not_all] my *deserialization* code was constructing the IDs as [though the bytes
|
||||||
|
were
|
||||||
|
*little*-endian](https://gitlab.com/nebkor/ww/-/blob/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L212). Which
|
||||||
|
lead me to ask:
|
||||||
|
|
||||||
|
what the fuck?
|
||||||
|
|
||||||
|
Like, everything was *working*. Why did I need to construct from a different byte order? I felt like
|
||||||
|
I was losing my mind, so I reached out to the [Recurse Center](https://www.recurse.com) community
|
||||||
|
and presented my case.
|
||||||
|
|
||||||
# Lessons learned
|
# Lessons learned
|
||||||
|
|
||||||
|
@ -382,5 +469,21 @@ automated tests aren't enough
|
||||||
|
|
||||||
[^confusing-yes]: Is this confusing? Yes, 100%, it is not just you. Don't get discouraged.
|
[^confusing-yes]: Is this confusing? Yes, 100%, it is not just you. Don't get discouraged.
|
||||||
|
|
||||||
|
[^ulid-timestamps]: The 7 most-significant bytes make up the timestamp in a ULID, which in the hex
|
||||||
|
dump form pasted there would be the first fourteen characters, since each byte is two hex
|
||||||
|
digits.
|
||||||
|
|
||||||
|
[^advanced-debugging]: "adding `dbg!()` statements in the code"
|
||||||
|
|
||||||
|
[^actually_not_all]: Upon further review, I discovered that the only methods that were constructing
|
||||||
|
with little-endian order were the SQLx `decode()` method, and the Serde `visit_seq()` method,
|
||||||
|
which were also the only ones that were being called at all. The
|
||||||
|
[`visit_bytes()`](https://gitlab.com/nebkor/ww/-/blob/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L152)
|
||||||
|
and `visit_byte_buf()` methods, that I had thought were so important, were correctly treating
|
||||||
|
the bytes as big-endian, but were simply never actually used. I fixed [in the next
|
||||||
|
commit](https://gitlab.com/nebkor/ww/-/commit/84d70336d39293294fd47b4cf115c70091552c11#ce34dd57be10530addc52a3273548f2b8d3b8a9b)
|
||||||
|
|
||||||
|
|
||||||
[thats_a_database]: ./thats_a_database.png "that's a database"
|
[thats_a_database]: ./thats_a_database.png "that's a database"
|
||||||
|
|
||||||
|
[see_the_light]: ./seen_the_light.png
|
||||||
|
|
BIN
content/rnd/a_serialized_mystery/seen_the_light.png
Normal file
BIN
content/rnd/a_serialized_mystery/seen_the_light.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 330 KiB |
Loading…
Reference in a new issue