tweakitty-tweak

This commit is contained in:
Joe Ardent 2023-07-30 10:11:25 -07:00
parent f7aa8840bd
commit 5f03effbf3
1 changed files with 42 additions and 38 deletions

View File

@ -7,10 +7,10 @@ tags = ["software", "sundry", "proclamation", "sqlite", "rust", "ulid", "julid"]
+++ +++
# Presenting Julids # Presenting Julids
Nebcorp Heavy Industries and Sundries, long a world leader in sundries, is proud to present the Nebcorp Heavy Industries and Sundries, long the world leader in sundries, is proud to announce the
official globally unique sortable identifier type for all Nebcorp HIAS', and all Nebcorp companies' public launch of the official identifier type for all Nebcorp companies' assets and database
database entities, [Julids](https://gitlab.com/nebkor/julid). Julids are globally unique sortable entries, [Julids](https://gitlab.com/nebkor/julid). Julids are globally unique sortable identifiers,
identifiers, backwards-compatible with [ULIDs](https://github.com/ulid/spec). backwards-compatible with [ULIDs](https://github.com/ulid/spec), but better.
Inside your Rust program, simply add `julid-rs` to your project's `Cargo.toml` file, and use it Inside your Rust program, simply add `julid-rs` to your project's `Cargo.toml` file, and use it
like: like:
@ -27,8 +27,8 @@ fn main() {
Such a program would output something like: Such a program would output something like:
``` text ``` text
[main.rs:2] id.created_at() = 2023-07-29T20:21:50.009Z [main.rs:5] id.created_at() = 2023-07-29T20:21:50.009Z
[main.rs:2] id.as_string() = "01H6HN10SS00020YT344XMGA3C" [main.rs:5] id.as_string() = "01H6HN10SS00020YT344XMGA3C"
``` ```
However, it can also be built as a [loadable extension](https://www.sqlite.org/loadext.html) for However, it can also be built as a [loadable extension](https://www.sqlite.org/loadext.html) for
@ -69,11 +69,11 @@ they do:
* IDs created within the same millisecond are still meant to sort in their order of creation * IDs created within the same millisecond are still meant to sort in their order of creation
Julids and ULIDs have different ways to implement that last piece. If you look at the layout of bits Julids and ULIDs have different ways to implement that last piece. If you look at the layout of bits
in a ULID, they look like this: in a ULID, you see:
![ULID bit structure](./ulid.svg) ![ULID bit structure](./ulid.svg)
According to the ULID spec, for ULIDs created in the same millisecond, the least-significant bit According to the ULID spec, for ULIDs created within the same millisecond, the least-significant bit
should be incremented for each new ID. Since that portion of the ULID is random, that means you may should be incremented for each new ID. Since that portion of the ULID is random, that means you may
not be able to increment it without spilling into the timestamp portion. Likewise, it's easy to not be able to increment it without spilling into the timestamp portion. Likewise, it's easy to
guess a new possibly-valid ULID simply by incrementing an already-known one. And finally, this means guess a new possibly-valid ULID simply by incrementing an already-known one. And finally, this means
@ -86,18 +86,18 @@ To address these shortcomings, Julids (Joe's ULIDs) have the following structure
As with ULIDs, the 48 most-significant bits encode the time of creation. Unlike ULIDs, the next 16 As with ULIDs, the 48 most-significant bits encode the time of creation. Unlike ULIDs, the next 16
most-significant bits are not random: they're a monotonic counter for IDs created within the same most-significant bits are not random: they're a monotonic counter for IDs created within the same
millisecond[^monotonic]. Since it's only 16 bits, it will saturate after 65,536 IDs intra-millisecond creations, millisecond[^monotonic]. Since it's only 16 bits, it will saturate after 65,536 IDs
after which, IDs in that same millisecond will not have an intrinsic total order (the random bits intra-millisecond creations, after which, IDs in that same millisecond will not have an intrinsic
will still be different, so you shouldn't have collisions). My PC, which is no slouch, can only total order (the random bits will still be different, so you shouldn't have collisions). My PC,
generate about 20,000 per millisecond, so hopefully this is not an issue! Because the random bits which is no slouch, can only generate about 20,000 per millisecond, so hopefully this is not an
are always fresh, it's not possible to easily guess a valid Julid if you already have a different issue! Because the random bits are always fresh, it's not possible to easily guess a valid Julid if
valid one. you already have one.
# How to use # How to use
As mentioned, the Julid crate can be used in two different ways: as a regular Rust library, declared As noted, the Julid crate can be used in two different ways: as a regular Rust library, declared
in your Rust project's `Cargo.toml` file (say, by running `cargo add julid-rs`), and used as also in your Rust project's `Cargo.toml` file (say, by running `cargo add julid-rs`), and used as shown
shown above. There's a rudimentary above. There's a rudimentary
[benchmark](https://gitlab.com/nebkor/julid/-/blob/main/examples/benchmark.rs) example in the repo, [benchmark](https://gitlab.com/nebkor/julid/-/blob/main/examples/benchmark.rs) example in the repo,
which I'll talk more about below. But the primary use case for me was as a loadable SQLite which I'll talk more about below. But the primary use case for me was as a loadable SQLite
extension, as I [previously extension, as I [previously
@ -158,8 +158,8 @@ create table if not exists watches (
``` ```
and then [some and then [some
code](https://gitlab.com/nebkor/ww/-/blob/main/src/import_utils.rs?ref_type=heads#L92-126) that code](https://gitlab.com/nebkor/ww/-/blob/cc14c30fcfbd6cdaecd85d0ba629154d098b4be9/src/import_utils.rs#L92-126)
inserted rows into that table like that inserted rows into that table like
``` sql ``` sql
insert into watches (kind, title, length, release_date, added_by) values (?,?,?,?,?) insert into watches (kind, title, length, release_date, added_by) values (?,?,?,?,?)
@ -217,16 +217,10 @@ inside your Rust application, *especially* if you're also loading it as an exten
your application. You'll get a long and confusing runtime panic due to there being multiple your application. You'll get a long and confusing runtime panic due to there being multiple
entrypoints defined with the same name. entrypoints defined with the same name.
## Safety
There is one `unsafe fn` in this project, `sqlite_julid_init()`, and it is only built for the
`plugin` feature. The reason for it is that it's interacting with foreign code (SQLite itself) via
the C interface, which is inherently unsafe. If you are not building the plugin, there is no
`unsafe` code.
# Why Julids? # Why Julids?
The astute may note that this is the third time I've written recently about globally unique sortable The astute may have noticed that this is the third time I've written about globally unique
IDs ([here is part one](/rnd/one-part-serialized-mystery), and [part two is sortable IDs ([here is part one](/rnd/one-part-serialized-mystery), and [part two is
here](/rnd/one-part-serialized-mystery-part-2)). What's, uh... what's up with that? here](/rnd/one-part-serialized-mystery-part-2)). What's, uh... what's up with that?
![marge just thinks they're neat][marge ids] ![marge just thinks they're neat][marge ids]
@ -244,16 +238,24 @@ Like Marge says, I just think they're neat! I'm not the only one; here are just
the lower 64 bits for that, instead of UUIDv7's 62) the lower 64 bits for that, instead of UUIDv7's 62)
* [Snowflake ID](https://en.wikipedia.org/wiki/Snowflake_ID), developed by Twitter in 2010; these * [Snowflake ID](https://en.wikipedia.org/wiki/Snowflake_ID), developed by Twitter in 2010; these
are 63-bit identifiers (so they fit in a signed 64-bit number), where the top 41 bits are a are 63-bit identifiers (so they fit in a signed 64-bit number), where the top 41 bits are a
millisecond timestamp, the next 10 bits are a machine identifier[^twitter machine count], and the last 12 bits are for an millisecond timestamp, the next 10 bits are a machine identifier[^twitter machine count], and the
intra-millisecond sequence counter (what Julid calls a "monotonic counter") last 12 bits are for an intra-millisecond sequence counter (what Julid calls a "monotonic
counter"); unlike all the other IDs discussed, there are no random bits
and I'm sure the list can go on. and I'm sure the list can go on.
As for what I wanted them for, I wanted to use them in my Rust and SQLite-based [web I wanted to use them in my SQLite-backed [web app](https://gitlab.com/nebkor/ww), in order to fix
app](https://gitlab.com/nebkor/ww), in order to fix some deficiencies in ULIDs, as discussed. Now I some deficiencies in ULIDs and the way I was using them, as [I said
have no unshaved yaks to distract me from getting back to that. before](/rnd/one-part-serialized-mystery-part-2/#next-steps-with-ids):
So, is this the last I'll time I'll be writing at length about these things? It's hard to say for > [...] it bothers me that ID generation is not done inside the database itself. Aside from being
> a generally bad idea, this lead to at least one frustrating debug session where I was inserting
> one ID but reporting back another. SQLite doesn't have native support for this, but it does have
> good native support for loading shared libraries as plugins in order to add functionality to it,
> and so my next step is to write one of those, and remove the ID generation logic from the
> application.
So, is this the last time I'll time I'll be writing at length about these things? It's hard to say for
sure, but signs point to "yes". I hope you've found them at least a little interesting! sure, but signs point to "yes". I hope you've found them at least a little interesting!
# Thanks # Thanks
@ -267,12 +269,14 @@ hours. Thank you, authors of those crates! Feel free to steal from this project!
---- ----
[^monotonic]: At least, they will still have a total order if they're all generated within the same [^monotonic]: At least, they will still have a total order if they're all generated within the same
process in the same way; the crate and extension use an atomic u64 to ensure that IDs generated process in the same way; the code uses a [64-bit atomic
within the same millisecond have incremented counters, but that atomic counter is not global, so integer](https://gitlab.com/nebkor/julid/-/blob/2484d5156bde82a91dcc106410ed56ee0a5c1e07/src/julid.rs#L11-12)
calling `Julid::new()` in Rust and `select julid_new()` in SQLite will not be aware of each to ensure that IDs generated within the same millisecond have incremented counters, but that
others' counters. atomic counter is not global; calling `Julid::new()` in Rust and `select julid_new()` in SQLite
will not be aware of each others' counters. I just make sure to only generate them inside the
DB.
[^my computer]: According to the output of `lscpu`, my computer is an "AMD Ryzen 9 3900X 12-Core [^my computer]: According to the output of `lscpu`, my computer has an "AMD Ryzen 9 3900X 12-Core
Processor", running between 2.2 and 4.6 GHz. It's no slouch! Processor", running between 2.2 and 4.6 GHz. It's no slouch!
[^twitter machine count]: There are only ten bits for the machine ID, which means there are only [^twitter machine count]: There are only ten bits for the machine ID, which means there are only