tweakitty-tweak
This commit is contained in:
parent
f7aa8840bd
commit
5f03effbf3
1 changed files with 42 additions and 38 deletions
|
@ -7,10 +7,10 @@ tags = ["software", "sundry", "proclamation", "sqlite", "rust", "ulid", "julid"]
|
||||||
+++
|
+++
|
||||||
|
|
||||||
# Presenting Julids
|
# Presenting Julids
|
||||||
Nebcorp Heavy Industries and Sundries, long a world leader in sundries, is proud to present the
|
Nebcorp Heavy Industries and Sundries, long the world leader in sundries, is proud to announce the
|
||||||
official globally unique sortable identifier type for all Nebcorp HIAS', and all Nebcorp companies'
|
public launch of the official identifier type for all Nebcorp companies' assets and database
|
||||||
database entities, [Julids](https://gitlab.com/nebkor/julid). Julids are globally unique sortable
|
entries, [Julids](https://gitlab.com/nebkor/julid). Julids are globally unique sortable identifiers,
|
||||||
identifiers, backwards-compatible with [ULIDs](https://github.com/ulid/spec).
|
backwards-compatible with [ULIDs](https://github.com/ulid/spec), but better.
|
||||||
|
|
||||||
Inside your Rust program, simply add `julid-rs` to your project's `Cargo.toml` file, and use it
|
Inside your Rust program, simply add `julid-rs` to your project's `Cargo.toml` file, and use it
|
||||||
like:
|
like:
|
||||||
|
@ -27,8 +27,8 @@ fn main() {
|
||||||
Such a program would output something like:
|
Such a program would output something like:
|
||||||
|
|
||||||
``` text
|
``` text
|
||||||
[main.rs:2] id.created_at() = 2023-07-29T20:21:50.009Z
|
[main.rs:5] id.created_at() = 2023-07-29T20:21:50.009Z
|
||||||
[main.rs:2] id.as_string() = "01H6HN10SS00020YT344XMGA3C"
|
[main.rs:5] id.as_string() = "01H6HN10SS00020YT344XMGA3C"
|
||||||
```
|
```
|
||||||
|
|
||||||
However, it can also be built as a [loadable extension](https://www.sqlite.org/loadext.html) for
|
However, it can also be built as a [loadable extension](https://www.sqlite.org/loadext.html) for
|
||||||
|
@ -69,11 +69,11 @@ they do:
|
||||||
* IDs created within the same millisecond are still meant to sort in their order of creation
|
* IDs created within the same millisecond are still meant to sort in their order of creation
|
||||||
|
|
||||||
Julids and ULIDs have different ways to implement that last piece. If you look at the layout of bits
|
Julids and ULIDs have different ways to implement that last piece. If you look at the layout of bits
|
||||||
in a ULID, they look like this:
|
in a ULID, you see:
|
||||||
|
|
||||||
![ULID bit structure](./ulid.svg)
|
![ULID bit structure](./ulid.svg)
|
||||||
|
|
||||||
According to the ULID spec, for ULIDs created in the same millisecond, the least-significant bit
|
According to the ULID spec, for ULIDs created within the same millisecond, the least-significant bit
|
||||||
should be incremented for each new ID. Since that portion of the ULID is random, that means you may
|
should be incremented for each new ID. Since that portion of the ULID is random, that means you may
|
||||||
not be able to increment it without spilling into the timestamp portion. Likewise, it's easy to
|
not be able to increment it without spilling into the timestamp portion. Likewise, it's easy to
|
||||||
guess a new possibly-valid ULID simply by incrementing an already-known one. And finally, this means
|
guess a new possibly-valid ULID simply by incrementing an already-known one. And finally, this means
|
||||||
|
@ -86,18 +86,18 @@ To address these shortcomings, Julids (Joe's ULIDs) have the following structure
|
||||||
|
|
||||||
As with ULIDs, the 48 most-significant bits encode the time of creation. Unlike ULIDs, the next 16
|
As with ULIDs, the 48 most-significant bits encode the time of creation. Unlike ULIDs, the next 16
|
||||||
most-significant bits are not random: they're a monotonic counter for IDs created within the same
|
most-significant bits are not random: they're a monotonic counter for IDs created within the same
|
||||||
millisecond[^monotonic]. Since it's only 16 bits, it will saturate after 65,536 IDs intra-millisecond creations,
|
millisecond[^monotonic]. Since it's only 16 bits, it will saturate after 65,536 IDs
|
||||||
after which, IDs in that same millisecond will not have an intrinsic total order (the random bits
|
intra-millisecond creations, after which, IDs in that same millisecond will not have an intrinsic
|
||||||
will still be different, so you shouldn't have collisions). My PC, which is no slouch, can only
|
total order (the random bits will still be different, so you shouldn't have collisions). My PC,
|
||||||
generate about 20,000 per millisecond, so hopefully this is not an issue! Because the random bits
|
which is no slouch, can only generate about 20,000 per millisecond, so hopefully this is not an
|
||||||
are always fresh, it's not possible to easily guess a valid Julid if you already have a different
|
issue! Because the random bits are always fresh, it's not possible to easily guess a valid Julid if
|
||||||
valid one.
|
you already have one.
|
||||||
|
|
||||||
# How to use
|
# How to use
|
||||||
|
|
||||||
As mentioned, the Julid crate can be used in two different ways: as a regular Rust library, declared
|
As noted, the Julid crate can be used in two different ways: as a regular Rust library, declared
|
||||||
in your Rust project's `Cargo.toml` file (say, by running `cargo add julid-rs`), and used as also
|
in your Rust project's `Cargo.toml` file (say, by running `cargo add julid-rs`), and used as shown
|
||||||
shown above. There's a rudimentary
|
above. There's a rudimentary
|
||||||
[benchmark](https://gitlab.com/nebkor/julid/-/blob/main/examples/benchmark.rs) example in the repo,
|
[benchmark](https://gitlab.com/nebkor/julid/-/blob/main/examples/benchmark.rs) example in the repo,
|
||||||
which I'll talk more about below. But the primary use case for me was as a loadable SQLite
|
which I'll talk more about below. But the primary use case for me was as a loadable SQLite
|
||||||
extension, as I [previously
|
extension, as I [previously
|
||||||
|
@ -158,8 +158,8 @@ create table if not exists watches (
|
||||||
```
|
```
|
||||||
|
|
||||||
and then [some
|
and then [some
|
||||||
code](https://gitlab.com/nebkor/ww/-/blob/main/src/import_utils.rs?ref_type=heads#L92-126) that
|
code](https://gitlab.com/nebkor/ww/-/blob/cc14c30fcfbd6cdaecd85d0ba629154d098b4be9/src/import_utils.rs#L92-126)
|
||||||
inserted rows into that table like
|
that inserted rows into that table like
|
||||||
|
|
||||||
``` sql
|
``` sql
|
||||||
insert into watches (kind, title, length, release_date, added_by) values (?,?,?,?,?)
|
insert into watches (kind, title, length, release_date, added_by) values (?,?,?,?,?)
|
||||||
|
@ -217,16 +217,10 @@ inside your Rust application, *especially* if you're also loading it as an exten
|
||||||
your application. You'll get a long and confusing runtime panic due to there being multiple
|
your application. You'll get a long and confusing runtime panic due to there being multiple
|
||||||
entrypoints defined with the same name.
|
entrypoints defined with the same name.
|
||||||
|
|
||||||
## Safety
|
|
||||||
There is one `unsafe fn` in this project, `sqlite_julid_init()`, and it is only built for the
|
|
||||||
`plugin` feature. The reason for it is that it's interacting with foreign code (SQLite itself) via
|
|
||||||
the C interface, which is inherently unsafe. If you are not building the plugin, there is no
|
|
||||||
`unsafe` code.
|
|
||||||
|
|
||||||
# Why Julids?
|
# Why Julids?
|
||||||
|
|
||||||
The astute may note that this is the third time I've written recently about globally unique sortable
|
The astute may have noticed that this is the third time I've written about globally unique
|
||||||
IDs ([here is part one](/rnd/one-part-serialized-mystery), and [part two is
|
sortable IDs ([here is part one](/rnd/one-part-serialized-mystery), and [part two is
|
||||||
here](/rnd/one-part-serialized-mystery-part-2)). What's, uh... what's up with that?
|
here](/rnd/one-part-serialized-mystery-part-2)). What's, uh... what's up with that?
|
||||||
|
|
||||||
![marge just thinks they're neat][marge ids]
|
![marge just thinks they're neat][marge ids]
|
||||||
|
@ -244,16 +238,24 @@ Like Marge says, I just think they're neat! I'm not the only one; here are just
|
||||||
the lower 64 bits for that, instead of UUIDv7's 62)
|
the lower 64 bits for that, instead of UUIDv7's 62)
|
||||||
* [Snowflake ID](https://en.wikipedia.org/wiki/Snowflake_ID), developed by Twitter in 2010; these
|
* [Snowflake ID](https://en.wikipedia.org/wiki/Snowflake_ID), developed by Twitter in 2010; these
|
||||||
are 63-bit identifiers (so they fit in a signed 64-bit number), where the top 41 bits are a
|
are 63-bit identifiers (so they fit in a signed 64-bit number), where the top 41 bits are a
|
||||||
millisecond timestamp, the next 10 bits are a machine identifier[^twitter machine count], and the last 12 bits are for an
|
millisecond timestamp, the next 10 bits are a machine identifier[^twitter machine count], and the
|
||||||
intra-millisecond sequence counter (what Julid calls a "monotonic counter")
|
last 12 bits are for an intra-millisecond sequence counter (what Julid calls a "monotonic
|
||||||
|
counter"); unlike all the other IDs discussed, there are no random bits
|
||||||
|
|
||||||
and I'm sure the list can go on.
|
and I'm sure the list can go on.
|
||||||
|
|
||||||
As for what I wanted them for, I wanted to use them in my Rust and SQLite-based [web
|
I wanted to use them in my SQLite-backed [web app](https://gitlab.com/nebkor/ww), in order to fix
|
||||||
app](https://gitlab.com/nebkor/ww), in order to fix some deficiencies in ULIDs, as discussed. Now I
|
some deficiencies in ULIDs and the way I was using them, as [I said
|
||||||
have no unshaved yaks to distract me from getting back to that.
|
before](/rnd/one-part-serialized-mystery-part-2/#next-steps-with-ids):
|
||||||
|
|
||||||
So, is this the last I'll time I'll be writing at length about these things? It's hard to say for
|
> [...] it bothers me that ID generation is not done inside the database itself. Aside from being
|
||||||
|
> a generally bad idea, this lead to at least one frustrating debug session where I was inserting
|
||||||
|
> one ID but reporting back another. SQLite doesn't have native support for this, but it does have
|
||||||
|
> good native support for loading shared libraries as plugins in order to add functionality to it,
|
||||||
|
> and so my next step is to write one of those, and remove the ID generation logic from the
|
||||||
|
> application.
|
||||||
|
|
||||||
|
So, is this the last time I'll time I'll be writing at length about these things? It's hard to say for
|
||||||
sure, but signs point to "yes". I hope you've found them at least a little interesting!
|
sure, but signs point to "yes". I hope you've found them at least a little interesting!
|
||||||
|
|
||||||
# Thanks
|
# Thanks
|
||||||
|
@ -267,12 +269,14 @@ hours. Thank you, authors of those crates! Feel free to steal from this project!
|
||||||
----
|
----
|
||||||
|
|
||||||
[^monotonic]: At least, they will still have a total order if they're all generated within the same
|
[^monotonic]: At least, they will still have a total order if they're all generated within the same
|
||||||
process in the same way; the crate and extension use an atomic u64 to ensure that IDs generated
|
process in the same way; the code uses a [64-bit atomic
|
||||||
within the same millisecond have incremented counters, but that atomic counter is not global, so
|
integer](https://gitlab.com/nebkor/julid/-/blob/2484d5156bde82a91dcc106410ed56ee0a5c1e07/src/julid.rs#L11-12)
|
||||||
calling `Julid::new()` in Rust and `select julid_new()` in SQLite will not be aware of each
|
to ensure that IDs generated within the same millisecond have incremented counters, but that
|
||||||
others' counters.
|
atomic counter is not global; calling `Julid::new()` in Rust and `select julid_new()` in SQLite
|
||||||
|
will not be aware of each others' counters. I just make sure to only generate them inside the
|
||||||
|
DB.
|
||||||
|
|
||||||
[^my computer]: According to the output of `lscpu`, my computer is an "AMD Ryzen 9 3900X 12-Core
|
[^my computer]: According to the output of `lscpu`, my computer has an "AMD Ryzen 9 3900X 12-Core
|
||||||
Processor", running between 2.2 and 4.6 GHz. It's no slouch!
|
Processor", running between 2.2 and 4.6 GHz. It's no slouch!
|
||||||
|
|
||||||
[^twitter machine count]: There are only ten bits for the machine ID, which means there are only
|
[^twitter machine count]: There are only ten bits for the machine ID, which means there are only
|
||||||
|
|
Loading…
Reference in a new issue