maybe ready to post
This commit is contained in:
parent
62e0ccaaa8
commit
f7aa8840bd
6 changed files with 289 additions and 4 deletions
|
@ -2,9 +2,9 @@
|
||||||
title = "A One-Part Serialized Mystery"
|
title = "A One-Part Serialized Mystery"
|
||||||
slug = "one-part-serialized-mystery"
|
slug = "one-part-serialized-mystery"
|
||||||
date = "2023-06-29"
|
date = "2023-06-29"
|
||||||
updated = "2023-06-29"
|
updated = "2023-07-29"
|
||||||
[taxonomies]
|
[taxonomies]
|
||||||
tags = ["software", "rnd", "proclamation", "upscm", "rust"]
|
tags = ["software", "rnd", "proclamation", "upscm", "rust", "ulid", "sqlite"]
|
||||||
+++
|
+++
|
||||||
|
|
||||||
# *Mise en Scene*
|
# *Mise en Scene*
|
||||||
|
|
|
@ -2,9 +2,9 @@
|
||||||
title = "A One-Part Serialized Mystery, Part 2: The Benchmarks"
|
title = "A One-Part Serialized Mystery, Part 2: The Benchmarks"
|
||||||
slug = "one-part-serialized-mystery-part-2"
|
slug = "one-part-serialized-mystery-part-2"
|
||||||
date = "2023-07-15"
|
date = "2023-07-15"
|
||||||
updated = "2023-07-21"
|
updated = "2023-07-29"
|
||||||
[taxonomies]
|
[taxonomies]
|
||||||
tags = ["software", "rnd", "proclamation", "upscm", "rust", "sqlite"]
|
tags = ["software", "rnd", "proclamation", "upscm", "rust", "sqlite", "ulid"]
|
||||||
+++
|
+++
|
||||||
|
|
||||||
# A one-part serial mystery post-hoc prequel
|
# A one-part serial mystery post-hoc prequel
|
||||||
|
|
283
content/sundries/presenting-julids/index.md
Normal file
283
content/sundries/presenting-julids/index.md
Normal file
|
@ -0,0 +1,283 @@
|
||||||
|
+++
|
||||||
|
title = "Presenting Julids: another fine sundry, by Nebcorp Heavy Industries and Sundries"
|
||||||
|
slug = "presenting-julids"
|
||||||
|
date = "2023-07-29"
|
||||||
|
[taxonomies]
|
||||||
|
tags = ["software", "sundry", "proclamation", "sqlite", "rust", "ulid", "julid"]
|
||||||
|
+++
|
||||||
|
|
||||||
|
# Presenting Julids
|
||||||
|
Nebcorp Heavy Industries and Sundries, long a world leader in sundries, is proud to present the
|
||||||
|
official globally unique sortable identifier type for all Nebcorp HIAS', and all Nebcorp companies'
|
||||||
|
database entities, [Julids](https://gitlab.com/nebkor/julid). Julids are globally unique sortable
|
||||||
|
identifiers, backwards-compatible with [ULIDs](https://github.com/ulid/spec).
|
||||||
|
|
||||||
|
Inside your Rust program, simply add `julid-rs` to your project's `Cargo.toml` file, and use it
|
||||||
|
like:
|
||||||
|
|
||||||
|
``` rust
|
||||||
|
use julid::Julid;
|
||||||
|
|
||||||
|
fn main() {
|
||||||
|
let id = Julid::new();
|
||||||
|
dbg!(id.created_at(), id.as_string());
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Such a program would output something like:
|
||||||
|
|
||||||
|
``` text
|
||||||
|
[main.rs:2] id.created_at() = 2023-07-29T20:21:50.009Z
|
||||||
|
[main.rs:2] id.as_string() = "01H6HN10SS00020YT344XMGA3C"
|
||||||
|
```
|
||||||
|
|
||||||
|
However, it can also be built as a [loadable extension](https://www.sqlite.org/loadext.html) for
|
||||||
|
SQLite, adding database functions for creating and querying Julids:
|
||||||
|
|
||||||
|
``` text
|
||||||
|
$ sqlite3
|
||||||
|
SQLite version 3.40.1 2022-12-28 14:03:47
|
||||||
|
Enter ".help" for usage hints.
|
||||||
|
Connected to a transient in-memory database.
|
||||||
|
Use ".open FILENAME" to reopen on a persistent database.
|
||||||
|
sqlite> .load ./libjulid
|
||||||
|
sqlite> select hex(julid_new());
|
||||||
|
018998768ACF000060B31DB175E0C5F9
|
||||||
|
sqlite> select julid_string(julid_new());
|
||||||
|
01H6C7D9CT00009TF3EXXJHX4Y
|
||||||
|
sqlite> select julid_seconds(julid_new());
|
||||||
|
1690480066.208
|
||||||
|
sqlite> select datetime(julid_timestamp(julid_new()), 'auto');
|
||||||
|
2023-07-27 17:47:50
|
||||||
|
sqlite> select julid_counter(julid_new());
|
||||||
|
0
|
||||||
|
```
|
||||||
|
|
||||||
|
## Julids vs ULIDs
|
||||||
|
|
||||||
|
Julids are a drop-in replacement for ULIDs; all Julids are valid ULIDs, but not all ULIDs are valid Julids.
|
||||||
|
|
||||||
|
Given their compatibility relationship, Julids and ULIDs must have quite a bit in common, and indeed
|
||||||
|
they do:
|
||||||
|
|
||||||
|
* they are 128-bits long
|
||||||
|
* they are lexicographically sortable
|
||||||
|
* they encode their creation time as the number of milliseconds since the [UNIX
|
||||||
|
epoch](https://en.wikipedia.org/wiki/Unix_time)
|
||||||
|
* their string representation is a 26-character [base-32
|
||||||
|
Crockford](https://en.wikipedia.org/wiki/Base32) encoding of their big-endian bytes
|
||||||
|
* IDs created within the same millisecond are still meant to sort in their order of creation
|
||||||
|
|
||||||
|
Julids and ULIDs have different ways to implement that last piece. If you look at the layout of bits
|
||||||
|
in a ULID, they look like this:
|
||||||
|
|
||||||
|
![ULID bit structure](./ulid.svg)
|
||||||
|
|
||||||
|
According to the ULID spec, for ULIDs created in the same millisecond, the least-significant bit
|
||||||
|
should be incremented for each new ID. Since that portion of the ULID is random, that means you may
|
||||||
|
not be able to increment it without spilling into the timestamp portion. Likewise, it's easy to
|
||||||
|
guess a new possibly-valid ULID simply by incrementing an already-known one. And finally, this means
|
||||||
|
that sorting will need to read all the way to the end of the ULID for IDs created in the same
|
||||||
|
millisecond.
|
||||||
|
|
||||||
|
To address these shortcomings, Julids (Joe's ULIDs) have the following structure:
|
||||||
|
|
||||||
|
![Julid bit structure](./julid.svg)
|
||||||
|
|
||||||
|
As with ULIDs, the 48 most-significant bits encode the time of creation. Unlike ULIDs, the next 16
|
||||||
|
most-significant bits are not random: they're a monotonic counter for IDs created within the same
|
||||||
|
millisecond[^monotonic]. Since it's only 16 bits, it will saturate after 65,536 IDs intra-millisecond creations,
|
||||||
|
after which, IDs in that same millisecond will not have an intrinsic total order (the random bits
|
||||||
|
will still be different, so you shouldn't have collisions). My PC, which is no slouch, can only
|
||||||
|
generate about 20,000 per millisecond, so hopefully this is not an issue! Because the random bits
|
||||||
|
are always fresh, it's not possible to easily guess a valid Julid if you already have a different
|
||||||
|
valid one.
|
||||||
|
|
||||||
|
# How to use
|
||||||
|
|
||||||
|
As mentioned, the Julid crate can be used in two different ways: as a regular Rust library, declared
|
||||||
|
in your Rust project's `Cargo.toml` file (say, by running `cargo add julid-rs`), and used as also
|
||||||
|
shown above. There's a rudimentary
|
||||||
|
[benchmark](https://gitlab.com/nebkor/julid/-/blob/main/examples/benchmark.rs) example in the repo,
|
||||||
|
which I'll talk more about below. But the primary use case for me was as a loadable SQLite
|
||||||
|
extension, as I [previously
|
||||||
|
wrote](/rnd/one-part-serialized-mystery-part-2/#next-steps-with-ids). Both are covered in the
|
||||||
|
[documentation](https://docs.rs/julid-rs/latest/julid/), but let's go over them here, starting with
|
||||||
|
the extension.
|
||||||
|
|
||||||
|
## Inside SQLite as a loadable extension
|
||||||
|
|
||||||
|
The extension, when loaded into SQLite, provides the following functions:
|
||||||
|
|
||||||
|
* `julid_new()`: create a new Julid and return it as a 16-byte
|
||||||
|
[blob](https://www.sqlite.org/datatype3.html#storage_classes_and_datatypes)
|
||||||
|
* `julid_seconds(julid)`: get the number seconds (as a 64-bit float) since the UNIX epoch that this
|
||||||
|
julid was created
|
||||||
|
* `julid_counter(julid)`: show the value of this julid's monotonic counter
|
||||||
|
* `julid_sortable(julid)`: return the 64-bit concatenation of the timestamp and counter
|
||||||
|
* `julid_string(julid)`: show the [base-32 Crockford](https://en.wikipedia.org/wiki/Base32)
|
||||||
|
encoding of this julid; the raw bytes won't be valid UTF-8, so use this or the built-in `hex()`
|
||||||
|
function to `select` a human-readable representation
|
||||||
|
|
||||||
|
### Building and loading
|
||||||
|
|
||||||
|
If you want to use it as a SQLite extension:
|
||||||
|
|
||||||
|
* clone the [repo](https://gitlab.com/nebkor/julid)
|
||||||
|
* build it with `cargo build --features plugin` (this builds the SQLite extension)
|
||||||
|
* copy the resulting `libjulid.[so|dylib|whatevs]` to some place where you can...
|
||||||
|
* load it into SQLite with `.load /path/to/libjulid` as shown at the top
|
||||||
|
* party
|
||||||
|
|
||||||
|
If you, like me, wish to use Julids as primary keys, just create your table like:
|
||||||
|
|
||||||
|
``` sql
|
||||||
|
create table users (
|
||||||
|
id blob not null primary key default (julid_new()),
|
||||||
|
...
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
and you've got a first-class ticket straight to Julid City, baby!
|
||||||
|
|
||||||
|
For a table created like:
|
||||||
|
|
||||||
|
``` sql
|
||||||
|
-- table of things to watch
|
||||||
|
create table if not exists watches (
|
||||||
|
id blob not null primary key default (julid_new()),
|
||||||
|
kind int not null, -- enum for movie or tv show or whatev
|
||||||
|
title text not null,
|
||||||
|
metadata_url text, -- possible url for imdb or other metadata-esque site to show the user
|
||||||
|
length int,
|
||||||
|
release_date int,
|
||||||
|
added_by blob not null, -- ID of the user that added it
|
||||||
|
last_updated int not null default (unixepoch()),
|
||||||
|
foreign key (added_by) references users (id)
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
and then [some
|
||||||
|
code](https://gitlab.com/nebkor/ww/-/blob/main/src/import_utils.rs?ref_type=heads#L92-126) that
|
||||||
|
inserted rows into that table like
|
||||||
|
|
||||||
|
``` sql
|
||||||
|
insert into watches (kind, title, length, release_date, added_by) values (?,?,?,?,?)
|
||||||
|
```
|
||||||
|
|
||||||
|
where the wildcards get bound in a loop with unique values and the Julid `id` field is
|
||||||
|
generated by the extension for each row, I get over 100,000 insertions/second.
|
||||||
|
|
||||||
|
## Inside a Rust program
|
||||||
|
|
||||||
|
Of course, you can also use it outside of a database; the `Julid` type is publicly exported. There's
|
||||||
|
a simple benchmark in the examples folder of the repo, the important parts of which look like:
|
||||||
|
|
||||||
|
``` rust
|
||||||
|
use julid::Julid;
|
||||||
|
|
||||||
|
fn main() {
|
||||||
|
[....]
|
||||||
|
let start = Instant::now();
|
||||||
|
for _ in 0..num {
|
||||||
|
v.push(Julid::new());
|
||||||
|
}
|
||||||
|
let end = Instant::now();
|
||||||
|
let dur = (end - start).as_micros();
|
||||||
|
|
||||||
|
for id in v.iter() {
|
||||||
|
eprintln!(
|
||||||
|
"{id}: created_at {}; counter: {}; sortable: {}",
|
||||||
|
id.created_at(),
|
||||||
|
id.counter(),
|
||||||
|
id.sortable()
|
||||||
|
);
|
||||||
|
}
|
||||||
|
println!("{num} Julids generated in {dur}us");
|
||||||
|
```
|
||||||
|
|
||||||
|
If you were to run it on a computer like mine[^my computer], you might see something like this:
|
||||||
|
|
||||||
|
``` text
|
||||||
|
$ cargo run --example=benchmark --release -- -n 30000 2> /dev/null
|
||||||
|
30000 Julids generated in 1240us
|
||||||
|
```
|
||||||
|
|
||||||
|
That's about 24,000 IDs/millisecond; 24 *MILLION* per second!
|
||||||
|
|
||||||
|
The default optional Cargo features include implementations of traits for getting Julids into and
|
||||||
|
out of SQLite via [SQLx](https://github.com/launchbadge/sqlx), and for generally
|
||||||
|
serializing/deserializing with [Serde](https://serde.rs/), via the `sqlx` and `serde` features,
|
||||||
|
respectively. One final default optional feature, `chrono`, uses the Chrono crate to return the
|
||||||
|
timestamp as a [`DateTime`](https://docs.rs/chrono/latest/chrono/struct.DateTime.html) by adding a
|
||||||
|
`created_at(&self)` method to `Julid`.
|
||||||
|
|
||||||
|
Something to note: don't enable the `plugin` feature in your Cargo.toml if you're using this crate
|
||||||
|
inside your Rust application, *especially* if you're also loading it as an extension in SQLite in
|
||||||
|
your application. You'll get a long and confusing runtime panic due to there being multiple
|
||||||
|
entrypoints defined with the same name.
|
||||||
|
|
||||||
|
## Safety
|
||||||
|
There is one `unsafe fn` in this project, `sqlite_julid_init()`, and it is only built for the
|
||||||
|
`plugin` feature. The reason for it is that it's interacting with foreign code (SQLite itself) via
|
||||||
|
the C interface, which is inherently unsafe. If you are not building the plugin, there is no
|
||||||
|
`unsafe` code.
|
||||||
|
|
||||||
|
# Why Julids?
|
||||||
|
|
||||||
|
The astute may note that this is the third time I've written recently about globally unique sortable
|
||||||
|
IDs ([here is part one](/rnd/one-part-serialized-mystery), and [part two is
|
||||||
|
here](/rnd/one-part-serialized-mystery-part-2)). What's, uh... what's up with that?
|
||||||
|
|
||||||
|
![marge just thinks they're neat][marge ids]
|
||||||
|
<div class = "caption">we both just think they're neat</div>
|
||||||
|
|
||||||
|
Like Marge says, I just think they're neat! I'm not the only one; here are just some alternatives:
|
||||||
|
|
||||||
|
* Segment's [KSUID](https://segment.com/blog/a-brief-history-of-the-uuid/), released in 2017. This
|
||||||
|
was possibly my first exposure to this idea. They're 36 bits larger than UUIDs or ULIDs, but
|
||||||
|
otherwise very similar to ULIDs (and hence Julids)
|
||||||
|
* [ULIDs](https://github.com/ulid/spec), as previously discussed at length
|
||||||
|
* [UUIDv7](https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-01.html#name-uuidv7-layout-and-bit-order);
|
||||||
|
these are *very* similar to Julids; the primary difference is that the lower 62 bits are left up
|
||||||
|
to the implementation, rather than always containing pseudorandom bits as in Julids (which use
|
||||||
|
the lower 64 bits for that, instead of UUIDv7's 62)
|
||||||
|
* [Snowflake ID](https://en.wikipedia.org/wiki/Snowflake_ID), developed by Twitter in 2010; these
|
||||||
|
are 63-bit identifiers (so they fit in a signed 64-bit number), where the top 41 bits are a
|
||||||
|
millisecond timestamp, the next 10 bits are a machine identifier[^twitter machine count], and the last 12 bits are for an
|
||||||
|
intra-millisecond sequence counter (what Julid calls a "monotonic counter")
|
||||||
|
|
||||||
|
and I'm sure the list can go on.
|
||||||
|
|
||||||
|
As for what I wanted them for, I wanted to use them in my Rust and SQLite-based [web
|
||||||
|
app](https://gitlab.com/nebkor/ww), in order to fix some deficiencies in ULIDs, as discussed. Now I
|
||||||
|
have no unshaved yaks to distract me from getting back to that.
|
||||||
|
|
||||||
|
So, is this the last I'll time I'll be writing at length about these things? It's hard to say for
|
||||||
|
sure, but signs point to "yes". I hope you've found them at least a little interesting!
|
||||||
|
|
||||||
|
# Thanks
|
||||||
|
|
||||||
|
This crate wouldn't have been possible without a lot of inspiration (and a little shameless
|
||||||
|
stealing) from the [ulid-rs](https://github.com/dylanhart/ulid-rs) crate. For the loadable
|
||||||
|
extension, the [sqlite-loadable-rs](https://github.com/asg017/sqlite-loadable-rs) crate made it
|
||||||
|
*extremely* easy to write; what I thought would take a couple days instead took a couple
|
||||||
|
hours. Thank you, authors of those crates! Feel free to steal from this project!
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
[^monotonic]: At least, they will still have a total order if they're all generated within the same
|
||||||
|
process in the same way; the crate and extension use an atomic u64 to ensure that IDs generated
|
||||||
|
within the same millisecond have incremented counters, but that atomic counter is not global, so
|
||||||
|
calling `Julid::new()` in Rust and `select julid_new()` in SQLite will not be aware of each
|
||||||
|
others' counters.
|
||||||
|
|
||||||
|
[^my computer]: According to the output of `lscpu`, my computer is an "AMD Ryzen 9 3900X 12-Core
|
||||||
|
Processor", running between 2.2 and 4.6 GHz. It's no slouch!
|
||||||
|
|
||||||
|
[^twitter machine count]: There are only ten bits for the machine ID, which means there are only
|
||||||
|
1,024 possible machine IDs; did twitter only have a thousand machines in production? Maybe only
|
||||||
|
a thousand at a time, so you could use the timestamp to look up what machine any given 10-bit ID
|
||||||
|
referred to?
|
||||||
|
|
||||||
|
[marge ids]: ./marge_thinks_theyre_neat.png "marge simpson holding a potato labeled 'globally unique sortable identifiers'"
|
1
content/sundries/presenting-julids/julid.svg
Normal file
1
content/sundries/presenting-julids/julid.svg
Normal file
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 15 KiB |
BIN
content/sundries/presenting-julids/marge_thinks_theyre_neat.png
Normal file
BIN
content/sundries/presenting-julids/marge_thinks_theyre_neat.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 264 KiB |
1
content/sundries/presenting-julids/ulid.svg
Normal file
1
content/sundries/presenting-julids/ulid.svg
Normal file
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 15 KiB |
Loading…
Reference in a new issue