245 lines
9.4 KiB
Markdown
245 lines
9.4 KiB
Markdown
# Bottom line up front
|
|
Julids are globally unique, sortable identifiers, that are backwards-compatible with
|
|
[ULIDs](https://github.com/ulid/spec). This crate provides a Rust Julid datatype, as well as a
|
|
loadable extension for SQLite for creating and querying them:
|
|
|
|
``` text
|
|
$ sqlite3
|
|
SQLite version 3.40.1 2022-12-28 14:03:47
|
|
Enter ".help" for usage hints.
|
|
Connected to a transient in-memory database.
|
|
Use ".open FILENAME" to reopen on a persistent database.
|
|
sqlite> .load ./libjulid
|
|
sqlite> select hex(julid_new());
|
|
018998768ACF000060B31DB175E0C5F9
|
|
sqlite> select julid_string(julid_new());
|
|
01H6C7D9CT00009TF3EXXJHX4Y
|
|
sqlite> select julid_seconds(julid_new());
|
|
1690480066.208
|
|
sqlite> select datetime(julid_timestamp(julid_new()), 'auto');
|
|
2023-07-27 17:47:50
|
|
sqlite> select julid_counter(julid_new());
|
|
0
|
|
sqlite> select julid_string();
|
|
01HM4WJ7T90001P8SN9898FBTN
|
|
```
|
|
|
|
Crates.io: <https://crates.io/crates/julid-rs>
|
|
|
|
Docs.rs: <https://docs.rs/julid-rs/latest/julid/>
|
|
|
|
Blog post: <https://proclamations.nebcorp-hias.com/sundries/presenting-julids/>
|
|
|
|
|
|
## A slightly deeper look
|
|
|
|
Julids are a drop-in replacement for ULIDs: all Julids are valid ULIDs, but not all ULIDs are valid Julids.
|
|
|
|
Given their compatibility relationship, Julids and ULIDs must have quite a bit in common, and indeed
|
|
they do:
|
|
|
|
* they are 128-bits long
|
|
* they are lexicographically sortable
|
|
* they encode their creation time as the number of milliseconds since the [UNIX
|
|
epoch](https://en.wikipedia.org/wiki/Unix_time) in their top 48 bits
|
|
* their string representation is a 26-character [base-32
|
|
Crockford](https://en.wikipedia.org/wiki/Base32) encoding of their big-endian bytes
|
|
* IDs created within the same millisecond are still meant to sort in their order of creation
|
|
|
|
Julids and ULIDs have different ways to implement that last piece. If you look at the layout of bits
|
|
in a ULID, you see:
|
|
|
|
![ULID bit structure](./ulid.svg)
|
|
|
|
According to the ULID spec, for ULIDs created in the same millisecond, the least-significant bit
|
|
should be incremented for each new ID. Since that portion of the ULID is random, that means you may
|
|
not be able to increment it without spilling into the timestamp portion. Likewise, it's easy to
|
|
guess a new possibly-valid ULID simply by incrementing an already-known one. And finally, this means
|
|
that sorting will need to read all the way to the end of the ULID for IDs created in the same
|
|
millisecond.
|
|
|
|
To address these shortcomings, Julids (Joe's ULIDs) have the following structure:
|
|
|
|
![Julid bit structure](./julid.svg)
|
|
|
|
As with ULIDs, the 48 most-significant bits encode the time of creation. Unlike ULIDs, the next 16
|
|
most-significant bits are not random, they're a monotonic counter for IDs created within the same
|
|
millisecond. Since it's only 16 bits, it will saturate after 65,536 IDs intra-millisecond creations,
|
|
after which, IDs in that same millisecond will not have an intrinsic total order (the random bits
|
|
will still be different, so you shouldn't have collisions). My PC, which is no slouch, can only
|
|
generate about 20,000 per millisecond, so hopefully this is not an issue! Because the random bits
|
|
are always fresh, it's not possible to easily guess a valid Julid if you already know one.
|
|
|
|
# How to use
|
|
|
|
The Julid crate can be used in two different ways: as a regular Rust library, declared in your Rust
|
|
project's `Cargo.toml` file (say, by running `cargo add julid-rs`), and used as shown above. There's
|
|
a rudimentary [benchmark](https://gitlab.com/nebkor/julid/-/blob/main/examples/benchmark.rs) example
|
|
in the repo that shows off most of the Rust API. But the primary use case for me was as a loadable
|
|
SQLite extension. Both are covered in the [documentation](https://docs.rs/julid-rs/latest/julid/),
|
|
but let's go over them here, starting with the extension.
|
|
|
|
## Inside SQLite as a loadable extension
|
|
|
|
The extension, when loaded into SQLite, provides the following functions:
|
|
|
|
* `julid_new()`: create a new Julid and return it as a 16-byte
|
|
[blob](https://www.sqlite.org/datatype3.html#storage_classes_and_datatypes)
|
|
* `julid_string()`: create a new Julid and return it as a 26-character [base-32
|
|
Crockford](https://en.wikipedia.org/wiki/Base32)-encoded string
|
|
* `julid_seconds(julid)`: get the number seconds (as a 64-bit float) since the UNIX epoch that this
|
|
julid was created (convenient for passing to the builtin `datetime()` function)
|
|
* `julid_counter(julid)`: show the value of this julid's monotonic counter
|
|
* `julid_sortable(julid)`: return the 64-bit concatenation of the timestamp and counter
|
|
* `julid_string(julid)`: show the [base-32 Crockford](https://en.wikipedia.org/wiki/Base32)
|
|
encoding of this julid; the raw bytes of Julids won't be valid UTF-8, so use this or the built-in
|
|
`hex()` function to `select` a human-readable representation
|
|
|
|
### Building and loading
|
|
|
|
If you want to use it as a SQLite extension:
|
|
|
|
* clone the [repo](https://gitlab.com/nebkor/julid)
|
|
* build it with `cargo build --features plugin` (this builds the SQLite extension)
|
|
* copy the resulting `libjulid.[so|dylib|whatevs]` to some place where you can...
|
|
* load it into SQLite with `.load /path/to/libjulid` as shown at the top
|
|
* party
|
|
|
|
If you, like me, wish to use Julids as primary keys, just create your table like:
|
|
|
|
``` sql
|
|
create table users (
|
|
id blob not null primary key default (julid_new()),
|
|
...
|
|
);
|
|
```
|
|
|
|
and you've got a first-class ticket straight to Julid City, baby!
|
|
|
|
For a table created like:
|
|
|
|
``` sql
|
|
-- table of things to watch
|
|
create table if not exists watches (
|
|
id blob not null primary key default (julid_new()),
|
|
kind int not null, -- enum for movie or tv show or whatev
|
|
title text not null, -- this has a secondary index
|
|
length int,
|
|
release_date int,
|
|
added_by blob not null,
|
|
last_updated int not null default (unixepoch()),
|
|
foreign key (added_by) references users (id)
|
|
);
|
|
```
|
|
|
|
and then [some
|
|
code](https://gitlab.com/nebkor/ww/-/blob/cc14c30fcfbd6cdaecd85d0ba629154d098b4be9/src/import_utils.rs#L92-126)
|
|
that inserted rows into that table like
|
|
|
|
``` sql
|
|
insert into watches (kind, title, length, release_date, added_by) values (?,?,?,?,?)
|
|
```
|
|
|
|
where the wildcards get bound in a loop with unique values and the Julid `id` field is
|
|
generated by the extension for each row, I get over 100,000 insertions/second when using a
|
|
file-backed DB in WAL mode and `NORMAL` durability settings.
|
|
|
|
### Safety
|
|
There is one `unsafe fn` in this project, `sqlite_julid_init()`, and it is only built for the
|
|
`plugin` feature. The reason for it is that it's interacting with foreign code (SQLite itself) via
|
|
the C interface, which is inherently unsafe. If you are not building the plugin, there is no
|
|
`unsafe` code.
|
|
|
|
## Inside a Rust program
|
|
|
|
Of course, you can also use it outside of a database; the `Julid` type is publicly exported. There's
|
|
a simple benchmark in the examples folder of the repo, the important parts of which look like:
|
|
|
|
``` rust
|
|
use julid::Julid;
|
|
|
|
fn main() {
|
|
/* snip some stuff */
|
|
|
|
let start = Instant::now();
|
|
for _ in 0..num {
|
|
v.push(Julid::new());
|
|
}
|
|
let end = Instant::now();
|
|
let dur = (end - start).as_micros();
|
|
|
|
for id in v.iter() {
|
|
eprintln!(
|
|
"{id}: created_at {}; counter: {}; sortable: {}",
|
|
id.created_at(),
|
|
id.counter(),
|
|
id.sortable()
|
|
);
|
|
}
|
|
println!("{num} Julids generated in {dur}us");
|
|
```
|
|
|
|
If you were to run it on a computer like mine (AMD Ryzen 9 3900X, 12-core, 2.2-4.6 GHz), you might
|
|
see something like this:
|
|
|
|
``` text
|
|
$ cargo run --example=benchmark --release -- -n 30000 2> /dev/null
|
|
30000 Julids generated in 1240us
|
|
```
|
|
|
|
That's about 24,000 IDs/millisecond; 24 *MILLION* per second!
|
|
|
|
The default optional Cargo features include implementations of traits for getting Julids into and
|
|
out of SQLite with [SQLx](https://github.com/launchbadge/sqlx), and for generally
|
|
serializing/deserializing with [Serde](https://serde.rs/), via the `sqlx` and `serde` features,
|
|
respectively.
|
|
|
|
Something to note: don't enable the `plugin` feature in your Cargo.toml if you're using this crate
|
|
inside your Rust application, especially if you're *also* loading it as an extension in SQLite in
|
|
your application. You'll get a long and confusing runtime panic due to there being multiple
|
|
entrypoints defined with the same name.
|
|
|
|
## On the command line
|
|
|
|
An even simpler program than the benchmark called `julid-gen` is available to install via cargo:
|
|
|
|
`cargo install julid-rs --no-default-features`
|
|
|
|
And then using it is as simple as,
|
|
|
|
``` text
|
|
$ julid-gen -h
|
|
Generate and print Julids
|
|
|
|
Usage: julid-gen [OPTIONS] [NUM]
|
|
|
|
Arguments:
|
|
[NUM] Number of Julids to generate [default: 1]
|
|
|
|
Options:
|
|
-d, --decode <INPUT> Print the components of the given Julid
|
|
-a, --answer The answer to the meaning of Julid
|
|
-h, --help Print help
|
|
-V, --version Print version
|
|
|
|
$ julid-gen
|
|
01H9DYRVDX0001X0RE5Y7XFGBC
|
|
|
|
$ julid-gen 3
|
|
01H9DYT48E000EK2EH7P67N8GG
|
|
01H9DYT48E000ZBKXVZ91HEZX4
|
|
01H9DYT48E0012VX89PYX4HDKP
|
|
|
|
$ julid-gen -d 01H9DYT48E0012VX89PYX4HDKP
|
|
Created at: 2023-09-03 16:42:57.678 UTC
|
|
Monotonic counter: 2
|
|
Entropy: 3311563785709860470
|
|
```
|
|
|
|
# Thanks
|
|
|
|
This project wouldn't have happened without a lot of inspiration (and a little shameless stealing)
|
|
from the [ulid-rs](https://github.com/dylanhart/ulid-rs) crate. For the loadable extension, the
|
|
[sqlite-loadable-rs](https://github.com/asg017/sqlite-loadable-rs) crate made it *extremely* easy to
|
|
write; what I thought would take a couple days instead took a couple hours. Thank you, authors of
|
|
those crates! Feel free to steal code from me any time!
|