julid-rs/README.md

245 lines
9.4 KiB
Markdown

# Bottom line up front
Julids are globally unique, sortable identifiers, that are backwards-compatible with
[ULIDs](https://github.com/ulid/spec). This crate provides a Rust Julid datatype, as well as a
loadable extension for SQLite for creating and querying them:
``` text
$ sqlite3
SQLite version 3.40.1 2022-12-28 14:03:47
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .load ./libjulid
sqlite> select hex(julid_new());
018998768ACF000060B31DB175E0C5F9
sqlite> select julid_string(julid_new());
01H6C7D9CT00009TF3EXXJHX4Y
sqlite> select julid_seconds(julid_new());
1690480066.208
sqlite> select datetime(julid_timestamp(julid_new()), 'auto');
2023-07-27 17:47:50
sqlite> select julid_counter(julid_new());
0
sqlite> select julid_string();
01HM4WJ7T90001P8SN9898FBTN
```
Crates.io: <https://crates.io/crates/julid-rs>
Docs.rs: <https://docs.rs/julid-rs/latest/julid/>
Blog post: <https://proclamations.nebcorp-hias.com/sundries/presenting-julids/>
## A slightly deeper look
Julids are a drop-in replacement for ULIDs: all Julids are valid ULIDs, but not all ULIDs are valid Julids.
Given their compatibility relationship, Julids and ULIDs must have quite a bit in common, and indeed
they do:
* they are 128-bits long
* they are lexicographically sortable
* they encode their creation time as the number of milliseconds since the [UNIX
epoch](https://en.wikipedia.org/wiki/Unix_time) in their top 48 bits
* their string representation is a 26-character [base-32
Crockford](https://en.wikipedia.org/wiki/Base32) encoding of their big-endian bytes
* IDs created within the same millisecond are still meant to sort in their order of creation
Julids and ULIDs have different ways to implement that last piece. If you look at the layout of bits
in a ULID, you see:
![ULID bit structure](./ulid.svg)
According to the ULID spec, for ULIDs created in the same millisecond, the least-significant bit
should be incremented for each new ID. Since that portion of the ULID is random, that means you may
not be able to increment it without spilling into the timestamp portion. Likewise, it's easy to
guess a new possibly-valid ULID simply by incrementing an already-known one. And finally, this means
that sorting will need to read all the way to the end of the ULID for IDs created in the same
millisecond.
To address these shortcomings, Julids (Joe's ULIDs) have the following structure:
![Julid bit structure](./julid.svg)
As with ULIDs, the 48 most-significant bits encode the time of creation. Unlike ULIDs, the next 16
most-significant bits are not random, they're a monotonic counter for IDs created within the same
millisecond. Since it's only 16 bits, it will saturate after 65,536 IDs intra-millisecond creations,
after which, IDs in that same millisecond will not have an intrinsic total order (the random bits
will still be different, so you shouldn't have collisions). My PC, which is no slouch, can only
generate about 20,000 per millisecond, so hopefully this is not an issue! Because the random bits
are always fresh, it's not possible to easily guess a valid Julid if you already know one.
# How to use
The Julid crate can be used in two different ways: as a regular Rust library, declared in your Rust
project's `Cargo.toml` file (say, by running `cargo add julid-rs`), and used as shown above. There's
a rudimentary [benchmark](https://gitlab.com/nebkor/julid/-/blob/main/examples/benchmark.rs) example
in the repo that shows off most of the Rust API. But the primary use case for me was as a loadable
SQLite extension. Both are covered in the [documentation](https://docs.rs/julid-rs/latest/julid/),
but let's go over them here, starting with the extension.
## Inside SQLite as a loadable extension
The extension, when loaded into SQLite, provides the following functions:
* `julid_new()`: create a new Julid and return it as a 16-byte
[blob](https://www.sqlite.org/datatype3.html#storage_classes_and_datatypes)
* `julid_string()`: create a new Julid and return it as a 26-character [base-32
Crockford](https://en.wikipedia.org/wiki/Base32)-encoded string
* `julid_seconds(julid)`: get the number seconds (as a 64-bit float) since the UNIX epoch that this
julid was created (convenient for passing to the builtin `datetime()` function)
* `julid_counter(julid)`: show the value of this julid's monotonic counter
* `julid_sortable(julid)`: return the 64-bit concatenation of the timestamp and counter
* `julid_string(julid)`: show the [base-32 Crockford](https://en.wikipedia.org/wiki/Base32)
encoding of this julid; the raw bytes of Julids won't be valid UTF-8, so use this or the built-in
`hex()` function to `select` a human-readable representation
### Building and loading
If you want to use it as a SQLite extension:
* clone the [repo](https://gitlab.com/nebkor/julid)
* build it with `cargo build --features plugin` (this builds the SQLite extension)
* copy the resulting `libjulid.[so|dylib|whatevs]` to some place where you can...
* load it into SQLite with `.load /path/to/libjulid` as shown at the top
* party
If you, like me, wish to use Julids as primary keys, just create your table like:
``` sql
create table users (
id blob not null primary key default (julid_new()),
...
);
```
and you've got a first-class ticket straight to Julid City, baby!
For a table created like:
``` sql
-- table of things to watch
create table if not exists watches (
id blob not null primary key default (julid_new()),
kind int not null, -- enum for movie or tv show or whatev
title text not null, -- this has a secondary index
length int,
release_date int,
added_by blob not null,
last_updated int not null default (unixepoch()),
foreign key (added_by) references users (id)
);
```
and then [some
code](https://gitlab.com/nebkor/ww/-/blob/cc14c30fcfbd6cdaecd85d0ba629154d098b4be9/src/import_utils.rs#L92-126)
that inserted rows into that table like
``` sql
insert into watches (kind, title, length, release_date, added_by) values (?,?,?,?,?)
```
where the wildcards get bound in a loop with unique values and the Julid `id` field is
generated by the extension for each row, I get over 100,000 insertions/second when using a
file-backed DB in WAL mode and `NORMAL` durability settings.
### Safety
There is one `unsafe fn` in this project, `sqlite_julid_init()`, and it is only built for the
`plugin` feature. The reason for it is that it's interacting with foreign code (SQLite itself) via
the C interface, which is inherently unsafe. If you are not building the plugin, there is no
`unsafe` code.
## Inside a Rust program
Of course, you can also use it outside of a database; the `Julid` type is publicly exported. There's
a simple benchmark in the examples folder of the repo, the important parts of which look like:
``` rust
use julid::Julid;
fn main() {
/* snip some stuff */
let start = Instant::now();
for _ in 0..num {
v.push(Julid::new());
}
let end = Instant::now();
let dur = (end - start).as_micros();
for id in v.iter() {
eprintln!(
"{id}: created_at {}; counter: {}; sortable: {}",
id.created_at(),
id.counter(),
id.sortable()
);
}
println!("{num} Julids generated in {dur}us");
```
If you were to run it on a computer like mine (AMD Ryzen 9 3900X, 12-core, 2.2-4.6 GHz), you might
see something like this:
``` text
$ cargo run --example=benchmark --release -- -n 30000 2> /dev/null
30000 Julids generated in 1240us
```
That's about 24,000 IDs/millisecond; 24 *MILLION* per second!
The default optional Cargo features include implementations of traits for getting Julids into and
out of SQLite with [SQLx](https://github.com/launchbadge/sqlx), and for generally
serializing/deserializing with [Serde](https://serde.rs/), via the `sqlx` and `serde` features,
respectively.
Something to note: don't enable the `plugin` feature in your Cargo.toml if you're using this crate
inside your Rust application, especially if you're *also* loading it as an extension in SQLite in
your application. You'll get a long and confusing runtime panic due to there being multiple
entrypoints defined with the same name.
## On the command line
An even simpler program than the benchmark called `julid-gen` is available to install via cargo:
`cargo install julid-rs --no-default-features`
And then using it is as simple as,
``` text
$ julid-gen -h
Generate and print Julids
Usage: julid-gen [OPTIONS] [NUM]
Arguments:
[NUM] Number of Julids to generate [default: 1]
Options:
-d, --decode <INPUT> Print the components of the given Julid
-a, --answer The answer to the meaning of Julid
-h, --help Print help
-V, --version Print version
$ julid-gen
01H9DYRVDX0001X0RE5Y7XFGBC
$ julid-gen 3
01H9DYT48E000EK2EH7P67N8GG
01H9DYT48E000ZBKXVZ91HEZX4
01H9DYT48E0012VX89PYX4HDKP
$ julid-gen -d 01H9DYT48E0012VX89PYX4HDKP
Created at: 2023-09-03 16:42:57.678 UTC
Monotonic counter: 2
Entropy: 3311563785709860470
```
# Thanks
This project wouldn't have happened without a lot of inspiration (and a little shameless stealing)
from the [ulid-rs](https://github.com/dylanhart/ulid-rs) crate. For the loadable extension, the
[sqlite-loadable-rs](https://github.com/asg017/sqlite-loadable-rs) crate made it *extremely* easy to
write; what I thought would take a couple days instead took a couple hours. Thank you, authors of
those crates! Feel free to steal code from me any time!