12 KiB
+++ title = "Presenting Julids: another fine sundry, by Nebcorp Heavy Industries and Sundries" slug = "presenting-julids" date = "2023-07-29" [taxonomies] tags = ["software", "sundry", "proclamation", "sqlite", "rust", "ulid", "julid"] +++
Presenting Julids
Nebcorp Heavy Industries and Sundries, long a world leader in sundries, is proud to present the official globally unique sortable identifier type for all Nebcorp HIAS', and all Nebcorp companies' database entities, Julids. Julids are globally unique sortable identifiers, backwards-compatible with ULIDs.
Inside your Rust program, simply add julid-rs
to your project's Cargo.toml
file, and use it
like:
use julid::Julid;
fn main() {
let id = Julid::new();
dbg!(id.created_at(), id.as_string());
}
Such a program would output something like:
[main.rs:2] id.created_at() = 2023-07-29T20:21:50.009Z
[main.rs:2] id.as_string() = "01H6HN10SS00020YT344XMGA3C"
However, it can also be built as a loadable extension for SQLite, adding database functions for creating and querying Julids:
$ sqlite3
SQLite version 3.40.1 2022-12-28 14:03:47
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .load ./libjulid
sqlite> select hex(julid_new());
018998768ACF000060B31DB175E0C5F9
sqlite> select julid_string(julid_new());
01H6C7D9CT00009TF3EXXJHX4Y
sqlite> select julid_seconds(julid_new());
1690480066.208
sqlite> select datetime(julid_timestamp(julid_new()), 'auto');
2023-07-27 17:47:50
sqlite> select julid_counter(julid_new());
0
Julids vs ULIDs
Julids are a drop-in replacement for ULIDs; all Julids are valid ULIDs, but not all ULIDs are valid Julids.
Given their compatibility relationship, Julids and ULIDs must have quite a bit in common, and indeed they do:
- they are 128-bits long
- they are lexicographically sortable
- they encode their creation time as the number of milliseconds since the UNIX epoch
- their string representation is a 26-character base-32 Crockford encoding of their big-endian bytes
- IDs created within the same millisecond are still meant to sort in their order of creation
Julids and ULIDs have different ways to implement that last piece. If you look at the layout of bits in a ULID, they look like this:
According to the ULID spec, for ULIDs created in the same millisecond, the least-significant bit should be incremented for each new ID. Since that portion of the ULID is random, that means you may not be able to increment it without spilling into the timestamp portion. Likewise, it's easy to guess a new possibly-valid ULID simply by incrementing an already-known one. And finally, this means that sorting will need to read all the way to the end of the ULID for IDs created in the same millisecond.
To address these shortcomings, Julids (Joe's ULIDs) have the following structure:
As with ULIDs, the 48 most-significant bits encode the time of creation. Unlike ULIDs, the next 16 most-significant bits are not random: they're a monotonic counter for IDs created within the same millisecond1. Since it's only 16 bits, it will saturate after 65,536 IDs intra-millisecond creations, after which, IDs in that same millisecond will not have an intrinsic total order (the random bits will still be different, so you shouldn't have collisions). My PC, which is no slouch, can only generate about 20,000 per millisecond, so hopefully this is not an issue! Because the random bits are always fresh, it's not possible to easily guess a valid Julid if you already have a different valid one.
How to use
As mentioned, the Julid crate can be used in two different ways: as a regular Rust library, declared
in your Rust project's Cargo.toml
file (say, by running cargo add julid-rs
), and used as also
shown above. There's a rudimentary
benchmark example in the repo,
which I'll talk more about below. But the primary use case for me was as a loadable SQLite
extension, as I previously
wrote. Both are covered in the
documentation, but let's go over them here, starting with
the extension.
Inside SQLite as a loadable extension
The extension, when loaded into SQLite, provides the following functions:
julid_new()
: create a new Julid and return it as a 16-byte blobjulid_seconds(julid)
: get the number seconds (as a 64-bit float) since the UNIX epoch that this julid was createdjulid_counter(julid)
: show the value of this julid's monotonic counterjulid_sortable(julid)
: return the 64-bit concatenation of the timestamp and counterjulid_string(julid)
: show the base-32 Crockford encoding of this julid; the raw bytes won't be valid UTF-8, so use this or the built-inhex()
function toselect
a human-readable representation
Building and loading
If you want to use it as a SQLite extension:
- clone the repo
- build it with
cargo build --features plugin
(this builds the SQLite extension) - copy the resulting
libjulid.[so|dylib|whatevs]
to some place where you can... - load it into SQLite with
.load /path/to/libjulid
as shown at the top - party
If you, like me, wish to use Julids as primary keys, just create your table like:
create table users (
id blob not null primary key default (julid_new()),
...
);
and you've got a first-class ticket straight to Julid City, baby!
For a table created like:
-- table of things to watch
create table if not exists watches (
id blob not null primary key default (julid_new()),
kind int not null, -- enum for movie or tv show or whatev
title text not null,
metadata_url text, -- possible url for imdb or other metadata-esque site to show the user
length int,
release_date int,
added_by blob not null, -- ID of the user that added it
last_updated int not null default (unixepoch()),
foreign key (added_by) references users (id)
);
and then some code that inserted rows into that table like
insert into watches (kind, title, length, release_date, added_by) values (?,?,?,?,?)
where the wildcards get bound in a loop with unique values and the Julid id
field is
generated by the extension for each row, I get over 100,000 insertions/second.
Inside a Rust program
Of course, you can also use it outside of a database; the Julid
type is publicly exported. There's
a simple benchmark in the examples folder of the repo, the important parts of which look like:
use julid::Julid;
fn main() {
[....]
let start = Instant::now();
for _ in 0..num {
v.push(Julid::new());
}
let end = Instant::now();
let dur = (end - start).as_micros();
for id in v.iter() {
eprintln!(
"{id}: created_at {}; counter: {}; sortable: {}",
id.created_at(),
id.counter(),
id.sortable()
);
}
println!("{num} Julids generated in {dur}us");
If you were to run it on a computer like mine2, you might see something like this:
$ cargo run --example=benchmark --release -- -n 30000 2> /dev/null
30000 Julids generated in 1240us
That's about 24,000 IDs/millisecond; 24 MILLION per second!
The default optional Cargo features include implementations of traits for getting Julids into and
out of SQLite via SQLx, and for generally
serializing/deserializing with Serde, via the sqlx
and serde
features,
respectively. One final default optional feature, chrono
, uses the Chrono crate to return the
timestamp as a DateTime
by adding a
created_at(&self)
method to Julid
.
Something to note: don't enable the plugin
feature in your Cargo.toml if you're using this crate
inside your Rust application, especially if you're also loading it as an extension in SQLite in
your application. You'll get a long and confusing runtime panic due to there being multiple
entrypoints defined with the same name.
Safety
There is one unsafe fn
in this project, sqlite_julid_init()
, and it is only built for the
plugin
feature. The reason for it is that it's interacting with foreign code (SQLite itself) via
the C interface, which is inherently unsafe. If you are not building the plugin, there is no
unsafe
code.
Why Julids?
The astute may note that this is the third time I've written recently about globally unique sortable IDs (here is part one, and part two is here). What's, uh... what's up with that?
Like Marge says, I just think they're neat! I'm not the only one; here are just some alternatives:
- Segment's KSUID, released in 2017. This was possibly my first exposure to this idea. They're 36 bits larger than UUIDs or ULIDs, but otherwise very similar to ULIDs (and hence Julids)
- ULIDs, as previously discussed at length
- UUIDv7; these are very similar to Julids; the primary difference is that the lower 62 bits are left up to the implementation, rather than always containing pseudorandom bits as in Julids (which use the lower 64 bits for that, instead of UUIDv7's 62)
- Snowflake ID, developed by Twitter in 2010; these are 63-bit identifiers (so they fit in a signed 64-bit number), where the top 41 bits are a millisecond timestamp, the next 10 bits are a machine identifier3, and the last 12 bits are for an intra-millisecond sequence counter (what Julid calls a "monotonic counter")
and I'm sure the list can go on.
As for what I wanted them for, I wanted to use them in my Rust and SQLite-based web app, in order to fix some deficiencies in ULIDs, as discussed. Now I have no unshaved yaks to distract me from getting back to that.
So, is this the last I'll time I'll be writing at length about these things? It's hard to say for sure, but signs point to "yes". I hope you've found them at least a little interesting!
Thanks
This crate wouldn't have been possible without a lot of inspiration (and a little shameless stealing) from the ulid-rs crate. For the loadable extension, the sqlite-loadable-rs crate made it extremely easy to write; what I thought would take a couple days instead took a couple hours. Thank you, authors of those crates! Feel free to steal from this project!
-
At least, they will still have a total order if they're all generated within the same process in the same way; the crate and extension use an atomic u64 to ensure that IDs generated within the same millisecond have incremented counters, but that atomic counter is not global, so calling
Julid::new()
in Rust andselect julid_new()
in SQLite will not be aware of each others' counters. ↩︎ -
According to the output of
lscpu
, my computer is an "AMD Ryzen 9 3900X 12-Core Processor", running between 2.2 and 4.6 GHz. It's no slouch! ↩︎ -
There are only ten bits for the machine ID, which means there are only 1,024 possible machine IDs; did twitter only have a thousand machines in production? Maybe only a thousand at a time, so you could use the timestamp to look up what machine any given 10-bit ID referred to? ↩︎