diff --git a/content/rnd/a_serialized_mystery/index.md b/content/rnd/a_serialized_mystery/index.md index 29c35ba..3589301 100644 --- a/content/rnd/a_serialized_mystery/index.md +++ b/content/rnd/a_serialized_mystery/index.md @@ -2,9 +2,9 @@ title = "A One-Part Serialized Mystery" slug = "one-part-serialized-mystery" date = "2023-06-29" -updated = "2023-06-29" +updated = "2023-07-29" [taxonomies] -tags = ["software", "rnd", "proclamation", "upscm", "rust"] +tags = ["software", "rnd", "proclamation", "upscm", "rust", "ulid", "sqlite"] +++ # *Mise en Scene* diff --git a/content/rnd/ulid_benchmarks/index.md b/content/rnd/ulid_benchmarks/index.md index 3edf0b0..d8e9f68 100644 --- a/content/rnd/ulid_benchmarks/index.md +++ b/content/rnd/ulid_benchmarks/index.md @@ -2,9 +2,9 @@ title = "A One-Part Serialized Mystery, Part 2: The Benchmarks" slug = "one-part-serialized-mystery-part-2" date = "2023-07-15" -updated = "2023-07-21" +updated = "2023-07-29" [taxonomies] -tags = ["software", "rnd", "proclamation", "upscm", "rust", "sqlite"] +tags = ["software", "rnd", "proclamation", "upscm", "rust", "sqlite", "ulid"] +++ # A one-part serial mystery post-hoc prequel diff --git a/content/sundries/presenting-julids/index.md b/content/sundries/presenting-julids/index.md new file mode 100644 index 0000000..8aebda1 --- /dev/null +++ b/content/sundries/presenting-julids/index.md @@ -0,0 +1,283 @@ ++++ +title = "Presenting Julids: another fine sundry, by Nebcorp Heavy Industries and Sundries" +slug = "presenting-julids" +date = "2023-07-29" +[taxonomies] +tags = ["software", "sundry", "proclamation", "sqlite", "rust", "ulid", "julid"] ++++ + +# Presenting Julids +Nebcorp Heavy Industries and Sundries, long a world leader in sundries, is proud to present the +official globally unique sortable identifier type for all Nebcorp HIAS', and all Nebcorp companies' +database entities, [Julids](https://gitlab.com/nebkor/julid). Julids are globally unique sortable +identifiers, backwards-compatible with [ULIDs](https://github.com/ulid/spec). + +Inside your Rust program, simply add `julid-rs` to your project's `Cargo.toml` file, and use it +like: + +``` rust +use julid::Julid; + +fn main() { + let id = Julid::new(); + dbg!(id.created_at(), id.as_string()); +} +``` + +Such a program would output something like: + +``` text +[main.rs:2] id.created_at() = 2023-07-29T20:21:50.009Z +[main.rs:2] id.as_string() = "01H6HN10SS00020YT344XMGA3C" +``` + +However, it can also be built as a [loadable extension](https://www.sqlite.org/loadext.html) for +SQLite, adding database functions for creating and querying Julids: + +``` text +$ sqlite3 +SQLite version 3.40.1 2022-12-28 14:03:47 +Enter ".help" for usage hints. +Connected to a transient in-memory database. +Use ".open FILENAME" to reopen on a persistent database. +sqlite> .load ./libjulid +sqlite> select hex(julid_new()); +018998768ACF000060B31DB175E0C5F9 +sqlite> select julid_string(julid_new()); +01H6C7D9CT00009TF3EXXJHX4Y +sqlite> select julid_seconds(julid_new()); +1690480066.208 +sqlite> select datetime(julid_timestamp(julid_new()), 'auto'); +2023-07-27 17:47:50 +sqlite> select julid_counter(julid_new()); +0 +``` + +## Julids vs ULIDs + +Julids are a drop-in replacement for ULIDs; all Julids are valid ULIDs, but not all ULIDs are valid Julids. + +Given their compatibility relationship, Julids and ULIDs must have quite a bit in common, and indeed +they do: + + * they are 128-bits long + * they are lexicographically sortable + * they encode their creation time as the number of milliseconds since the [UNIX + epoch](https://en.wikipedia.org/wiki/Unix_time) + * their string representation is a 26-character [base-32 + Crockford](https://en.wikipedia.org/wiki/Base32) encoding of their big-endian bytes + * IDs created within the same millisecond are still meant to sort in their order of creation + +Julids and ULIDs have different ways to implement that last piece. If you look at the layout of bits +in a ULID, they look like this: + +![ULID bit structure](./ulid.svg) + +According to the ULID spec, for ULIDs created in the same millisecond, the least-significant bit +should be incremented for each new ID. Since that portion of the ULID is random, that means you may +not be able to increment it without spilling into the timestamp portion. Likewise, it's easy to +guess a new possibly-valid ULID simply by incrementing an already-known one. And finally, this means +that sorting will need to read all the way to the end of the ULID for IDs created in the same +millisecond. + +To address these shortcomings, Julids (Joe's ULIDs) have the following structure: + +![Julid bit structure](./julid.svg) + +As with ULIDs, the 48 most-significant bits encode the time of creation. Unlike ULIDs, the next 16 +most-significant bits are not random: they're a monotonic counter for IDs created within the same +millisecond[^monotonic]. Since it's only 16 bits, it will saturate after 65,536 IDs intra-millisecond creations, +after which, IDs in that same millisecond will not have an intrinsic total order (the random bits +will still be different, so you shouldn't have collisions). My PC, which is no slouch, can only +generate about 20,000 per millisecond, so hopefully this is not an issue! Because the random bits +are always fresh, it's not possible to easily guess a valid Julid if you already have a different +valid one. + +# How to use + +As mentioned, the Julid crate can be used in two different ways: as a regular Rust library, declared +in your Rust project's `Cargo.toml` file (say, by running `cargo add julid-rs`), and used as also +shown above. There's a rudimentary +[benchmark](https://gitlab.com/nebkor/julid/-/blob/main/examples/benchmark.rs) example in the repo, +which I'll talk more about below. But the primary use case for me was as a loadable SQLite +extension, as I [previously +wrote](/rnd/one-part-serialized-mystery-part-2/#next-steps-with-ids). Both are covered in the +[documentation](https://docs.rs/julid-rs/latest/julid/), but let's go over them here, starting with +the extension. + +## Inside SQLite as a loadable extension + +The extension, when loaded into SQLite, provides the following functions: + + * `julid_new()`: create a new Julid and return it as a 16-byte + [blob](https://www.sqlite.org/datatype3.html#storage_classes_and_datatypes) + * `julid_seconds(julid)`: get the number seconds (as a 64-bit float) since the UNIX epoch that this + julid was created + * `julid_counter(julid)`: show the value of this julid's monotonic counter + * `julid_sortable(julid)`: return the 64-bit concatenation of the timestamp and counter + * `julid_string(julid)`: show the [base-32 Crockford](https://en.wikipedia.org/wiki/Base32) + encoding of this julid; the raw bytes won't be valid UTF-8, so use this or the built-in `hex()` + function to `select` a human-readable representation + +### Building and loading + +If you want to use it as a SQLite extension: + + * clone the [repo](https://gitlab.com/nebkor/julid) + * build it with `cargo build --features plugin` (this builds the SQLite extension) + * copy the resulting `libjulid.[so|dylib|whatevs]` to some place where you can... + * load it into SQLite with `.load /path/to/libjulid` as shown at the top + * party + +If you, like me, wish to use Julids as primary keys, just create your table like: + +``` sql +create table users ( + id blob not null primary key default (julid_new()), + ... +); +``` + +and you've got a first-class ticket straight to Julid City, baby! + +For a table created like: + +``` sql +-- table of things to watch +create table if not exists watches ( + id blob not null primary key default (julid_new()), + kind int not null, -- enum for movie or tv show or whatev + title text not null, + metadata_url text, -- possible url for imdb or other metadata-esque site to show the user + length int, + release_date int, + added_by blob not null, -- ID of the user that added it + last_updated int not null default (unixepoch()), + foreign key (added_by) references users (id) +); +``` + +and then [some +code](https://gitlab.com/nebkor/ww/-/blob/main/src/import_utils.rs?ref_type=heads#L92-126) that +inserted rows into that table like + +``` sql +insert into watches (kind, title, length, release_date, added_by) values (?,?,?,?,?) +``` + +where the wildcards get bound in a loop with unique values and the Julid `id` field is +generated by the extension for each row, I get over 100,000 insertions/second. + +## Inside a Rust program + +Of course, you can also use it outside of a database; the `Julid` type is publicly exported. There's +a simple benchmark in the examples folder of the repo, the important parts of which look like: + +``` rust +use julid::Julid; + +fn main() { + [....] + let start = Instant::now(); + for _ in 0..num { + v.push(Julid::new()); + } + let end = Instant::now(); + let dur = (end - start).as_micros(); + + for id in v.iter() { + eprintln!( + "{id}: created_at {}; counter: {}; sortable: {}", + id.created_at(), + id.counter(), + id.sortable() + ); + } + println!("{num} Julids generated in {dur}us"); +``` + +If you were to run it on a computer like mine[^my computer], you might see something like this: + +``` text +$ cargo run --example=benchmark --release -- -n 30000 2> /dev/null +30000 Julids generated in 1240us +``` + +That's about 24,000 IDs/millisecond; 24 *MILLION* per second! + +The default optional Cargo features include implementations of traits for getting Julids into and +out of SQLite via [SQLx](https://github.com/launchbadge/sqlx), and for generally +serializing/deserializing with [Serde](https://serde.rs/), via the `sqlx` and `serde` features, +respectively. One final default optional feature, `chrono`, uses the Chrono crate to return the +timestamp as a [`DateTime`](https://docs.rs/chrono/latest/chrono/struct.DateTime.html) by adding a +`created_at(&self)` method to `Julid`. + +Something to note: don't enable the `plugin` feature in your Cargo.toml if you're using this crate +inside your Rust application, *especially* if you're also loading it as an extension in SQLite in +your application. You'll get a long and confusing runtime panic due to there being multiple +entrypoints defined with the same name. + +## Safety +There is one `unsafe fn` in this project, `sqlite_julid_init()`, and it is only built for the +`plugin` feature. The reason for it is that it's interacting with foreign code (SQLite itself) via +the C interface, which is inherently unsafe. If you are not building the plugin, there is no +`unsafe` code. + +# Why Julids? + +The astute may note that this is the third time I've written recently about globally unique sortable +IDs ([here is part one](/rnd/one-part-serialized-mystery), and [part two is +here](/rnd/one-part-serialized-mystery-part-2)). What's, uh... what's up with that? + +![marge just thinks they're neat][marge ids] +
we both just think they're neat
+ +Like Marge says, I just think they're neat! I'm not the only one; here are just some alternatives: + + * Segment's [KSUID](https://segment.com/blog/a-brief-history-of-the-uuid/), released in 2017. This + was possibly my first exposure to this idea. They're 36 bits larger than UUIDs or ULIDs, but + otherwise very similar to ULIDs (and hence Julids) + * [ULIDs](https://github.com/ulid/spec), as previously discussed at length + * [UUIDv7](https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-01.html#name-uuidv7-layout-and-bit-order); + these are *very* similar to Julids; the primary difference is that the lower 62 bits are left up + to the implementation, rather than always containing pseudorandom bits as in Julids (which use + the lower 64 bits for that, instead of UUIDv7's 62) + * [Snowflake ID](https://en.wikipedia.org/wiki/Snowflake_ID), developed by Twitter in 2010; these + are 63-bit identifiers (so they fit in a signed 64-bit number), where the top 41 bits are a + millisecond timestamp, the next 10 bits are a machine identifier[^twitter machine count], and the last 12 bits are for an + intra-millisecond sequence counter (what Julid calls a "monotonic counter") + +and I'm sure the list can go on. + +As for what I wanted them for, I wanted to use them in my Rust and SQLite-based [web +app](https://gitlab.com/nebkor/ww), in order to fix some deficiencies in ULIDs, as discussed. Now I +have no unshaved yaks to distract me from getting back to that. + +So, is this the last I'll time I'll be writing at length about these things? It's hard to say for +sure, but signs point to "yes". I hope you've found them at least a little interesting! + +# Thanks + +This crate wouldn't have been possible without a lot of inspiration (and a little shameless +stealing) from the [ulid-rs](https://github.com/dylanhart/ulid-rs) crate. For the loadable +extension, the [sqlite-loadable-rs](https://github.com/asg017/sqlite-loadable-rs) crate made it +*extremely* easy to write; what I thought would take a couple days instead took a couple +hours. Thank you, authors of those crates! Feel free to steal from this project! + +---- + +[^monotonic]: At least, they will still have a total order if they're all generated within the same + process in the same way; the crate and extension use an atomic u64 to ensure that IDs generated + within the same millisecond have incremented counters, but that atomic counter is not global, so + calling `Julid::new()` in Rust and `select julid_new()` in SQLite will not be aware of each + others' counters. + +[^my computer]: According to the output of `lscpu`, my computer is an "AMD Ryzen 9 3900X 12-Core + Processor", running between 2.2 and 4.6 GHz. It's no slouch! + +[^twitter machine count]: There are only ten bits for the machine ID, which means there are only + 1,024 possible machine IDs; did twitter only have a thousand machines in production? Maybe only + a thousand at a time, so you could use the timestamp to look up what machine any given 10-bit ID + referred to? + +[marge ids]: ./marge_thinks_theyre_neat.png "marge simpson holding a potato labeled 'globally unique sortable identifiers'" diff --git a/content/sundries/presenting-julids/julid.svg b/content/sundries/presenting-julids/julid.svg new file mode 100644 index 0000000..4e83243 --- /dev/null +++ b/content/sundries/presenting-julids/julid.svg @@ -0,0 +1 @@ +647980127monotonic countertimestamp063random diff --git a/content/sundries/presenting-julids/marge_thinks_theyre_neat.png b/content/sundries/presenting-julids/marge_thinks_theyre_neat.png new file mode 100644 index 0000000..74d84d0 Binary files /dev/null and b/content/sundries/presenting-julids/marge_thinks_theyre_neat.png differ diff --git a/content/sundries/presenting-julids/ulid.svg b/content/sundries/presenting-julids/ulid.svg new file mode 100644 index 0000000..0e17806 --- /dev/null +++ b/content/sundries/presenting-julids/ulid.svg @@ -0,0 +1 @@ +647980127randomtimestamp063random