ready to publish

This commit is contained in:
Joe Ardent 2023-07-30 13:11:29 -07:00
parent ca1c3ea2d9
commit 51ff082cbf

View file

@ -1,5 +1,5 @@
+++
title = "Presenting Julids, another fine sundry by Nebcorp Heavy Industries and Sundries"
title = "Presenting Julids, another fine sundry from Nebcorp Heavy Industries and Sundries"
slug = "presenting-julids"
date = "2023-07-31"
[taxonomies]
@ -53,9 +53,12 @@ sqlite> select julid_counter(julid_new());
0
```
Intrigued? Confused? Disgusted? Enraged?? Well, read on!
## Julids vs ULIDs
Julids are a drop-in replacement for ULIDs; all Julids are valid ULIDs, but not all ULIDs are valid Julids.
Julids are a drop-in replacement for ULIDs: all Julids are valid ULIDs, but not all ULIDs are valid
Julids.
Given their compatibility relationship, Julids and ULIDs must have quite a bit in common, and indeed
they do:
@ -63,7 +66,7 @@ they do:
* they are 128-bits long
* they are lexicographically sortable
* they encode their creation time as the number of milliseconds since the [UNIX
epoch](https://en.wikipedia.org/wiki/Unix_time)
epoch](https://en.wikipedia.org/wiki/Unix_time) in their top 48 bits
* their string representation is a 26-character [base-32
Crockford](https://en.wikipedia.org/wiki/Base32) encoding of their big-endian bytes
* IDs created within the same millisecond are still meant to sort in their order of creation
@ -85,8 +88,8 @@ To address these shortcomings, Julids (Joe's ULIDs) have the following structure
![Julid bit structure](./julid.svg)
As with ULIDs, the 48 most-significant bits encode the time of creation. Unlike ULIDs, the next 16
most-significant bits are not random: they're a monotonic counter for IDs created within the same
millisecond[^monotonic]. Since it's only 16 bits, it will saturate after 65,536 IDs
most-significant bits are not random[^counter idea]: they're a monotonic counter for IDs created
within the same millisecond[^monotonic]. Since it's only 16 bits, it will saturate after 65,536 IDs
intra-millisecond creations, after which, IDs in that same millisecond will not have an intrinsic
total order (the random bits will still be different, so you shouldn't have collisions). My PC,
which is no slouch, can only generate about 20,000 per millisecond, so hopefully this is not an
@ -95,12 +98,11 @@ you already have one.
# How to use
As noted, the Julid crate can be used in two different ways: as a regular Rust library, declared
in your Rust project's `Cargo.toml` file (say, by running `cargo add julid-rs`), and used as shown
above. There's a rudimentary
[benchmark](https://gitlab.com/nebkor/julid/-/blob/main/examples/benchmark.rs) example in the repo,
which I'll talk more about below. But the primary use case for me was as a loadable SQLite
extension, as I [previously
The Julid crate can be used in two different ways: as a regular Rust library, declared in your Rust
project's `Cargo.toml` file (say, by running `cargo add julid-rs`), and used as shown above. There's
a rudimentary [benchmark](https://gitlab.com/nebkor/julid/-/blob/main/examples/benchmark.rs) example
in the repo, which I'll talk more about below. But the primary use case for me was as a loadable
SQLite extension, as I [previously
wrote](/rnd/one-part-serialized-mystery-part-2/#next-steps-with-ids). Both are covered in the
[documentation](https://docs.rs/julid-rs/latest/julid/), but let's go over them here, starting with
the extension.
@ -116,8 +118,8 @@ The extension, when loaded into SQLite, provides the following functions:
* `julid_counter(julid)`: show the value of this julid's monotonic counter
* `julid_sortable(julid)`: return the 64-bit concatenation of the timestamp and counter
* `julid_string(julid)`: show the [base-32 Crockford](https://en.wikipedia.org/wiki/Base32)
encoding of this julid; the raw bytes won't be valid UTF-8, so use this or the built-in `hex()`
function to `select` a human-readable representation
encoding of this julid; the raw bytes of Julids won't be valid UTF-8, so use this or the built-in
`hex()` function to `select` a human-readable representation
### Building and loading
@ -148,10 +150,9 @@ create table if not exists watches (
id blob not null primary key default (julid_new()),
kind int not null, -- enum for movie or tv show or whatev
title text not null,
metadata_url text, -- possible url for imdb or other metadata-esque site to show the user
length int,
release_date int,
added_by blob not null, -- ID of the user that added it
added_by blob not null,
last_updated int not null default (unixepoch()),
foreign key (added_by) references users (id)
);
@ -177,7 +178,8 @@ a simple benchmark in the examples folder of the repo, the important parts of wh
use julid::Julid;
fn main() {
[....]
/* snip some stuff */
let start = Instant::now();
for _ in 0..num {
v.push(Julid::new());
@ -213,14 +215,14 @@ timestamp as a [`DateTime`](https://docs.rs/chrono/latest/chrono/struct.DateTime
`created_at(&self)` method to `Julid`.
Something to note: don't enable the `plugin` feature in your Cargo.toml if you're using this crate
inside your Rust application, *especially* if you're also loading it as an extension in SQLite in
inside your Rust application, especially if you're *also* loading it as an extension in SQLite in
your application. You'll get a long and confusing runtime panic due to there being multiple
entrypoints defined with the same name.
# Why Julids?
The astute may have noticed that this is the third time I've written about globally unique
sortable IDs ([here is part one](/rnd/one-part-serialized-mystery), and [part two is
The astute may have noticed that this is the third time I've written about globally unique sortable
IDs ([here is part one](/rnd/one-part-serialized-mystery), and [part two is
here](/rnd/one-part-serialized-mystery-part-2)). What's, uh... what's up with that?
![marge just thinks they're neat][marge ids]
@ -255,17 +257,17 @@ before](/rnd/one-part-serialized-mystery-part-2/#next-steps-with-ids):
> and so my next step is to write one of those, and remove the ID generation logic from the
> application.
Now that I've accomplished all I've set out to, is this the last time I'll time I'll be writing at
length about these things? It's hard to say for sure, but signs point to "yes". I hope you've found
them at least a little interesting!
Now that I've accomplished all that I've set out to do, is this the last time I'll time I'll be
writing at length about these things? It's hard to say for sure, but signs point to "yes". I hope
you've found them at least a little interesting!
# Thanks
This crate wouldn't have been possible without a lot of inspiration (and a little shameless
stealing) from the [ulid-rs](https://github.com/dylanhart/ulid-rs) crate. For the loadable
extension, the [sqlite-loadable-rs](https://github.com/asg017/sqlite-loadable-rs) crate made it
*extremely* easy to write; what I thought would take a couple days instead took a couple
hours. Thank you, authors of those crates! Feel free to steal from this project!
This project wouldn't have happened without a lot of inspiration (and a little shameless stealing)
from the [ulid-rs](https://github.com/dylanhart/ulid-rs) crate. For the loadable extension, the
[sqlite-loadable-rs](https://github.com/asg017/sqlite-loadable-rs) crate made it *extremely* easy to
write; what I thought would take a couple days instead took a couple hours. Thank you, authors of
those crates! Feel free to steal code from me any time!
----
@ -276,6 +278,12 @@ hours. Thank you, authors of those crates! Feel free to steal from this project!
[name](https://gitlab.com/nebkor/julid/-/blob/2484d5156bde82a91dcc106410ed56ee0a5c1e07/Cargo.toml#L24)
is just "julid"; that's how you refer to it in a `use` statement in your Rust program.
[^counter idea]: Sticking the counter bits after the timestamp bits was stolen from
<https://github.com/ahawker/ulid/issues/306#issuecomment-451850395>, though they use only 15 bits
for the counter, due to each character in the string encoding representing five bits, and using
three whole characters for the counter. That gives them one more random bit than Julids, and
lowers the number of available unique intra-millisecond IDs in the same process to 32,678.
[^monotonic]: At least, they will still have a total order if they're all generated within the same
process in the same way; the code uses a [64-bit atomic
integer](https://gitlab.com/nebkor/julid/-/blob/2484d5156bde82a91dcc106410ed56ee0a5c1e07/src/julid.rs#L11-12)