julid-rs/README.md
2023-07-25 17:55:06 -07:00

3.9 KiB

Globally unique sortable identifiers for SQLite!

quick take

$ sqlite3
SQLite version 3.40.1 2022-12-28 14:03:47
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .load ./libjulid
sqlite> select hex(julid_new());
01898F1332B90000D6F0F0FE1066A6BF
sqlite> select julid_string(julid_new());
01H67GV14N000BBHJD6FARVB7Q
sqlite> select datetime(julid_timestamp(julid_new()) / 1000, 'auto'); -- sqlite wants seconds, not milliseconds
2023-07-25 21:58:56
sqlite> select julid_counter(julid_new());
0

a little more in depth

Julids are ULID-backwards-compatible (that is, all Julids are valid ULIDs, but not all ULIDs are Julids) identifiers with the following properties:

  • they are 128-bits long
  • they are lexicographically sortable
  • they encode their creation time as the number of milliseconds since the UNIX epoch
  • IDs created within the same millisecond will still sort in their order of creation, due to the presence of a 16-bit monotonic counter, placed immediately after the creation time bits

It's that last bit that makes them distinctive. ULIDs have the following big-endian bit structure:

ULID bit structure

According to the ULID spec, for ULIDs created in the same millisecond, the least-significant bit should be incremented for each new one. Since that portion of the ULID is random, that means you may not be able to increment it without spilling into the timestamp portion. Likewise, it's easy to guess a new possibly-valid ULID simply by incrementing an already-known one. And finally, this means that sorting will need to read all the way to the end of the ULID for IDs created in the same millisecond.

To address these shortcomings, Julids (Joe's ULIDs) have the following big-endian bit structure:

Julid bit structure

As with ULIDs, the 48 most-significant bits encode the time of creation. Unlike ULIDs, the next 16 bits are not random, they're a monotonic counter for IDs created within the same millisecond. Since it's only 16 bits, it will saturate after 65,536 IDs intra-millisecond creations, after which, IDs in that same millisecond will not have an intrinsic total order (the random bits will still be different, so you shouldn't have collisions). My PC, which is no slouch, can only generate about 20,000 per millisecond, so hopefully this is not an issue! Because the random bits are always fresh, it is not possible to guess a valid Julid if you already know one.

functions overview

This extension provides the following functions:

  • julid_new(): create a new Julid and return it as a blob
  • julid_timestamp(julid): get the number milliseconds since the UNIX epoch that this julid was created
  • julid_counter(julid): show the value of this julid's monotonic counter
  • julid_sortable(julid): return the 64-bit concatenation of the timestamp and counter
  • julid_string(julid): show the base-32 Crockford encoding of this julid

how to use

  • clone the repo
  • build it with cargo build
  • copy the resulting libjulid.[so|dylib|whatevs] to some place where you can...
  • load it into SQLite with .load /path/to/libjulid as shown at the top
  • party

If you, like me, wish to use Julids as primary keys, just create your table like:

create table users (
  id blob not null primary key default julid_new(),
  ...
);

and you've got a first-class ticket straight to Julid City, baby!

using it as a library in a Rust application

Of course, you can also use it outside of a database; the Julid type is publicly exported, and you can do like such as:

use julid::Julid;

fn main() {
  let id = Julid::new();
  dbg!(id.timestamp(), id.counter(), id.sortable(), id.as_string());
}

after adding it to your project's dependencies, like cargo add julid-rs.