Compare commits
3 commits
aeb93c29af
...
c0985ee336
Author | SHA1 | Date | |
---|---|---|---|
|
c0985ee336 | ||
|
e518080e12 | ||
|
c3499eadf1 |
|
@ -2,7 +2,7 @@
|
|||
title = "A One-Part Serialized Mystery"
|
||||
slug = "one-part-serialized-mystery"
|
||||
date = "2023-06-29"
|
||||
updated = "2023-07-29"
|
||||
updated = "2025-07-21"
|
||||
[taxonomies]
|
||||
tags = ["software", "rnd", "proclamation", "upscm", "rust", "ulid", "sqlite"]
|
||||
+++
|
||||
|
@ -12,8 +12,8 @@ tags = ["software", "rnd", "proclamation", "upscm", "rust", "ulid", "sqlite"]
|
|||
I recently spent a couple days moving from [one type of universally unique
|
||||
identifier](https://commons.apache.org/sandbox/commons-id/uuid.html) to a [different
|
||||
one](https://github.com/ulid/spec), for an in-progress [database-backed
|
||||
web-app](https://gitlab.com/nebkor/ww). The [initial
|
||||
work](https://gitlab.com/nebkor/ww/-/commit/be96100237da56313a583be6da3dc27a4371e29d#f69082f7433f159d627269b207abdaf2ad52b24c)
|
||||
web-app](https://git.kittencollective.com/nebkor/what2watch). The [initial
|
||||
work](https://git.kittencollective.com/nebkor/what2watch/src/commit/be96100237da56313a583be6da3dc27a4371e29d/src/ids.rs)
|
||||
didn't take very long, but debugging the [serialization and
|
||||
deserialization](https://en.wikipedia.org/wiki/Serialization) of the new IDs took another day and a
|
||||
half, and in the end, the alleged mystery of why it wasn't working was a red herring due to my own
|
||||
|
@ -65,7 +65,7 @@ representation and efficiency!
|
|||
And at first, that's what I did. The [external library](https://docs.rs/sqlx/latest/sqlx/) I'm using
|
||||
to interface with my database automatically writes UUIDs as a sequence of sixteen bytes, if you
|
||||
specified the type in the database[^sqlite-dataclasses] as "[blob](https://www.sqlite.org/datatype3.html)", which [I
|
||||
did](https://gitlab.com/nebkor/ww/-/commit/65a32f1f20df6c572580d796e1044bce807fd3b6#f1043d50a0244c34e4d056fe96659145d03b549b_0_5).
|
||||
did](https://git.kittencollective.com/nebkor/what2watch/src/commit/65a32f1f20df6c572580d796e1044bce807fd3b6/migrations/20230426221940_init.up.sql).
|
||||
|
||||
But then I saw a [blog post](https://shopify.engineering/building-resilient-payment-systems) where
|
||||
the following tidbit was mentioned:
|
||||
|
@ -403,7 +403,7 @@ method in my deserialization code.
|
|||
<div class = "caption">fine, fine, i see the light</div>
|
||||
|
||||
You can see that
|
||||
[here](https://gitlab.com/nebkor/ww/-/blob/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L194-216)
|
||||
[here](https://git.kittencollective.com/nebkor/what2watch/src/commit/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L194-L215)
|
||||
if you'd like, but I'll actually come back to it in a second. The important part was that my logins
|
||||
were working again; time to party!
|
||||
|
||||
|
@ -415,13 +415,13 @@ day, I dove back into it.
|
|||
|
||||
|
||||
All my serialization code was calling a method called
|
||||
[`bytes()`](https://gitlab.com/nebkor/ww/-/blob/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L18),
|
||||
[`bytes()`](https://git.kittencollective.com/nebkor/what2watch/src/commit/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L18),
|
||||
which simply called another method that would return an array of 16 bytes, in big-endian order, so
|
||||
it could go into the database and be sortable, as discussed.
|
||||
|
||||
But all[^actually_not_all] my *deserialization* code was constructing the IDs as [though the bytes
|
||||
were
|
||||
*little*-endian](https://gitlab.com/nebkor/ww/-/blob/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L212). Which
|
||||
*little*-endian](https://git.kittencollective.com/nebkor/what2watch/src/commit/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L212). Which
|
||||
lead me to ask:
|
||||
|
||||
what the fuck?
|
||||
|
@ -437,9 +437,9 @@ then were "backwards" coming out and "had to be" cast using little-endian constr
|
|||
What had actually happened is that as long as there was agreement about what order to use for reconstructing the
|
||||
ID from the bytes, it didn't matter if it was big or little-endian, it just had to be the same on
|
||||
both the
|
||||
[SQLx](https://gitlab.com/nebkor/ww/-/commit/84d70336d39293294fd47b4cf115c70091552c11#ce34dd57be10530addc52a3273548f2b8d3b8a9b_106_105)
|
||||
[SQLx](https://git.kittencollective.com/nebkor/what2watch/src/commit/4211ead59edc008e65aca2ed69e9f87de26e37b2/src/db_id.rs#L101-L107)
|
||||
side and on the
|
||||
[Serde](https://gitlab.com/nebkor/ww/-/commit/84d70336d39293294fd47b4cf115c70091552c11#ce34dd57be10530addc52a3273548f2b8d3b8a9b_210_209)
|
||||
[Serde](https://git.kittencollective.com/nebkor/what2watch/src/commit/4211ead59edc008e65aca2ed69e9f87de26e37b2/src/db_id.rs#L209)
|
||||
side. This is also irrespective of the order they were written out in, but again, the two sides must
|
||||
agree on the convention used. Inside the Serde method, I had added some debug printing of the bytes
|
||||
it was getting, and they were in little-endian order. What I had not realized is that that was
|
||||
|
@ -472,7 +472,7 @@ issues. Collaboration is a great technique for navigating these situations, and
|
|||
focus a bit more on enabling that[^solo-yolo-dev].
|
||||
|
||||
In the course of debugging this issue, I tried to get more insight via
|
||||
[testing](https://gitlab.com/nebkor/ww/-/commit/656e6dceedf0d86e2805e000c9821e931958a920#ce34dd57be10530addc52a3273548f2b8d3b8a9b_143_251),
|
||||
[testing](https://git.kittencollective.com/nebkor/what2watch/src/commit/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L231-L251)
|
||||
and though that helped a little, it was not nearly enough; the problem was that I misunderstood how
|
||||
something worked, not that I had mistakenly implemented something I was comfortable with. Tests
|
||||
aren't a substitute for understanding!
|
||||
|
@ -521,10 +521,10 @@ is no longer an exercise in eye-glaze-control. Maybe this has helped you with th
|
|||
[^actually_not_all]: Upon further review, I discovered that the only methods that were constructing
|
||||
with little-endian order were the SQLx `decode()` method, and the Serde `visit_seq()` method,
|
||||
which were also the only ones that were being called at all. The
|
||||
[`visit_bytes()`](https://gitlab.com/nebkor/ww/-/blob/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L152)
|
||||
[`visit_bytes()`](https://git.kittencollective.com/nebkor/what2watch/src/commit/656e6dceedf0d86e2805e000c9821e931958a920/src/db_id.rs#L152)
|
||||
and `visit_byte_buf()` methods, that I had thought were so important, were correctly treating
|
||||
the bytes as big-endian, but were simply never actually used. I fixed [in the next
|
||||
commit](https://gitlab.com/nebkor/ww/-/commit/84d70336d39293294fd47b4cf115c70091552c11#ce34dd57be10530addc52a3273548f2b8d3b8a9b)
|
||||
commit](https://git.kittencollective.com/nebkor/what2watch/commit/84d70336d39293294fd47b4cf115c70091552c11#diff-ce34dd57be10530addc52a3273548f2b8d3b8a9b)
|
||||
|
||||
[^solo-yolo-dev]: I've described my current practices as "solo-yolo", which has its plusses and
|
||||
minuses, as you may imagine.
|
||||
|
|
BIN
content/rnd/random_points/2d_point_length.png
Normal file
After Width: | Height: | Size: 25 KiB |
BIN
content/rnd/random_points/3d_point_length.png
Normal file
After Width: | Height: | Size: 65 KiB |
BIN
content/rnd/random_points/3d_point_normalized.png
Normal file
After Width: | Height: | Size: 75 KiB |
BIN
content/rnd/random_points/8unit_cube.png
Normal file
After Width: | Height: | Size: 37 KiB |
BIN
content/rnd/random_points/Standard_deviation_diagram.svg.png
Normal file
After Width: | Height: | Size: 23 KiB |
BIN
content/rnd/random_points/cbrt_function.png
Normal file
After Width: | Height: | Size: 11 KiB |
BIN
content/rnd/random_points/chapter7-correctly_scaled_point.png
Normal file
After Width: | Height: | Size: 276 KiB |
BIN
content/rnd/random_points/chapter7-fast_cube_root.png
Normal file
After Width: | Height: | Size: 276 KiB |
BIN
content/rnd/random_points/chapter7-incorrectly_scaled_point.png
Normal file
After Width: | Height: | Size: 257 KiB |
BIN
content/rnd/random_points/chapter7-normed_cube_point.png
Normal file
After Width: | Height: | Size: 260 KiB |
BIN
content/rnd/random_points/chapter7.png
Normal file
After Width: | Height: | Size: 276 KiB |
513
content/rnd/random_points/index.md
Normal file
|
@ -0,0 +1,513 @@
|
|||
+++
|
||||
title = "Right and wrong ways to pick random points inside a sphere"
|
||||
slug = "not-rand-but-rand-y"
|
||||
date = "2018-07-22"
|
||||
updated = "2018-11-29"
|
||||
[taxonomies]
|
||||
tags = [
|
||||
"fun",
|
||||
"rust",
|
||||
"raytracing",
|
||||
"statistics",
|
||||
"profiling",
|
||||
"random",
|
||||
"proclamation"
|
||||
]
|
||||
+++
|
||||
|
||||
## Last things first
|
||||
|
||||
I'm going to cut to the chase: I could not find a way to choose
|
||||
uniformly distributed random points inside the volume of a unit sphere that was
|
||||
faster than picking one in the 8-unit cube, testing to see if it was inside the
|
||||
unit-sphere, and trying again if it wasn't -- but I did come real close. But
|
||||
that's getting ahead of myself. Before we come to the stunning conclusion, there
|
||||
are many rabbit holes and digressions awaiting.
|
||||
|
||||
## First things first
|
||||
|
||||
A couple months ago, I started working my way through a nice little book called
|
||||
*[Ray Tracing in One
|
||||
Weekend](https://in1weekend.blogspot.com/2016/01/ray-tracing-in-one-weekend.html)*,
|
||||
which all the cool kids are doing, it seems. Which makes sense, because making
|
||||
pretty pictures is fun.
|
||||
|
||||
Anyway, around midway through, you get to the point where you're rendering some
|
||||
spheres that are supposed to act as [diffuse
|
||||
reflectors](https://en.wikipedia.org/wiki/Diffuse_reflection), like a subtly
|
||||
fuzzy mirror[^1]. This involves performing a task that is, in slightly technical
|
||||
words, randomly choosing a point from uniformly distributed set of points inside
|
||||
the volume of the unit sphere (the sphere with a radius of 1). The book
|
||||
describes a method of doing so that involves picking three uniformly distributed
|
||||
points inside the 8-unit cube (whose corners go from *(-1, -1, -1)* to *(+1, +1,
|
||||
+1)*),
|
||||
|
||||

|
||||
|
||||
testing to to see if the point so described has a length less than 1.0,
|
||||
returning it if it does, tossing it and trying again if it doesn't. In
|
||||
pseudo-Python code, it looks like this:
|
||||
|
||||
``` python
|
||||
while True:
|
||||
point = Point(random(-1, 1), random(-1, 1), random(-1, 1))
|
||||
if length(point) < 1:
|
||||
return point # victory!
|
||||
else:
|
||||
pass # do nothing, pick a new random point the next roll through
|
||||
# the loop
|
||||
```
|
||||
|
||||
This seems like it might be wasteful, because you don't know how many times
|
||||
you'll have to try generating a random point before you luck out by making a
|
||||
valid one. But, you can come up with a pretty confident guess based on the
|
||||
relative volumes. We know that the 8-unit cube is 8 cubic units in volume, since
|
||||
its sides are 2 units long (from -1 to +1). Also, the name is a dead
|
||||
giveaway. It's obviously larger than the unit sphere, because the latter is
|
||||
contained within the former, shown pinkly below:
|
||||
|
||||

|
||||
The sphere touches the cube at the center of each cubic face.
|
||||
|
||||
In three dimensions, the volume of a sphere is ```4/3
|
||||
π r^3```, or "four-thirds times Pi times the radius cubed". By definition, the
|
||||
unit sphere's radius is one, leaving us with 4/3 π, or about 4.1888. So if you
|
||||
had a four-dimensional dart you were throwing at an 8-unit three-dimensional
|
||||
cubic dartboard with an equal chance of hitting any particular single point inside
|
||||
it, your chance of hitting one inside the unit sphere is (4.1888)/8 or about
|
||||
52%. Meaning that on average, you have to try just a smidge less than twice
|
||||
before you get lucky. That also means that we're basically generating six random
|
||||
numbers each time.
|
||||
|
||||
I wanted to know if it were possible to randomly select a valid point *by
|
||||
construction*, meaning, at the end of the process, I'd know that the point was
|
||||
chosen from the uniformly-distributed set of random points inside the unit
|
||||
sphere and could be used without having to check it. If you've been using it to
|
||||
implement the raytracer from the book and you've done it right, at the end of
|
||||
Chapter 7 you get an image that looks like this:
|
||||
|
||||

|
||||
Technically, this is an image of a small ball sitting on top of a larger ball
|
||||
that's made of the same material. They're both matte gray, and are just
|
||||
reflecting the image of the sky from the background.
|
||||
|
||||
## The first wrong way
|
||||
|
||||
The first obvious but wrong way to choose a random point *(x, y, z)*, where x, y,
|
||||
and z are uniformly distributed random numbers, *normalize* its length to 1 in
|
||||
order to place it on the surface of the unit sphere, then *scale* it by some
|
||||
random factor *r* that is less than 1 (but more than 0). In pseudo-Python code,
|
||||
|
||||
``` python
|
||||
point = Point(random(-1, 1),
|
||||
random(-1, 1),
|
||||
random(-1, 1))
|
||||
|
||||
point = point / length(point) # normalize length to 1
|
||||
|
||||
point = point * random(0, 1) # randomly scale it to the interior of the unit sphere
|
||||
|
||||
return point
|
||||
```
|
||||
|
||||
There are actually two things wrong with this, but first, a brief digression
|
||||
into some math-ish stuff.
|
||||
|
||||
### Length of a point?
|
||||
|
||||
Do you remember how to calculate the length of a right triangle's hypotenuse with Pythagoras'
|
||||
Theorem? You know, *a-squared plus b-squared equals c-squared*?
|
||||
|
||||

|
||||
|
||||
What if we replaced *a* with *x* and interpreted it as a coordinate on the
|
||||
horizontal axis, with an origin at 0? Similarly, replace *b* with *y*, and think
|
||||
of it as a 0-origined coordinate on a vertical axis. If you now have not a
|
||||
triangle but instead a two-dimensional point at the [Cartesian
|
||||
coordinate](https://www.mathsisfun.com/data/cartesian-coordinates.html) *(x, y)*, what is
|
||||
*c*? It's now the *length* of the point.
|
||||
|
||||

|
||||
The length here is the square root of 25 plus 16, or about 6.4.
|
||||
|
||||
That's all well and good, you may say, but what about a point in THREE
|
||||
dimensions?? It turns out the same trick works, you just do
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
So if we have a point *(3, 1, 2)*, its length is the square root of 3^2 + 1^2 +
|
||||
2^2 which is the square root of fourteen, which is about 3.74. What if we wanted
|
||||
to make it so that it was pointing in the same direction from the origin, but
|
||||
the length was 1? That's called *normalizing*, and is done by dividing each
|
||||
coordinate by the length. Below, the point P'[^2] has the same direction as point P,
|
||||
but the length of P' is 1.
|
||||
|
||||

|
||||
|
||||
Now, we could have chosen to multiply each component coordinate by a number
|
||||
other than the reciprocal of the length of P to make P'. No matter what number
|
||||
we chose (as long as it was greater than zero), it would still point in the same
|
||||
direction as P, but its length would be something else. This operation, keeping
|
||||
the direction but changing the length by multiplying each coordinate by some
|
||||
number, is called *scaling*. Normalizing is just scaling by the reciprocal of
|
||||
the length.
|
||||
|
||||
## OK, back to being wrong about constructing random points
|
||||
|
||||
So, the first obvious way is to pick a random, uniformly distributed point
|
||||
inside the 8-unit cube, change its length so that it lies on the surface of the
|
||||
unit sphere, then scale it by some random amount so that it's somewhere in the
|
||||
interior of the sphere. Here's what it looks like when you do that:
|
||||
|
||||

|
||||
|
||||
Hmm.... close... Let's see our reference, correct image again:
|
||||
|
||||

|
||||
|
||||
The reference image has a much softer shadow under the smaller ball, meaning the
|
||||
edge is less distinctly defined, than the image using the first wrong technique
|
||||
for generating random points. The experimental shadow also seems to be smaller
|
||||
than the reference one, but it's hard to tell for sure given the diffuse way the
|
||||
correct shadow dissipates. Still, something is obviously wrong.[^3]
|
||||
|
||||
## The first obviously wrong thing about the first obvious wrong way
|
||||
|
||||
Let's look again at the unit sphere inside the 8-unit cube:
|
||||
|
||||

|
||||
|
||||
Taking a moment to become unblinded by the brilliance of the diagram, we can see
|
||||
that if we're taking a truly random point inside the cube, and if the point
|
||||
happens to not be inside the unit sphere (which, as you know, happens about half
|
||||
the time), it's a lot more likely that the invalid point is in the direction of
|
||||
one of the eight corners, than near the center of the cubic face, because
|
||||
there's a lot more non-sphere space there. So what would it mean, on average,
|
||||
for us to not throw those points away, but instead normalize them so that they
|
||||
have a length of 1 and are on the surface of the unit sphere? You'd have eight
|
||||
"hot spots" on the sphere's surface where points would be more likely to be
|
||||
found, and your surface distribution would not be uniform.
|
||||
|
||||
### You keep saying "uniformly distributed"; what does that mean?
|
||||
|
||||
I've been meaning to take a brief aside to talk about what I mean by "uniformly
|
||||
distributed", or, "random". In everyday speech, people usually take "random" to
|
||||
mean, "each distinct event in a set of related events has an equal probability
|
||||
of happening." Examples are rolls of fair dice or flipping a fair coin; ones and
|
||||
sixes and fives and fours and twos and threes should each come up 1/6 the time,
|
||||
heads and tails should be 50/50. When this happens, you say that the values are
|
||||
*uniformly distributed*, or they have a *uniform distribution*.
|
||||
|
||||
Just because we usually want to talk about random events that happen with a
|
||||
uniform distribution doesn't mean that non-uniform distributions are
|
||||
unimportant. One of the most important and useful distributions is called the
|
||||
[Standard, or Normal,
|
||||
Distribution](https://en.wikipedia.org/wiki/Normal_distribution). Another name
|
||||
is the "Gaussian Distribution", but really, it's Normal when it's a Gaussian
|
||||
distribution whose expected value is 0 with a [standard
|
||||
deviation](https://en.wikipedia.org/wiki/Standard_deviation) of 1. It looks like
|
||||
this:
|
||||
|
||||

|
||||
*taken from the wikipedia page on "standard deviation"*
|
||||
|
||||
A good way of thinking about distributions is, if you pretend that they're like
|
||||
a one-dimensional (numberline) dartboard, and you throw a bunch of darts at
|
||||
them, then the distribution is the chance that the dart lands within the given
|
||||
interval. When it's uniform, the line is a straight horizontal line at some
|
||||
height; if the height were 1, the odds are 100%, and if the line were at 0, the
|
||||
odds are 0%, but the take-away is that each point on the numberline-dartboard
|
||||
would be equally likely to be hit by a dart in a uniform distribution. If your
|
||||
dart throws were normally distributed, you'd be hitting near the middle about
|
||||
40% of the time; hit within the interval -1 to +1 about 68% of the time, and
|
||||
within -3 to +3 like 99% of the time. There's technically no theoretical limit
|
||||
to the magnitude of a normally distributed random value. Your dart could land at
|
||||
like -1,000,000, but the odds are really, really, *really* low.
|
||||
|
||||
### The Normal Distribution to the rescue!
|
||||
|
||||
Early on in my search for methods for choosing a random point inside a sphere,
|
||||
I'd seen a fancy Javascript math font formula on like math.stackexchange.com[^4]
|
||||
that looked something like this,
|
||||
|
||||

|
||||
*note: this formula doesn't make total sense so don't worry too hard about the
|
||||
details; it's meant to reflect the fact of my own confusion about it making it
|
||||
hard to remember it accurately*
|
||||
|
||||
but it was too soon in my search, so I'd not become familiar with the
|
||||
distinction between differently distributed sets of random values, and all I
|
||||
took from it immediately was that they were scaling a nomalized point. Later,
|
||||
after grasping that it was incorrect to just normalize points whose coordinates were
|
||||
uniformly distributed, I read somewhere else[^5] something like, "Three
|
||||
independantly selected Gaussian variables, when normalized, will be uniformly
|
||||
distributed on the unit sphere,"[^6] and it clicked what those ".. = N"s meant: they
|
||||
were random values sampled from the Normal Distribution. I quickly updated my
|
||||
code to something like the following:
|
||||
|
||||
``` python
|
||||
point = Point(random(0, 1, gaussian=True), # first arg is expected val,
|
||||
# second is standard deviation
|
||||
random(0, 1, gaussian=True),
|
||||
random(0, 1, gaussian=True))
|
||||
|
||||
point = point / length(point) # normalize length to 1
|
||||
|
||||
point = point * random(0, 1) # randomly scale it to the interior of the unit sphere
|
||||
|
||||
return point
|
||||
```
|
||||
|
||||
As you can see, it's basically exactly the same as the previous code, except our
|
||||
initial *x*, *y*, and *z* component coordinates are random values with a
|
||||
Gaussian distribution, instead of a uniform distribution. I eagerly re-ran the
|
||||
render, and it produced the following image:
|
||||
|
||||

|
||||
|
||||
Which.. looks... a LOT like the previous one, which was wrong. If anything, the
|
||||
shadow is darker and sharper than the previous wrong one, so it's even MORE
|
||||
wrong now. Which brings us to
|
||||
|
||||
## The second wrong thing about all previous wrong things
|
||||
|
||||
Those of you who are, unlike me, **not** infested with brain worms may have been
|
||||
wondering to yourselves, *What the heck was up with the "1/3" next to the "U" in
|
||||
that awesomely hand-written formula from just a second ago? Shouldn't there be a
|
||||
term in the code that reflects whatever that might mean?* You would be right to
|
||||
wonder that. That means, "take the cube root of a uniform random number".
|
||||
|
||||
Because of the brain worms, though, I did not at first make that
|
||||
connection. However, luckily for me, at the time I was trying to figure all this
|
||||
out, I was hanging out with my friends [Allen](http://www.allenhemberger.com/)
|
||||
and [Mike](https://twitter.com/mfrederickson), two incredibly creative, smart,
|
||||
and curious people, and who had started to be taken in by this problem I was
|
||||
having. Both of them are highly skilled and talented professional visual effects
|
||||
artists, and have good eyes for seeing when images are wrong, so this was right
|
||||
up their alley.
|
||||
|
||||
Mike suggested taking a look directly at the random points being generated, so I
|
||||
changed my program to also print out each randomly generated point in a format
|
||||
that Mike could import into
|
||||
[Houdini](https://www.sidefx.com/products/houdini-fx/) in order to visualize
|
||||
them. When he did, it was super obvious that the points were incorrectly distributed,
|
||||
because they were mostly clustered very close to the center of the sphere,
|
||||
instead of being evenly spread out throughout the volume of it.
|
||||
|
||||
Incredibly, even this wasn't enough for me to connect the dots. I wound up going
|
||||
on a convoluted journey through trying to understand how to generate the right
|
||||
probability distribution to compensate for the fact that as distance from the
|
||||
center increases, the amount of available space drastically increases. Increases
|
||||
as the cube of the radius. If only there were an inverse operation for cubing
|
||||
something...
|
||||
|
||||
I don't know what it finally was, but the whole, "volume goes up as radius
|
||||
cubed" and "U to the 1/3 power" and probably Mike saying, "What about the
|
||||
cube root?" gestalt made me realize: I needed to take the cube root of the
|
||||
uniformly distributed random scaling factor! I changed the code to do that, and
|
||||
handed Mike a new set of data to visualize. He put together this spiffy
|
||||
animation showing the effect of taking the cube root of the random number and
|
||||
then scaling the points with that, instead of just scaling them by a uniform
|
||||
random number (ignore the captions embedded in the image and just notice how the
|
||||
distrubition changes):
|
||||
|
||||

|
||||
|
||||
You can see pretty clearly how the points are clustered near the origin, where
|
||||
there are a lot fewer possible places to be, vs. when using the cube root. Taking the
|
||||
cube root of the uniformly distributed random number between 0 and 1 "pushes"
|
||||
it away from the origin; check out a graph of the cube root of *x* between 0 and
|
||||
1:
|
||||
|
||||

|
||||
|
||||
This shows that if you randomly got, say, 0.2, it would get pushed to near
|
||||
0.6. But if you got something close to 1, it won't go past 1. In other words,
|
||||
this changes the distribution from uniform to a Power Law distribution; thanks
|
||||
to [this
|
||||
commenter](https://lobste.rs/s/bkjqjc/right_wrong_ways_pick_random_points#c_lieynp)
|
||||
on lobste.rs!
|
||||
|
||||
Anyway, the proof, of course, is in the pudding, which is weird, because how did
|
||||
it get there. Anyway, when I ran the render with the updated code, it produced
|
||||
the following pretty picture, which we all agreed looked right:
|
||||
|
||||

|
||||
|
||||
Time to celebrate!
|
||||
|
||||
## Not so fast
|
||||
|
||||
Although this new method of randomly choosing points within the unit sphere with
|
||||
a uniform distribution seemed to be correct, I noticed that my renders were a
|
||||
little slower. By which I meant, about 50% slower, so like 13 seconds instead
|
||||
of 9. This wasn't yet unacceptably slow in any absolute sense, but one of the
|
||||
reasons I wanted to try to do this by construction was to do it less
|
||||
wastefully. In general with programming, that means, "faster". Taking 150% of
|
||||
the time was not less wasteful. But where was the extra work being done?
|
||||
|
||||
I knew that generating a Gaussian random number would be more work than a
|
||||
uniformly distributed one, but didn't think it would be THAT much more. It turns
|
||||
out that it takes almost twice as long to make a Gaussian one than a uniform one for
|
||||
my system[^7]. With the textbook method of generating and testing to see if you should
|
||||
keep it, you'll have to generate almost six uniformly distributed random
|
||||
numbers. With our constructive method, we have to generate three Gaussian
|
||||
values, plus one uniform one, for an effective cost of seven uniform random
|
||||
number generations. Hmm, so, not really a win there, random-number-generation-wise.
|
||||
|
||||
The only other expensive thing we're doing is taking a cube root[^8], and
|
||||
indeed, almost half the time of running the new code is spent performing that
|
||||
operation! I suspected this might be the case; I figured that even the square
|
||||
root would have direct support by the hardware of the CPU, but the cube root
|
||||
would have to be done "in software", which means, "slowly". But I know that
|
||||
there are some tricks you can pull if you're willing to trade accuracy for
|
||||
speed.
|
||||
|
||||
A [quick and dirty implementation of "fast cube
|
||||
root"](http://www.hackersdelight.org/hdcodetxt/acbrt.c.txt) later, and I got the
|
||||
following image:
|
||||
|
||||

|
||||
|
||||
which looked pretty good! And after benchmarking, it was... ***ALMOST*** as fast
|
||||
as the original method from the book. Still more wasteful, and technically less
|
||||
accurate (the name of the page I got the algorithm from is "approximate cube
|
||||
root", and the function is most inaccurate in the region near zero, sometimes
|
||||
wrong by over 300%).
|
||||
|
||||
## Lessons learned
|
||||
|
||||
This really has been a long and twisted voyage of distractions for me. For as
|
||||
long as this post is, it's really only scratching the surface on most of the
|
||||
material, and omitting quite a bit of other stuff completely. Things I've left
|
||||
out completely:
|
||||
|
||||
* going into detail on the fast cube root; I had a bunch of graphs of things
|
||||
like the absolute and percentage error values for it vs. the native "correct"
|
||||
cube root, and will probably attempt to improve my implementation in the
|
||||
future to work better for values close to zero and write that up.
|
||||
|
||||
* Going into detail on how to generate random Gaussian numbers. The most common
|
||||
way that I see referenced is called the [Box-Muller
|
||||
Transform](https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform), and it
|
||||
requires first generating two uniform random values then doing some
|
||||
trigonometry with them, which would be a very expensive operation for a
|
||||
computer to do. But there are [clever ways](https://arxiv.org/abs/1403.6870)
|
||||
to generate them that are almost as fast as generating uniformly distributed
|
||||
ones; [the random number library I'm
|
||||
using](https://docs.rs/rand/0.5.4/rand/distributions/struct.StandardNormal.html)
|
||||
is pretty clever in that way. There's still room for improvement, though!
|
||||
|
||||
* Details on using ```cargo bench``` to benchmark my algorithms, which is
|
||||
intrinsic to [Rust](https://www.rust-lang.org/en-US/), the hip new-ish
|
||||
language from Mozilla, the web browser people. This would be of interest to
|
||||
programming nerds like me.
|
||||
|
||||
* You can generalize spheres to more than three dimensions; for example, in 4D,
|
||||
it's called a [glome](https://en.wikipedia.org/wiki/3-sphere). As the number
|
||||
of dimensions increases, it becomes less and less likely that a randomly
|
||||
selected point from the enclosing cube will be inside the sphere, so the
|
||||
"rejection sampling" method of the text becomes more and more slow compared to
|
||||
the various constructive techniques.
|
||||
|
||||
* Yet MORE ways to do this task, and how I might be able to get a constructive
|
||||
technique to actually be faster, and hot-rodding software in general.
|
||||
|
||||
Some of these will come up in the near future, in more blog posts. Some will
|
||||
just have to be taken up by someone else, or never at all (or already
|
||||
exhaustively covered by smarter people than I, of course). Most of all, though,
|
||||
I've been surprised and delighted by how rich and deep this problem is, and I
|
||||
hope some of that has come through here.
|
||||
|
||||
I also want to give huge thanks to Allen and Mike, who were both extremely
|
||||
supportive of cracking this nut in person and over email. They said figuring
|
||||
this out should lead to the mother of all blog posts, so I hope this isn't a
|
||||
disappointment. No pressure all around!
|
||||
|
||||
## da code
|
||||
|
||||
Not that it's of that much interest to most, but of course, my code is freely
|
||||
available. See https://github.com/nebkor/weekend-raytracer/ for the repository,
|
||||
and take a look at
|
||||
https://github.com/nebkor/weekend-raytracer/blob/random_points_blog_post/src/lib.rs
|
||||
to see the implementations of the different techniques, right and wrong, from
|
||||
this post.
|
||||
|
||||
Eventually, I'll be parallelizing the code, and possibly attempting to use the
|
||||
GPU. But before then, I need to finish the last couple chapters of this book,
|
||||
and maybe chase fewer rabbits until I've done that :)
|
||||
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
[^1]: A more common name for this is that it's a Lambertian surface or
|
||||
material. See https://en.wikipedia.org/wiki/Lambertian_reflectance
|
||||
|
||||
[^2]: The little single-quote mark after *P* in *P'* is pronounced "prime", so
|
||||
the whole thing is pronounced "pee prime". "Prime" is meant to indicate
|
||||
that the thing so annotated is derived directly from some original thing; in
|
||||
our case, *P*. I don't know why the word "prime" is chosen to mean that;
|
||||
something like "secondary" would make more semantic sense, but here we are,
|
||||
stuck in Hell.
|
||||
|
||||
[^3]: This may sound weird, but one of the most fun things about writing
|
||||
software that makes images (or, as I and most others like to say, "pretty
|
||||
pictures") is that when you have a bug, it shows up in a way that you might
|
||||
not realize is wrong, or it produces an unexpected but pleasing image,
|
||||
etc. It's fun to look at the picture, try to reason about what must have
|
||||
happened to make it incorrect, then try to fix and re-render to see if you
|
||||
were right.
|
||||
|
||||
[^4]: If you want to feel like an ignorant moron, try googling for stuff like
|
||||
"random point on sphere" and "probability distribution". I promise you, it
|
||||
made about as much sense to me as to you, assuming it makes very little
|
||||
sense to you on first glance. Look at this shit from
|
||||
http://mathworld.wolfram.com/SpherePointPicking.html
|
||||
|
||||

|
||||
|
||||
I mean come ON with that. I'm trying real hard here to explain everything so
|
||||
that even if you already know it, you'll still be entertained, and if you
|
||||
don't, you won't run away screaming because it's all incomprehensible jargon.
|
||||
|
||||
[^5]: OK, looks like it's the Mathworld.wolfram link from above, because right
|
||||
below that quaternion nonsense there's
|
||||
|
||||

|
||||
|
||||
which is pretty much exactly what I remember reading.
|
||||
|
||||
[^6]: Getting into the reason WHY three Gaussian random values will be uniformly
|
||||
distributed on the surface of the unit sphere when normalized is a rabbit
|
||||
hole I did not go deeply down, but I think it would be another whole thing
|
||||
if I did. The magic phrase is "rotational invariance", if you'd like to look
|
||||
into it.
|
||||
|
||||
[^7]: How do I know how long it takes on my system to produce a Gaussian random
|
||||
value vs. a uniform one? I [wrote a tiny program that generates random
|
||||
values, either Gaussian or
|
||||
uniform](https://github.com/nebkor/misfit_toys/tree/master/gauss_the_answer),
|
||||
and timed how long it took to generate a a few billion values of each
|
||||
kind. On my system, that program will take about five seconds to
|
||||
generate and sum a billion Gaussian values, while doing the same for uniform
|
||||
values takes less than three. Also, it turns out that the "fast" Gaussian
|
||||
value generator technique is, like the fast way to pick a random point in
|
||||
the sphere, to generate a random value, check to see if it's a valid
|
||||
Gaussian one, and guess again. It seems that it takes a couple rounds of
|
||||
that on average no matter what.
|
||||
|
||||
[^8]: The act of normalizing the point requires a square root, but most
|
||||
processors have pretty fast ways of calculating those, and as with the cube
|
||||
root, there are hacks to trade accuracy for speed if desired. But my
|
||||
processor, a 64-bit Intel CPU, has direct support for taking a square root,
|
||||
so any hack to trade accuracy for speed is likely to simply be worse in both
|
||||
respects.
|
BIN
content/rnd/random_points/mathworld_normed_point.png
Normal file
After Width: | Height: | Size: 15 KiB |
BIN
content/rnd/random_points/pythagoras_length.png
Normal file
After Width: | Height: | Size: 18 KiB |
BIN
content/rnd/random_points/right_triangle.png
Normal file
After Width: | Height: | Size: 21 KiB |
BIN
content/rnd/random_points/simple_quaternion.png
Normal file
After Width: | Height: | Size: 29 KiB |
BIN
content/rnd/random_points/sphere_points3.gif
Normal file
After Width: | Height: | Size: 12 MiB |
BIN
content/rnd/random_points/too_soon_normals.png
Normal file
After Width: | Height: | Size: 38 KiB |
BIN
content/rnd/random_points/unit_sphere.png
Normal file
After Width: | Height: | Size: 39 KiB |
|
@ -2,7 +2,7 @@
|
|||
title = "A One-Part Serialized Mystery, Part 2: The Benchmarks"
|
||||
slug = "one-part-serialized-mystery-part-2"
|
||||
date = "2023-07-15"
|
||||
updated = "2023-07-29"
|
||||
updated = "2025-07-21"
|
||||
[taxonomies]
|
||||
tags = ["software", "rnd", "proclamation", "upscm", "rust", "sqlite", "ulid"]
|
||||
+++
|
||||
|
@ -10,7 +10,7 @@ tags = ["software", "rnd", "proclamation", "upscm", "rust", "sqlite", "ulid"]
|
|||
# A one-part serial mystery post-hoc prequel
|
||||
|
||||
I [wrote recently](/rnd/one-part-serialized-mystery) about switching the types of the primary keys in
|
||||
the database for an [in-progress web app](https://gitlab.com/nebkor/ww) I'm building. At that time,
|
||||
the database for an [in-progress web app](https://git.kittencollective.com/nebkor/what2watch) I'm building. At that time,
|
||||
I'd not yet done any benchmarking, but had reason to believe that using [sortable primary
|
||||
keys](https://github.com/ulid/spec) would yield some possibly-significant gains in performance, in
|
||||
both time and space. I'd also read accounts of regret that databases had not used ULIDs (instead of
|
||||
|
@ -47,9 +47,7 @@ My benchmark is pretty simple: starting from an empty database, do the following
|
|||
1. for each user, randomly select around 100 movies from the 10,000 available and put them on their list of
|
||||
things to watch
|
||||
|
||||
Only that last part is significant, and is where I got my [timing
|
||||
information](https://gitlab.com/nebkor/ww/-/blob/897fd993ceaf9c77433d44f8d68009eb466ac3aa/src/bin/import_users.rs#L47-58)
|
||||
from.
|
||||
Only that last part is significant; the first two steps are basically instantaneous.
|
||||
|
||||
The table that keeps track of what users want to watch was defined[^not-final-form] like this:
|
||||
|
||||
|
@ -92,7 +90,7 @@ and the [recommended durability setting](https://www.sqlite.org/pragma.html#prag
|
|||
WAL mode, along with all other production-appropriate settings, I got almost 20,000 *writes* per
|
||||
second[^nothing is that slow]. There were multiple concurrent writers, and each write was a
|
||||
transaction that inserted about 100 rows at a time. I had [retry
|
||||
logic](https://gitlab.com/nebkor/ww/-/blob/4c44aa12b081c777c82192755ac85d1fe0f5bdca/src/bin/import_users.rs#L143-145)
|
||||
logic](https://git.kittencollective.com/nebkor/what2watch/src/commit/4c44aa12b081c777c82192755ac85d1fe0f5bdca/src/bin/import_users.rs#L134-L148)
|
||||
in case a transaction failed due to the DB being locked by another writer, but that never happened:
|
||||
each write was just too fast.
|
||||
|
||||
|
@ -183,7 +181,7 @@ capabilities resulted in better resource use. Every table in my original, UUID-b
|
|||
a `created_at` column, stored as a 64-bit signed offset from the [UNIX
|
||||
epoch](https://en.wikipedia.org/wiki/Unix_time). Because ULIDs encode their creation time, I could
|
||||
remove that column from every table that used ULIDs as their primary key. [Doing
|
||||
so](https://gitlab.com/nebkor/ww/-/commit/5782651aa691125f11a80e241f14c681dda7a7c1) dropped the
|
||||
so](https://git.kittencollective.com/nebkor/what2watch/commit/5782651aa691125f11a80e241f14c681dda7a7c1) dropped the
|
||||
overall DB size by 5-10% compared to UUID-based tables with a `created_at` column. This advantage
|
||||
was unique to ULIDs as opposed to UUIDv4s, and so using the latter with a schema that excludude a
|
||||
"created at" column was giving an unrealistic edge to UUIDs, but for my benchmarks, I was interested
|
||||
|
@ -209,7 +207,7 @@ create table if not exists witch_watch (
|
|||
|
||||
And, it did, a little. I also took a more critical eye to that table as a whole, and realized I
|
||||
could [tidy up the
|
||||
DB](https://gitlab.com/nebkor/ww/-/commit/0e016552ab6c66d5fdd82704b6277bd857c94188?view=parallel#f1043d50a0244c34e4d056fe96659145d03b549b_34_34)
|
||||
DB](https://git.kittencollective.com/nebkor/what2watch/commit/0e016552ab6c66d5fdd82704b6277bd857c94188?view=parallel#diff-f1043d50a0244c34e4d056fe96659145d03b549b)
|
||||
a little more, and remove one more redundant field; this helped a little bit, too.
|
||||
|
||||
But overall, things were still looking like ULIDs had no real inherent advantage over UUIDs in the
|
||||
|
@ -262,7 +260,7 @@ about the primary key for that table. It would also eliminate an entire index (a
|
|||
automatically-generated "primary key to rowid" index), resulting in the ultimate space savings.
|
||||
|
||||
So, [that's what I
|
||||
did](https://gitlab.com/nebkor/ww/-/commit/2c7990ff09106fa2a9ec30974bbc377b44082082):
|
||||
did](https://git.kittencollective.com/nebkor/what2watch/commit/2c7990ff09106fa2a9ec30974bbc377b44082082):
|
||||
|
||||
``` sql
|
||||
-- table of what people want to watch
|
||||
|
@ -320,7 +318,7 @@ was a `created_at` column for it. Still a win, though!
|
|||
Something I realized with the "final" schema is that you could have duplicate rows, since the only
|
||||
unique field was the `rowid`. I didn't want this. So, rather than create a `unique index on
|
||||
watch_quests (user, watch)`, I [just
|
||||
added](https://gitlab.com/nebkor/ww/-/commit/c685dc1a6b08d9ff6bafc72582acb539651a350c) a `primary
|
||||
added](https://git.kittencollective.com/nebkor/what2watch/commit/c685dc1a6b08d9ff6bafc72582acb539651a350c) a `primary
|
||||
key (user, watch)`.
|
||||
|
||||
If that looks familiar, good eye! Doing this brings the disk usage back up to 17MB in the baseline
|
||||
|
|
53
content/sundries/no-corporate-friendly/index.md
Normal file
|
@ -0,0 +1,53 @@
|
|||
+++
|
||||
date = "2017-01-17"
|
||||
updated = "2025-07-21"
|
||||
slug = "say-no-to-corporate-friendly-licenses"
|
||||
title = "Say no to corporate-friendly licenses"
|
||||
[taxonomies]
|
||||
tags = [
|
||||
"lucasfilm",
|
||||
"alembic",
|
||||
"gpl",
|
||||
"philosophy",
|
||||
"anti-capitalism",
|
||||
"old blog",
|
||||
"proclamation"
|
||||
]
|
||||
+++
|
||||
|
||||
*In which I commit to anti-capitalism*
|
||||
|
||||
I want to start this with a little story about working at Industrial Light & Magic. For those not in the know, ILM is the visual effects studio that George Lucas started in order to make his indie space-adventure movie, _Star Wars_, a tale of a man who became absolutely corrupted by his power and was, thankfully, eventually destroyed. When the movies came out, they absolutely captured the imagination of nearly an entire generation, at which point, George Lucas held onto that imagination as a cultural hostage[^1] [^2], and demanded payment from anyone else who foolishly tried to play with the storyblocks Lucas had made. This is all to say that the Empire that Lucas eventually built was very, very much into copyright maximalism.
|
||||
|
||||
That was all way before my time there, of course; I was there for only a few years, from 2008
|
||||
to 2011. But while I was there, I was lucky enough to be one of the main developers of an [Open
|
||||
Source project for efficiently sharing animated geometric data in an application-agnostic way,
|
||||
called Alembic](https://www.alembic.io/). The inside story of how Alembic came to be is a fairly
|
||||
sordid one, but that's for another time; suffice to say it was an unlikely thing that we did, but we
|
||||
pushed on and wound up changing the entire industry (update from 2025: Alembic [won an
|
||||
Oscar](https://www.oscars.org/sci-tech/ceremonies/2024) in 2024).
|
||||
|
||||
Even though ILM had some [prior experiencing with open-sourcing some of their software](http://www.openexr.com/), we faced an up-hill battle internally. Like I said, Lucas' Empire was far more comfortable in the role of cultural vampire, and their legal team had far more experience in taking rather than giving. Their head of counsel, an IP lawyer named Jennifer Siebley, was particularly inimical to the GPL, so much as to have what I would term an allergy to it. For example, we were not allowed to include the source for zlib in our project, because the zlib project bundled some third-party code that was licensed under the GPL that linked against zlib. To the ILM legal team, any code that was distributed along with GPL'd code was tainted, even though zlib itself was not GPL'd. This is not a proper interpretation of the GPL, but as my manager said at the time, "It's her job to say no."[^3]
|
||||
|
||||
*ANYWAY*, at the time, this left a fairly bad taste in my mouth regarding the GPL. It was viral! It made it impossible for businesses to adopt your code! Indeed, our BSD-licensed project crucially relied on corporate network effects; the more studios adopted Alembic-based workflows, the better we and everyone else would be, because it enabled studios to share assets in a way that was far easier than before.[^4] Had it been GPL'd, its studio adoption would have been far more anemic, and it probably would have turned into yet another geometry format that died on the vine.
|
||||
|
||||
Which brings us to the present. I've lately been thinking about all that time and this stuff, and I've come to the following conclusion: fuck all capitalist vampires. We live in a world where [a group of eight people have control of more economic means than the poorest four billion people](https://www.theguardian.com/global-development/2017/jan/16/worlds-eight-richest-people-have-same-wealth-as-poorest-50), and the power disparity between our corporate masters and regular humans is unimaginably vast. There's very little that you or I can do, but we do have one ace up our sleeves: we write software, and software increases our leverage. So don't give that leverage to the leviathans trying to commoditize you.
|
||||
|
||||
Now, I realize that we are living in a material world, and we are a material girl, so I'm not relying solely on philosophical appeals. Consider the case that you wish to profit from the fruits of your keyboard. If you release your software as GPL, you are then free at any time to release new versions that are closed-source, while all your competitors would have to grovel before you in order to procure a commercial license (or you could give your friends licenses for free; the point is, that's your call). This is a pretty compelling commercial case for the GPL, and it's what, eg, Sleepycat (BerkeleyDB) or Trolltech (Qt) did before they were bought by Oracle and Nokia, respectively.
|
||||
|
||||
But for me, though, a big part of my motivation to commit to using the GPL is so that people like Jennifer Siebley will tell their corporate overlords like ILM[^5] to not use my software. I'm a little vindictive like that.
|
||||
|
||||
So here's my plea to you, fellow software developers. When you write software for free, make it awesome, and make it GPL'd. Make it so good that others must choose between using the best shit, or further filling the corporate coffers of those who wish to enslave us.
|
||||
|
||||
[^1]: This is the cultural equivalent of [giving new mothers free infant formula until they stop lactating, and then charging them for it going forward.](http://www.businessinsider.com/nestles-infant-formula-scandal-2012-6?op=1)
|
||||
|
||||
[^2]: In case anyone was confused about who had the right to control how his story was told, try to find a legal copy of the original Star Wars trilogy that's not all crapped up with Old Man Lucas smell.
|
||||
|
||||
[^3]: I pointed out that it's not, in fact, her job to say no; it's her job to use her training and intellect to correctly divine what the law and license allows, but I lost that battle, as you might imagine. It was frustrating to be in the thrall of a lawyer who didn't understand the difference between patenting and copyrighting, but there you go; that was pretty much the Lucasfilm corporate culture in a nutshell.
|
||||
|
||||
[^4]: The business case for sharing the code, of course, was to enable studios to outsource work to cheaper studios in places like Macedonia. Yay capitalism!
|
||||
|
||||
[^5]: Lucasfilm, ILM's parent company, is now owned by Disney, who also own Pixar and Marvel.
|
||||
|
||||
---
|
||||
By the way, I couldn't figure out an elegant way to fit this into the main body of this post, but I hope it's super fucking obvious that George Lucas' wealth is the Dark Side of the Force, and he is Vader. Which is double-tragic, because George purposely made Vader a tragic figure, and who wants to become a living tragedy, especially when you spelled it out so plainly yourself what it means to be evil?
|
|
@ -2,6 +2,7 @@
|
|||
title = "Presenting Julids, another fine sundry from Nebcorp Heavy Industries and Sundries"
|
||||
slug = "presenting-julids"
|
||||
date = "2023-07-31"
|
||||
updated = "2025-07-21"
|
||||
[taxonomies]
|
||||
tags = ["software", "sundry", "proclamation", "sqlite", "rust", "ulid", "julid"]
|
||||
+++
|
||||
|
@ -9,7 +10,7 @@ tags = ["software", "sundry", "proclamation", "sqlite", "rust", "ulid", "julid"]
|
|||
# Presenting Julids
|
||||
Nebcorp Heavy Industries and Sundries, long the world leader in sundries, is proud to announce the
|
||||
public launch of the official identifier type for all Nebcorp companies' assets and database
|
||||
entries, [Julids](https://gitlab.com/nebkor/julid). Julids are globally unique sortable identifiers,
|
||||
entries, [Julids](https://git.kittencollective.com/nebkor/julid-rs). Julids are globally unique sortable identifiers,
|
||||
backwards-compatible with [ULIDs](https://github.com/ulid/spec), *but better*.
|
||||
|
||||
Inside your Rust program, simply add `julid-rs`[^julid-package] to your project's `Cargo.toml` file, and use it
|
||||
|
@ -100,7 +101,7 @@ you already have one.
|
|||
|
||||
The Julid crate can be used in two different ways: as a regular Rust library, declared in your Rust
|
||||
project's `Cargo.toml` file (say, by running `cargo add julid-rs`), and used as shown above. There's
|
||||
a rudimentary [benchmark](https://gitlab.com/nebkor/julid/-/blob/main/examples/benchmark.rs) example
|
||||
a rudimentary [benchmark](https://git.kittencollective.com/nebkor/julid-rs/src/branch/main/benches/simple.rs) example
|
||||
in the repo, which I'll talk more about below. But the primary use case for me was as a loadable
|
||||
SQLite extension, as I [previously
|
||||
wrote](/rnd/one-part-serialized-mystery-part-2/#next-steps-with-ids). Both are covered in the
|
||||
|
@ -125,7 +126,7 @@ The extension, when loaded into SQLite, provides the following functions:
|
|||
|
||||
If you want to use it as a SQLite extension:
|
||||
|
||||
* clone the [repo](https://gitlab.com/nebkor/julid)
|
||||
* clone the [repo](https://git.kittencollective.com/nebkor/julid-rs)
|
||||
* build it with `cargo build --features plugin` (this builds the SQLite extension)
|
||||
* copy the resulting `libjulid.[so|dylib|whatevs]` to some place where you can...
|
||||
* load it into SQLite with `.load /path/to/libjulid` as shown at the top
|
||||
|
@ -159,7 +160,7 @@ create table if not exists watches (
|
|||
```
|
||||
|
||||
and then [some
|
||||
code](https://gitlab.com/nebkor/ww/-/blob/cc14c30fcfbd6cdaecd85d0ba629154d098b4be9/src/import_utils.rs#L92-126)
|
||||
code](https://git.kittencollective.com/nebkor/what2watch/src/commit/72ca947cf6092e7d9719e0780ab37e3f498b99b0/src/import_utils.rs#L15-L27)
|
||||
that inserted rows into that table like
|
||||
|
||||
``` sql
|
||||
|
@ -237,7 +238,8 @@ Like Marge, I just think they're neat! We're not the only ones; here are just so
|
|||
* [UUIDv7](https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-01.html#name-uuidv7-layout-and-bit-order);
|
||||
these are *very* similar to Julids; the primary difference is that the lower 62 bits are left up
|
||||
to the implementation, rather than always containing pseudorandom bits as in Julids (which use
|
||||
the lower 64 bits for that, instead of UUIDv7's 62)
|
||||
the lower 64 bits for that, instead of UUIDv7's 62) -- UPDATE! Julids are now able to
|
||||
[interconvert with UUIDv7s](https://git.kittencollective.com/nebkor/julid-rs/src/commit/e333ea52637c9fe4db60cfec3603c7d60e70ecab/src/uuid.rs).
|
||||
* [Snowflake ID](https://en.wikipedia.org/wiki/Snowflake_ID), developed by Twitter in 2010; these
|
||||
are 63-bit identifiers (so they fit in a signed 64-bit number), where the top 41 bits are a
|
||||
millisecond timestamp, the next 10 bits are a machine identifier[^twitter machine count], and the
|
||||
|
@ -246,7 +248,7 @@ Like Marge, I just think they're neat! We're not the only ones; here are just so
|
|||
|
||||
and I'm sure the list can go on.
|
||||
|
||||
I wanted to use them in my SQLite-backed [web app](https://gitlab.com/nebkor/ww), in order to fix
|
||||
I wanted to use them in my SQLite-backed [web app](https://git.kittencollective.com/nebkor/what2watch), in order to fix
|
||||
some deficiencies in ULIDs and the way I was using them, as [I said
|
||||
before](/rnd/one-part-serialized-mystery-part-2/#next-steps-with-ids):
|
||||
|
||||
|
@ -272,10 +274,9 @@ those crates! Feel free to steal code from me any time!
|
|||
----
|
||||
|
||||
[^julid-package]: The Rust crate *package's*
|
||||
[name](https://gitlab.com/nebkor/julid/-/blob/2484d5156bde82a91dcc106410ed56ee0a5c1e07/Cargo.toml#L2)
|
||||
[name](https://git.kittencollective.com/nebkor/julid-rs/src/commit/e333ea52637c9fe4db60cfec3603c7d60e70ecab/Cargo.toml#L2)
|
||||
is "julid-rs"; that's the name you add to your `Cargo.toml` file, that's how it's listed on
|
||||
[crates.io](https://crates.io/crates/julid-rs), etc. The crate's *library*
|
||||
[name](https://gitlab.com/nebkor/julid/-/blob/2484d5156bde82a91dcc106410ed56ee0a5c1e07/Cargo.toml#L24)
|
||||
[crates.io](https://crates.io/crates/julid-rs), etc. The crate's *library* [name](https://git.kittencollective.com/nebkor/julid-rs/src/commit/e333ea52637c9fe4db60cfec3603c7d60e70ecab/Cargo.toml#L32)
|
||||
is just "julid"; that's how you refer to it in a `use` statement in your Rust program.
|
||||
|
||||
[^httm]: Remember in *Hot Tub Time Machine*, where Rob Cordry's character, "Lew", decides to stay in
|
||||
|
@ -291,7 +292,7 @@ those crates! Feel free to steal code from me any time!
|
|||
|
||||
[^monotonic]: At least, they will still have a total order if they're all generated within the same
|
||||
process in the same way; the code uses a [64-bit atomic
|
||||
integer](https://gitlab.com/nebkor/julid/-/blob/2484d5156bde82a91dcc106410ed56ee0a5c1e07/src/julid.rs#L11-12)
|
||||
integer](https://git.kittencollective.com/nebkor/julid-rs/src/commit/2484d5156bde82a91dcc106410ed56ee0a5c1e07/src/julid.rs#L11-12)
|
||||
to ensure that IDs generated within the same millisecond have incremented counters, but that
|
||||
atomic counter is not global; calling `Julid::new()` in Rust and `select julid_new()` in SQLite
|
||||
would be as though they were generated on different machines. I just make sure to only generate
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
title = "Shit-code and Other Performance Arts"
|
||||
slug = "shit-code-and-performance-art"
|
||||
date = "2023-02-08"
|
||||
updated = "2023-02-09"
|
||||
updated = "2025-07-21"
|
||||
[taxonomies]
|
||||
tags = ["software", "art", "sundry", "proclamation", "chaos"]
|
||||
[extra]
|
||||
|
@ -42,7 +42,7 @@ and using the font used by the alien in *Predator*
|
|||
![get to the choppah][katabastird_predator]
|
||||
|
||||
But by far its greatest feature is an undocumented option, `-A`, that will play an [airhorn
|
||||
salvo](https://gitlab.com/nebkor/katabastird/-/blob/4ccc2e4738df3f9d3af520e2d3875200534f4f6f/resources/airhorn_alarm.mp3)
|
||||
salvo](https://git.kittencollective.com/nebkor/katabastird/src/commit/4ccc2e4738df3f9d3af520e2d3875200534f4f6f/resources/airhorn_alarm.mp3)
|
||||
when it's done. This option is visible in the program's help text, but it's not described.
|
||||
|
||||
Truly honestly, this is not a great program. Once it's launched, it only understands two keyboard
|
||||
|
@ -66,12 +66,12 @@ what's done is done.
|
|||
## *randical, a commandline program for generating random values*
|
||||
|
||||
Some time ago, I was [trying to work out some ways to pick random points in a
|
||||
sphere](https://blog.joeardent.net/2018/07/right-and-wrong-ways-to-pick-random-points-inside-a-sphere/),
|
||||
and during that exploration, I found myself wanting to just be able to generate random values
|
||||
outside of any program in particular. So, I wrapped a primitive interface around [the random value
|
||||
generation library](https://docs.rs/rand/0.8.0/rand/index.html) I was using. I wound up using it
|
||||
selfishly and in a limited fashion for that project, but afterward, decided to expand it a bit and
|
||||
release it, as my first [real Rust crate](https://crates.io/crates/randical).
|
||||
sphere](/rnd/not-rand-but-rand-y), and during that exploration, I found myself wanting to just be
|
||||
able to generate random values outside of any program in particular. So, I wrapped a primitive
|
||||
interface around [the random value generation library](https://docs.rs/rand/0.8.0/rand/index.html) I
|
||||
was using. I wound up using it selfishly and in a limited fashion for that project, but afterward,
|
||||
decided to expand it a bit and release it, as my first [real Rust
|
||||
crate](https://crates.io/crates/randical).
|
||||
|
||||
I'll reproduce the help text here, since it's fairly comprehensive:
|
||||
|
||||
|
@ -103,7 +103,7 @@ OPTIONS:
|
|||
with millisecond precision.
|
||||
```
|
||||
|
||||
The [README](https://github.com/nebkor/randical/blob/main/README.md) contains some examples of using
|
||||
The [README](https://git.kittencollective.com/nebkor/randical#readme) contains some examples of using
|
||||
it to do various things, like simulate a fair coin toss, or an *unfair* coin toss, or "a *Sliding
|
||||
Doors*-style garden of forking paths alternate timeline for Ferris Bueller's presence or absence on
|
||||
that fateful day."
|
||||
|
@ -241,7 +241,7 @@ requirements about semver[^smegver].
|
|||
## goldver
|
||||
|
||||
When I version software for public consumption, I tend to use a scheme I call
|
||||
"[goldver](https://gitlab.com/nebkor/katabastird/-/blob/main/VERSIONING.md)", short for "Golden
|
||||
"[goldver](https://git.kittencollective.com/nebkor/katabastird/src/branch/main/VERSIONING.md)", short for "Golden
|
||||
Versioning". It works like this:
|
||||
|
||||
> When projects are versioned with goldver, the first version is "1". Note that it is not "1.0", or,
|
||||
|
@ -264,7 +264,7 @@ software. It was Windows 95 and then Windows 2000; obviously there was a lot of
|
|||
about arguing about the whether or not this is a "patch release" or a "minor release" or a "major
|
||||
change". There are no downstream dependents who need to make sure they don't accidentally upgrade to
|
||||
the latest release. If someone wants to update it, they know what they're getting into, and they do
|
||||
it in an inherently manual way.
|
||||
it in an inherently manual way.
|
||||
|
||||
## chaos license
|
||||
|
||||
|
@ -275,7 +275,7 @@ license are unclear, refer to the [Fuck Around and Find Out
|
|||
License](https://git.sr.ht/~boringcactus/fafol/tree/master/LICENSE-v0.2.md).
|
||||
|
||||
This is about as
|
||||
[business-hostile](https://blog.joeardent.net/2017/01/say-no-to-corporate-friendly-licenses/) as I
|
||||
[business-hostile](/sundries/say-no-to-corporate-friendly-licenses) as I
|
||||
can imagine, far worse even than the strong copyleft licenses that terrified the lawyers at ILM when
|
||||
I was there. It oozes uncertainty and risk; you'd have to be deranged to seriously engage with
|
||||
it. But if you're just a person? Dive right in, it doesn't really matter!
|
||||
|
|