64bit File node IDs

2020-10-19

This is a very technical update on the Boomla Platform, aimed at curious developers.

EDIT: originally, I went overboard and rolled out 128bit File node IDs. As it may have other performance implications and may in fact not be required, I decided to do a slow ramp-up instead and only use 64bit IDs for now. If that turns out to be not enough, we can still increase it later.

Old File node IDs

Until today, Boomla has been using 32bit incremental File node IDs, for example i102FD. This is just a special formatting of the number 66301. It starts with an i to avoid ambiguity, followed by the number expressed in hexadecimal form because it is somewhat shorter.

I intentionally said File node ID not File ID, because the two are different. File node IDs identify file nodes within volumes while File IDs identify files in filesystems. A filesystem may contain nested volumes so a File ID may contain one or more nested File node IDs.

An example (old) file ID could be f102FD.158A9. It starts with an f, followed by file node IDs in hex format, separated by dots.

Merging branches

This worked super well but Boomla has evolved to a point where it became necessary to merge branches. The problem is, with incremental file node IDs, merging is almost guaranteed to fail.

The reason is that File IDs are super important in Boomla, so they must be tracked throughout the version history. (For example, they are used for automatically redirecting visitors after a page was renamed.) But with incremental file node IDs, one will use the same IDs on every branch.

Let's look at an example. Say you have the website example.com with the biggest file node ID of i20000. You want to work on a new feature so you create the branch beta.example.com. You keep working on both branches, you create new files on each. The problem is, the first new file will get the ID i20001 on both branches! Because of this, merging will fail.

We need to guarantee that file nodes will get unique IDs on each branch, thereby avoiding merge conflicts.

Random File node IDs

To avoid this, we are going to use random file node IDs. Unfortunately, at this point using 32bit file node IDs will become too tight. At 32bit, the we can allocate ~4 billion IDs. On the other hand, it is said that one can only use the square root of it without a high probability of conflicts, which is only 65536 files. That's clearly a bit too tight, even when considering this applies to a single volume only, so an entire website having a tree of volumes could have way more than that.

Because of this, the size of File node IDs has been expanded to 64 bits. That's an insanely large number: 18.446.744.073.709.551.616. Even't its square root is 4.294.967.296. This means you can create billions of new files on separate branches without the likelihood of them conflicting. (Note: that's not a limit on the number of files but the number of new files on feature branches!) Thereby we have practically eliminated the chance of any file node ID conflicts.

An example File node ID (FnodeId) now looks like this:

iF78B0BDEC6305840

The equavalent File ID looks like this (note the f at the start):

fF78B0BDEC6305840

And an example nested File ID now looks like this:

fF78B0BDEC6305840.EC64D53884F1744D

Rolling out

We want to keep all existing file node IDs so the roll-out did not require any migration. New file nodes will get these large, random IDs while existing files will keep their old ones.

Merging is not publicly available yet but will be some time in the future.


Cheers,

you can follow me on Twitter