LUKS on a NAS-like setup with NixOS

When I moved from Synology as a NAS solution, I migrated to a self-made NixOS setup. This was mostly because of the extra flexibility I required. I switched to ZFS as my primary storage filesystem, but due to potential issues with encryption I opted-out of the native ZFS encryption and proceed with ZFS-on-LUKS. There are downsides to this, especially around remote replication, but I value the data on my NAS and I don’t want to be in the spot where I have to do a full restore (especially if my primary backup relies on ZFS itself).

With NixOS, the LUKS setup is rather straightforward: you just add all the devices in boot.initrd.luks.devices. While I could have used secure boot and TPM to unlock the drives or unlock it over SSH, having a keyfile over USB seemed more straightforward. The important part is to remember to add boot.initrd.kernelModules = [ "usb_storage" ] so that the USB drives would be present early on.

Due to unforseen hardware issues (a faulty SATA cable), I ended in a spot where one of the HDDs would be missing at boot. It’s not a problem for ZFS, of course, and it can both bring up a degraded pool and reintroduce the drive into it later, but with this LUKS setup the system won’t actually proceed to booting if the drive was missing. A suggestion I found online was to use crypttabExtraOpts = [ "nofail" ] for my storage drives, so that failing to unlock those won’t block the stage 1 boot. This, however, introduced a peculiar race condition. With nofail, systemd wouldn’t even bother waiting for the drives that weren’t used in the rootfs pool, so chances were that only some of them would be unlocked, or even none at all. This, in turn, would make the zfs pool missing in stage 2.

While the rest of the system was resilient to this kind of failure thanks to mount options of x-systemd.before=k3s.service and x-systemd.required-by=k3s.service, that meant I’d have to reboot a few times to get over the pesky race condition. Moreover, the boots would be annoyingly slow, because ZFS tries to bring up a pool for a while before giving up.

I studied my options, and, apparently, the best way to handle it is via the good old /etc/crypttab, which, oddly, doesn’t have a corresponding NixOS option. Not a problem, you can always craft a new one:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


environment.etc.crypttab = {
  enable = true;
  text = ''
    # <target name>   <source device>               <key file>                  <options>
    luks-nas-1        /dev/disk/by-id/ata-aaa-part1 /dev/disk/by-id/usb-123-0:0 keyfile-size=4096
    luks-nas-2        /dev/disk/by-id/ata-bbb-part1 /dev/disk/by-id/usb-123-0:0 keyfile-size=4096
    luks-nas-3        /dev/disk/by-id/ata-ccc-part1 /dev/disk/by-id/usb-123-0:0 keyfile-size=4096
    luks-nas-4        /dev/disk/by-id/ata-ddd-part1 /dev/disk/by-id/usb-123-0:0 keyfile-size=4096      
  '';
};

With the crypttab present, boot.initrd.luks.devices now has a single entry—the rootfs disk. Everything else is handled by systemd during stage 2, meaning the service dependencies are propagated fully: daemons depend on mounts, mounts depend on zfs pools, zfs pools depend on luks devices. Any failure in this chain will result in the least severe degradation up to daemons not starting.

You can respond to this post with Mastodon or any other fediverse-compatible app.

Comments