How I Do Encrypted, Mirrored ZFS Root on Linux

Update: No more keyfiles!

I am done with Solaris. A quick look through this blog should be enough to see how much I like Solaris. So when I say “I’m done,” I want to be perfectly clear as to why: Oracle. As long as Oracle continues to keep Solaris’ development and code under wraps, I cannot feel comfortable using it or advocating for it, and that includes at work, where we are paying customers. I stuck with it up until now, waiting for a better alternative to come about, and now that ZFS is stable on Linux, I’m out.

I’ve returned to my first love, Gentoo. Well, more specifically, I’ve landed in Funtoo, a variant of Gentoo. I learned almost everything I know about Linux on Gentoo, and being back in its ecosystem feels like coming back home. Funtoo, in particular, addresses a lot of the annoyances that made me leave Gentoo in the first place by offering more stable packages of core software like the kernel and GCC. Its Git-based Portage tree is also a very nice addition. But it was Funtoo’s ZFS documentation and community that really got my attention.

The Funtoo ZFS install guide is a nice starting point, but my requirements were a bit beyond the scope of the document. I wanted:

  • redundancy handled by ZFS (that is, not by another layer like md),
  • encryption using a passphrase, not a keyfile,
  • and to be prompted for the passphrase once, not for each encrypted device.

My solution is depicted below:

ZFS Root Diagram

A small block device is encrypted using a passphrase. The randomly initialized contents of that device are then in turn used as a keyfile for unlocking the devices that make up the mirrored ZFS rpool. Not pictured is Dracut, the initramfs that takes care of assembling the md RAID devices, unlocking the encrypted devices, and mounting the ZFS root at boot time.

Here is a rough guide for doing it yourself:

  1. Partition the disks.
    Without going in to all the commands, use gdisk to make your first disk look something like this:

    # gdisk -l /dev/sda
    ...
    Number  Start (sector)    End (sector)  Size       Code  Name
       1            2048         1026047   500.0 MiB   FD00  Linux RAID
       2         1026048         1091583   32.0 MiB    EF02  BIOS boot partition
       3         1091584         1099775   4.0 MiB     FD00  Linux RAID
       4         1099776       781422734   372.1 GiB   8300  Linux filesystem
    

    Then copy the partition table to the second disk:

    # sgdisk --backup=/tmp/table /dev/sda
    # sgdisk --load-backup=/tmp/table /dev/sdb
    # sgdisk --randomize-guids /dev/sdb
    

    If your system uses EFI rather than BIOS, you won’t need a BIOS boot partition, so adjust your partition numbers accordingly.

  2. Create the md RAID devices for /boot and the keyfile.
    # mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
    # mdadm --create /dev/md1 --level=1 --raid-devices=2 /dev/sda3 /dev/sdb3
    
  3. Set up the crypt devices.
    First, the keyfile:

    # cryptsetup -c aes-xts-plain64 luksFormat /dev/md1
    # cryptsetup luksOpen /dev/md1 keyfile
    # dd if=/dev/urandom of=/dev/mapper/keyfile
    

    Then, the ZFS vdevs:

    # cryptsetup -c aes-xts-plain64 luksFormat /dev/sda4 /dev/mapper/keyfile
    # cryptsetup -c aes-xts-plain64 luksFormat /dev/sdb4 /dev/mapper/keyfile
    # cryptsetup -d /dev/mapper/keyfile luksOpen /dev/sda4 rpool-crypt0
    # cryptsetup -d /dev/mapper/keyfile luksOpen /dev/sdb4 rpool-crypt1
    
  4. Format and mount everything up.
    # zpool create -O compression=on -m none -R /mnt/funtoo rpool mirror rpool-crypt0 rpool-crypt1
    # zfs create rpool/ROOT
    # zfs create -o mountpoint=/ rpool/ROOT/funtoo
    # zpool set bootfs=rpool/ROOT/funtoo rpool
    # zfs create -o mountpoint=/home rpool/home
    # zfs create -o volblocksize=4K -V 2G rpool/swap
    # mkswap -f /dev/zvol/rpool/swap
    # mkfs.ext2 /dev/md0
    # mkdir /mnt/funtoo/boot && mount /dev/md0 /mnt/funtoo/boot
    

    Now you can chroot and install Funtoo as you normally would.

When it comes time to finish the installation and set up the Dracut initramfs, there is a number of things that need to be in place. First, the ZFS package must be installed with the Dracut module. The current ebuild strips it out for some reason. I have a bug report open to fix that.

Second, /etc/mdadm.conf must be populated so that Dracut knows how to reassemble the md RAID devices. That can be done with the command mdadm --detail --scan > /etc/mdadm.conf.

Third, /etc/crypttab must be created so that Dracut knows how to unlock the encrypted devices:

keyfile /dev/md1 none luks
rpool-crypt0 /dev/sda4 /dev/mapper/keyfile luks
rpool-crypt1 /dev/sdb4 /dev/mapper/keyfile luks

Finally, you must tell Dracut about the encrypted devices required for boot. Create a file, /etc/dracut.conf.d/devices.conf containing:

add_device="/dev/md1 /dev/sda4 /dev/sdb4"

Once all that is done, you can build the initramfs using the command dracut --hostonly. To tell Dracut to use the ZFS root, add the kernel boot parameter root=zfs. The actual filesystem it chooses to mount is determined from the zpool’s bootfs property, which was set above.

And that’s it!

Now, I go a little further by creating a set of Puppet modules to do the whole thing for me. Actually, I can practically install a whole system from scratch with one command thanks to Puppet.

I also have a script that runs after boot to close the keyfile device. You’ve got to protect that thing.

# cat /etc/local.d/keyfile.start
#!/bin/sh
/sbin/cryptsetup luksClose keyfile

I think one criticism that could be leveled against this setup is that all the data on the ZFS pool gets encrypted and decrypted twice. That is because the redundancy comes in at a layer higher than the crypt layer. A way around that would be to set it up the other way around: encrypt a md RAID device and build the ZFS pool on top of that. Unfortunately, that comes at the cost of ZFS’s self healing capabilities. Until encryption support comes to ZFS directly, that’s the trade-off we have to make. In practice, though, the double encryption of this setup doesn’t make a noticeable performance impact.

UPDATE

I should mention that I’ve learned that Dracut is much smarter than I would have guessed, and it will let you enter a passphrase once and it will try it on all of the encrypted devices. This eliminates the need for the keyfile in my case, so I’ve updated all of my systems to simply use the same passphrase on all of the encrypted devices. I have found it to be a simpler and more reliable setup.

Mounting Encrypted ZFS Datasets at Boot

When ZFS encryption was released in Solaris 11 Express, I went out and bought four 2 TB drives and moved all of my data to a fresh, fully-encrypted zpool. I don’t keep a lot of sensitive data, but it brings me peace of mind to know that, in the event of theft or worse, my data is secure.

I chose to protect the data keys using a passphrase as opposed to using a raw key on disk. In my opinion, the only safe key is one that’s inside your head (though the US v. Fricosu case has me reevaluating that). The downside is that Solaris will ignore passphrase-encrypted datasets at boot.

The thing is, I run several services that depend on the data stored in my encrypted ZFS datasets. When Solaris doesn’t mount those filesystems at boot, those services fail to start or come up in very weird states that I must recover from manually. I would rather pause the boot process to wait for me to supply the passphrase so those services come up properly. Fortunately this is possible with SMF!

All of the services I am concerned about depend on, in one way or another, the svc:/system/filesystem/local:default service, which is responsible for mounting all of the filesystems. That service, in turn, depends on the single-user milestone. So I just need to inject my own service between the single-user milestone and the system/filesystem/local service that fails when it doesn’t have the keys. That failure will pause the boot process until it is cleared.

I wrote a simple manifest that expresses the dependencies between single-user and system/filesystem/local:

<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">

<service_bundle type='manifest' name='nest'>

<service
   name='system/filesystem/nest'
   type='service'
   version='1'>

    <create_default_instance enabled='true' />
    <single_instance />

    <dependency
       name='single-user'
       grouping='require_all'
       restart_on='none'
       type='service'>
        <service_fmri value='svc:/milestone/single-user' />
    </dependency>

    <dependent
       name='nest-local'
       grouping='require_all'
       restart_on='none'>
        <service_fmri value='svc:/system/filesystem/local' />
    </dependent>

    <exec_method
       type='method'
       name='start'
       exec='/lib/svc/method/nest start'
       timeout_seconds='60' />

    <exec_method
       type='method'
       name='stop'
       exec=':true'
       timeout_seconds='60' />

    <property_group name='startd' type='framework'>
        <propval name='duration' type='astring' value='transient' />
    </property_group>

    <stability value='Unstable' />

    <template>
        <common_name>
            <loctext xml:lang='C'>Load key for 'nest' zpool</loctext>
        </common_name>
    </template>
</service>

</service_bundle>

and a script at /lib/svc/method/nest that gets called by SMF:

#!/sbin/sh

. /lib/svc/share/smf_include.sh

case "$1" in
    'start')
        if [ $(zfs get -H -o value keystatus nest) != "available" ]; then
            echo "Run '/usr/sbin/zfs key -lr nest && /usr/sbin/svcadm clear $SMF_FMRI'" | smf_console
            exit $SMF_EXIT_ERR_FATAL
        fi
        ;;

    *)
        echo "Usage: $0 start"
        exit $SMF_EXIT_ERR_CONFIG
        ;;
esac

exit $SMF_EXIT_OK

The script checks whether the keys are available, and if not, prints a helpful hint to the console. The whole thing looks something like this at boot:

SunOS Release 5.11 Version 11.0 64-bit
Copyright (c) 1983, 2011, Oracle and/or its affiliates. All rights reserved.
Hostname: falcon

Run '/usr/sbin/zfs key -lr nest && /usr/sbin/svcadm clear svc:/system/filesystem/nest:default'
May 30 14:31:06 svc.startd[11]: svc:/system/filesystem/nest:default: Method "/lib/svc/method/nest start" failed with exit status 95.
May 30 14:31:06 svc.startd[11]: system/filesystem/nest:default failed fatally: transitioned to maintenance (see 'svcs -xv' for details)

falcon console login: jlee
Password: 
falcon% sudo -s
falcon# /usr/sbin/zfs key -lr nest && /usr/sbin/svcadm clear svc:/system/filesystem/nest:default
Enter passphrase for 'nest': 
falcon#

When I get to the console shell, I can just copy and paste the command printed by the script. Once the service failure is cleared, SMF continues the boot process normally and all of my other services come up exactly as I’d expect.

No, it’s not very pretty, but I’d rather have a little bit of manual intervention during the boot process for as infrequently as I do it, than to have to clean up after services that come up without the correct dependencies. And with my new homemade LOM, it’s not too much trouble to run commands at the console, even remotely.