Mounting Encrypted ZFS Datasets at Boot

When ZFS encryption was released in Solaris 11 Express, I went out and bought four 2 TB drives and moved all of my data to a fresh, fully-encrypted zpool. I don’t keep a lot of sensitive data, but it brings me peace of mind to know that, in the event of theft or worse, my data is secure.

I chose to protect the data keys using a passphrase as opposed to using a raw key on disk. In my opinion, the only safe key is one that’s inside your head (though the US v. Fricosu case has me reevaluating that). The downside is that Solaris will ignore passphrase-encrypted datasets at boot.

The thing is, I run several services that depend on the data stored in my encrypted ZFS datasets. When Solaris doesn’t mount those filesystems at boot, those services fail to start or come up in very weird states that I must recover from manually. I would rather pause the boot process to wait for me to supply the passphrase so those services come up properly. Fortunately this is possible with SMF!

All of the services I am concerned about depend on, in one way or another, the svc:/system/filesystem/local:default service, which is responsible for mounting all of the filesystems. That service, in turn, depends on the single-user milestone. So I just need to inject my own service between the single-user milestone and the system/filesystem/local service that fails when it doesn’t have the keys. That failure will pause the boot process until it is cleared.

I wrote a simple manifest that expresses the dependencies between single-user and system/filesystem/local:

<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">

<service_bundle type='manifest' name='nest'>

<service
   name='system/filesystem/nest'
   type='service'
   version='1'>

    <create_default_instance enabled='true' />
    <single_instance />

    <dependency
       name='single-user'
       grouping='require_all'
       restart_on='none'
       type='service'>
        <service_fmri value='svc:/milestone/single-user' />
    </dependency>

    <dependent
       name='nest-local'
       grouping='require_all'
       restart_on='none'>
        <service_fmri value='svc:/system/filesystem/local' />
    </dependent>

    <exec_method
       type='method'
       name='start'
       exec='/lib/svc/method/nest start'
       timeout_seconds='60' />

    <exec_method
       type='method'
       name='stop'
       exec=':true'
       timeout_seconds='60' />

    <property_group name='startd' type='framework'>
        <propval name='duration' type='astring' value='transient' />
    </property_group>

    <stability value='Unstable' />

    <template>
        <common_name>
            <loctext xml:lang='C'>Load key for 'nest' zpool</loctext>
        </common_name>
    </template>
</service>

</service_bundle>

and a script at /lib/svc/method/nest that gets called by SMF:

#!/sbin/sh

. /lib/svc/share/smf_include.sh

case "$1" in
    'start')
        if [ $(zfs get -H -o value keystatus nest) != "available" ]; then
            echo "Run '/usr/sbin/zfs key -lr nest && /usr/sbin/svcadm clear $SMF_FMRI'" | smf_console
            exit $SMF_EXIT_ERR_FATAL
        fi
        ;;

    *)
        echo "Usage: $0 start"
        exit $SMF_EXIT_ERR_CONFIG
        ;;
esac

exit $SMF_EXIT_OK

The script checks whether the keys are available, and if not, prints a helpful hint to the console. The whole thing looks something like this at boot:

SunOS Release 5.11 Version 11.0 64-bit
Copyright (c) 1983, 2011, Oracle and/or its affiliates. All rights reserved.
Hostname: falcon

Run '/usr/sbin/zfs key -lr nest && /usr/sbin/svcadm clear svc:/system/filesystem/nest:default'
May 30 14:31:06 svc.startd[11]: svc:/system/filesystem/nest:default: Method "/lib/svc/method/nest start" failed with exit status 95.
May 30 14:31:06 svc.startd[11]: system/filesystem/nest:default failed fatally: transitioned to maintenance (see 'svcs -xv' for details)

falcon console login: jlee
Password: 
falcon% sudo -s
falcon# /usr/sbin/zfs key -lr nest && /usr/sbin/svcadm clear svc:/system/filesystem/nest:default
Enter passphrase for 'nest': 
falcon#

When I get to the console shell, I can just copy and paste the command printed by the script. Once the service failure is cleared, SMF continues the boot process normally and all of my other services come up exactly as I’d expect.

No, it’s not very pretty, but I’d rather have a little bit of manual intervention during the boot process for as infrequently as I do it, than to have to clean up after services that come up without the correct dependencies. And with my new homemade LOM, it’s not too much trouble to run commands at the console, even remotely.

My Homemade LOM

In one of the final classes of my CS master’s program, Embedded Computing, we were required to complete a semester project of our choosing involving embedded systems. Like in previous semester projects, I wanted to do something that I would actually be able to use after the class ended.

This time around I chose to build a lights-out manager for my Sun Ultra 24 server (which this blog is hosted on). With a LOM I can control the system’s power and access its serial console so I will be able to perform OS updates remotely, among other things.

Since I already owned one and I didn’t want to spend a lot of money, I chose to develop the project on top of the Arduino platform. I like the size of the Arduino, and the availability of different shields to minimize soldering. It’s also able to be powered by USB, which is perfect because the Ultra 24 has an internal USB port that always supplies power.

To make things a little more challenging for myself (because the Arduino is pretty easy on its own), I chose to implement a hardware UART to communicate with the Ultra 24’s serial port. Specifically, I chose to use the Maxim MAX3110E SPI UART and RS-232 transceiver. Great little chip.

For communication with the outside world, I bought an Ethernet shield from Freetronics. It’s compatible with the official Arduino Ethernet shield, but includes a fix to allow the network module to work with other SPI devices (such as my UART) on the same bus. I started to implement the network UI using Telnet, but after realizing I would have to translate the serial console data from VT100 to NVT, I switched to Rlogin, which is like Telnet, but assumes like-to-like terminal types.

Lastly, for controling the system’s power, I figured out how to tap into the Ultra 24’s power LED and switch. Using the LED, I can check whether the system is on or off, and using the switch circuit and a transistor, I can power the system on and off. I managed to do this without affecting the operation of the front panel buttons/LEDs.

I’ll spare you all of the implementation details (if you’re interested, you can read my report). Suffice it to say, the thing works as well as I could have imagined. Here is a screenshot of me using the serial console on my workstation:

From my research, the serial and power motherboard headers are the same on most modern Intel systems, so this LOM should work on more than just an Ultra 24. If you want to build one of your own, my code is available on GitHub and the hardware schematic is in the report I linked to above.