Mounting Encrypted ZFS Datasets at Boot

When ZFS encryption was released in Solaris 11 Express, I went out and bought four 2 TB drives and moved all of my data to a fresh, fully-encrypted zpool. I don’t keep a lot of sensitive data, but it brings me peace of mind to know that, in the event of theft or worse, my data is secure.

I chose to protect the data keys using a passphrase as opposed to using a raw key on disk. In my opinion, the only safe key is one that’s inside your head (though the US v. Fricosu case has me reevaluating that). The downside is that Solaris will ignore passphrase-encrypted datasets at boot.

The thing is, I run several services that depend on the data stored in my encrypted ZFS datasets. When Solaris doesn’t mount those filesystems at boot, those services fail to start or come up in very weird states that I must recover from manually. I would rather pause the boot process to wait for me to supply the passphrase so those services come up properly. Fortunately this is possible with SMF!

All of the services I am concerned about depend on, in one way or another, the svc:/system/filesystem/local:default service, which is responsible for mounting all of the filesystems. That service, in turn, depends on the single-user milestone. So I just need to inject my own service between the single-user milestone and the system/filesystem/local service that fails when it doesn’t have the keys. That failure will pause the boot process until it is cleared.

I wrote a simple manifest that expresses the dependencies between single-user and system/filesystem/local:

<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">

<service_bundle type='manifest' name='nest'>

<service
   name='system/filesystem/nest'
   type='service'
   version='1'>

    <create_default_instance enabled='true' />
    <single_instance />

    <dependency
       name='single-user'
       grouping='require_all'
       restart_on='none'
       type='service'>
        <service_fmri value='svc:/milestone/single-user' />
    </dependency>

    <dependent
       name='nest-local'
       grouping='require_all'
       restart_on='none'>
        <service_fmri value='svc:/system/filesystem/local' />
    </dependent>

    <exec_method
       type='method'
       name='start'
       exec='/lib/svc/method/nest start'
       timeout_seconds='60' />

    <exec_method
       type='method'
       name='stop'
       exec=':true'
       timeout_seconds='60' />

    <property_group name='startd' type='framework'>
        <propval name='duration' type='astring' value='transient' />
    </property_group>

    <stability value='Unstable' />

    <template>
        <common_name>
            <loctext xml:lang='C'>Load key for 'nest' zpool</loctext>
        </common_name>
    </template>
</service>

</service_bundle>

and a script at /lib/svc/method/nest that gets called by SMF:

#!/sbin/sh

. /lib/svc/share/smf_include.sh

case "$1" in
    'start')
        if [ $(zfs get -H -o value keystatus nest) != "available" ]; then
            echo "Run '/usr/sbin/zfs key -lr nest && /usr/sbin/svcadm clear $SMF_FMRI'" | smf_console
            exit $SMF_EXIT_ERR_FATAL
        fi
        ;;

    *)
        echo "Usage: $0 start"
        exit $SMF_EXIT_ERR_CONFIG
        ;;
esac

exit $SMF_EXIT_OK

The script checks whether the keys are available, and if not, prints a helpful hint to the console. The whole thing looks something like this at boot:

SunOS Release 5.11 Version 11.0 64-bit
Copyright (c) 1983, 2011, Oracle and/or its affiliates. All rights reserved.
Hostname: falcon

Run '/usr/sbin/zfs key -lr nest && /usr/sbin/svcadm clear svc:/system/filesystem/nest:default'
May 30 14:31:06 svc.startd[11]: svc:/system/filesystem/nest:default: Method "/lib/svc/method/nest start" failed with exit status 95.
May 30 14:31:06 svc.startd[11]: system/filesystem/nest:default failed fatally: transitioned to maintenance (see 'svcs -xv' for details)

falcon console login: jlee
Password: 
falcon% sudo -s
falcon# /usr/sbin/zfs key -lr nest && /usr/sbin/svcadm clear svc:/system/filesystem/nest:default
Enter passphrase for 'nest': 
falcon#

When I get to the console shell, I can just copy and paste the command printed by the script. Once the service failure is cleared, SMF continues the boot process normally and all of my other services come up exactly as I’d expect.

No, it’s not very pretty, but I’d rather have a little bit of manual intervention during the boot process for as infrequently as I do it, than to have to clean up after services that come up without the correct dependencies. And with my new homemade LOM, it’s not too much trouble to run commands at the console, even remotely.

The Solaris 11 Experience So Far

I have a system (a zone on which this blog is hosted) that has been running the same installation of Solaris since 11/11/2009, starting with OpenSolaris 2009.06. In the time since, it has seen every public build of OpenSolaris, then OpenIndiana, and finally Solaris 11 Express. Now, exactly two years later, I’ve updated it to Solaris 11 11/11, and I’d like to share my experience so far.

The update itself did not go smoothly. I was sitting at Solaris 11 Express SRU 8 and thought, like every update I’ve done in the past, that I could just run pkg image-update. Silly me, because when I did and then rebooted, the kernel panicked. No big deal, that’s what boot environments are for. I reverted to the previous boot environment and found some helpful documentation that told me to do exactly what I just did. It turns out that there is no way to update to SRU 13 using the support repositories because they already contain the Solaris 11 11/11 packages, and pkg tries to pull some of them in. And there is no way to update just pkg because the ips-consolidation prevents it, and trying to update the ips-consolidation pulls the entire package which breaks everything just the same. In short, Oracle bungled it. The only way to update to SRU 13 that I could see was to download the SRU 13 repository ISO from My Oracle Support and set up a local repository. Once I was on SRU 13, I could continue with the update to the 11/11 release. But there were more surprises in store for me.

First, it looks like pkg decided to start enforcing consistent attributes on files shared by multiple packages. Fine, I can understand that. As a result, I had to remove a lot of my custom packages (mostly from spec-files-extra) which I’ll have to rebuild. Second, pkg decided it doesn’t like the opensolaris.org packages anymore so I had to uninstall OpenOffice.org. Also fair enough.

Happily, after that, the updates got applied successfully and the system rebooted into the 11/11 release. Next came the zone updates. When I did the normal zoneadm -z foo detach && zoneadm -z foo attach -u deal, I was told I had to convert my zones to a new ZFS structure which more closely matches the global zone. The script /usr/lib/brand/shared/dsconvert actually worked flawlessly and the updated zones came up fine.

Unfortunately I couldn’t SSH into my zones because my DNS server didn’t know where they were. It seems that with the updated networking framework, DHCP doesn’t request a hostname anymore. (/etc/default/dhcpagent still says inet <hostname> can be put in /etc/hostname.<if> to request the hostname.) I found that you can create an addr object that requests a hostname with ipadm create-addr -T dhcp -h <hostname> <addrobj>, but NWAM pretty much won’t let you create or modify anything with ipadm, and there were no options for requesting hostnames with nwamcfg. As a result, I had to disable NWAM (netadm enable -p ncp DefaultFixed) and then I could set up the interface with ipadm. Why doesn’t Solaris request hostnames by default? Not very “cloud-like” if you ask me.

I have to say, I’m impressed by the way global zones and non-global zones are linked in the new release. Zone updates were an obvious shortcoming of previous releases. We’ll see how well it works when Solaris 11 Update 1 comes out.

What else…I lost my ability to pfexec to root. Oracle removed the “Primary Administrator” profile for security reasons so I had to install sudo. Not a big deal, I just wish they had said something a little louder about it.

Also, whatever update to pkg happened, it wiped out my repositories under /var/pkg. I had to restore them from a snapshot. Bad Oracle!

I’m also a little confused about some of the changes to the way networking settings are stored. For example, when I first booted the global zone, I found that my NFSv4 domain name was reset by NWAM. I set it to what it should be with sharectl set -p nfsmapid_domain=thestaticvoid.com nfs, but is that going to be overwritten again by NWAM? Also, the name resolver settings are now stored in the svc:/network/dns/client:default service, and according to the documentation, DHCP will set the service properties properly, but I have yet to see this work.

And the last problem I’ll mention is that the update removed my virtual consoles. I had to install the virtual-console package to restore them.

Overall, I’m happy that I was at least able to update to the latest release. Oracle could have cut off any update path from OpenSolaris. However, the update should have been a lot smoother. It doesn’t speak well of future updates when I can’t even update from one supported release (SRU 8) to another. I also wish Oracle were more open about upcoming changes (as in, having more preview releases or, dare I say it, opening development the way OpenSolaris was). Even to me, a long time pre-Solaris 11 user, the changes to zones and networking are huge in this release, and I would rather have not been so surprised by them.

Wireless 802.1X Support in Solaris

The George Washington University (where I work and study) has recently implemented 802.1X to secure its wireless networks. 802.1X defines support for EAP over Ethernet (including wireless) and the WPA standards define several modes of EAP that can be used.

Solaris (I’m referring to version 11, OpenSolaris, OpenIndiana, and Illumos) supports WPA. It modified an early version of wpa_supplicant and called it “wpad“. However, they seemed to make a point of stripping out all EAP support in wpad.

So when my Network Security instructor said we had to do a term project of our choosing relating to network security, I decided I’d try to get 802.1X working in Solaris. To do this, I decided I could either add the EAP bits back into wpad, or add the Solaris-specific bits to the latest version of wpa_supplicant. wpad is based on very old code. It’s not even clear which version of wpa_supplicant it is based on, and there is no record of the massive amount of changes they made. It would be too hard for me to figure out where to plug EAP back in, and who knows how many bugs and security vulnerabilities were fixed upstream that we’d be missing out on.

Fortunately, wpa_supplicant is very modular, and reasonably well documented. I was able to graft the older Solaris code onto the newer interfaces. The result of my work is currently maintained in my own branch at GitHub. It’s not perfect, but it works (and I’ll explain how). Solaris has a very limited public API for wireless support and my goal was to get wpa_supplicant working without having to modify any system libraries or the kernel. I struggled to figure out some idiosyncrasies such as:

  • Events (association, disassociation, etc.) are only sent to wpa_supplicant when WPA is enabled in the driver.
  • Full scan results are only available when WPA is disabled in the driver.
  • Scan results don’t provide nearly as much information as their Linux counterparts do, such as access point capabilities, signal strength, noise levels, etc. I was very worried I wouldn’t be able to fill out the scan results structure fully and wpa_supplicant would refuse to work without complete information.

Here is how you can get 802.1X support working on your Solaris laptop:

  1. Install the wpa_supplicant package from my package repository:
    # pkg set-publisher -p http://pkg.thestaticvoid.com/
    # pkg install wpa_supplicant
    
  2. Add the configuration for your protected wireless networks to /etc/wpa_supplicant.conf. Here is mine:

    ctrl_interface=/var/run/wpa_supplicant
    ctrl_interface_group=0
    ap_scan=0

    network={
        ssid="prey"
        key_mgmt=WPA-PSK
        psk="<network key>"
    }

    network={
        ssid="GW1X"
        key_mgmt=WPA-EAP
        eap=TTLS
        identity="jameslee"
        anonymous_identity="anonymous"
        password="<personal password>"
        phase2="auth=PAP"
    }

    The most important thing here is ap_scan=0. This tells wpa_supplicant not to do any scanning or association of its own. Those tasks will be handled by dladm and NWAM.

  3. Backup /usr/lib/inet/wpad and replace it with this script:

    #!/bin/sh

    interface=`echo $@ | /usr/bin/sed 's/.*-i *\([a-z0-9]*\).*/\1/'`
    exec /usr/sbin/wpa_supplicant -Dsolaris -i$interface -c/etc/wpa_supplicant.conf -s &

Now connect to a wireless network with NWAM or dladm. When prompted for a network key, enter anything; it won’t be used. The actual keys will be looked up in /etc/wpa_supplicant.conf. Here is an example of me connecting to my 802.1X-secured network using dladm:

# dladm connect-wifi -e GW1X -s wpa -k nwam-GW1X iwh0
# dladm show-wifi
LINK       STATUS            ESSID               SEC    STRENGTH   MODE   SPEED
iwh0       connected         GW1X                wpa    excellent  g      54Mb

-k nwam-GW1X” refers to a dummy key setup by NWAM. dladm will complain if it’s not supplied a key.

That should be it!

Future Directions

Obviously, the integration of wpa_supplicant and NWAM/dladm leaves a lot to be desired. If there is sufficient interest, I will start looking into how to modify the dladm security framework in Illumos to include EAP related configurations (keys, certificates, identities; it’s all much more complicated than the single pre-shared key that dladm supports now). My hope, though, is that Oracle is already working on this. Do you hear that Oracle?

Automounting NFSv4 over SSH

For the past couple of years, I’ve used SSHFS to access my fileserver remotely (mostly from work). It’s always been pretty slow and it isn’t very stable on Solaris, so I’ve switched to NFSv4 over SSH. My biggest hangup of using NFS was how to secure it over the internet. Its Kerberos support is completely overkill for my needs and I never really wanted to deal with the complications of scripting the set up of an SSH tunnel, either. It all seemed so fragile.

Then I discovered autossh which does all the work of setting up and maintaining the tunnel for me. I coupled that with an executable autofs map to automatically start the tunnel just before trying to mount a share, like:

#!/bin/bash

export AUTOSSH_PIDFILE=/var/run/falcon-tunnel.pid
export AUTOSSH_GATETIME=0
export AUTOSSH_DEBUG=1

if [ -f $AUTOSSH_PIDFILE ]; then
    kill -HUP $(cat $AUTOSSH_PIDFILE)
else
    autossh -f -M 0 -o ServerAliveInterval=5 -NL 2050:localhost:2049 jlee@falcon
fi

echo "-fstype=nfs4,port=2050 localhost:/nest/$1"

Using an executable autofs map allows me to avoid reconciling the differences between service managers like SMF and Upstart, offering a consistent way to start the tunnel exactly when it’s needed on both Solaris and Linux. When you ‘cd’ into a directory managed by autofs, autossh is started or woken up, then the share is mounted over the tunnel. If there is a network interruption or change (from wired to wireless, for example), ssh will disconnect after 15 seconds of inactivity and autossh will restart it. NFS is smart enough to resume its operation when the tunnel is reestablished.

autossh has built-in support for heartbeat monitoring, but I’ve found SSH’s built-in ServerAliveInterval feature to be more reliable.

With this setup I have very simple, robust, and secure remote access to my fileserver.

Persistent Search Domains With NWAM and DHCP

What I Want

I want to be able to refer to systems on both my home and work networks by their hostnames rather than their fully-qualified domain names, so, ‘prey’ instead of ‘prey.thestaticvoid.com’ and ‘acad2’ instead of ‘acad2.es.gwu.edu’.

NWAM Settings

The Problem

I would typically set my home and work domains as the search setting in /etc/resolv.conf. Unfortunately, either NWAM or the Solaris DHCP client (I haven’t decided which) overwrites resolv.conf on every new connection. DHCP on Linux does the same thing, but I can configure it by editing dhclient.conf (or whatever is being used these days, it’s been a while. I think I just set my domains in the NetworkManager GUI and forget about it).

The Solaris DHCP client configuration is not nearly as flexible, and neither is NWAM which gives you the option of replacing resolv.conf with information supplied by the DHCP server, or provided by you, but not a mix of both. I do like having the nameservers set by the DHCP server, so supplying a manual configuration is not an option.

What I Tried

The first thing I tried was setting the LOCALDOMAIN environmental variable in /etc/profile. From the resolv.conf man page:

You can override the search keyword of the system
resolv.conf file on a per-process basis by setting the
environment variable LOCALDOMAIN to a space-separated list
of search domains.

I thought, great, a way to manage domain search settings without worrying about what’s doing what to resolv.conf. It didn’t work as advertised:

% LOCALDOMAIN=thestaticvoid.com ping prey
ping: unknown host prey
% s touch /etc/resolv.conf
% LOCALDOMAIN=thestaticvoid.com ping prey
prey is alive
% LOCALDOMAIN=thestaticvoid.com ping prey
ping: unknown host prey

Next, I considered adding an NWAM Network Modifier to set my search string in resolv.conf after a new connection is established. This worked reasonably well, but didn’t handle the case when you switch from one network to another, for example, from wireless to wired. The only events in NWAM that can trigger a script when the network connection changes happens before DHCP messes up resolv.conf.

Finally, in the course of my testing, I discovered that the svc:/network/dns/client service was restarting with every network connection change. I looked into its manifest and saw that it was designed to wait for changes to resolv.conf:

<!--
 Wait for potential DHCP modification of resolv.conf.
-->
<dependency
   name='net'
   grouping='require_all'
   restart_on='none'
   type='service'>
    <service_fmri value='svc:/network/service' />
</dependency>

So I could write another service which depends on dns/client and restarts whenever dns/client does and I would have the last word about what goes into my configuration file!

My Solution

I wrote a service, svc:/network/dns/resolv-conf, with the following manifest:

<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">

<service_bundle type="manifest" name="dns-resolv-conf">
    <service name="network/dns/resolv-conf"
        type="service"
        version="1">
        <create_default_instance enabled="false" />
        <single_instance />

        <dependency name="dns-client"
            grouping="require_all"
            restart_on="restart"
            type="service">
            <service_fmri value="svc:/network/dns/client" />
        </dependency>

        <dependent name="resolv-conf"
            grouping="optional_all"
            restart_on="restart">
            <service_fmri value="svc:/milestone/name-services" />
        </dependent>

        <exec_method type="method"
            name="start"
            exec="/lib/svc/method/dns-resolv-conf start"
            timeout_seconds="60" />

        <exec_method type="method"
            name="stop"
            exec="/lib/svc/method/dns-resolv-conf stop"
            timeout_seconds="60" />

        <property_group name="options" type="application">
            <propval name="search" type="astring" value="" />
        </property_group>

        <property_group name="startd" type="framework">
            <propval name="duration" type="astring" value="transient" />
        </property_group>

        <stability value="Unstable" />

        <template>
            <common_name>
                <loctext xml:lang="C">resolv.conf Settings</loctext>
            </common_name>
            <documentation>
                <manpage title="resolv.conf" section="4"
                    manpath="/usr/share/man" />
            </documentation>
        </template>
    </service>
</service_bundle>

which calls the script, /lib/svc/method/dns-resolv-conf containing:

#!/sbin/sh

. /lib/svc/share/smf_include.sh

search=$(svcprop -p options/search $SMF_FMRI)

case "$1" in
    "start")
        # Don't do anything if search option not provided.
        [ "$search" == '""' ] && exit $SMF_EXIT_OK

        # Reverse the lines because we either want to:
        #   add the search line after the *last* domain line or
        #   add it to the very top of the file if there is no domain line
        tac /etc/resolv.conf | grep -v "^search" | gawk '
            /^domain/ {
                if (!isset) {
                    print "search", $2, search
                    isset=1
                }
            }

            END {
                if (!isset) {
                    print "search", search
                }
            }

            1
        '
search="$search" | tac > /etc/resolv.conf.new && mv -f /etc/resolv.conf.new /etc/resolv.conf
        ;;

    "stop")
        # Just get rid of any search lines, I guess.
        grep -v "^search" /etc/resolv.conf > /etc/resolv.conf.new && mv -f /etc/resolv.conf.new /etc/resolv.conf
        ;;

    *)
        echo "Usage: $0 { start | stop }"
        exit $SMF_EXIT_ERR_CONFIG
esac

exit $SMF_EXIT_OK

So now I can set my search options like:

% svccfg -s resolv-conf setprop 'options/search="thestaticvoid.com es.gwu.edu"'
% svcadm refresh resolv-conf
% svcadm enable resolv-conf
% cat /etc/resolv.conf
domain  iss.gwu.edu
search iss.gwu.edu thestaticvoid.com es.gwu.edu
nameserver  161.253.152.50
nameserver  128.164.141.12

Problem solved! Or at least worked-around in the least hacky way I can!

CrashPlan

I have a little storage array that I store my life on. Music, movies, photographs, projects, school work—I’d be devastated if I lost any of it. And yet, I don’t have any sort of backup for it. Last year I evaluated various online backup services but concluded that my 5 Mbps (~600 KB/s) upload bandwidth was just too slow to feasibly backup all of my data. Now I have a 25 Mbps (~3 MB/s) symmetric connection, so last week when I got a promotional email from CrashPlan announcing their new version and prices, I decided to give it another try.

CrashPlan is, as far as I know, the only online backup solution that officially supports Solaris, and it’s not half-assed either. The software is delivered as a standard SVR4 package which installs to /opt/sfw/crashplan and includes an SMF manifest. Normally I’d never trust consumer-oriented proprietary software like this, but their Solaris support instills confidence in me. I can only hope that they continue to maintain it, despite the uncertainty surrounding Solaris’s future.

Like I said, installation was a breeze. Looking back at my shell history, it was as easy as:

# cd /tmp
# wget http://download.crashplan.com/installs/solaris/install/CrashPlan/CrashPlan_3.0_Solaris.tar.gz
# tar -xvzf CrashPlan_3.0_Solaris.tar.gz
# pkgadd -d . CrashPlan
# svccfg import /opt/sfw/crashplan/bin/crashplan.xml
# svcadm enable crashplan

From there the GUI can be launched as a regular user by running /opt/sfw/crashplan/bin/CrashPlanDesktop. The user interface is clean and simple. On the first run, it walks you through setting up an account. New users get a 30-day free trial to CrashPlan+, which includes unlimited online backups. I’m still on my trial, but as long as it continues to work for me, I expect I’ll purchase a subscription for $5/month.

First thing I did after registering was to go into the security settings and change the archive encryption key type to use a private password. This encrypts the key which encrypts my data with a separate password so even if someone hijacks my CrashPlan account, they will not be able to restore any of my files. The other advanced option, supplying your own private data key, I would argue is less secure since the key is stored in-the-clear on the local system and it cannot be changed without invalidating all of your backups. Security is very important to me, so I am happy to see that they give control over these settings to the user, though I wish the backup agent were open-source to enable more public scrutiny. At the very least, I’d like for CrashPlan to provide more details about their encryption methods similar to SpiderOak.

Next I directed the software to backup my storage array mounted at /nest to CrashPlan Central and off it went. I’m currently seeing speeds around 6 Mbps (750 KB/s) which is slightly disappointing on my fast connection, but not unacceptable. They claim that they do not cap or throttle connections, though from what I’ve read, speed is largely dependent on which of CrashPlan’s many datacenters you are provisioned to. They’ve been experiencing much higher volume than normal with last week’s release of CrashPlan 3, so I hope to see increased speed when that activity subsides.

I do like that the backup actually takes place in the background, so the GUI is only ever necessary for changing settings and performing restores. I tested a restore and saw much better speeds around 16 Mbps (2 MB/s), though still not even close to saturating my internet connection.

My backup should hopefully be done by the new year and then it’ll just be a matter of performing small nightly incrementals.

DTrace to the Rescue Again!

I am really beginning to understand just how powerful and important this tool is. I ran into another problem at work where I was seeing errors, but the details weren’t very helpful. I recently made a NFS mounted filesystem read-only, and some errors started to pop up in the system log:

# dmesg
...
Mar 30 12:09:00 ragno nfs: [ID 808668 kern.notice] NFS write error on host nasfb: Read-only file system.
Mar 30 12:09:00 ragno nfs: [ID 702911 kern.notice] (file handle: 1d010000 1000000 e10d1300 a5122c4b 1d010000 1000000 15000000 5c179c4a)
...

I wanted to know which file is being written to. The file handle isn’t very useful for me. I tried using lsof like:

# lsof -N -z ragno /zones/ragno/root/home
lsof: WARNING: can't stat() 5 zone file systems; using dev= options
COMMAND     PID ZONE      USER   FD   TYPE        DEVICE             SIZE/OFF       NODE NAME
httpd      3065 ragno webservd  cwd   VDIR     256,65718                    0          0 /zones/ragno (ragno)
httpd      3065 ragno webservd  rtd   VDIR     256,65718                    0          0 /zones/ragno (ragno)
httpd      3065 ragno webservd  txt   VREG     256,65718                    0          0 /zones/ragno (ragno)
httpd      3065 ragno webservd  txt   VREG     256,65718                    0          0 /zones/ragno (ragno)
...

but it didn’t give me any useful information either. Maybe I’m not using it correctly?

Then I found the function in the kernel which is responsible outputting the NFS write errors that appear in the system log:

...
void
nfs_write_error(vnode_t *vp, int error, cred_t *cr)
{
...

This function gets passed a vnode_t pointer which has tons of data on the current I/O operation, including the file name. DTrace enabled me to access this data dynamically:

# dtrace -n 'fbt:nfs:nfs*_write_error:entry /zonename == "ragno"/ {vp = (vnode_t*) arg0; printf("%s", stringof(vp->v_path));}'
dtrace: description 'fbt:nfs:nfs*_write_error:entry ' matched 2 probes
CPU     ID                    FUNCTION:NAME
  0  56891            nfs_write_error:entry /zones/ragno/root/home/rmserver/Logs/rmaccess.log
  0  56891            nfs_write_error:entry /zones/ragno/root/home/rmserver/Logs/rmaccess.log
  0  56891            nfs_write_error:entry /zones/ragno/root/home/rmserver/Logs/rmaccess.log
  0  56891            nfs_write_error:entry /zones/ragno/root/home/rmserver/Logs/rmaccess.log
  0  56891            nfs_write_error:entry /zones/ragno/root/home/rmserver/Logs/rmaccess.log

And there’s the culprit. Now I know which file needs to be relocated to local read/write storage. Neato!

Watching Network Connections with DTrace

DTrace is one of those magical tools that I sort of know how to use but never had the chance to. Usually, firing up truss is sufficient to figure out what’s happening in a process.

But recently, a team at GW was setting up a web application and having some problems. It would hang whenever it was accessed. The application is highly modular and accesses several other servers for resources, so I suspected there must have been some sort of network problem, probably a blocked firewall port somewhere. Unfortunately, the application produces no logs (!) so it was pretty much guesswork.

I started snoop to watch for traffic generated by the application. I knew from watching the production application that there should be communication between two servers over port 9300, but I wasn’t seeing anything. I thought of using truss to determine if the application was at least trying to connect, but being a web application, the process was way too short-lived to attach to. Even if I was able to attach to it, all I might have learned was that the application was calling the ‘connect’ system call and possibly getting a return value.

DTrace, on the other hand, doesn’t have to attach to a particular process. It just sits in the kernel and dynamically observes what your scripts tell it to. It also has the ability to look into data structures so you can actually see what values are passed to functions and system calls. I used this ability to watch every call to the connect system call.

#!/usr/sbin/dtrace -qs

syscall::connect:entry
{
        socks = (struct sockaddr*) copyin(arg1, arg2);
        hport = (uint_t) socks->sa_data[0];
        lport = (uint_t) socks->sa_data[1];
        hport <<= 8;
        port = hport + lport;

        printf("%s: %d.%d.%d.%d:%d\n", execname, socks->sa_data[2], socks->sa_data[3], socks->sa_data[4], socks->sa_data[5], port);
}

This script copies arg2 bytes from the address pointed to by arg1 into kernel space and uses the data to determine the port (big endian order) and destination address.

When run, the script immediately revealed that the web application was trying to connect to itself rather than a database server, a simple configuration mistake made difficult to diagnose due to poor logging facilities in the application.

In the future, there will be a network provider for DTrace which will simplify the job of extracting data from network calls. It should then be possible to rewrite the script to simply:

#!/usr/sbin/dtrace -qs

tcp:::connect-request
{
        printf("%s: %s:%d\n", execname, args[2]->ip_daddr, args[4]->tcp_dport);
}

Hopefully I’ll have more opportunities to use DTrace in the future.