Fun With vpnc

I recently got a new laptop at work and I decided to put OpenSolaris on it. This meant I had to setup vpnc in order to access the server networks and wireless here. I installed my vpnc package, copied the profile from my Ubuntu workstation, and started it up. It connected, but no packets flowed. I didn’t have time to investigate, so I decided to work on it some more at home.

The strange thing is that it connected from home with the very same profile and everything worked fine. I immediately suspected something was wrong with the routing tables, like maybe some of the routes installed by vpnc-script were conflicting with the routes necessary to talk to the VPN concentrator. I endlessly compared the routing tables between work and home and my working Ubuntu workstation, removing routes, adding routes, and manually constructing the routing table until I was positive it could not be that.

Everything I pinged worked. I could ping the concentrator. I could ping the gateway. I could ping the tunnel device. I could ping the physical interface—or so I thought.

As I was preparing to write a message to the vpnc-devel mailing list requesting help, I did some pings to post the output in the email. I ran

$ ping <concentrator ip>
<concentrator ip> is alive

which looked good, but I wanted the full ping output, so I ran

$ ping -s <concentrator ip>
PING <concentrator ip>: 56 data bytes
^C
----<concentrator ip> PING Statistics----
4 packets transmitted, 1 packets received, 75% packet loss
round-trip (ms)  min/avg/max/stddev = 9223372036854776.000/0.000/0.000/-NaN

For some reason, only the first ping was getting through. The rest were getting hung up somewhere. The really strange thing was that I saw the same behavior on the local physical interface:

$ ifconfig bge0
bge0: flags=1004843 mtu 1500 index 3
        inet 161.253.143.151 netmask ffffff00 broadcast 161.253.143.255
$ ping -s 161.253.143.151
PING 161.253.143.151: 56 data bytes
^C
----161.253.143.151 PING Statistics----
5 packets transmitted, 1 packets received, 80% packet loss
round-trip (ms)  min/avg/max/stddev = 9223372036854776.000/0.000/0.000/-NaN

I have never seen a situation where you couldn’t even ping a local physical interface! I checked and double checked that IPFilter wasn’t running. Finally I started a packet capture of the physical interface to see what was happening to my pings:

# snoop -d bge0 icmp
Using device bge0 (promiscuous mode)
161.253.143.151 -> <concentrator ip> ICMP Destination unreachable (Bad protocol 50)
161.253.143.151 -> <concentrator ip> ICMP Destination unreachable (Bad protocol 50)
161.253.143.151 -> <concentrator ip> ICMP Destination unreachable (Bad protocol 50)
^C

That’s when by chance I saw messages being sent to the VPN concentrator saying “bad protocol 50.” IP protocol 50 represents “ESP”, commonly used for IPsec. Apparently Solaris eats these packets. Haven’t figured out why.

I remembered seeing something in the vpnc manpage about ESP packets:

--natt-mode <natt/none/force-natt/cisco-udp>

      Which NAT-Traversal Method to use:
      o    natt -- NAT-T as defined in RFC3947
      o    none -- disable use of any NAT-T method
      o    force-natt -- always use NAT-T encapsulation  even
           without presence of a NAT device (useful if the OS
           captures all ESP traffic)
      o    cisco-udp -- Cisco proprietary UDP  encapsulation,
           commonly over Port 10000

I enabled force-natt mode, which encapsulates the ESP packet in a UDP packet, normally to get past NAT, and it started working! In retrospect, I should have been able to figure that out much easier. First, it pretty much says it on the vpnc homepage: “Solaris (7 works, 9 only with –natt-mode forced).” I didn’t even notice that. Second, I should have realized that I was behind a NAT at home and not at work, so they would be using a different NAT-traversal mode by default. Oh well, it was a good diagnostic exercise, hence the post to share the experience.

In other vpnc related news, I’ve ported Kazuyoshi’s patch to the open_tun and solaris_close_tun functions of OpenVPN to the tun_open and tun_close functions of vpnc. His sets up the tunnel interface a little bit differently and adds TAP support. It solves the random problems vpnc had with bringing up the tunnel interface such as:

# ifconfig tun0
tun0: flags=10010008d0<POINTOPOINT,RUNNING,NOARP,MULTICAST,IPv4,FIXEDMTU> mtu 1412 index 8
        inet 128.164.xxx.yy --> 128.164.xxx.yy netmask ffffffff 
        ether f:ea:1:ff:ff:ff
# ifconfig tun0 up
ifconfig: setifflags: SIOCSLIFFLAGS: tun0: no such interface
# dmesg | grep tun0
Jul 23 14:56:05 swan ip: [ID 728316 kern.error] tun0: DL_BIND_REQ failed: DL_OUTSTATE

The changes are in the latest vpnc package available from my package repository.

DTrace to the Rescue Again!

I am really beginning to understand just how powerful and important this tool is. I ran into another problem at work where I was seeing errors, but the details weren’t very helpful. I recently made a NFS mounted filesystem read-only, and some errors started to pop up in the system log:

# dmesg
...
Mar 30 12:09:00 ragno nfs: [ID 808668 kern.notice] NFS write error on host nasfb: Read-only file system.
Mar 30 12:09:00 ragno nfs: [ID 702911 kern.notice] (file handle: 1d010000 1000000 e10d1300 a5122c4b 1d010000 1000000 15000000 5c179c4a)
...

I wanted to know which file is being written to. The file handle isn’t very useful for me. I tried using lsof like:

# lsof -N -z ragno /zones/ragno/root/home
lsof: WARNING: can't stat() 5 zone file systems; using dev= options
COMMAND     PID ZONE      USER   FD   TYPE        DEVICE             SIZE/OFF       NODE NAME
httpd      3065 ragno webservd  cwd   VDIR     256,65718                    0          0 /zones/ragno (ragno)
httpd      3065 ragno webservd  rtd   VDIR     256,65718                    0          0 /zones/ragno (ragno)
httpd      3065 ragno webservd  txt   VREG     256,65718                    0          0 /zones/ragno (ragno)
httpd      3065 ragno webservd  txt   VREG     256,65718                    0          0 /zones/ragno (ragno)
...

but it didn’t give me any useful information either. Maybe I’m not using it correctly?

Then I found the function in the kernel which is responsible outputting the NFS write errors that appear in the system log:

...
void
nfs_write_error(vnode_t *vp, int error, cred_t *cr)
{
...

This function gets passed a vnode_t pointer which has tons of data on the current I/O operation, including the file name. DTrace enabled me to access this data dynamically:

# dtrace -n 'fbt:nfs:nfs*_write_error:entry /zonename == "ragno"/ {vp = (vnode_t*) arg0; printf("%s", stringof(vp->v_path));}'
dtrace: description 'fbt:nfs:nfs*_write_error:entry ' matched 2 probes
CPU     ID                    FUNCTION:NAME
  0  56891            nfs_write_error:entry /zones/ragno/root/home/rmserver/Logs/rmaccess.log
  0  56891            nfs_write_error:entry /zones/ragno/root/home/rmserver/Logs/rmaccess.log
  0  56891            nfs_write_error:entry /zones/ragno/root/home/rmserver/Logs/rmaccess.log
  0  56891            nfs_write_error:entry /zones/ragno/root/home/rmserver/Logs/rmaccess.log
  0  56891            nfs_write_error:entry /zones/ragno/root/home/rmserver/Logs/rmaccess.log

And there’s the culprit. Now I know which file needs to be relocated to local read/write storage. Neato!

VPNC for OpenSolaris

I’ve compiled VPNC and the requisite TUN/TAP driver for OpenSolaris so that I can access my work network from home. Kazuyoshi’s driver adds TAP functionality to the original TUN driver which hasn’t been updated in nine years. It’s a real testament to the stability of the OpenSolaris kernel ABI that the module still compiles, loads, and works properly.

All of the software can be installed from my repository onto build 111 or higher:

$ pfexec pkg set-publisher -O http://pkg.thestaticvoid.com/ thestaticvoid
$ pfexec pkg install vpnc

The tun driver should load automatically and create /dev/tun. Now create a VPN profile configuration in /etc/vpnc/. The configuration contains a lot of private information so I’m not going to share mine here, but /etc/vpnc/default.conf is a good start.

One thing I do like to do is make sure only certain subnets are tunneled through the VPN. That way connecting to the VPN doesn’t interrupt any connections that are already established (for example, AIM). To do that I create a script /etc/vpnc/gwu-networks-script containing

#!/bin/sh

# Only tunnel GWU networks through VPN
CISCO_SPLIT_INC=2
CISCO_SPLIT_INC_0_ADDR=161.253.0.0
CISCO_SPLIT_INC_0_MASK=255.255.0.0
CISCO_SPLIT_INC_0_MASKLEN=16
CISCO_SPLIT_INC_0_PROTOCOL=0
CISCO_SPLIT_INC_0_SPORT=0
CISCO_SPLIT_INC_0_DPORT=0
CISCO_SPLIT_INC_1_ADDR=128.164.0.0
CISCO_SPLIT_INC_1_MASK=255.255.0.0
CISCO_SPLIT_INC_1_MASKLEN=16
CISCO_SPLIT_INC_1_PROTOCOL=0
CISCO_SPLIT_INC_1_SPORT=0
CISCO_SPLIT_INC_1_DPORT=0

. /etc/vpnc/vpnc-script

then add Script /etc/vpnc/gwu-networks-script to the end of my VPN profile configuration.

Connecting to the VPN you should see messages like:

$ pfexec vpnc gwu
Enter password for jameslee@<no>: 
which: no ip in (/sbin:/usr/sbin:/usr/gnu/bin:/usr/bin:/usr/sbin:/sbin)
which: no ip in (/sbin:/usr/sbin:/usr/gnu/bin:/usr/bin:/usr/sbin:/sbin)
add net 128.164.<no>: gateway 128.164.<no>
add host 128.164.<no>: gateway 161.253.<no>
add net 161.253.0.0: gateway 128.164.<no>
add net 128.164.0.0: gateway 128.164.<no>
add net 128.164.<no>: gateway 128.164.<no>
add net 128.164.<no>: gateway 128.164.<no>
VPNC started in background (pid: 594)...

The vpnc-script will modify your /etc/resolv.conf and routing tables so be sure to run vpnc-disconnect when you are done with the connection to restore the original configuration.

Thanks to the good folks at OpenConnect for a well-maintained vpnc-script which works on Solaris. Spec files for these packages are available from my GitHub repository if you want to roll your own.

Start Virtual NICs on OpenSolaris Boot

One of the more frustrating things I deal with on OpenSolaris is that every time I reboot, I have to manually bring up each virtual network interface in order to start all of my zones. There is a bug report for this problem that says a fix will be integrated into b132, which is just a few weeks away, but in the mean time, I’ve whipped up an SMF service to handle this for me. Create a file vnic.xml:

<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">

<service_bundle type='manifest' name='vnic'>

<service
    name='network/vnic'
    type='service'
    version='1'>

    <dependency
        name='network_service'
        grouping='require_all'
        restart_on='none'
        type='service'>
        <service_fmri value='svc:/network/service' />
    </dependency>

    <dependent
        name='network_vnic'
        grouping='optional_all'
        restart_on='none'>
        <service_fmri value='svc:/system/zones' />
    </dependent>

    <exec_method
        type='method'
        name='start'
        exec='/usr/sbin/dladm up-vnic ${SMF_FMRI/*:/}'
        timeout_seconds='60' />

    <exec_method
        type='method'
        name='stop'
        exec=':true'
        timeout_seconds='60' />

    <property_group name='startd' type='framework'>
        <propval name='duration' type='astring' value='transient' />
    </property_group>

    <stability value='Unstable' />

    <template>
        <common_name>
            <loctext xml:lang='C'>
            Virtual Network Interface
            </loctext>
        </common_name>
        <documentation>
            <manpage title='dladm' section='1M'
                manpath='/usr/share/man' />
        </documentation>
    </template>
</service>

</service_bundle>

This service should run sometime after the network is started but before the zones are started. Load it in with svccfg -v import vnic.xml and create an instance of the service for each of the VNICs that you want to start. For example, if you want to start vnic0 on boot:

# svccfg -s vnic add vnic0
# svcadm refresh vnic0
# svcadm enable vnic0

UPDATE: Build 132 is out an this functionality has been integrated as the svc:/network/datalink-management:default service. The services that were added above can be removed by running svccfg delete vnic.

Watching Network Connections with DTrace

DTrace is one of those magical tools that I sort of know how to use but never had the chance to. Usually, firing up truss is sufficient to figure out what’s happening in a process.

But recently, a team at GW was setting up a web application and having some problems. It would hang whenever it was accessed. The application is highly modular and accesses several other servers for resources, so I suspected there must have been some sort of network problem, probably a blocked firewall port somewhere. Unfortunately, the application produces no logs (!) so it was pretty much guesswork.

I started snoop to watch for traffic generated by the application. I knew from watching the production application that there should be communication between two servers over port 9300, but I wasn’t seeing anything. I thought of using truss to determine if the application was at least trying to connect, but being a web application, the process was way too short-lived to attach to. Even if I was able to attach to it, all I might have learned was that the application was calling the ‘connect’ system call and possibly getting a return value.

DTrace, on the other hand, doesn’t have to attach to a particular process. It just sits in the kernel and dynamically observes what your scripts tell it to. It also has the ability to look into data structures so you can actually see what values are passed to functions and system calls. I used this ability to watch every call to the connect system call.

#!/usr/sbin/dtrace -qs

syscall::connect:entry
{
        socks = (struct sockaddr*) copyin(arg1, arg2);
        hport = (uint_t) socks->sa_data[0];
        lport = (uint_t) socks->sa_data[1];
        hport <<= 8;
        port = hport + lport;

        printf("%s: %d.%d.%d.%d:%d\n", execname, socks->sa_data[2], socks->sa_data[3], socks->sa_data[4], socks->sa_data[5], port);
}

This script copies arg2 bytes from the address pointed to by arg1 into kernel space and uses the data to determine the port (big endian order) and destination address.

When run, the script immediately revealed that the web application was trying to connect to itself rather than a database server, a simple configuration mistake made difficult to diagnose due to poor logging facilities in the application.

In the future, there will be a network provider for DTrace which will simplify the job of extracting data from network calls. It should then be possible to rewrite the script to simply:

#!/usr/sbin/dtrace -qs

tcp:::connect-request
{
        printf("%s: %s:%d\n", execname, args[2]->ip_daddr, args[4]->tcp_dport);
}

Hopefully I’ll have more opportunities to use DTrace in the future.

Replacing Apache with Yaws

After some time running with the Apache worker model, I noticed it was not much better on memory than prefork. It would spawn new threads as load increased, but didn’t give up the resources when load decreased. I know it does this for performance, but I am very tight on memory. And I know Apache is very tunable, so I could probably change that behavior, but tuning is boring. Playing with new stuff is fun! This was the perfect opportunity for me to look at lightweight web servers.

I toyed around with lighttpd and read the documentation for nginx. Both seem to do what I need them to do, but I was more interested in the equally fast and light Yaws. I’ve been really into Erlang lately (more on that in another post), so this was a great opportunity to see how a real Erlang application works.

Installation

Naturally, Yaws isn’t included in OpenSolaris yet, but Erlang is! So it was fairly easy to whip together a spec file and SMF manifest for the program. Then it was as easy as:

$ pkgtool build-only --download --interactive yaws.spec
$ pfexec pkg install yaws

I’ve set the configuration to go to /etc/yaws/yaws.conf and the service can be started with svcadm enable yaws. When I’m completely satisfied with the package, I’ll upload it to SourceJuicer.

Configuration

I run two virtual hosts: thestaticvoid.com and iriverter.thestaticvoid.com. Those two domains also have “.org” counterparts which I want to redirect to the “.com” domain. The yaws.conf syntax to handle this case is very simple:

pick_first_virthost_on_nomatch = true

<server localhost>
        port = 80
        listen = 0.0.0.0
        <redirect>
                / = thestaticvoid.com
        </redirect>
</server>

<server thestaticvoid.com>
        port = 80
        listen = 0.0.0.0
        docroot = /docs/thestaticvoid.com
        dir_listings = true
</server>

<server iriverter.thestaticvoid.com>
        port = 80
        listen = 0.0.0.0
        docroot = /docs/iriverter.thestaticvoid.com
        dir_listings = true
</server>

<server iriverter.thestaticvoid.org>
        port = 80
        listen = 0.0.0.0
        <redirect>
                / = iriverter.thestaticvoid.com
        </redirect>
</server>

Should be pretty self-explanatory, but the nice thing is the pick_first_virthost_on_nomatch directive combined with the localhost block so that if anyone gets to this site by any other address, they’ll be redirected to the canonical thestaticvoid.com. I did actually run into a bug with the redirection putting extra slashes in the URL, but squashed that bug pretty quickly with a bit of help from the Yaws mailing list. That whole problem is summarized in this thread.

PHP

Yaws handles PHP execution as a special case. All you have to do is add a couple lines to the configuration above:

php_exe_path = /usr/php/bin/php-cgi

<server thestaticvoid.com>
        ...
        allowed_scripts = php
        ...
</server>

Reload the server (svcadm refresh yaws) and PHP, well, won’t work just yet. This actually took me an hour or two to figure out. OpenSolaris’ PHP is compiled with the --enable-force-cgi-redirect option which means PHP will refuse to execute unless it was invoked by an Apache Action directive. Fortunately, you can disable this security measure by setting cgi.force_redirect = 0 in your /etc/php/5.2/php.ini.

Trac

Trac needs a little work to get running in Yaws. It’s requires an environmental variable, TRAC_ENV, set to tell it where to find the project database. The easiest way to do that is to copy /usr/share/trac/cgi-bin/trac.cgi to the document root, modify it to set the environmental variable, and enable CGI scripts in Yaws by setting allowed_scripts = cgi.

But I decided to set up an appmod, so that trac.cgi could be left where it was, unmodified. Appmods are Erlang modules which get run by Yaws whenever the configured URL is requested. Here’s the one I wrote for Trac:

-module(trac).

-export([out/1]).

-define(APPMOD, "/trac").
-define(TRAC_ENV, "/trac/iriverter").
-define(SCRIPT, "/usr/share/trac/cgi-bin/trac.cgi").

-include_lib("yaws/include/yaws_api.hrl").

out(Arg) ->
        Pathinfo = Arg#arg.pathinfo,
        Env = [{"SCRIPT_NAME", ?APPMOD}, {"TRAC_ENV", ?TRAC_ENV}],
        yaws_cgi:call_cgi(Arg, undefined, ?SCRIPT, Pathinfo, Env).

All appmods must define the out/1 function which takes an arg record which contains information about the current request. At the end of the function, the Yaws API is used to execute the CGI script with the extra environmental variables. This is compiled (erlc -o /var/yaws/ebin -I/usr/lib trac.erl) and enabled in the Yaws configuration by adding appmods = </trac, trac> to a server section. Then whenever someone requests /trac/foo/bar, Trac runs properly!

I also set up URL rewriting so that instead of requesting something like http://i.tsv.c/trac/wiki, all you see is http://i.tsv.c/wiki. This involves another Erlang module, a rewrite module. It looks like:

-module(rewrite_trac).

-export([arg_rewrite/1]).

-include_lib("yaws/include/yaws_api.hrl").

arg_rewrite(Arg) ->
        Req = Arg#arg.req,
        {abs_path, Path} = Req#http_request.path,
        try yaws_api:url_decode_q_split(Path) of
                {DecPath, _Query} ->
                        case DecPath == "/" orelse not filelib:is_file(Arg#arg.docroot ++ DecPath) of
                                true ->
                                        Arg#arg{req = Req#http_request{path = {abs_path, "/trac" ++ Path}}};
                                false ->
                                        Arg
                        end
        catch
                exit:_ ->
                        Arg
        end.

This module runs as soon as a request comes into the server and allows you to modify many variables before the request is handled. This is a simple one which says: if the request is “/” or the requested file doesn’t exist, append the request to the Trac appmod, otherwise pass it through unaltered. It’s enabled in the Yaws server by adding arg_rewrite_mod = rewrite_trac to yaws.conf. The Trac appmod must also be modified to make sure SCRIPT_NAME is now / so the application generates links without containing /trac.

WordPress

WordPress works perfectly once PHP is enabled in Yaws. WordPress permalinks, however, do not. A little background: WordPress normally relies on Apache’s mod_rewrite to execute index.php when it gets an request like /post/2009/07/27/suexec-on-opensolaris/. mod_rewrite sets up the environmental variables such that WordPress is able to detect how it was called and can process the page accordingly.

Without mod_rewrite, the best it can do is rely on requests like /index.php/post/2009/07/27/suexec-on-opensolaris/ and use the PATH_INFO variable, which is the text after index.php. I think that looks ugly, having index.php in every URL. You would think that simply rewriting the URL, just as was done with Trac, would solve the problem, but WordPress is too smart, and always sends you to a canonical URL which it thinks must include index.php.

After more experimentation than I care to explain, I discovered that if I set the REQUEST_URI variable to the original request (the one not including index.php), WordPress was happy. This was a tricky exercise in trying to set an environmental variable from the rewrite module. But, as we saw with the Trac example, environmental variables can be set from appmods. And I found that data can be passed from the rewrite module to the appmod through the arg record! Here’s my solution:

-module(rewrite_blog).

-export([arg_rewrite/1]).

-include_lib("yaws/include/yaws_api.hrl").

arg_rewrite(Arg) ->
        Req = Arg#arg.req,
        {abs_path, Path} = Req#http_request.path,
        try yaws_api:url_decode_q_split(Path) of
                {DecPath, _Query} ->
                        case DecPath == "/" orelse not filelib:is_file(Arg#arg.docroot ++ DecPath) of
                                true ->
                                        case string:str(Path, "/wsvn") == 1 of
                                                true ->
                                                        {ok, NewPath, _RepCount} = regexp:sub(Path, "^/wsvn(\.php)?", "/wsvn.php"),
                                                        Arg#arg{req = Req#http_request{path = {abs_path, NewPath}}};
                                                false ->
                                                        Arg#arg{opaque = [{"REQUEST_URI", Path}],
                                                                req = Req#http_request{path = {abs_path, "/blog" ++ Path}}}
                                        end;
                                false ->
                                        Arg
                        end
        catch
                exit:_ ->
                        Arg
        end.

Nevermind the extra WebSVN rewriting code. Notice I set the opaque component of the arg. Then in the appmod:

-module(blog).

-export([out/1]).

-define(SCRIPT, "/docs/thestaticvoid.com/wordpress/index.php").

-include_lib("yaws/include/yaws.hrl").
-include_lib("yaws/include/yaws_api.hrl").

out(Arg) ->
        Pathinfo = Arg#arg.pathinfo,
        Env = Arg#arg.opaque,
        {ok, GC, _Groups} = yaws_api:getconf(),
        yaws_cgi:call_cgi(Arg, GC#gconf.phpexe, ?SCRIPT, Pathinfo, Env).

I pull the data from the arg record and pass it to the call_cgi/5 function. Also of note here is the special way to invoke the PHP CGI. The location of the php-cgi executable is pulled from the Yaws configuration and passed as the second argument to call_cgi/5 so Yaws knows what to do with the files. You can surely imagine this as a way to execute things other than PHP which do not have a #! at the top. Or emulating suEXEC with a custom wrapper 🙂

Overall, this probably seems like a lot of work to get things working that are trivial in other servers, but I’m finding that the appmods are really powerful, and the Yaws code itself is very easy to understand and modify. Plus you get the legendary performance and fault-tolerance of Erlang.

suEXEC on OpenSolaris

One nice thing about having all dynamic content being generated by CGI is that you can use suEXEC to run the scripts as a different user. This is primarily used for systems where you have multiple untrusted users who run sites in one HTTP server. Then no one can interfere with anyone else. It can also be used simply for separating the application from the server.

I’m the only user on my server so I don’t necessarily have any of these security concerns, but I have enabled suEXEC for convenience. For example, WordPress will allow you to modify the stylesheets from the admin interface as long as it can write to them. With suEXEC, the admin interface can run as my Unix user, so I can edit the files from both the web interface and the command line without having wide-open permissions or switching to root.

Same applies for Trac where I can manage the project with the web interface or trac-admin on the command line. The same effect could pretty much be obtained by using Unix groups properly:

# groupadd wordpress
# usermod -G wordpress webservd
# usermod -G wordpress jlee  # my username
# chgrp -R wordpress /docs/thestaticvoid.com  # virtualhost document root
# chmod -R g+ws /docs/thestaticvoid.com  # make directory writable and always owned by
                                           the wordpress group

Then umask 002 would have to be set in Apache’s and my profile so any files that get created can be written to by the other users in the group. That’s all well and good, but it seems like a bit of work and I don’t like the idea of messing with the default umask.

On to suEXEC. First, let’s show the current user that PHP executes as. Create a file test.php containing <?php echo exec("id"); ?>. Accessing the script from your web browser should show something like uid=80(webservd) gid=80(webservd).

Next, in OpenSolaris, the suexec binary must be enabled:

# cd /usr/apache2/2.2/bin/  # go one directory further into the amd64 dir
                              if you're running 64-bit
# mv suexec.disabled suexec
# chown root:webservd suexec
# chmod 4750 suexec
# ./suexec -V
 -D AP_DOC_ROOT="/var/apache2/2.2/htdocs"
 -D AP_GID_MIN=100
 -D AP_HTTPD_USER="webservd"
 -D AP_LOG_EXEC="/var/apache2/2.2/logs/suexec_log"
 -D AP_SAFE_PATH="/usr/local/bin:/usr/bin:/bin"
 -D AP_UID_MIN=100
 -D AP_USERDIR_SUFFIX="public_html"

These variables were set at compile time and cannot be changed. They ensure that certain conditions must be met in order to use the binary. That’s very important because it’s setuid root. The first thing I had to do was move everything from my old document root to the one specified above in AP_DOC_ROOT. Then you can add SuexecUserGroup jlee jlee (with whatever username and group you want the scripts to run as) to your <VirtualHost> section of the Apache configuration. At this point if you try to execute test.php you’ll probably see one of a couple errors in the suEXEC log (/var/apache2/2.2/logs/suexec_log):

  • [2009-07-27 11:08:02]: uid: (1000/jlee) gid: (1000/jlee) cmd: php-cgi
    [2009-07-27 11:08:02]: command not in docroot (/usr/php/bin/php-cgi)

    In this case, php-cgi is going to have to be moved to the document root:

    $ cp /usr/php/bin/php-cgi /var/apache2/2.2/htdocs/
    $ pfexec vi /etc/apache2/2.2/conf.d/php-cgi.conf  # modify the ScriptAlias appropriately
    $ svcadm restart http
  • [2009-07-27 11:11:07]: uid: (1000/jlee) gid: (1000/jlee) cmd: php-cgi
    [2009-07-27 11:11:07]: target uid/gid (1000/1000) mismatch with directory (0/2) or program (0/0)

    Make sure everything that suexec is to execute is owned by the same user and group as specified in the SuexecUserGroup line of your Apache configuration.

Now, running test.php should give the correct results: uid=1000(jlee) gid=1000(jlee). Done!

As a side note, I lose all frame of reference while I write so I can’t remember if I’m writing this for you or me, explaining what I’ve done or what you should do. Sorry 🙂

Reducing Memory Footprint of Apache Services

An interesting thing happened when I set up this blog. It first manifested itself as a heap of junk mail in my inbox. Then no mail at all. I had run out of memory. WordPress requires me to run MySQL and that extra 12M pushed me over the 256M cap in my OpenSolaris 2009.06 zone. As a result SpamAssassin could not spawn, and ultimately Postfix died. So I sought out to try to reduce my memory footprint.

Let’s take a look at where things were when I got started:

$ prstat -s rss -Z 1 1 | cat
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
 13488 webservd  183M   92M sleep   59    0   0:00:28 0.0% trac.fcgi/1
 13479 webservd   59M   41M sleep   59    0   0:00:14 0.0% trac.fcgi/1
 13489 webservd   59M   41M sleep   59    0   0:00:14 0.0% trac.fcgi/1
  4463 mysql      64M   12M sleep   59    0   0:02:39 0.0% mysqld/10
 19296 root       13M 8444K sleep   59    0   0:00:25 0.0% svc.configd/16
 19619 named      11M 5824K sleep   59    0   0:03:51 0.0% named/7
 13473 root       64M 4352K sleep   59    0   0:00:00 0.0% httpd/1
 19358 root       12M 3688K sleep   59    0   0:00:54 0.0% nscd/31
 19294 root       12M 3180K sleep   59    0 244:37:22 0.0% svc.startd/13
 13476 webservd   64M 2940K sleep   59    0   0:00:00 0.0% httpd/1
 13486 webservd   64M 2924K sleep   59    0   0:00:00 0.0% httpd/1
 13745 root     6248K 2832K cpu1    59    0   0:00:00 0.0% prstat/1
 13721 root     5940K 2368K sleep   39    0   0:00:00 0.0% bash/1
 13485 webservd   64M 2252K sleep   59    0   0:00:00 0.0% httpd/1
 13482 webservd   64M 2168K sleep   59    0   0:00:00 0.0% httpd/1
ZONEID    NPROC  SWAP   RSS MEMORY      TIME  CPU ZONE                        
    39       60  494M  246M    96% 244:47:13 0.1% case                        
Total: 60 processes, 149 lwps, load averages: 0.61, 0.62, 0.52

First thing I noticed is the 174M that Trac was taking up. I was running it as a FastCGI service for speed. The problem with that is it remains resident even when it’s not processing any requests, which is most of the time. One option I tried was setting DefaultMaxClassProcessCount 1 in my /etc/apache2/2.2/conf.d/fcgid.conf file. This effectively limits Trac to only one process at a time, which greatly reduces the memory utilization, but means it can only service one request at a time. That’s not an option.

Fortunately, my zone seems to have good, fast processors and disks, so I can put up with running it as standard CGI service. Easy enough to make the switch, just move some things around in my Apache configuration:

ScriptAlias /trac /usr/share/trac/cgi-bin/trac.cgi
#ScriptAlias /trac /usr/share/trac/cgi-bin/trac.fcgi
#DefaultInitEnv TRAC_ENV "/trac/iriverter"

<Location "/trac">
    SetEnv TRAC_ENV "/trac/iriverter"
    Order allow,deny
    Allow from all
</Location>

So things are looking much better, but I’m still not happy with it:

$ prstat -s rss -Z 1 1 | cat
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
 15362 webservd   74M   31M sleep   59    0   0:00:00 0.0% httpd/1
 15388 webservd   69M   30M sleep   59    0   0:00:00 0.0% httpd/1
 15366 webservd   66M   22M sleep   59    0   0:00:00 0.0% httpd/1
...
ZONEID    NPROC  SWAP   RSS MEMORY      TIME  CPU ZONE                        
    39       58  254M  113M    44% 244:46:20 0.2% case 

Now Apache is being a hog, and that’s only a few of the httpd processes. By default on Unix, Apache uses the prefork MPM which serves each request from its own process. It likes to keep around a handful of children for performance, so it doesn’t have to swawn a new one each time. The problem is if your request involves PHP, each httpd process will load its own instance of the PHP module and it doesn’t let it go when it’s finished. I get this. It’s all for performance. My initial reaction was: wouldn’t be nice if Apache was threaded so requests can all share the same PHP code. That’s when I was introduced to the worker MPM. It serves requests from threads so it’s efficient, but also has a couple of children for fault tolerance. This is easy to switch to in OpenSolaris:

# svcadm disable http
# svccfg -s http:apache22 setprop httpd/server_type=worker
# svcadm refresh http
# svcadm enable http

I also copied /etc/apache2/2.2/samples-conf.d/mpm.conf into /etc/apache2/2.2/conf.d/ which includes some sane defaults like only spawning two servers to start with. This was good:

$ prstat -s rss -Z 1 1 | cat
...
ZONEID    NPROC  SWAP   RSS MEMORY      TIME  CPU ZONE                        
    39       50  125M   75M    29% 244:46:23 0.3% case

75M makes me feel safe, like I could take the occasional spam bomb. What I forgot to mention is that mod_php isn’t supported with the worker MPM since any of its extensions might not be thread-safe. This is okay, because PHP can be run as a CGI program which has the additional benefit of being memory efficient (at the cost of speed) since it’s only loaded when it’s executed. All I had to do was create a file /etc/apache2/2.2/conf.d/php-cgi.conf containing:

<IfModule worker.c>
    ScriptAlias /php-cgi /usr/php/bin/php-cgi

    <Location "/php-cgi">
        Order allow,deny
        Allow from all
    </Location>
   
    Action php-cgi /php-cgi
    AddHandler php-cgi .php
    DirectoryIndex index.php
</IfModule>

I’ll be the first to admit, running Trac and WordPress as CGI have made them noticeably slower, but I’d rather them run slower for as much action that they get and know that my mail will get to me. If you’re faced with similar resource constraints, you may want to consider these changes. There may be other ways I can tweak Apache, such as unloading unused modules, but I’m not ready to face that yet.