Replacing Apache with Yaws

After some time running with the Apache worker model, I noticed it was not much better on memory than prefork. It would spawn new threads as load increased, but didn’t give up the resources when load decreased. I know it does this for performance, but I am very tight on memory. And I know Apache is very tunable, so I could probably change that behavior, but tuning is boring. Playing with new stuff is fun! This was the perfect opportunity for me to look at lightweight web servers.

I toyed around with lighttpd and read the documentation for nginx. Both seem to do what I need them to do, but I was more interested in the equally fast and light Yaws. I’ve been really into Erlang lately (more on that in another post), so this was a great opportunity to see how a real Erlang application works.

Installation

Naturally, Yaws isn’t included in OpenSolaris yet, but Erlang is! So it was fairly easy to whip together a spec file and SMF manifest for the program. Then it was as easy as:

$ pkgtool build-only --download --interactive yaws.spec
$ pfexec pkg install yaws

I’ve set the configuration to go to /etc/yaws/yaws.conf and the service can be started with svcadm enable yaws. When I’m completely satisfied with the package, I’ll upload it to SourceJuicer.

Configuration

I run two virtual hosts: thestaticvoid.com and iriverter.thestaticvoid.com. Those two domains also have “.org” counterparts which I want to redirect to the “.com” domain. The yaws.conf syntax to handle this case is very simple:

pick_first_virthost_on_nomatch = true

<server localhost>
        port = 80
        listen = 0.0.0.0
        <redirect>
                / = thestaticvoid.com
        </redirect>
</server>

<server thestaticvoid.com>
        port = 80
        listen = 0.0.0.0
        docroot = /docs/thestaticvoid.com
        dir_listings = true
</server>

<server iriverter.thestaticvoid.com>
        port = 80
        listen = 0.0.0.0
        docroot = /docs/iriverter.thestaticvoid.com
        dir_listings = true
</server>

<server iriverter.thestaticvoid.org>
        port = 80
        listen = 0.0.0.0
        <redirect>
                / = iriverter.thestaticvoid.com
        </redirect>
</server>

Should be pretty self-explanatory, but the nice thing is the pick_first_virthost_on_nomatch directive combined with the localhost block so that if anyone gets to this site by any other address, they’ll be redirected to the canonical thestaticvoid.com. I did actually run into a bug with the redirection putting extra slashes in the URL, but squashed that bug pretty quickly with a bit of help from the Yaws mailing list. That whole problem is summarized in this thread.

PHP

Yaws handles PHP execution as a special case. All you have to do is add a couple lines to the configuration above:

php_exe_path = /usr/php/bin/php-cgi

<server thestaticvoid.com>
        ...
        allowed_scripts = php
        ...
</server>

Reload the server (svcadm refresh yaws) and PHP, well, won’t work just yet. This actually took me an hour or two to figure out. OpenSolaris’ PHP is compiled with the --enable-force-cgi-redirect option which means PHP will refuse to execute unless it was invoked by an Apache Action directive. Fortunately, you can disable this security measure by setting cgi.force_redirect = 0 in your /etc/php/5.2/php.ini.

Trac

Trac needs a little work to get running in Yaws. It’s requires an environmental variable, TRAC_ENV, set to tell it where to find the project database. The easiest way to do that is to copy /usr/share/trac/cgi-bin/trac.cgi to the document root, modify it to set the environmental variable, and enable CGI scripts in Yaws by setting allowed_scripts = cgi.

But I decided to set up an appmod, so that trac.cgi could be left where it was, unmodified. Appmods are Erlang modules which get run by Yaws whenever the configured URL is requested. Here’s the one I wrote for Trac:

-module(trac).

-export([out/1]).

-define(APPMOD, "/trac").
-define(TRAC_ENV, "/trac/iriverter").
-define(SCRIPT, "/usr/share/trac/cgi-bin/trac.cgi").

-include_lib("yaws/include/yaws_api.hrl").

out(Arg) ->
        Pathinfo = Arg#arg.pathinfo,
        Env = [{"SCRIPT_NAME", ?APPMOD}, {"TRAC_ENV", ?TRAC_ENV}],
        yaws_cgi:call_cgi(Arg, undefined, ?SCRIPT, Pathinfo, Env).

All appmods must define the out/1 function which takes an arg record which contains information about the current request. At the end of the function, the Yaws API is used to execute the CGI script with the extra environmental variables. This is compiled (erlc -o /var/yaws/ebin -I/usr/lib trac.erl) and enabled in the Yaws configuration by adding appmods = </trac, trac> to a server section. Then whenever someone requests /trac/foo/bar, Trac runs properly!

I also set up URL rewriting so that instead of requesting something like http://i.tsv.c/trac/wiki, all you see is http://i.tsv.c/wiki. This involves another Erlang module, a rewrite module. It looks like:

-module(rewrite_trac).

-export([arg_rewrite/1]).

-include_lib("yaws/include/yaws_api.hrl").

arg_rewrite(Arg) ->
        Req = Arg#arg.req,
        {abs_path, Path} = Req#http_request.path,
        try yaws_api:url_decode_q_split(Path) of
                {DecPath, _Query} ->
                        case DecPath == "/" orelse not filelib:is_file(Arg#arg.docroot ++ DecPath) of
                                true ->
                                        Arg#arg{req = Req#http_request{path = {abs_path, "/trac" ++ Path}}};
                                false ->
                                        Arg
                        end
        catch
                exit:_ ->
                        Arg
        end.

This module runs as soon as a request comes into the server and allows you to modify many variables before the request is handled. This is a simple one which says: if the request is “/” or the requested file doesn’t exist, append the request to the Trac appmod, otherwise pass it through unaltered. It’s enabled in the Yaws server by adding arg_rewrite_mod = rewrite_trac to yaws.conf. The Trac appmod must also be modified to make sure SCRIPT_NAME is now / so the application generates links without containing /trac.

WordPress

WordPress works perfectly once PHP is enabled in Yaws. WordPress permalinks, however, do not. A little background: WordPress normally relies on Apache’s mod_rewrite to execute index.php when it gets an request like /post/2009/07/27/suexec-on-opensolaris/. mod_rewrite sets up the environmental variables such that WordPress is able to detect how it was called and can process the page accordingly.

Without mod_rewrite, the best it can do is rely on requests like /index.php/post/2009/07/27/suexec-on-opensolaris/ and use the PATH_INFO variable, which is the text after index.php. I think that looks ugly, having index.php in every URL. You would think that simply rewriting the URL, just as was done with Trac, would solve the problem, but WordPress is too smart, and always sends you to a canonical URL which it thinks must include index.php.

After more experimentation than I care to explain, I discovered that if I set the REQUEST_URI variable to the original request (the one not including index.php), WordPress was happy. This was a tricky exercise in trying to set an environmental variable from the rewrite module. But, as we saw with the Trac example, environmental variables can be set from appmods. And I found that data can be passed from the rewrite module to the appmod through the arg record! Here’s my solution:

-module(rewrite_blog).

-export([arg_rewrite/1]).

-include_lib("yaws/include/yaws_api.hrl").

arg_rewrite(Arg) ->
        Req = Arg#arg.req,
        {abs_path, Path} = Req#http_request.path,
        try yaws_api:url_decode_q_split(Path) of
                {DecPath, _Query} ->
                        case DecPath == "/" orelse not filelib:is_file(Arg#arg.docroot ++ DecPath) of
                                true ->
                                        case string:str(Path, "/wsvn") == 1 of
                                                true ->
                                                        {ok, NewPath, _RepCount} = regexp:sub(Path, "^/wsvn(\.php)?", "/wsvn.php"),
                                                        Arg#arg{req = Req#http_request{path = {abs_path, NewPath}}};
                                                false ->
                                                        Arg#arg{opaque = [{"REQUEST_URI", Path}],
                                                                req = Req#http_request{path = {abs_path, "/blog" ++ Path}}}
                                        end;
                                false ->
                                        Arg
                        end
        catch
                exit:_ ->
                        Arg
        end.

Nevermind the extra WebSVN rewriting code. Notice I set the opaque component of the arg. Then in the appmod:

-module(blog).

-export([out/1]).

-define(SCRIPT, "/docs/thestaticvoid.com/wordpress/index.php").

-include_lib("yaws/include/yaws.hrl").
-include_lib("yaws/include/yaws_api.hrl").

out(Arg) ->
        Pathinfo = Arg#arg.pathinfo,
        Env = Arg#arg.opaque,
        {ok, GC, _Groups} = yaws_api:getconf(),
        yaws_cgi:call_cgi(Arg, GC#gconf.phpexe, ?SCRIPT, Pathinfo, Env).

I pull the data from the arg record and pass it to the call_cgi/5 function. Also of note here is the special way to invoke the PHP CGI. The location of the php-cgi executable is pulled from the Yaws configuration and passed as the second argument to call_cgi/5 so Yaws knows what to do with the files. You can surely imagine this as a way to execute things other than PHP which do not have a #! at the top. Or emulating suEXEC with a custom wrapper 🙂

Overall, this probably seems like a lot of work to get things working that are trivial in other servers, but I’m finding that the appmods are really powerful, and the Yaws code itself is very easy to understand and modify. Plus you get the legendary performance and fault-tolerance of Erlang.

suEXEC on OpenSolaris

One nice thing about having all dynamic content being generated by CGI is that you can use suEXEC to run the scripts as a different user. This is primarily used for systems where you have multiple untrusted users who run sites in one HTTP server. Then no one can interfere with anyone else. It can also be used simply for separating the application from the server.

I’m the only user on my server so I don’t necessarily have any of these security concerns, but I have enabled suEXEC for convenience. For example, WordPress will allow you to modify the stylesheets from the admin interface as long as it can write to them. With suEXEC, the admin interface can run as my Unix user, so I can edit the files from both the web interface and the command line without having wide-open permissions or switching to root.

Same applies for Trac where I can manage the project with the web interface or trac-admin on the command line. The same effect could pretty much be obtained by using Unix groups properly:

# groupadd wordpress
# usermod -G wordpress webservd
# usermod -G wordpress jlee  # my username
# chgrp -R wordpress /docs/thestaticvoid.com  # virtualhost document root
# chmod -R g+ws /docs/thestaticvoid.com  # make directory writable and always owned by
                                           the wordpress group

Then umask 002 would have to be set in Apache’s and my profile so any files that get created can be written to by the other users in the group. That’s all well and good, but it seems like a bit of work and I don’t like the idea of messing with the default umask.

On to suEXEC. First, let’s show the current user that PHP executes as. Create a file test.php containing <?php echo exec("id"); ?>. Accessing the script from your web browser should show something like uid=80(webservd) gid=80(webservd).

Next, in OpenSolaris, the suexec binary must be enabled:

# cd /usr/apache2/2.2/bin/  # go one directory further into the amd64 dir
                              if you're running 64-bit
# mv suexec.disabled suexec
# chown root:webservd suexec
# chmod 4750 suexec
# ./suexec -V
 -D AP_DOC_ROOT="/var/apache2/2.2/htdocs"
 -D AP_GID_MIN=100
 -D AP_HTTPD_USER="webservd"
 -D AP_LOG_EXEC="/var/apache2/2.2/logs/suexec_log"
 -D AP_SAFE_PATH="/usr/local/bin:/usr/bin:/bin"
 -D AP_UID_MIN=100
 -D AP_USERDIR_SUFFIX="public_html"

These variables were set at compile time and cannot be changed. They ensure that certain conditions must be met in order to use the binary. That’s very important because it’s setuid root. The first thing I had to do was move everything from my old document root to the one specified above in AP_DOC_ROOT. Then you can add SuexecUserGroup jlee jlee (with whatever username and group you want the scripts to run as) to your <VirtualHost> section of the Apache configuration. At this point if you try to execute test.php you’ll probably see one of a couple errors in the suEXEC log (/var/apache2/2.2/logs/suexec_log):

  • [2009-07-27 11:08:02]: uid: (1000/jlee) gid: (1000/jlee) cmd: php-cgi
    [2009-07-27 11:08:02]: command not in docroot (/usr/php/bin/php-cgi)

    In this case, php-cgi is going to have to be moved to the document root:

    $ cp /usr/php/bin/php-cgi /var/apache2/2.2/htdocs/
    $ pfexec vi /etc/apache2/2.2/conf.d/php-cgi.conf  # modify the ScriptAlias appropriately
    $ svcadm restart http
  • [2009-07-27 11:11:07]: uid: (1000/jlee) gid: (1000/jlee) cmd: php-cgi
    [2009-07-27 11:11:07]: target uid/gid (1000/1000) mismatch with directory (0/2) or program (0/0)

    Make sure everything that suexec is to execute is owned by the same user and group as specified in the SuexecUserGroup line of your Apache configuration.

Now, running test.php should give the correct results: uid=1000(jlee) gid=1000(jlee). Done!

As a side note, I lose all frame of reference while I write so I can’t remember if I’m writing this for you or me, explaining what I’ve done or what you should do. Sorry 🙂

Reducing Memory Footprint of Apache Services

An interesting thing happened when I set up this blog. It first manifested itself as a heap of junk mail in my inbox. Then no mail at all. I had run out of memory. WordPress requires me to run MySQL and that extra 12M pushed me over the 256M cap in my OpenSolaris 2009.06 zone. As a result SpamAssassin could not spawn, and ultimately Postfix died. So I sought out to try to reduce my memory footprint.

Let’s take a look at where things were when I got started:

$ prstat -s rss -Z 1 1 | cat
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
 13488 webservd  183M   92M sleep   59    0   0:00:28 0.0% trac.fcgi/1
 13479 webservd   59M   41M sleep   59    0   0:00:14 0.0% trac.fcgi/1
 13489 webservd   59M   41M sleep   59    0   0:00:14 0.0% trac.fcgi/1
  4463 mysql      64M   12M sleep   59    0   0:02:39 0.0% mysqld/10
 19296 root       13M 8444K sleep   59    0   0:00:25 0.0% svc.configd/16
 19619 named      11M 5824K sleep   59    0   0:03:51 0.0% named/7
 13473 root       64M 4352K sleep   59    0   0:00:00 0.0% httpd/1
 19358 root       12M 3688K sleep   59    0   0:00:54 0.0% nscd/31
 19294 root       12M 3180K sleep   59    0 244:37:22 0.0% svc.startd/13
 13476 webservd   64M 2940K sleep   59    0   0:00:00 0.0% httpd/1
 13486 webservd   64M 2924K sleep   59    0   0:00:00 0.0% httpd/1
 13745 root     6248K 2832K cpu1    59    0   0:00:00 0.0% prstat/1
 13721 root     5940K 2368K sleep   39    0   0:00:00 0.0% bash/1
 13485 webservd   64M 2252K sleep   59    0   0:00:00 0.0% httpd/1
 13482 webservd   64M 2168K sleep   59    0   0:00:00 0.0% httpd/1
ZONEID    NPROC  SWAP   RSS MEMORY      TIME  CPU ZONE                        
    39       60  494M  246M    96% 244:47:13 0.1% case                        
Total: 60 processes, 149 lwps, load averages: 0.61, 0.62, 0.52

First thing I noticed is the 174M that Trac was taking up. I was running it as a FastCGI service for speed. The problem with that is it remains resident even when it’s not processing any requests, which is most of the time. One option I tried was setting DefaultMaxClassProcessCount 1 in my /etc/apache2/2.2/conf.d/fcgid.conf file. This effectively limits Trac to only one process at a time, which greatly reduces the memory utilization, but means it can only service one request at a time. That’s not an option.

Fortunately, my zone seems to have good, fast processors and disks, so I can put up with running it as standard CGI service. Easy enough to make the switch, just move some things around in my Apache configuration:

ScriptAlias /trac /usr/share/trac/cgi-bin/trac.cgi
#ScriptAlias /trac /usr/share/trac/cgi-bin/trac.fcgi
#DefaultInitEnv TRAC_ENV "/trac/iriverter"

<Location "/trac">
    SetEnv TRAC_ENV "/trac/iriverter"
    Order allow,deny
    Allow from all
</Location>

So things are looking much better, but I’m still not happy with it:

$ prstat -s rss -Z 1 1 | cat
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
 15362 webservd   74M   31M sleep   59    0   0:00:00 0.0% httpd/1
 15388 webservd   69M   30M sleep   59    0   0:00:00 0.0% httpd/1
 15366 webservd   66M   22M sleep   59    0   0:00:00 0.0% httpd/1
...
ZONEID    NPROC  SWAP   RSS MEMORY      TIME  CPU ZONE                        
    39       58  254M  113M    44% 244:46:20 0.2% case 

Now Apache is being a hog, and that’s only a few of the httpd processes. By default on Unix, Apache uses the prefork MPM which serves each request from its own process. It likes to keep around a handful of children for performance, so it doesn’t have to swawn a new one each time. The problem is if your request involves PHP, each httpd process will load its own instance of the PHP module and it doesn’t let it go when it’s finished. I get this. It’s all for performance. My initial reaction was: wouldn’t be nice if Apache was threaded so requests can all share the same PHP code. That’s when I was introduced to the worker MPM. It serves requests from threads so it’s efficient, but also has a couple of children for fault tolerance. This is easy to switch to in OpenSolaris:

# svcadm disable http
# svccfg -s http:apache22 setprop httpd/server_type=worker
# svcadm refresh http
# svcadm enable http

I also copied /etc/apache2/2.2/samples-conf.d/mpm.conf into /etc/apache2/2.2/conf.d/ which includes some sane defaults like only spawning two servers to start with. This was good:

$ prstat -s rss -Z 1 1 | cat
...
ZONEID    NPROC  SWAP   RSS MEMORY      TIME  CPU ZONE                        
    39       50  125M   75M    29% 244:46:23 0.3% case

75M makes me feel safe, like I could take the occasional spam bomb. What I forgot to mention is that mod_php isn’t supported with the worker MPM since any of its extensions might not be thread-safe. This is okay, because PHP can be run as a CGI program which has the additional benefit of being memory efficient (at the cost of speed) since it’s only loaded when it’s executed. All I had to do was create a file /etc/apache2/2.2/conf.d/php-cgi.conf containing:

<IfModule worker.c>
    ScriptAlias /php-cgi /usr/php/bin/php-cgi

    <Location "/php-cgi">
        Order allow,deny
        Allow from all
    </Location>
   
    Action php-cgi /php-cgi
    AddHandler php-cgi .php
    DirectoryIndex index.php
</IfModule>

I’ll be the first to admit, running Trac and WordPress as CGI have made them noticeably slower, but I’d rather them run slower for as much action that they get and know that my mail will get to me. If you’re faced with similar resource constraints, you may want to consider these changes. There may be other ways I can tweak Apache, such as unloading unused modules, but I’m not ready to face that yet.