Replacing Apache with Yaws

After some time running with the Apache worker model, I noticed it was not much better on memory than prefork. It would spawn new threads as load increased, but didn’t give up the resources when load decreased. I know it does this for performance, but I am very tight on memory. And I know Apache is very tunable, so I could probably change that behavior, but tuning is boring. Playing with new stuff is fun! This was the perfect opportunity for me to look at lightweight web servers.

I toyed around with lighttpd and read the documentation for nginx. Both seem to do what I need them to do, but I was more interested in the equally fast and light Yaws. I’ve been really into Erlang lately (more on that in another post), so this was a great opportunity to see how a real Erlang application works.

Installation

Naturally, Yaws isn’t included in OpenSolaris yet, but Erlang is! So it was fairly easy to whip together a spec file and SMF manifest for the program. Then it was as easy as:

$ pkgtool build-only --download --interactive yaws.spec
$ pfexec pkg install yaws

I’ve set the configuration to go to /etc/yaws/yaws.conf and the service can be started with svcadm enable yaws. When I’m completely satisfied with the package, I’ll upload it to SourceJuicer.

Configuration

I run two virtual hosts: thestaticvoid.com and iriverter.thestaticvoid.com. Those two domains also have “.org” counterparts which I want to redirect to the “.com” domain. The yaws.conf syntax to handle this case is very simple:

pick_first_virthost_on_nomatch = true

<server localhost>
        port = 80
        listen = 0.0.0.0
        <redirect>
                / = thestaticvoid.com
        </redirect>
</server>

<server thestaticvoid.com>
        port = 80
        listen = 0.0.0.0
        docroot = /docs/thestaticvoid.com
        dir_listings = true
</server>

<server iriverter.thestaticvoid.com>
        port = 80
        listen = 0.0.0.0
        docroot = /docs/iriverter.thestaticvoid.com
        dir_listings = true
</server>

<server iriverter.thestaticvoid.org>
        port = 80
        listen = 0.0.0.0
        <redirect>
                / = iriverter.thestaticvoid.com
        </redirect>
</server>

Should be pretty self-explanatory, but the nice thing is the pick_first_virthost_on_nomatch directive combined with the localhost block so that if anyone gets to this site by any other address, they’ll be redirected to the canonical thestaticvoid.com. I did actually run into a bug with the redirection putting extra slashes in the URL, but squashed that bug pretty quickly with a bit of help from the Yaws mailing list. That whole problem is summarized in this thread.

PHP

Yaws handles PHP execution as a special case. All you have to do is add a couple lines to the configuration above:

php_exe_path = /usr/php/bin/php-cgi

<server thestaticvoid.com>
        ...
        allowed_scripts = php
        ...
</server>

Reload the server (svcadm refresh yaws) and PHP, well, won’t work just yet. This actually took me an hour or two to figure out. OpenSolaris’ PHP is compiled with the --enable-force-cgi-redirect option which means PHP will refuse to execute unless it was invoked by an Apache Action directive. Fortunately, you can disable this security measure by setting cgi.force_redirect = 0 in your /etc/php/5.2/php.ini.

Trac

Trac needs a little work to get running in Yaws. It’s requires an environmental variable, TRAC_ENV, set to tell it where to find the project database. The easiest way to do that is to copy /usr/share/trac/cgi-bin/trac.cgi to the document root, modify it to set the environmental variable, and enable CGI scripts in Yaws by setting allowed_scripts = cgi.

But I decided to set up an appmod, so that trac.cgi could be left where it was, unmodified. Appmods are Erlang modules which get run by Yaws whenever the configured URL is requested. Here’s the one I wrote for Trac:

-module(trac).

-export([out/1]).

-define(APPMOD, "/trac").
-define(TRAC_ENV, "/trac/iriverter").
-define(SCRIPT, "/usr/share/trac/cgi-bin/trac.cgi").

-include_lib("yaws/include/yaws_api.hrl").

out(Arg) ->
        Pathinfo = Arg#arg.pathinfo,
        Env = [{"SCRIPT_NAME", ?APPMOD}, {"TRAC_ENV", ?TRAC_ENV}],
        yaws_cgi:call_cgi(Arg, undefined, ?SCRIPT, Pathinfo, Env).

All appmods must define the out/1 function which takes an arg record which contains information about the current request. At the end of the function, the Yaws API is used to execute the CGI script with the extra environmental variables. This is compiled (erlc -o /var/yaws/ebin -I/usr/lib trac.erl) and enabled in the Yaws configuration by adding appmods = </trac, trac> to a server section. Then whenever someone requests /trac/foo/bar, Trac runs properly!

I also set up URL rewriting so that instead of requesting something like http://i.tsv.c/trac/wiki, all you see is http://i.tsv.c/wiki. This involves another Erlang module, a rewrite module. It looks like:

-module(rewrite_trac).

-export([arg_rewrite/1]).

-include_lib("yaws/include/yaws_api.hrl").

arg_rewrite(Arg) ->
        Req = Arg#arg.req,
        {abs_path, Path} = Req#http_request.path,
        try yaws_api:url_decode_q_split(Path) of
                {DecPath, _Query} ->
                        case DecPath == "/" orelse not filelib:is_file(Arg#arg.docroot ++ DecPath) of
                                true ->
                                        Arg#arg{req = Req#http_request{path = {abs_path, "/trac" ++ Path}}};
                                false ->
                                        Arg
                        end
        catch
                exit:_ ->
                        Arg
        end.

This module runs as soon as a request comes into the server and allows you to modify many variables before the request is handled. This is a simple one which says: if the request is “/” or the requested file doesn’t exist, append the request to the Trac appmod, otherwise pass it through unaltered. It’s enabled in the Yaws server by adding arg_rewrite_mod = rewrite_trac to yaws.conf. The Trac appmod must also be modified to make sure SCRIPT_NAME is now / so the application generates links without containing /trac.

WordPress

WordPress works perfectly once PHP is enabled in Yaws. WordPress permalinks, however, do not. A little background: WordPress normally relies on Apache’s mod_rewrite to execute index.php when it gets an request like /post/2009/07/27/suexec-on-opensolaris/. mod_rewrite sets up the environmental variables such that WordPress is able to detect how it was called and can process the page accordingly.

Without mod_rewrite, the best it can do is rely on requests like /index.php/post/2009/07/27/suexec-on-opensolaris/ and use the PATH_INFO variable, which is the text after index.php. I think that looks ugly, having index.php in every URL. You would think that simply rewriting the URL, just as was done with Trac, would solve the problem, but WordPress is too smart, and always sends you to a canonical URL which it thinks must include index.php.

After more experimentation than I care to explain, I discovered that if I set the REQUEST_URI variable to the original request (the one not including index.php), WordPress was happy. This was a tricky exercise in trying to set an environmental variable from the rewrite module. But, as we saw with the Trac example, environmental variables can be set from appmods. And I found that data can be passed from the rewrite module to the appmod through the arg record! Here’s my solution:

-module(rewrite_blog).

-export([arg_rewrite/1]).

-include_lib("yaws/include/yaws_api.hrl").

arg_rewrite(Arg) ->
        Req = Arg#arg.req,
        {abs_path, Path} = Req#http_request.path,
        try yaws_api:url_decode_q_split(Path) of
                {DecPath, _Query} ->
                        case DecPath == "/" orelse not filelib:is_file(Arg#arg.docroot ++ DecPath) of
                                true ->
                                        case string:str(Path, "/wsvn") == 1 of
                                                true ->
                                                        {ok, NewPath, _RepCount} = regexp:sub(Path, "^/wsvn(\.php)?", "/wsvn.php"),
                                                        Arg#arg{req = Req#http_request{path = {abs_path, NewPath}}};
                                                false ->
                                                        Arg#arg{opaque = [{"REQUEST_URI", Path}],
                                                                req = Req#http_request{path = {abs_path, "/blog" ++ Path}}}
                                        end;
                                false ->
                                        Arg
                        end
        catch
                exit:_ ->
                        Arg
        end.

Nevermind the extra WebSVN rewriting code. Notice I set the opaque component of the arg. Then in the appmod:

-module(blog).

-export([out/1]).

-define(SCRIPT, "/docs/thestaticvoid.com/wordpress/index.php").

-include_lib("yaws/include/yaws.hrl").
-include_lib("yaws/include/yaws_api.hrl").

out(Arg) ->
        Pathinfo = Arg#arg.pathinfo,
        Env = Arg#arg.opaque,
        {ok, GC, _Groups} = yaws_api:getconf(),
        yaws_cgi:call_cgi(Arg, GC#gconf.phpexe, ?SCRIPT, Pathinfo, Env).

I pull the data from the arg record and pass it to the call_cgi/5 function. Also of note here is the special way to invoke the PHP CGI. The location of the php-cgi executable is pulled from the Yaws configuration and passed as the second argument to call_cgi/5 so Yaws knows what to do with the files. You can surely imagine this as a way to execute things other than PHP which do not have a #! at the top. Or emulating suEXEC with a custom wrapper 🙂

Overall, this probably seems like a lot of work to get things working that are trivial in other servers, but I’m finding that the appmods are really powerful, and the Yaws code itself is very easy to understand and modify. Plus you get the legendary performance and fault-tolerance of Erlang.