Adventures in HPC: RDMA and Erlang

I recently attended the SC13 conference where one of my goals was to learn about InfiniBand. I attended a full day tutorial session on the subject, which did a good job of introducing most of the concepts, but didn’t really delve as deep as I had hoped. That’s not really the fault of the class; InfiniBand, and the larger subject of remote direct memory access (RDMA), is incredibly complex. I wanted to learn more.

Now, I’ve been an Erlang enthusiast for a few years, and I’ve always wondered why it doesn’t have a larger following in the HPC community. I’ll grant you that Erlang doesn’t have the best reputation for performance, but in terms of concurrency, distribution, and fault tolerance, it is unmatched. And areas where performance is critical can be offloaded to other languages or, better yet, to GPGPUs and MICs with OpenCL.

But compared to its competition, there are areas where Erlang is lacking, for example, in distributed message passing, where it still uses TCP/IP. So in an effort to learn more about RDMA and in hopes of making Erlang a little more attractive to the HPC community, I set out to write an RDMA distribution driver for Erlang.

RDMA is a surprisingly tough nut to crack for its maturity. Documentation is scarce. Examples are even more so. Compared to TCP/IP, there is a lot more micro-management: you have to set up the connection; you have to decide how to allocate memory, queues, and buffers; you have to control how to send and receive; and you have to do your own flow control, among other complications. But for all that, you get the possibility of moving data between systems without invoking the kernel, and that promises significant performance gains over TCP/IP.

In addition, it would almost seem like RDMA was made for Erlang. RDMA is highly asynchronous and event-driven, which is a nearly perfect match for Erlang’s asynchronous message passing model. Once I got my head around some Erlang port driver idiosyncrasies, things sort-of fell in to place, and here is the result:

RDMA Ping Pong

pong. I’ve never been happier to see such a silly word.

Of course, the driver works for more than just pinging. It works for all distributed Erlang messages. In theory, you can drop it in to any Erlang application and it should just work.

The question is: how well does it work? Is it any better than the default TCP/IP distribution driver? For that, I devised a simple benchmark.

RDMA Benchmark Diagram

For each in a given set of nodes, the program will spawn a hundred processes that sit in a tight loop performing RPCs. The number of RPCs is counted and can be compared between different network implementations.

The program was tested on four nodes of a cluster, each with:

  • 2 x Intel Xeon X5560 Quad Core @ 2.80 GHz
  • 48 GB memory
  • Mellanox ConnectX QDR PCI Gen2 Channel Adapter
  • Red Hat Enterprise Linux 5.9 64-bit
  • Erlang/OTP R16B03
  • Elixir 0.12.0
  • OFED 1.5.4

The results are summarized as follows:

RDMA Benchmark

The RDMA implementation offers around a 50% increase in messaging performance over the default TCP/IP driver in this test. I believe this is primarily explained by the reduction in context switching. Where the TCP implementation has to issue a system call for every send and receive operation, requiring a context switch to the kernel, the RDMA implementation only calls into the kernel to be notified of incoming packets. And if packets are coming in fast enough, as they are in this test, then the driver can process many packets per context switch. The RDMA driver stays completely in user-space for send operations.

You may be wondering why the TCP driver performed about the same over the Ethernet and InfiniBand interfaces. These RPC operations involve very small messages, on the order of tens of bytes being passed back and forth, so this test really highlights the overhead of the network stacks, which is what I intended. I would imagine increasing the message size would make the InfiniBand interfaces take off, but I’ll leave that for a future test. Indeed, there are many more benchmarks I should perform.

Also, for now I’m avoiding the obvious comparison between Erlang and MPI. MPI libraries tend to have very mature, sophisticated RDMA implementations that I know I can’t compete against yet. I’d rather focus on improving the driver. I’ve started a to-do list. Feel free to pitch in and send me some pull requests on GitHub!

One last thing: Thank you The Geek in the Corner for your basic RDMA examples, and thank you Erlang/OTP community and Ericsson for your awesome documentation. As for my goal of wanting to learn about InfiniBand, I’d say goal accomplished.

Using Nitrogen as a Library Under Yaws

Motivation

I’ve been working on a project off and on for the past year which uses the Spring Framework extensively. I love Spring for how easy it makes web development, from wiring up various persistence and validation libraries, to dependency injection, and brainless security and model-view-controller functionality. However, as the project has grown, I’ve become more and more frustrated with one aspect of Spring and Java web development in general: performance and resource usage. It’s so bad, I’ve pretty much stopped working on it altogether. Between Eclipse and Tomcat, you’ve already spent over 2 GB of memory, and every time you make a source code change, Tomcat has to reload the application which takes up to 30 seconds on my system, if it doesn’t crash first. This doesn’t suit my development style of making and testing lots of small, incremental changes.

So rather than buy a whole new computer, I’ve started to look for a new lightweight web framework to convert the project to. I really like Erlang and have wanted to write something big in it for a while, so when I found the Nitrogen Web Framework, I thought this might be my opportunity to do so. Erlang is designed for performance and fault-tolerance and has a great standard library in OTP, including a distributed database, mnesia, which should eliminate my need for an object-relational mapper (it stores Erlang terms directly) and enable me to make my application highly available in the future without much fuss. Nitrogen has the added benefit of simplifying some of the fancy things I wanted to do with AJAX but found too difficult with Spring MVC.

The thing I don’t like about Nitrogen is that it is designed to deliver a complete, stand-alone application with a built-in web server of your choosing and a copy of the entire Erlang runtime. This seems to be The Erlang/OTP Way of doing things, but it seems very foreign to me. I already have Erlang installed system-wide and a web server, Yaws, that I have a lot of time invested in. I’d rather use Nitrogen as a library in my application under Yaws just like I was using Spring as a library in my application under Tomcat.

Procedures

I start my new project with Rebar:

$ mkdir test && cd test
$ wget https://bitbucket.org/basho/rebar/downloads/rebar && chmod +x rebar
$ ./rebar create-app appid=test
==> test (create-app)
Writing src/test.app.src
Writing src/test_app.erl
Writing src/test_sup.erl
$ mkdir static include templates  # These directories will be used later

Now I define my project’s dependencies in rebar.config in the same directory:

{deps, [
    {nitrogen_core, "2.1.*", {git, "git://github.com/nitrogen/nitrogen_core.git", "HEAD"}},
    {nprocreg, "0.2.*", {git, "git://github.com/nitrogen/nprocreg.git", "HEAD"}},
    {simple_bridge, "1.2.*", {git, "git://github.com/nitrogen/simple_bridge.git", "HEAD"}},
    {sync, "0.1.*", {git, "git://github.com/rklophaus/sync.git", "HEAD"}}
]}.

These dependencies are taken from Nitrogen’s rebar.config. Next I write a Makefile to simplify common tasks:

default: compile static/nitrogen

get-deps:
        ./rebar get-deps

include/basedir.hrl:
        echo '-define(BASEDIR, "$(PWD)").' > include/basedir.hrl

static/nitrogen:
        ln -sf ../deps/nitrogen_core/www static/nitrogen

compile: include/basedir.hrl get-deps
        ./rebar compile

clean:
        -rm -f static/nitrogen include/basedir.hrl
        ./rebar delete-deps
        ./rebar clean

distclean: clean
        -rm -rf deps ebin

I expect I’ll be tweaking this Makefile some more in the future, but it demonstrates the absolute minimum to compile the application. When I run make, four things happen the first time:

  1. BASEDIR is defined as the current directory in include/basedir.hrl. We’ll use this later.
  2. All of the Nitrogen dependencies are pulled from Git to the deps directory.
  3. All of the code is compiled.
  4. The static content from Nitrogen (mostly Javascript files) is symlinked into our static content directory.

Next I prepare the code for running under Yaws. First I create the Nitrogen appmod in src/test_yaws.erl:

-module(test_yaws).
-export ([out/1]).

out(Arg) ->
    RequestBridge = simple_bridge:make_request(yaws_request_bridge, Arg),
    ResponseBridge = simple_bridge:make_response(yaws_response_bridge, Arg),
    nitrogen:init_request(RequestBridge, ResponseBridge),
    nitrogen:run().

This is taken from Nitrogen repository. I also modify the init/0 function in src/test_sup.erl to start the nprocreg application, similar to how it is done in Nitrogen proper:

init([]) ->
    application:start(nprocreg),
    {ok, { {one_for_one, 5, 10}, []} }.

Lastly, I add a function to src/test_app.erl which can be used by Yaws to start the application:

-export([start/0]).

start() ->
    application:start(test).

One other thing that I do before loading the application up in Yaws is create a sample page, src/index.erl. This is downloaded from Nitrogen:

-module (index).
-compile(export_all).
-include_lib("nitrogen_core/include/wf.hrl").
-include("basedir.hrl").

main() -> #template { file=?BASEDIR ++ "/templates/bare.html" }.

title() -> "Welcome to Nitrogen".

body() ->
    #container_12 { body=[
        #grid_8 { alpha=true, prefix=2, suffix=2, omega=true, body=inner_body() }
    ]}.

inner_body() ->
    [
        #h1 { text="Welcome to Nitrogen" },
        #p{},
        "
If you can see this page, then your Nitrogen server is up and
running. Click the button below to test postbacks.
"
,
        #p{},
        #button { id=button, text="Click me!", postback=click },
        #p{},
        "
Run <b>./bin/dev help</b> to see some useful developer commands.
"

    ].

event(click) ->
    wf:replace(button, #panel {
        body="You clicked the button!",
        actions=#effect { effect=highlight }
    }).

I make sure to include basedir.hrl (generated by the Makefile, remember?) and modify the template path to start with ?BASEDIR. Since where Yaws is running is out of our control, we must reference files by absolute pathnames. Speaking of templates, I downloaded mine from the Nitrogen repository. Obviously, it can be modified however you want or you could create one from scratch.

Before we continue, I recompile everything by typing make.

Now the fun begins: wiring it all up in Yaws. I use my package for OpenSolaris which puts the configuration file in /etc/yaws/yaws.conf. I add the following to it:

ebin_dir = /docs/test/deps/nitrogen_core/ebin
ebin_dir = /docs/test/deps/nprocreg/ebin
ebin_dir = /docs/test/deps/simple_bridge/ebin
ebin_dir = /docs/test/deps/sync/ebin
ebin_dir = /docs/test/ebin

runmod = test_app

<server test.thestaticvoid.com>
    port = 80
    listen = 0.0.0.0
    docroot = /docs/test/static
    appmods = </, test_yaws>
</server>

Obviously, your paths will probably be different. The point is to tell Yaws where all of the compiled code is, tell it to start your application (where the business logic will be contained), and tell it to use the Nitrogen appmod. Restart Yaws and it should all be working!

Now for some cool stuff. If you run the svc:/network/http:yaws service from my package, or you start Yaws like yaws --run_erl svc, you can run yaws --to_erl svc (easiest to do with root privileges) and get access to Yaws’s Erlang console. From here you can hot-reload code. For example, modify the title in index.erl and recompile by running make. In the Erlang console, you can run l(index). and it will pick up your changes. But there is something even cooler. From the Erlang console, type sync:go(). and now whenever you make a change to a loaded module’s source code, it will automatically be recompiled and loaded, almost instantly! It looks something like:

# yaws --to_erl svc
Attaching to /var//run/yaws/pipe/svc/erlang.pipe.1 (^D to exit)

1> sync:go().
Starting Sync (Automatic Code Reloader)
ok
2> 
=INFO REPORT==== 17-Feb-2011::15:03:10 ===
/docs/test/src/index.erl:0: Recompiled. (Reason: Source modified.)

=INFO REPORT==== 17-Feb-2011::15:04:20 ===
/docs/test/src/index.erl:11: Error: syntax error before: body

=INFO REPORT==== 17-Feb-2011::15:04:26 ===
/docs/test/src/index.erl:0: Fixed!

2> sync:stop().

=INFO REPORT==== 17-Feb-2011::15:07:17 ===
    application: sync
    exited: stopped
    type: temporary
ok

One gotcha that may or may not apply to you, is that Yaws should have permission to write to your application’s ebin directory if you want to save the automatically compiled code. In my case, Yaws runs as a different user than I develop as, a practice that I would highly recommend. So I use a ZFS ACL to allow the web server user read and write access:

$ /usr/bin/chmod -R A+user:webservd:rw:f:allow /docs/test/ebin
$ /usr/bin/ls -dv /docs/test/ebin
drwxr-xr-x+  2 jlee     staff          8 Feb 17 15:04 /docs/test/ebin
     0:user:webservd:read_data/write_data:file_inherit:allow
     1:owner@::deny
     2:owner@:list_directory/read_data/add_file/write_data/add_subdirectory
         /append_data/write_xattr/execute/write_attributes/write_acl
         /write_owner:allow
     3:group@:add_file/write_data/add_subdirectory/append_data:deny
     4:group@:list_directory/read_data/execute:allow
     5:everyone@:add_file/write_data/add_subdirectory/append_data/write_xattr
         /write_attributes/write_acl/write_owner:deny
     6:everyone@:list_directory/read_data/read_xattr/execute/read_attributes
         /read_acl/synchronize:allow

ACLs are pretty scary to some people, but I love ’em 🙂

Other Thoughts

You would not be able to run multiple Nitrogen projects on separate virtual hosts using this scheme. Nitrogen maps request paths to module names (for example, requesting “/admin/login” would load a module admin_login) and module names must be unique in Erlang. I think it would be possible to work around this using a Yaws rewrite module, though I haven’t tested it. I imagine if one virtual host maps “/admin/login” to “/foo/admin/login” and another maps it to “/bar/admin/login”, then Nitrogen would search for foo_admin_login and bar_admin_login, respectively, eliminating the conflicting namespace problem.

Now that I’ve gone through all the trouble of setting up Nitrogen the way I like, I should start converting my application over. Hopefully I’ll like it. It would be a shame to have done all this work for naught. I’m sure there will be posts to follow.

Replacing Apache with Yaws

After some time running with the Apache worker model, I noticed it was not much better on memory than prefork. It would spawn new threads as load increased, but didn’t give up the resources when load decreased. I know it does this for performance, but I am very tight on memory. And I know Apache is very tunable, so I could probably change that behavior, but tuning is boring. Playing with new stuff is fun! This was the perfect opportunity for me to look at lightweight web servers.

I toyed around with lighttpd and read the documentation for nginx. Both seem to do what I need them to do, but I was more interested in the equally fast and light Yaws. I’ve been really into Erlang lately (more on that in another post), so this was a great opportunity to see how a real Erlang application works.

Installation

Naturally, Yaws isn’t included in OpenSolaris yet, but Erlang is! So it was fairly easy to whip together a spec file and SMF manifest for the program. Then it was as easy as:

$ pkgtool build-only --download --interactive yaws.spec
$ pfexec pkg install yaws

I’ve set the configuration to go to /etc/yaws/yaws.conf and the service can be started with svcadm enable yaws. When I’m completely satisfied with the package, I’ll upload it to SourceJuicer.

Configuration

I run two virtual hosts: thestaticvoid.com and iriverter.thestaticvoid.com. Those two domains also have “.org” counterparts which I want to redirect to the “.com” domain. The yaws.conf syntax to handle this case is very simple:

pick_first_virthost_on_nomatch = true

<server localhost>
        port = 80
        listen = 0.0.0.0
        <redirect>
                / = thestaticvoid.com
        </redirect>
</server>

<server thestaticvoid.com>
        port = 80
        listen = 0.0.0.0
        docroot = /docs/thestaticvoid.com
        dir_listings = true
</server>

<server iriverter.thestaticvoid.com>
        port = 80
        listen = 0.0.0.0
        docroot = /docs/iriverter.thestaticvoid.com
        dir_listings = true
</server>

<server iriverter.thestaticvoid.org>
        port = 80
        listen = 0.0.0.0
        <redirect>
                / = iriverter.thestaticvoid.com
        </redirect>
</server>

Should be pretty self-explanatory, but the nice thing is the pick_first_virthost_on_nomatch directive combined with the localhost block so that if anyone gets to this site by any other address, they’ll be redirected to the canonical thestaticvoid.com. I did actually run into a bug with the redirection putting extra slashes in the URL, but squashed that bug pretty quickly with a bit of help from the Yaws mailing list. That whole problem is summarized in this thread.

PHP

Yaws handles PHP execution as a special case. All you have to do is add a couple lines to the configuration above:

php_exe_path = /usr/php/bin/php-cgi

<server thestaticvoid.com>
        ...
        allowed_scripts = php
        ...
</server>

Reload the server (svcadm refresh yaws) and PHP, well, won’t work just yet. This actually took me an hour or two to figure out. OpenSolaris’ PHP is compiled with the --enable-force-cgi-redirect option which means PHP will refuse to execute unless it was invoked by an Apache Action directive. Fortunately, you can disable this security measure by setting cgi.force_redirect = 0 in your /etc/php/5.2/php.ini.

Trac

Trac needs a little work to get running in Yaws. It’s requires an environmental variable, TRAC_ENV, set to tell it where to find the project database. The easiest way to do that is to copy /usr/share/trac/cgi-bin/trac.cgi to the document root, modify it to set the environmental variable, and enable CGI scripts in Yaws by setting allowed_scripts = cgi.

But I decided to set up an appmod, so that trac.cgi could be left where it was, unmodified. Appmods are Erlang modules which get run by Yaws whenever the configured URL is requested. Here’s the one I wrote for Trac:

-module(trac).

-export([out/1]).

-define(APPMOD, "/trac").
-define(TRAC_ENV, "/trac/iriverter").
-define(SCRIPT, "/usr/share/trac/cgi-bin/trac.cgi").

-include_lib("yaws/include/yaws_api.hrl").

out(Arg) ->
        Pathinfo = Arg#arg.pathinfo,
        Env = [{"SCRIPT_NAME", ?APPMOD}, {"TRAC_ENV", ?TRAC_ENV}],
        yaws_cgi:call_cgi(Arg, undefined, ?SCRIPT, Pathinfo, Env).

All appmods must define the out/1 function which takes an arg record which contains information about the current request. At the end of the function, the Yaws API is used to execute the CGI script with the extra environmental variables. This is compiled (erlc -o /var/yaws/ebin -I/usr/lib trac.erl) and enabled in the Yaws configuration by adding appmods = </trac, trac> to a server section. Then whenever someone requests /trac/foo/bar, Trac runs properly!

I also set up URL rewriting so that instead of requesting something like http://i.tsv.c/trac/wiki, all you see is http://i.tsv.c/wiki. This involves another Erlang module, a rewrite module. It looks like:

-module(rewrite_trac).

-export([arg_rewrite/1]).

-include_lib("yaws/include/yaws_api.hrl").

arg_rewrite(Arg) ->
        Req = Arg#arg.req,
        {abs_path, Path} = Req#http_request.path,
        try yaws_api:url_decode_q_split(Path) of
                {DecPath, _Query} ->
                        case DecPath == "/" orelse not filelib:is_file(Arg#arg.docroot ++ DecPath) of
                                true ->
                                        Arg#arg{req = Req#http_request{path = {abs_path, "/trac" ++ Path}}};
                                false ->
                                        Arg
                        end
        catch
                exit:_ ->
                        Arg
        end.

This module runs as soon as a request comes into the server and allows you to modify many variables before the request is handled. This is a simple one which says: if the request is “/” or the requested file doesn’t exist, append the request to the Trac appmod, otherwise pass it through unaltered. It’s enabled in the Yaws server by adding arg_rewrite_mod = rewrite_trac to yaws.conf. The Trac appmod must also be modified to make sure SCRIPT_NAME is now / so the application generates links without containing /trac.

WordPress

WordPress works perfectly once PHP is enabled in Yaws. WordPress permalinks, however, do not. A little background: WordPress normally relies on Apache’s mod_rewrite to execute index.php when it gets an request like /post/2009/07/27/suexec-on-opensolaris/. mod_rewrite sets up the environmental variables such that WordPress is able to detect how it was called and can process the page accordingly.

Without mod_rewrite, the best it can do is rely on requests like /index.php/post/2009/07/27/suexec-on-opensolaris/ and use the PATH_INFO variable, which is the text after index.php. I think that looks ugly, having index.php in every URL. You would think that simply rewriting the URL, just as was done with Trac, would solve the problem, but WordPress is too smart, and always sends you to a canonical URL which it thinks must include index.php.

After more experimentation than I care to explain, I discovered that if I set the REQUEST_URI variable to the original request (the one not including index.php), WordPress was happy. This was a tricky exercise in trying to set an environmental variable from the rewrite module. But, as we saw with the Trac example, environmental variables can be set from appmods. And I found that data can be passed from the rewrite module to the appmod through the arg record! Here’s my solution:

-module(rewrite_blog).

-export([arg_rewrite/1]).

-include_lib("yaws/include/yaws_api.hrl").

arg_rewrite(Arg) ->
        Req = Arg#arg.req,
        {abs_path, Path} = Req#http_request.path,
        try yaws_api:url_decode_q_split(Path) of
                {DecPath, _Query} ->
                        case DecPath == "/" orelse not filelib:is_file(Arg#arg.docroot ++ DecPath) of
                                true ->
                                        case string:str(Path, "/wsvn") == 1 of
                                                true ->
                                                        {ok, NewPath, _RepCount} = regexp:sub(Path, "^/wsvn(\.php)?", "/wsvn.php"),
                                                        Arg#arg{req = Req#http_request{path = {abs_path, NewPath}}};
                                                false ->
                                                        Arg#arg{opaque = [{"REQUEST_URI", Path}],
                                                                req = Req#http_request{path = {abs_path, "/blog" ++ Path}}}
                                        end;
                                false ->
                                        Arg
                        end
        catch
                exit:_ ->
                        Arg
        end.

Nevermind the extra WebSVN rewriting code. Notice I set the opaque component of the arg. Then in the appmod:

-module(blog).

-export([out/1]).

-define(SCRIPT, "/docs/thestaticvoid.com/wordpress/index.php").

-include_lib("yaws/include/yaws.hrl").
-include_lib("yaws/include/yaws_api.hrl").

out(Arg) ->
        Pathinfo = Arg#arg.pathinfo,
        Env = Arg#arg.opaque,
        {ok, GC, _Groups} = yaws_api:getconf(),
        yaws_cgi:call_cgi(Arg, GC#gconf.phpexe, ?SCRIPT, Pathinfo, Env).

I pull the data from the arg record and pass it to the call_cgi/5 function. Also of note here is the special way to invoke the PHP CGI. The location of the php-cgi executable is pulled from the Yaws configuration and passed as the second argument to call_cgi/5 so Yaws knows what to do with the files. You can surely imagine this as a way to execute things other than PHP which do not have a #! at the top. Or emulating suEXEC with a custom wrapper 🙂

Overall, this probably seems like a lot of work to get things working that are trivial in other servers, but I’m finding that the appmods are really powerful, and the Yaws code itself is very easy to understand and modify. Plus you get the legendary performance and fault-tolerance of Erlang.