Benchmarking Duplog

In my last post I introduced Duplog, a small tool that basically forms the glue between rsyslog, RabbitMQ, Redis, and Splunk to enable a highly available, redundant syslog service that deduplicates messages along the way. In order for Duplog to serve the needs of a production enterprise, it will need to perform well. I fully expect the deduplication process to take a toll, and in order to find out how much, I devised a simple benchmark.

Duplog Benchmark Diagram

One one side, fake syslog generators pipe pairs of duplicate messages into the system as fast as they can. On the other, a process reads the messages out as fast as it can. Both sides report the rate of message ingestion and extraction.

The Details

  • 3 KVM virtual machines on the same host each with the same specs:
    • 1 x Intel Core i5 2400 @ 3.10 GHz
    • 1 GB memory
    • Virtio paravirtualized NIC
  • OS: Ubuntu Server 12.04.1 LTS 64-bit
  • RabbitMQ: 2.7.1
  • Redis: 2.2.12
  • Java: OpenJDK 6b27

The Results

(My testing wasn’t very scientific so take all of this with a grain of salt.)

Duplog Benchmark

The initial results were a little underwhelming, pushing about 750 messages per second through the system. I originally expected that hashing would be the major bottleneck, or the communication with the Redis server, but each of those processes were sitting comfortably on the CPU at about 50% and 20% usage, respectively. It turned out that the RabbitMQ message brokers were the source of the slow performance.

I began trying many different settings for RabbitMQ, starting by disabling the disk-backed queue, which made little difference. In fact, the developers have basically said as much: “In the case of a persistent message in a durable queue, yes, it will also go to disk, but that’s done in an asynchronous manner and is buffered heavily.”

So then I changed the prefetch setting. Rather than fetching and acknowledging one message at a time, going back and forth over the network each time, the message consumers can buffer a configurable number of messages at a time. It is possible to calculate the optimum prefetch count, but without detailed analytics handy, I just picked a prefetch size of 100.

That setting made a huge difference, as you can see in the histogram. Without spending so much time talking to RabbitMQ, consumers were free to spend more time calculating hashes, and that left the RabbitMQ message brokers more free to consume messages from rsyslog.

Another suggestion on the internet was to batch message acknowledgements. That added another modest gain in performance.

Finally, I tried enabling an unlimited prefetch count size. It is clear that caching as many messages as possible does improve performance, but it comes at the cost of fairness and adaptability. Luckily, neither of those characteristics is important for this application, so I’ve left it that way, along with re-enabling queue durability, whose performance hit, I think, is a fair trade-off for message persistence. I also reconfigured acknowledgements to fire every second rather than every 50 messages. Not only does that guarantee that successfully processed messages will get acknowledged sooner or later, it spaces out ACKs even more under normal operation, which boosted the performance yet again to around 6,000 messages per second.

So is 6,000 messages per second any good? Well, if you just throw a bunch of UDP datagrams at rsyslog (installed on the same servers as above), it turns out that it can take in about 25,000 messages per second without doing any processing. It is definitely reasonable to expect that the additional overhead of queueing and hashing in Duplog will have a significant impact. It is also important to note that these numbers are sustained while RabbitMQ is being written to and read from simultaneously. If you slow down or stop message production, Duplog is able to burst closer to 10,000 messages per second. The queueing component makes the whole system fairly tolerant of sudden spikes in logging.

For another perspective, suppose each syslog message averages 128 bytes (a reasonable, if small, estimate), then 6,000 messages per second works out to 66 GB per day. For comparison, I’ve calculated that all of the enterprise Unix systems in my group at UMD produce only 3 GB per day.

So as it stands now I think that Duplog performs well enough to work in many environments. I do expect to see better numbers on better hardware. I also think that there is plenty of optimization that can still be done. And in the worst case, I see no reason why this couldn’t be trivially scaled out: divide messages among multiple RabbitMQ queues to better take advantage of SMP. This testing definitely leaves me feeling optimistic.

Overengineering Syslog: Redundancy, High Availability, Deduplication, and Splunk

I am working on a new Splunk deployment at work, and as part of that project, I have to build a centralized syslog server. The server will collect logs from all of our systems and a forwarder will pass them along to Splunk to be indexed. That alone would be easy enough, but I think that logs are too important to leave to just one syslog server. Sending copies of the log data to two destinations may allow you to sustain outages in half of the log infrastructure while still getting up-to-the-minute logs in Splunk. I think duplicating log messages at the source is a fundamental aspect of a highly available, redundant syslog service when using the traditional UDP protocol.

That said, you don’t want to have Splunk index all of that data twice. That’ll cost you in licenses. But you also don’t want to just pick a copy of the logs to index—how would you know if the copy you pick is true and complete? Maybe the other copy is more complete. Or maybe both copies are incomplete in some way (for example, if routers were dropping some of those unreliable syslog datagrams). I think the best you can do is to take both copies of the log data, merge them together somehow, remove the duplicate messages, and hope that, between the two copies, you’re getting the complete picture.

I initially rejected the idea of syslog deduplication thinking it to be too complicated and fragile, but the more I looked into it, the more possible it seemed. When I came across Beetle, a highly available, deduplicating message queue, I knew it would be doable.

Beetle Architecture

Beetle Architecture

Beetle itself wouldn’t work for what I had in mind (it will deduplicate redundant messages from single sources; I want to deduplicate messages across streams from multiple sources), but I could take its component pieces and build my own system. I started hacking on some code a couple of days ago to get messages from rsyslog to RabbitMQ and then from RabbitMQ to some other process which could handle deduplication. It quickly turned into a working prototype that I’ve been calling Duplog. Duplog looks like this:

Duplog Architecture

Duplog Architecture

At its core, Duplog sits and reads messages out of redundant RabbitMQ queues, hashes them, and uses two constant-time Redis operations to deduplicate them. RabbitMQ makes the whole process fairly fault tolerant and was a great discovery for me (I can imagine many potential use cases for it besides this). Redis is a very flexible key-value store that I’ve configured to act as a least-recently-used cache. I can throw hashes at it all day and let it worry about expiring them.

One important design consideration for me was the ability to maintain duplicate messages within a single stream. Imagine you have a high-traffic web server. That server may be logging many identical HTTP requests at the same time. Those duplicates are important to capture in Splunk for reporting. My deduplication algorithm maintains them.

Looking at the architecture again, you will see that almost everything is redundant. You can take down almost any piece and still maintain seamless operation without dealing with failovers. The one exception is Redis. While it does have some high availability capability, it relies on failover which I don’t like. Instead, I’ve written Duplog to degrade gracefully. When it can’t connect to Redis, it will allow duplicate messages to pass through. A few duplicate messages isn’t the end of the world.

Feel free to play around with the code, but know that it is definitely a prototype. For now I’ve been working with the RabbitMQ and Redis default settings, but there is probably a lot that should be tuned, particularly timeouts, to make the whole system more solid. I also intend to do some benchmarking of the whole stack (that’ll probably be the next post), but initial tests on very modest hardware indicate that it will handle thousands of messages per second without breaking a sweat.

SuperSync

I wrote in the about me blurb on this blog that I like writing little programs for myself. One of the programs I’m most proud of is called SuperSync.

Back in college when I started developing an interest in music, I got in the habit of only acquiring losslessly encoded files. FLACs mostly. It wasn’t long before my collection outgrew what I could store on my iPod. So I hacked together a little script which I called “Sync” to encode my music files to something smaller, like Ogg Vorbis. I wrote it in Java because that’s what I knew best at the time, and for the most part, it just worked. It kept a flat database of files and timestamps to know what to sync to the iPod without reencoding everything every time.

But unfortunately, as my music collection grew, there were times, like when MusicBrainz would have a minor update for all of my files, which would make Sync think that everything needed to be resynced again. It got to to a point where some syncs would take a week, doing one file at a time.

It got me thinking: I have 10 CPU cores in my house. If I could get them all working together on the problem, I could get those long syncs down to a day or two. And thus SuperSync was born.

Still written in Java, SuperSync adds a distributed client/server architecture and nice GUI over top of Sync. The program takes the same flat database, and when it sees a new or updated file in the source directory, it copies it to the destination directory. If the file is a FLAC, it broadcasts a conversion request to the network. Any server can then respond if it has a free CPU. The server reads the file from my network share and sends the encoded file back to the client where it gets written out to the destination. The whole setup relies on having a consistent global namespace for the source collection. In my case, all of my systems can access my fileserver mounted at /nest in the same way. I can’t imagine many people have such a setup, so I don’t think a formal SuperSync release would be worthwhile.

In any case, the process looks something like this in action:

At the end of the sync, the program can optionally read a song log from Rockbox and scrobble it to Last.fm.

I’m also really proud of the way SuperSync is written. I spent a lot of time upfront to define clean interfaces using good object-oriented style. Feel free to checkout the source code if you’re into that sort of thing. Just ask me first if you want to use any of it.

Now the times are changing, and with Subsonic allowing me to stream music to my phone, I haven’t had to sync my music as much recently. But my iPod still has its purposes, so I’m glad I have SuperSync to let me take my whole collection with me.

Retirement: Defined Benefit or Defined Contribution?

Specifically, should employees of University System of Maryland institutions participate in the State Retirement and Pension System (SRPS) or the Optional Retirement Program (ORP)? That was one of the questions I had to answer for myself as I prepare to start a new job at the University of Maryland. The SRPS is a defined-benefit pension plan and the ORP is a defined-contribution 401(a)-like plan. The default is the SPRS, and it seems like they do everything they can to steer you to it (I suspect because your contributions are how the State funds current retirees), but is it a better deal?

First, some details: the SRPS requires employees to contribute 7% of their salary to the plan. In return, after 10 years of service, you can retire at age 65 and receive a monthly allowance following this formula:

\frac{0.015\:\times\:\textrm{salary}\:\times\:\textrm{years of service}}{12}

By contrast, the ORP is simple: the University will contribute a flat 7.25% of your salary to your choice of Fidelity or TIAA-CREF, and you can invest it however you want. The money is immediately vested. Additionally, you can take the 7% that you would have had to contribute to the SRPS and invest it on your own in a supplemental retirement plan or IRA. That’s a total of 14.25% of your salary going towards your retirement every year…comfortably within the 10-15% that experts recommend.

For me, considering it doesn’t vest until 10 years of service, the SRPS was right out. But as an experiment, I wanted to know which would be the better option if I worked for Maryland for 10 years. (The SRPS does allow you to withdraw your contributions compounded annually at 5% interest if you terminate employment before 10 years, but then you wouldn’t get the benefit of the State’s contribution to the plan, and you can almost certainly do better than 5% annually in the long-run by investing in a mix of stocks and bonds.)

I plugged in all of my numbers to the SRPS formula, and calculated an estimated withdraw rate for the ORP supposing a realistic inflation-adjusted growth rate. The results were clear: the ORP could provide me with about twice as much money during retirement. With results like those, I was curious whether the SRPS would be a good deal for anyone and under what circumstances that would be.

So I whipped up a little program to do the calculations for me. It has sliders for each of the input variables so the results can easily be compared for a wide variety of circumstances. The program works with your current salary and inflation-adjusted rates of return to give you a picture of what sort of spending power in today’s dollars you would have during retirement.

Consider a 35-year-old who makes $80,000 per year. If they expect to work for Maryland for 10 years and think they can earn around 5% in the market after adjusting for inflation (that’s a real annual return of 8% if you assume an average of 3% inflation), and intend to retire at 65 and expect to need income for 30 years in retirement, the ORP just barely comes out on top:

Indeed, that seems to be the turning point. Any older and you won’t have enough time to let those returns compound, and if you are any more risk-averse, then you won’t be able to generate the returns needed to outpace the pension system. In those cases, the SRPS would be a better choice for you, but only if you are in it for the long haul. You’d be wasting valuable investing time if you join the SRPS and leave before your contributions vest. Otherwise, read a book or two on investing, and do it yourself with the ORP.

But don’t take my word for it; do the math or put your numbers into my program and see what the better choice would be for you:

Run the Maryland Retirement Comparison Tool
Requires Java 5 or higher and Windows, MacOS X, or Linux

Of course, I make no guarantees that my program is accurate, but you can get the source code and check it out for yourself. Also consider investment risks and other factors such as plan benefits carefully.