Architecting for Scaling

Our backend is written in Ruby, and we can handle over 5,000 messages per second.  Per messaging server.

Twitter, everyone’s poster child for poor scaling capabilities, is written in Ruby, and using Rails, and because they’ve been having such a terrible time of keeping their system alive, a large majority of people have extrapolated that Ruby and/or Rails is at fault. This is an incorrect assumption.

Where real scaling issues come into play is the architecture, not the language or implementation, assuming sane languages or implementations. Of course there are always the crazies who write some horribly inefficient monstrosity in BASIC that’s too slow by a factor of ten, but by and large, the same problem solved in different languages will generally be in the same order of magnitude.

A friend of mine from the WWU CS department actually performed some tests on this exact issue. Benson Kalahar created a simple test with basic arithmetic operations and data I/O in a variety of languages, and tested them on inputs of varying sizes.

As you can see, different languages have different levels of overhead, but they all tend towards linearization as the input increases. As would be expected, C is the fastest, a Bash script has very little overhead but is computationally slower, interpreted languages are slower than compiled languages, etc. Ruby does show some nasty tendencies as inputs get very large. I’m going to guess this is the GC doing it’s thing.  We’ll see what happens with Ruby 1.9 and YARV, but even if YARV doesn’t live up to the hype, it’s not a blocker.

The important thing to take away from this graph is that all the languages tend towards linearization, which is where we’re interested for building an application that can scale to the sizes that we want.  No matter how lightning-fast our language might be, eventually it’s going to be too slow to run on a single machine alone.

The real solution to making something capable of handling lots of load is to make it capable of scaling out, rather than scaling up. Scaling up, which means to add additional resources to a single node, will always have an upper limit, at which point you simply cannot add more RAM, there are no faster processors available, or you’re maxing out your gigabit ethernet link.

Scaling out, on the other hand, if architected well, has no upper bound. If the system as a whole can handle it, it’s simply a matter of adding another node, potentially adjusting some configuration, and letting it rip.

This is where Twitter went wrong, and this is where our messaging queue is right. 5,000 text messages might sound like a lot right now, but we’ve seen spikes where we have 10,000 messages in the queue, and that 2 seconds of latency is just on the edge of acceptability to us. The solution? Add another server, put it behind the load balancer, add it to the in-queue balancing config, and suddenly we can handle 10,000 messages per second. This is linear all the way out until we start to saturate the network.

Obviously if we had written the backend in C, we might be able to pull 10,000 messages per server, or even more. Unfortunately, this has a significant cost tradeoff. Hardware is cheap – we can add another messaging node for well under $1000. Developer time, on the other hand, isn’t as cheap. That $1000 is less than a week’s worth of a developer’s time. Instead, we opted to spend less developer time, have a more flexible final product, and be able to scale that product out very easily. We can have an extra compute or messaging node online within 24 hours, including hardware acquisition time.

I’m looking forward to seeing what our new architecture can do. It’s loads faster than the existing messaging queue, which has allowed us to send out up to 5 million messages a month since January, with lots of transient load.  I’m excited for launch on the 15th, it’s going to be a great new product!