So I head up to the High Sierras for a week switched off from the grid and on the way back stop at a Chinese restaurant for lunch. My fortune cookie says “You will be pleasantly surprised” or some such thing. I kid you not. Turns out blitz.io has won the best CouchDB App at CouchConf and we are super excited about this. While I don’t know exactly the criteria used by the panel, I sure do know that blitz.io pushes CouchDB to the limits in many ways. Beyond just the map/reduce, we use lots of other cool things about CouchDB in production, making blitz.io the first multi-tenant load testing solution targeted at app developers.
First the award
Thanks to Soo Hwan for the picture:
I wrote out about the blitz.io architecture in a prior blog, but here’s the quick overview.
blitz.io uses regional CouchDB clusters as the backbone with the scale engines using continuous _changes waiting for things to do.
Obviously, CouchDB without Map/Reduce is somewhat an oxymoron. For blitz.io‘s needs though, we have 6 _design documents and a total of around 30 views. With the current status of our application, we don’t really use reduce that much, except for things like the scoreboard and statistics. Even here we use the built-in reduce functions (like _sum and _count) for performance.
We have regional CouchDB clusters distributed on the california and virginia EC2 regions with multi-master replication between the two. This is partly for redundancy and also to minimize geo-latencies when running load tests. This also allow us to distribute the writes with eventual consistency helping us to keep the clusters in sync. For a while we were running 2 clusters of CouchDB 1.0.2 and another 2 clusters of CouchDB 1.1 with 4-way replication. This was our migration strategy without bringing the app down. Once we were happy about the performance of 1.1, we simply switched to this as the primary cluster and turned of the 1.0.2 instances.
While new in 1.1, this takes a huge pain out of operations since the replication information is persistence across reboots and restarts. All CouchDB instances are kept up with upstart and the _replicator database makes it easy for us to fire and forget.
We have multiple scale engines deployed on EC2 across all of the supported regions globally and each of these engines use filtered _changes to only be notified of jobs that need to be run from a region. We call them job affinity. Using the update_seq, each engine continuously waits for jobs that are relevant to them and only get woken up when something needs to be run. All scale engines have zero local storage and completely rely on CouchDB to store all their own checkpoints and status. In this sense, what we have is almost like an IRC channel where we can spin up scale engines on the fly. They simply register with CouchDB and join the fun.
Using conflicts to claim ownership
This one’s pretty cool. When a job is posted, all scale engines from a given region wake up to acquire the job. They do this by trying to modify the document and use the conflict mechanism in CouchDB to acquire ownership. This gives us self selection in that we never know which particular scale engine in a region will end up processing the job. The scale engine that’s not as busy will automatically end up running the load tests.
So yeah, we are super happy with CouchDB and is ideal for what we are setting out accomplish. Which is:
Making load and performance testing a fun sport for app developers!
And blitz.io is all about continuous integration. So we’ve been busy getting multi-language API clients out. You can now easily bring load and performance testing into continuous integration with ruby, node.js, python, java and perl! So check it out and see how easy it is to scale out your app.