blitz.io: How we use Heroku, AWS and CouchDB

I put together this prezi a few weeks ago as a quick way to describe the internals of blitz.io. This blog is an expansion of this to go into further details on the internals. We launched blitz.io a few months ago to really bring load and performance testing to developers, as part of the continuous integration. Last week we released multiple API clients in various languages to make this possible. We realize that most cloud-based load testing tools are heavy and are geared towards experts and cost significant time and $$$ to do performance testing. With the rise of PaaS, it’s imperative that this type of testing is easy, affordable to the developers and really part of dev and test, not a one off expensive event that happens once a year.

Heroku

This was somewhat a no-brainer, since we already had a few apps deployed on Heroku. The web-tier of blitz.io is a sinatra/ruby app and we consciously decided to break up the app into multiple sub-apps with varying number of dynos and workers. So far we have blitz.io, docs.blitz.io and secure.blitz.io, with more apps coming soon.

We extensively use background workers that handle expiration of paid plans, sending emails, pre-authorizing apps as part of the Heroku add-on and so on. All these workers are simply rake tasks and we use rspec tests with mocks and stubs. Even the requirement documents in most cases are pending rspecs allowing the developers to focus on the expected behavior. We average around 5 to 10 git push‘s a day as we are constantly tweaking the code and making improvements.

CouchDB

CouchDB is the backbone of blitz.io. After some experimentation with various hosted CouchDB’s, we ended up running our own cluster across the virginia and california regions of AWS with multi-master replication between the two regions. We use m1.Large instances and are currently running 1.1.0 though for a while we had both 1.0.2 and 1.1.0 together with a 4-way replication during the upgrade process.

I’ve written multiple blogs about the reasons behind choosing CouchDB as our database and definitely excited that we won the best-app award at CouchConf! We use the filtered _changes feed, multi-master replication, lots of design views, the conflict mechanism and obviously map/reduce during the actual load tests.

The multi-region deployment is unique because it minimizes the geo latencies to the scale engines described below. Besides this also gives us redundancy, and distributed writes, though at this point in the life of blitz.io we don’t currently shard.

AWS Identity and Access Management

While I like the Heroku security model, there have been breaches in the past where you could access the config variables and source code for apps that didn’t belong to you. To ensure the security and privacy of our users, we use AWS IAM policies with restricted access to our AWS account from within Heroku. For admin and auto-scaling purposes, the IAM policies restricts the code to only launch EC2 instances, but no access to the rest of our AWS account.

Multi-region scale engines

We run multiple m1.Small instances across all five AWS regions for our scale engines. Written in evented C++, these scale engines can generate over 50,000 concurrent HTTP requests on a single EC2 instance! This is truly the first multi-tenant load testing solution on the cloud, where we don’t have to make you deploy instances and also pass on the cost. This sheer horsepower is also what makes blitz.io fun to use. You never see the autoscaling and provisioning happening behind the scenes.

Using S3 for engine boots

We push out a gem containing the ruby controller code as well as the scale engine to a private bucket on S3. We then use cloudinit and the EC2 user-data to boot off engines whenever we want. These engines, using the default Linux AMI, download the gem from S3, configure themselves and start hooking up into the filtered _changes feed to the geographically closest CouchDB instance. Each of the scale engines, uses the EC2 meta-data to identify itself as well as its region to the filtered _changes feed.

Each scale engine, further creates/modifies its own document to CouchDB using the EC2 instance-id and has no local disk storage. These are simply high compute, high network IO instances. They use the update_seq in CouchDB to checkpoint themselves and also handle restarts and reboots.

A Heroku worker constantly reconciles the instances registered in CouchDB against what the AWS API says to garbage collect instances that have been shut off during auto-scaling.

Summary

blitz.io is cool, we think, in that it uses a set of technologies in a very unique way to pull of the the first multi-tenant cloud load and performance testing solution targeted at developers and fits snugly into continuous integration and deployment.

More importantly, we strive to keep the gamification aspect of blitz.io in order to:

Make Load and Performance testing a fun sport!

What programmable load-testing gives you are very cool apps like these:

Bookmark and Share