Scaling it out – Building high performance mobile apps

We’ve been using blitz.io to optimize the backend server of a location-aware iPhone app (customer) that’s going to be released soon and we learned some interesting things that I thought I’ll pass on. The first key observation really is that performance optimization is a continuous, iterative and agile process. This means that the tools you use to validate performance have to be painless and friendly so you spend more time on tuning your code instead of figuring out how to run the tests. RESTful backends for iPhone and Android apps are exploding. I think we are past the era of simple utility apps like Sudoku and Chess, but instead the social element is now connecting these apps to backend RESTful servers that are expected to scale to the extreme, given the sheer number of mobile devices there are.

Server Architecture

The web server has 16-cores with lots of horse power connected to two MySQL databases with a gazillion bytes of RAM and a fast network connection in and out. The web application is RESTfully JSON and inserts the location of the user when the iPhone app starts up. There are other RESTful calls to do geo-spatial queries on how many users are within a certain lat/long bounding-box. The web application is kinda modeled like this:

iphone-app.png

Looks awfully like Heroku dynos and workers and built to scale with 100‘s of 1000‘s of iPhones calling in. The RESTful call (simplified) that does the registration looks like this:

http://cool.iphone.app/register?latitude=30.23&longitude=-100

When we ran a sprint (a single RESTful request with a randomized lat/long) with blitz.iothe response time was ~30 milliseconds. Beautiful!

And then something happened…

When we started rushing and increased the concurrency from a single iPhone to 10,000 iPhones (using a single EC2 instance, if I may add), we noticed something odd.

chart.png

The average response time started to increase in proportion to the number of concurrent iPhones in the load test, while the hit rate (requests/sec) remained fairly constant (the low hundreds). We checked the memory and CPU utilization on the servers and they were literally shrugging off the traffic as if nothing was going on. Why is that?

Lock contention, the bane of scalability

Turns out the workers and the web server were sharing an inter-process lock with the intent of making the web latency short and keeping the database insertions in the background. Unfortunately, this strategy backfired. With just a few workers and 10,000 iPhones coming in, the lock contention was resulting in a major pipeline stall. You can simply think about this as a traffic jam when a 6-lane highway has 2-lanes under construction and the cars simply backup and slow down. Every car gets through eventually, but it takes longer and longer to get past the jam. People tout their load tests all the time with their pretty graphs and metrics and what not.

The real value of any load testing solution is when it’s transparent and gets out of your way, but helps you to isolate and improve the performance your app on a continuous basis.

Concurrency vs. Hit Rate

If you are building a web app, either for mobile devices or for the browsers, what looks super scalable for a single user could easily fall apart when the concurrency levels go up. When we launched Studio Scale some time ago, I put together a video showing how we tested a node.js/CouchDB app under scale. The CouchDB queries were (apparently) awfully slow and if you read the update on the blog, turns out we were using a single http.createClient() to make all of the CouchDB requests. What looked awesome for a single user resulted in exactly the same pipeline stall described above when the concurrency went up since the http.createClient() was queuing up multiple requests increasing the overall response time perceived by each client.

blitz.io: Making load and performance testing a fun sport!

Bookmark and Share