Over the weekend, I was experimenting with CouchDB to see if it can pass the C10K barrier. Some of the performance optimizations I made along the way are really OS-level optimizations that affect MochiWeb (erlang web server) and fairly well documented in many blogs. This one by @metabrew in particular is a pretty good read, since it focuses on Erlang and MochiWeb. While I am a performance junkie, I am not an Erlang hacker. So this is a call for help to the CouchDB hackers for recommendations on scaling out CouchDB.
All load tests were done on an m2.2xlarge EC2 instance in us-west region with blitz.io. According to the AWS documentation, m2.2xlarge (High-Memory Double Extra Large Instance) comes with 32GB of memory, 4 virtual cores, high I/O performance and 850GB instance storage. Instance storage, supposedly, has better read performance and doesn’t have the network contention as EBS.
This is a first run against a vanilla CouchDB 1.1 that was simply compiled from source using a modified bootup script that @evilmike wrote. We used the following blitz line to ramp up to 1,024 users and then sustain for another 30 seconds:
-r california -p 1-1024:30,1024-1024:30 http://our.couch/
As you can see, as soon as we get to 1,024 users, all hell breaks loose. The hit rate drops, errors pile up. This is a simple fix. Since we are running CouchDB as a couchdb unprivileged user, we simply add this line to /etc/security/limits.conf and restart CouchDB.
echo "couchdb - nofile 10240" >> /etc/security/limits.conf
This increases the max open files for CouchDB instantly reducing the errors encountered at 1,024 concurrent users. Let’s add a few more tweaks that we already use in blitz.io‘s scale engines.
cat << 'EOF' >> /etc/sysctl.conf net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 net.ipv4.tcp_mem = 50576 64768 98152 net.ipv4.tcp_no_metrics_save = 1 net.core.netdev_max_backlog = 5000 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_max_syn_backlog = 100000 net.ipv4.tcp_sack = 0 net.core.somaxconn = 65535 net.ipv4.ip_local_port_range = 1024 65535 EOF
And yes add this too:
ifconfig eth0 txqueuelen 10000
Let’s see what happens if we ramp to 10,000 users over a couple of minutes:
As you can see, while we have improved things quite a bit, we still have a lot of work to do. These load tests are currently not hitting the database, view indexes, etc, etc, just to keep the problem bounded. We are simply seeing how much the CouchDB MochiWeb application scales out. Up until 2,300 concurrent hits, the hit rate is pretty linear. Beyond that the hit rate starts fluctuating quite a bit, the response time increases from 7ms to close to 3 seconds!
Like I said in the beginning of the blog, I’m a Erlang n00b, so I’m sure there are others in the community that can help with the performance optimization tuning. Based on a few tweets exchanged with @janl, it appears that Erlang uses system malloc/free. Replacing this with tcmalloc will obviously be a pretty big improvement. There’s also tweaking the scheduler, number of allocated processes, heap/stack sizes, invoking the GC at critical points that all could help with this kind of scale.
So if you are a CouchDB/Erlang committer/developer/hacker and you are interested in working on this (with patches landing in trunk), we are happy to bump your blitz.io concurrency to 10,000 for a week so you can see what kind of performance improvements you can make. This can include, simple ERL_FLAGS or much more involved than that. If you are interested drop me a note: @pcapr.
Update: Or if you simply have config changes/suggestions, leave a comment and I’ll be happy to try it out and report back the results.