CouchDB, DNS and Scaling the Cloud

Just got back from Interop where I was part of a panel that talked about cloud computing. We discussed a lots of interesting topics like migration, scaling, hybrid clouds and what not. NoSQL was definitely a discussion point since I personally believe you can’t talk about cloud without also talking about NoSQL.

The scaling part though got me thinking. The current approach for scaling any cloud app is to use your IaaS provider to just add more compute power and deal with it. I tend to think a little differently from this. xtractr on pcapr for example, uses a hybrid cloud model. You download a single binary that you use for indexing large packet captures. When you now want to search, extract, report on this, the application is delivered to your browser which then uses JSONP (until HTML5 is truly prevalent with cross-domain Ajax requests) to communicate to your instance of the xtractr. What this means is when you are busy crunching packets, the server load on pcapr is zero! Which implies infinite scaling, ‘cos the load is truly distributed across all of our users.

DNS

DNS is something that we all use everyday. It’s intricately wound up into the fabric of the Net. Oversimplified, DNS maps human-readable domain names into cryptic IP addresses. But if you pause for a moment and think about it:

DNS was the first NoSQL, multi-master, pull replication, distributed document database that was created over 20 years ago!

With DNS, nobody is running a DNS cloud. Instead the information contained is truly distributed in the form of DNS servers that anyone can access. All you need is the next authoritative hop and the redirects (rcode == name-error) and caching (TTL in the resource records) automatically distribute the data at run-time. If a server goes down, the next server is tried and life just goes on. Nobody owns or holds all of the DNS records.

This is beautiful sharding at work, on a global scale.

You can think of a record in DNS as a document. And yes there are multiple types of documents like CNAME, A, AAAA, TXT etc. Each document has something called a TTL which tells you how long this document can exist before it needs to be fetched again from the source.

CouchDB

The parallels between DNS and CouchDB are just too striking. CouchDB is a fully distributed JSON document store with multi-master replication and map/reduce for queries. I was idly wondering (had some time at the airport) how a CouchDB-based DNS server would look. In other words all you run is CouchDB and you resolve domain names by doing HTTP requests to your server. If you patch gethostbyname to now talk HTTP, this will be transparent to applications.

BTW, you can administer your DNS server purely through the Futon UI or we can build a CouchApp for that as well using the show and list functions. You don’t have to deal with gnarly label compression issues, pointer loops in resource records and other crazy things under the hood. Just plain old JSON documents. And you get to replace DNSSEC with HTTP Digest Authentication. Like I said, idle dreaming.

DNS Records as JSON

This is prolly the simplest of all. Each DNS record with all its gory details exists as a simple JSON document. For example an A record might look like this:

{
    type: "a",
    address: "1.2.3.4"
}

and here’s an SOA (start of authority) record

{
    type: "soa",
    mname: "mudynamics.com",
    rname: "root.mudynamics.com",
    serial: 2009010800,
    refresh: 10800,
    retry: 1800,
    expire: 604800,
    minimum: 86400
}

Now in order to lookup domain name, you would just have a view that emits the name and address which are then returned as answers. With CouchDB list functions you can easily provide a nice little RESTlet to resolve domain names. For example:

http://1.2.3.4/db/_design/domains/_list/resolve/q?key="mudynamics.com"

Caching and Replication

Let’s say you get a request for a domain that doesn’t exist (it’s not in the view index), you can either generate an HTTP redirect to the next hop or recursively fetch the document from the next hop. The DNS protocol itself has an RD (recursion desired) bit that tells the server what to do. Either return a 404 Not Found or pursue the next hop. As long as the records that are exchanged between CouchDB instances set the HTTP Cache: header to match the TTL of the records, you effectively get to use the CDN’s deployed all over for caching. :-)

We do need to handle TTL expiration which will cause CouchDB to automatically delete documents which of course auto propagates during replication.

Topic Cloud

Here’s another example for you to consider. Let’s say we use the domain name dotted notation for various topics (similar to newsgroups alt.computers.sale or comp.lang.ruby). So I can bring up a CouchDB instance that has various JSON objects about say, packets. My authoritative server tells the whole world all there is to know about packets. Now when I bring up this app and add it to the Topic Cloud, replication kicks in and pretty soon people looking for all.about.packets start replicating from my app. How cool is that? If I have a my.family.pictures, then I’m going to only allow my family to replicate the documents from my server using HTTP Authentication. Simple, no?

The Network News Transfer Protocol did exactly this, but the big difference when using CouchDB is:

The app itself, along with the data, is replicated and cached!

Disclaimer

Okay, there are a million reasons why this wouldn’t work and why you shouldn’t even attempt anything like this, but the point of the blog is just to use DNS as one example of a truly distributed document store with pull replication that doesn’t require requesting another instance of a compute quanta to scale your cloud app. Since the documents are all over the servers with nice caching in the middle, you get infinite scale as long as people participate.

So if you have some free cycles, get on with GitHub and go nuts with this idea. Git, BTW, is another great example of completely decentralized revision control system. Linus has already talked quite a bit about this, so I’ll stop already.

Bookmark and Share