Multi-dimensional data visualization

Way back in grad school, I was working on a project involving Auralization. The key idea was that your ear can process multi-dimensional data (pitch, volume, instruments, silence, tempo, etc) way better than your eyes can (try closing your eyes and listening to a Bach Fugue). So back then, we tried to take these types of data (stocks, sales reports, expenses, etc) and created MIDI files out of it to understand trends. Ever since I saw the Hans Rosling’s TED Talk I’ve wondered the applicability of this type of visualization on something other than economics.

Enter pcapr! With the recent influx of packet captures on pcapr, we are rapidly exceeding the amount of data that one can process and searches become harder since there’s just a lot of protocols and packets. So we focused on a few different things to unravel the meaning of it all:

  • How does the coverage and number of pcaps for a given protocol trend over time?
  • When was a protocol first introduced into pcapr?
  • How do I quickly get to a pcap uploaded on a certain date?
  • What is 42 and what does it have to do with packet captures?

Gap Minder, used by Hans Rosling, has since been converted by Google into a visualization mashup component. And CouchDB, which powers all of pcapr, provides all the necessary ingredients to rapidly harness data using map/reduce.

Armed with a 20-line Ruby script, we extracted 5-dimensional data from pcapr:

[ time, protocol, %coverage, #pcaps, %contribution ]

and ended up with Trends.


Pcapr Trends

What you find is a beautiful orchestration of multi-dimensional data visualized in a nifty way. As you drag the slider, it’s immediately obvious, when a protocol entered pcapr, the overall coverage of the pcap that contained the protocol as well as the total number of pcaps that contained that protocol and finally the meaning of 42.

Just so you know, there were no packets harmed in this process!

Bookmark and Share