What started off as a way to fully unit test xtractr, turned out to be a Gem, literally. First xtractr, then nuggets and now a gem. You follow? Seriously though, we are happy to announce a Ruby gem for xtractr which takes all the goodness of Ruby and interacts RESTfully with xtractr for oh-so-fun packet mining and troubleshooting all from within IRB.
We took the logical objects within the xtractr‘s index and turned them into a set of Ruby objects with method chaining and map/reduce. We still like the collaborative app on pcapr, but for those packet geeks that want to play, this gem is for you.
After you install the gem and run the xtractr web server, this is how the story unfolds. First launch IRB with auto completion and require mu/xtractr.
$ irb -rirb/completion -rmu/xtractr
Create an instance of Mu::Xtractr that can talk to the xtractr web server RESTfully.
irb> xtractr = Mu::Xtractr.new => #<xtractr localhost:8080>
And now for the fun.
What are the top 3 DNS queries?
First we grab all DNS flows (a collection) and then map/reduce the dns.qry.name on those flows. Since what’s returned is just a plain old Ruby Array nicely sorted by the count, we then slice the first three elements in the Array.
irb> xtractr.flows('flow.service:DNS').count('dns.qry.name')[0..2] => [#<count ax.search.itunes.apple.com 8>, #<count a1.phobos.apple.com 6>, #<count ax.itunes.apple.com 6>]
Services used by the top talker (based on bytes sent/received)
Here we start with all flows and run a map/reduce summing all the bytes keyed by the unique src IP. This returns an array of Mu::Xtractr::Sum objects sorted by the bytes. We then take the first one and map/reduce again to determine the unique services that this IP has.
irb > xtractr.flows.sum('flow.src', 'flow.bytes').count('flow.service') => [#<count HTTP 20>, #<count DNS 11>, #<count MDNS 1>]
Trend analysis of frequently occurring terms
We want to find out what are the most frequently occurring terms in the HTTP request URI. So we grab the field from xtractr and get the terms which is sorted by the frequency of occurrence already. The frequency is measured by the number of packets this term was seen within the URI.
irb> xtractr.field('http.request.uri').terms[0..2] => [#<term:http.request.uri us 81>, #<term:http.request.uri 75.jpg 46>, #<term:http.request.uri r1000 45>]
Save the packets that contain the trending term
Okay, we know the top trending term is us. We now want to grab all the packets that contain this term and save them into a new pcap for further analysis. Remember that xtractr can index flows across multiple pcaps so it’s possible that the newly constructed pcap is being stitched from multiple sources.
irb> xtractr.field('http.request.uri').terms.packets.save('trends.pcap') => #<packets:http.request.uri:us>
Find URL’s that are taking too long to download
Since during the indexing process, xtractr calculated the duration of each flow, this one’s pretty easy. We first get a collection of flows that lasted for more than a second and then map/reduce the URI within those flows.
irb> xtractr.flows('flow.proto:HTTP flow.duration:>1').count('http.request.uri') => #<count /big_image.jpg 2>
How about we save that flow that took so long?
From the count returned in the last step, we grab the first packet, then the flow that the packet belongs to and save the flow to a new pcap.
irb> xtractr.flows('flow.proto:HTTP flow.duration:>1').count('http.request.uri').packets.first.flow.save('slow.pcap') => #<flow:15 HTTP 192.168.1.10:49170 > 126.96.36.199:80 GET /big_image.jpg HTTP/1.1>
Now that we’ve piqued your interest, here’s the code:
Enjoy and remember to download xtractr first!