Personally, as a consumer, I love Netflix, but it hasn’t been the darling of service providers and ISP’s lately. You can read about the Canadian ISP saga here. Our imminent next release of Mu Studio will enable our customers to recreate 1,000,000 concurrent Netflix users watching a movie, so they can understand the impact of their application aware networks. One thing is pretty clear: compared to YouTube, Netflix inflicts so much more pain on the network. Credit for this blog goes to Yuri who did all the reverse engineering. And he’s signed up to Netflix to watch movies during work for “research” purposes. :)
Quick Firewall, DPI Primer
Before we talk about exactly how we recreated Netflix traffic flows here’s a quick primer on DPI and performance costs inside most firewalls. There are four common performance costs within firewalls:
The firewall has to allocate and manage some state for each logical connection that’s flowing through it. The more connections/second, the higher the state setup/teardown overhead and hence higher cost-per-flow. From its perspective, once a flow has been marked by the policy as “safe”, the packets for that flow typically will go through the network processors directly. The more packets/second on flows that need to be continuously inspected, the higher the cost-per-packet. Finally there’s the cost-per-byte. For application-aware networks that can do application QoS, security and policy enforcement, the more bytes that are scanned for patterns using signatures, the higher the cost. Pretty straightforward.
The cost-per-message on these networks tends to be very high if the network does on-the-fly SSL decryption, HTTP de-chunking and gzip/zlib decompression to inspect the application payloads. For example, transferring 1MByte of data using a single HTTP request-response pair is a lot friendlier to the network than transferring the exact same amount of data using 100 HTTP request-response pairs.
How we recreated Netflix flows
Since a big chunk of the Netflix traffic runs over SSL, starting with packet captures was a no-go. So we instead went with the Tamper Data plugin for Firefox which can output all of the exchanges (pre-SSL) into XML. This preserves all of the headers, cookies, the actual payloads. In short, everything that transpired during the transaction. While we also have converters from HAR to MuSL, we mostly went with Tamper Data because we could use pull parsers for incremental parsing instead of loading up a very large JSON file into memory. Watching a simple 30-second movie on Netflix resulted in a 31MB XML file. We then proceeded to convert this XML file to MuSL (Mu Scenario Language) which can easily model multi-host, multi-transport, stateful transactions.
Statistics and Lies
Here are some quick observations about the Netflix session. During the login, watch-movie, logout sequence, we observed 38 distinct HTTP(s) connections to 36 different hosts with over 750 distinct request-response pairs to various URL’s. Including the DNS resolution for the hosts, you get about 72 distinct flows in the application-aware network between the browser and Netflix’s CDN’s for a single user. Almost all of the Netflix flows were gzip-compressed and the video stream comprised of multiple requests to byte-range-encoded URL’s. Each video segment requested would return about 300-350K bytes of data in rapid succession.
Here’s the list of hosts that were contacted during this time:
800+ request-response pairs/minute for a single user and you understand why the operators and ISP’s are not happy.
This is very different from say, YouTube, which streams in the mp4 data over a single HTTP request-response pair. Bandwidth, turns out, is just one dimension of why streaming video has an adverse impact on application-aware networks. The other dimension is the number of flows that are being setup and teardown as well the sheer number of request-response pairs that cause the state on the firewalls to be constantly churned.
We are planning on releasing the Tamper-Data -> MuSL translator to our customers soon. We also plan on publishing the resulting scenario to Mu TestCloud shortly so you can take an instance of this one user watching the video and instantly scale it out to 1,000,000 concurrent users to assess the impact it has on your application-aware network. In the meantime enjoy this graph from Mu Studio (thx Soo-Hwan!) that shows the scaling up of 200,000 Netflix users concurrently watching the 1-minute movie.