Benchmarking Adobe HDS HTTP Video Streaming
Originally published 5th December, 2011
EDIT: After performing these tests I noticed my methodology was SERIOUSLY flawed and these results are basically useless.
In creating the list of 7000 URLs, I generated fragment numbers by incrementing an integer counter.
However — HDS fragment numbers are NOT necessarily contiguous over a range.
i.e. fragments 470 and 471 might not even exist, and the player plays perfectly 468–469–472–473
Requesting non-existent fragments of the module yields HTTP 503 which accounts for the high number of failures observed. I’m testing now with correct fragment lists, and have not seen one failure, even at high concurrency.
To obtain the complete list of fragments, you can examine the packaged file with something like:
f4fpackager — inspect-fragments — input-file infile.f4f
I then edit the results (you could grep -v/sed or whatever …. I just use vim) to take only the last batch of fragments, and REMOVE any lines saying “discontinuity = 1”
THEN you have the correct, complete list of fragments.
Original Post follows:
So I’ve been evaluating methods of HTTP Adaptive Streaming — namely Adobe HTTP Dynamic Streaming (HDS) and Apple HTTP Live Streaming (HLS).
The implementation of HDS seemed ugly and inefficient to me when I first learnt about it, and I thought it looked like it would not scale well. The main reason for this is that for every request, for every fragment of a movie, the origin must look up an index for that movie, and use information so-gained to open a media file (.f4v) and extract a byte range from it — seems like quite a bit of cpu, memory and io overhead right, especially compared to “just be a stupid web server and serve the file”.
I guess how often unique content is requested from an edge depends on things like how deep your edge caches are, entire size of your library etc, but it’s reasonable to assume that often the exact requested fragment will not be cached on the Edge, which will have to ask the Origin.
So in order to actually test my suspicion (hey, perhaps it actually doesn’t matter; perhaps the architecture actually scales OK) — today I’ve been benchmarking on a dev machine, to get an idea of the overhead and efficiency of the Apache module for HDS.
The server is a Dell with a single Xeon X3323, 4gb ram, so not super-beefy.
Web servers were vanilla installs of Apache and Nginx running on CentOS 5.5.
To compare using the module to serving without it, I used my browser to request a fragment from the module. I then saved this fragment to disk, and copied it to a static content directory on the web server. This pre-extracted fragment was then used when testing cases which did not involve the module.
Cases I compared:
1. Apache + HDS Module
2. Apache static
3. Nginx static
This should give me an idea of overhead of using the HDS module, and an implementation where there is no intermediate module (e.g. HLS), and as a bonus a comparison of Nginx and Apache for e.g. edges.
For performing the benchmarks, first I tried just ab, and the HDS module did pretty well compared to straight apache. Actually — surprisingly — sometimes did better.
Until I realized the module is possibly doing some caching, and that this result was only possible because with ab you can only request the one URL; not exactly what happens in the real world, where you get requests for all different, most likely unique fragments.
So I duplicated that one segment/fragment to 7000 directories in the filesystem, and used a load testing tool called “siege” to randomly request them. The main advantage of siege in this case was that it can request from a list of URLs, supplied in a text file. Cool.
Arguments given to siege were:
-f ./urls-7000.txt # my text file of 7000 urls from which to select requests
-i # internet emulation; randomize selected URL from file
-b # no delay between requests; benchmark
-r 1 # run the tests once
-c 1000 # number of concurrent requests; adjusted for each test
I ran a batch of concurrent requests 5 times, and averaged the results for all runs (actually I ran more than 5 to confirm the results were normal).
This time results were more what I expected, a summary of the important parts follows:
Concurrent RequestsHDS ModuleApacheNginx
Availability(%)Requests/secAvailability(%)Requests/secAvailability(%)Requests/sec10010013.110016.910018.550076.111.999.512.910017100058.29.697.913.710017.2
Basically by 500 requests (perhaps less) the module has begun to fall over. By 1000 concurrent requests it’s pretty-much toast. Straight Apache and nginx continue on well, though Apache a little unhappy, while nginx powers on undaunted. I wonder how big companies using HDS scale it. Perhaps they just have monstrous Apache instances, dunno.
If I had more time, some other things I’d like to try:
- Optimizing web servers for the machine
- Different server machines
- More batches of # concurrent requests, would give an interesting way to see (and graph), for each method how the 2 measured metrics change as a function of # concurrent requests — though the coarse approach here is still quite informative
- Faster machine from which to test; my workstation couldn’t push nginx fast enough to break it
Addendum (crosspost from the Adobe forums):
- I think the poor performance across the board re: txn/sec was due to the large number of concurrent requests (100, 500 and 1000 are quite high) and the size of each response (3.2mb).
- On the data used — for the without-HDS scenarios i.e. serving the fragment straight — I extracted that fragment via the f4f module first, and saved to the server’s html dir.
For through-HDS scenarios, I requested e.g. Seg1-Frag1, which was the same fragment, but extracted on the fly by HDS. Granted the f4f segment was 15M and contained 5 fragments — something I should change to have a totally fair comparison.
The 7000 copies are just duplicates of these files, in separate subdirectories; this is simply to simulate 7000 potential unique streams/content. - 1. Tuning apache — yes this is needed for accurate Apache vs. Nginx comparisons, but mainly I’m after overhead of HDS module.
Still, I did some adjustment of MaxClients, but found 256 to be around optimal. - 2. I should have described — content was a 3.2mb fragment.
- 3. Initial runs were made from my local box over local gigabit connection to box physically connected to the same switch as me.
But again, network issues shouldn’t matter as much when I’m really after relative comparison of HDS module vs without.
Later tests (see below) have done locally though with similar trends. - a. Makes sense, so you’d have more origins to spread the load
b. By prepackaging the content, you mean pre-extracting fragments? I don’t think priming the edges is feasible for a large library (what I was aiming to simulate with 7000 copies)
c. Yes, CDN architecture is another topic though — this is just about performance of an origin running the HDS module - So, adjusting:
- MaxClients to 256 (I tried this, 512, 1024 and 256 was the best for this machine)
- reduced the number of concurrent requests to just 20
- ran the test 10 times per batch (instead of once)
- ran 5 batches, for each setup, averaging results - Summary of results were:
- In all cases 100% availability (as you would hope)
- txn/sec for HDS Module, Straight Apache, and Straight Nginx were, in order: 14.6, 15.5, 18.7 - My conclusion is that the module still induces overhead (as would be expected) — though not huge (in this lightweight scenario ca. 5.8%, but at concurrency 100 ca. 17%), still considerable, something I would think an architect would count as a risk at scale.
Losing the module and running straight Nginx (e.g. something you could do with your suggestion of pre-extraction, e.g. John Crosby from RealEyes’ extractor referred to — but not availabe — on http://www.thekuroko.com/) would seem to give improvement in vicinity of 28% (at low concurrency 20) to 41% (at concurrency 100). - Again — and I guess the main outcome confirming my suspicions — dumping the HDS module for straight Nginx — to be conservative — would give expected speedup somewhere between 28% — 41%