What I know about HTTP Adaptive Video Streaming
Originally published 1st December, 2011
For my current project, I’ve been looking at video streaming over HTTP [wikipedia] aka adaptive streaming. It’s the way of the future etc.
(actually some big CDNs are abandoning RTMP support, to support only http adaptive — you have been warned)
The basic concept is, you take one video file, split it into small segments (typically, 1mb-5mb big), store the segments in multiple smaller files, and make an index of the segments.
When the client wants to play back, it gets the index, and based on where in the timeline it wants to play, gets the appropriate segments from the server.
I must admit, when I started learning about it I thought that whole segmenting thing was
- Unnecessarily complex
- Just introduced an extra step and point of failure in the content preparation phase, and
- Made assets harder to manage (split as they are over multiple files)
However my current thoughts on such criticisms are
- Complexity is actually not too bad — architecturally it’s pretty straightforward (though implementations differ in this — more later), and implementations in c using ffmpeg and libav are <1kloc and it run reasonably quickly — though I’m still not sure I’d want to do it on-the-fly for many concurrent streams
- The step is quite painless and simple, so potential for failure is not great
- That’s what subdirectories are for
The main nice things about the approach are
- Quite simple
- Very well-suited to CDNs & edge caching (this is the biggest)
- Servers can be relatively stupid, simple and fast (but more on that later)
- Works over existing protocol; anyone familiar with HTTP knows how the transport works
- HTTP on port 80 means it gets through most firewalls without hassle, unlike e.g. rtmp
- Fast scrubbing — when you move the position slider, the player just looks up the index, and starts requesting the relevant segments
- Client-side caching — the segments are just files the client retrieved and are easily cached
- The index gives flexibility such as having one stream/filegroup for
- target devices — i.e. encodings targeted at specific playback platforms
- bandwidth conditions (this is the “adaptive” part — the ability for the player to switch between streams depending on network connectivity) - If you were Apple, you would add “does not require flash” — more on this later
(yes I know you could do some of these with RTMP, RTP etc, it’s not “what RTMP can’t do”, just positives of the approach)
When you get to actual implementations, it starts to get not cool; unsurprisingly all three of Adobe, Microsoft, and Apple have their own mutually-exclusive implementations. Well done retards, way to play together.
Apple — “HTTP Live Streaming” aka HLS
From what I can tell, Apple have been the main proponents of the “http adaptive streaming” concept, as a way to get long-format streaming video in their devices (and web) without Adobe and Flash.
Their devices and browser support native playback of HLS — so you can make an HTML page with a video tag, src=index.m3u8, and it plays natively in Safari, and Quicktime can natively play index.m3u8.
Many people come to HLS because they need to support it in order to stream to Apple devices, and for them it’s a bit of a pain to have to jump through this shiny and well-designed hoop which Steve has obstinately placed in their path.
I actually came to it willingly, from the direction of just looking for a good streaming protocol and finding it to be the best. Because I’ve decided to use HLS for this project, it’s the implementation I know (hence will write) the most about.
Overview of HLS
The video stream is segmented into multiple .ts (transport stream) files
These are indexed in a .m3u8 playlist, a brief sample of which might be:
#EXTM3U
#EXT-X-TARGETDURATION:3
#EXTINF:3,
tintin-1.ts
#EXTINF:3,
tintin-2.ts
#EXT-X-ENDLIST
Segments are optionally encrypted with 128 bit AES, in which case you put an “EXT-X-KEY” above the segments, specifying where to get the key for decryption. Very useful thread on encrypting ts for HLS using OpenSSL — the final solution as posted by Barry O worked well for me.
Things I like about the implementation
- Much well-written doco
http://developer.apple.com/resources/http-streaming/ - Proposed standard to IETF
http://tools.ietf.org/html/draft-pantos-http-live-streaming-07 - Architecturally simplest (compared to Adobe — see later)
Indexes are 100% plaintext .m3u8 playlists (extended .m3u)
One segment per file - Good open source tools & support (though not from Apple, grr)
- Not tied to any one platform
- Servers don’t need to do anything except what they do best — serve files over HTTP; there is ZERO processing overhead — very important for those considering scalability
- Based on proven, open, industry standard technology — plaintext, SSL, AES
Segmenting for HLS
Apple provide only a binary for OSX, which is a bit disappointing; if they really want to encourage adoption they should certainly go open source.
The stream segmenter is preinstalled on many macs; try just typing “mediastreamsegmenter” in your console. Note the stream segmenter is targeted at on-the-fly or live streams over UDP, not a static file. To get a segmenter which is meant to operate on files, you need to have an Apple developer account; unfortunately when I installed xcode from the CDs that came with my macbook, I didn’t get the file segmenter. More info about Apple’s segmenter.
Thankfully, individuals have written and distributed open source alternatives, which I’ve tested, and they work well.
The first I found was this one by a guy called Carson McDonald, which he writes about here (there’s a link to his segmenter on github in that post). It didn’t work for me out of the box, as it was written against a previous version of ffmpeg and libav; I had to make some minor changes to get it to compile against the latest versions, and my fork of this is here.
In general, as I said it works, but I think Carson hasn’t had a whole lot of time to update it, and there don’t seem to be many people working on it.
Note his work also provides ruby code intended to wrap the segmenter (which is written in c) and do some extra things he obviously needed to do — automate distribution/pushing of the output files over ssh/ftp, store presets for this in a config yaml.
The second, which I found more recently is on google code here.
The guy that wrote that took some code (with attribution) from Carson’s work and built on it.
There seem to be more people working on it — commits from people from Ooyala and Sorenson (which would lead me to believe that they are using it), and (at least as at 2011–12–01) it compiles against the latest version of ffmpeg and libav.
Playback of HLS
As I mentioned, Apple software can play HLS natively, and it does it very nicely.
However for my project, I needed to play it back within Flash.
For a long time this was a challenge. Obviously Adobe only want to push their implementation, so have only added support for such to their Open Source Media Framework.
Ex-Adobe employee (and now director of Skype) Matthew Kaufman has written code to play back HLS in OSMF. Unfortunately, it’s only Matthew doing this (with one other minor committer), and obviously he can’t devote heaps of time to it — though he informs me he might work on it over the coming holidays, so it’s far from dead.
Because of this, when I checked it out, it didn’t compile against the latest version of OSMF; they have refactored this area of the framework recently, things like event handling and class abstraction. With some hacking I did manage to get the current version of Matthew’s code to compile against the latest version of OSMF, but playback was audio only, with long pauses.
I’ll be honest, I’m no ActionScript developer, and the prospect of making this work 100% was slightly daunting (and in my role, not something I should be focusing much time on anyway). So I have had some devs look at it, but it’s nontrivial stuff, and tbh most average coders aren’t up to it; you need someone who really knows their stuff to get that working (have since made contact with a couple, and they’re understandably in high demand).
The breakthrough in HLS playback with flash, came when I found this development branch of jwplayer implementing HLS. There are sample tests of this code here, and all except the first worked well with one click for me. I admit, I was quite excited to open up Wireshark and see the player working perfectly as advertised.
According to this thread in which JW himself is participating, there are still some issues to iron out, but I’ve downloaded, compiled and tested this and it’s looking pretty good.
It also doesn’t support the segment encryption yet, but tbh that shouldn’t be too difficult to implement with as3crypto.
It should be noted though, this adaptive playback in JW Player is only in development — further testing shows some medium-level issues with blockiness, see my posts ITT
Criticisms of HLS
[Edit — I notice Wowza provide some licence functionality for HLS which I’ll definitely have to check out]
Probably the strongest criticism of HLS, especially in a commercial context with copyrighted content, is lack of built-in DRM; whereas Adobe and Microsoft’s implementations specify out-of-the-box licence servers which authenticate users and securely provide decryption keys, HLS does not. Which is ok if you’re up for rolling your own (as we are), but for those businesses more on the “consumer” side of things, without such technical skill, and desiring the comfort of letting someone else deal with all that, I can see this being a problem.
Just on DRM — it has been interesting following content owners and distributors’ shifting focus from draconian DRM; gone are the days of “you can only play this back on one authorized device, and if anything in the chain gets broken, you’re screwed”. I guess with sites like thepiratebay just a click away, the advantages of restrictive DRM soon become relatively small compared to the inconvenience to consumers (which results in badwill and lost revenue). My impression is that content owners are more likely to prefer an approach which makes it very difficult to rip content for your average consumer, but also does not inconvenience your honest consumer at all. Something like this guy talks about.
Finally xkcd has some very good comics on DRM here, here and here.
Adobe — “HTTP Dynamic Streaming” aka HDS
Adobe introduced new functionality to Flash to support http adaptive streaming in flash 10.1, things like:
- NetStream.play2 — method to seamlessly switch between video streams
- NetStream.appendBytes — method to append raw bytes to video pipeline
Adobe’s page on HDS is here.
The protocol is reasonably documented (though not as well as HLS), with some incomplete doco about their file formats here. I say incomplete because I couldn’t find anything about f3x format.
Key Points
Perhaps to be fleshed out when I get time.
In general — for a commercial solution provided by a big company, which makes money from selling you the products to implement the solution, I found the out-of-the-box workiness, and support (forums) disappointing.
Relies on free but closed-source tools; not nearly as many open source alternatives as there are for HLS.
There is an extra level of abstraction in segmentation
In HDS the original media is broken into .f4v “segments” (one per file), and segments contain multiple “fragments”, where fragments are the fundamental media atoms requested by players.
This is potentially confusing when compared to HLS, where the fundamental media atom as requested by players is a “segment”, and there’s only one per .ts file
Each segment has its own index (.f4x), which tells the server where in the segment to get a certain fragment. This file is binary and tbh I couldn’t find specs for it.
The primary index for a stream is a “manifest” (.f4m)
This is an xml file, where the important stuff (i.e. fragments) is a base64-encoded serialized AMF. This is very human-unreadable, and anything creating/maintaining these files will need to deal with it. As an example, a simple f4m is:
<?xml version=”1.0" encoding=”UTF-8"?>
<manifest xmlns=”http://ns.adobe.com/f4m/1.0">
<id>
tintin-crop
</id>
<streamType>
recorded
</streamType>
<duration>
31.021999999999998
</duration>
<bootstrapInfo
profile=”named”
id=”bootstrap5634"
>
AAAA5GFic3QAAAAAAAAABwAAAAPoAAAAAAAAeRYAAAAAAAAAAAAAAAAAAQAAACFhc3J0AAAAAAAAAAACAAAAAQAAAAQAAAACAAAAAwEAAACXYWZydAAAAAAAAAPoAAAAAAgAAAABAAAAAAAAAAAAABRQAAAAAgAAAAAAABRdAAAOEAAAAAMAAAAAAAAiigAADIAAAAAEAAAAAAAALuAAAAAAAQAAAAUAAAAAAABGiwAAElwAAAAGAAAAAAAAWPQAAAdsAAAABwAAAAAAAGBzAAAYnAAAAAAAAAAAAAAAAAAAAAAA
</bootstrapInfo>
<media
streamId=”tintin-crop”
url=”tintin-crop”
bootstrapInfoId=”bootstrap5634"
>
<metadata>
AgAKb25NZXRhRGF0YQgAAAAAAAhkdXJhdGlvbgBAPwWhysCDEgAFd2lkdGgAQJ4AAAAAAAAABmhlaWdodABAiYAAAAAAAAAMdmlkZW9jb2RlY2lkAgAEYXZjMQAMYXVkaW9jb2RlY2lkAgAEbXA0YQAKYXZjcHJvZmlsZQBAWQAAAAAAAAAIYXZjbGV2ZWwAQEQAAAAAAAAADnZpZGVvZnJhbWVyYXRlAEA4AfuKCWrQAA9hdWRpb3NhbXBsZXJhdGUAQOWIgAAAAAAADWF1ZGlvY2hhbm5lbHMAQAAAAAAAAAAACXRyYWNraW5mbwoAAAACAwAGbGVuZ3RoAEFFR3YAAAAAAAl0aW1lc2NhbGUAQPX5AAAAAAAACGxhbmd1YWdlAgADdW5kAAAJAwAGbGVuZ3RoAEE04AYAAAAAAAl0aW1lc2NhbGUAQOWIgAAAAAAACGxhbmd1YWdlAgADZW5nAAAJAAAJ
</metadata>
</media>
</manifest>
I’m probably overly-critical about this non-human-readability; any decent coder who can base64 encode/decode then read ASF in their head should have no problems editing these files with their favourite text editor (vim). It’s probably really neat once you get the hang of that.
Segmenting for HDS
AFAIK no open source version.
Adobe provide binaries for Windows and Linux.
In my trials (and those of others on the Adobe forums) — their segmenters are picky about the input file’s format/keyframes/GOP and can silently produce incorrect output if these aren’t right.
HDS Playback
- Flash — Currently OSMF only — however this means using AIR can playback on most big Smart TVs
The server requires Apache with Adobe’s closed source f4fhttp module installed.
This module is required, in order for incoming player requests for fragments (which as you recall, are grouped into segment files), to be translated to segment and byte offsets, then extracted from the files, and sent to the client.
This means that your origin server — the core bottleneck at the heart of your CDN — must run Apache, and must do this work, for the entire network, for every fragment requested.
This CPU and IO work (impact on memory) required of the origin is one thing I really dislike about HDS; it’s inefficient and there’s absolutely no way it will scale as well as HLS.
A workaround which a couple of smart guys have suggested, is to pre-extract the fragments and just store them on the origin. If you do this then you can eliminate the module and the web server can just do what it does best. Wow just like HLS. Except for the stupid fact your content preparation pipeline is now the silly-walk of: encode -> segment -> unsegment.
DRM with Flash Access
Adobe’s doco on Flash Access is here
Tbh I think it’s more complex than it has to be, but it obviously works.
The short version is
- Java (Tomcat)
- Media encrypted 128 bit AES
- The key for this is called “Content Encryption Key” (CEK), and this is then encrypted with the Licence Server’s public key, and stored with the content
- Player must first authenticate with Licence Server, whereupon
Licence server retrieves CEK
Decrypts CEK
Encrypts CEK with player’s public key
Sends this to player - Player can now decrypt CEK hence the media
Microsoft — “Smooth Streaming”
- It’s Microsoft
- So unfortunately it is cancer, and you need to run Windows and IIS and start playing Halo
- Which means I have only cursorily investigated, but really it was never on the table
- Looks ok, though seems a similar level of complexity to Adobe, very DRM-oriented
- Netflix use it
- Only player implementation is currently in Silverlight