Recently, YouTube has changed/evolved the way it delivers video contents. It used to be a big fan of HTTP progressive download (downloading a video file by a single request just like downloading a data file) until last year. However, from this year, many efforts have been made by YouTube (e.g. switching to chunking and adaptive streaming) for improved video delivery. We will focus on "switching into chunking" this time.
1. Conventional Video Delivery of YouTube
Before April 2013, YouTube videos were requested and delivered as follows:
Please note that only downloading to PCs will be discussed in this post.
Youtube provides videos in many different resolutions (1080p, 720p, 480p, 360p, 240p). With the conventional delivery method, only the ones in 360p resolution were requested and delivered in chunks, whereas others were downloaded as one whole file by a single request (HTTP GET).
For detailed description, please see our technical document "Analysis of YouTube video request and delivery (360p, 720p)".
2. What's In and What's Out? (Progressive Download is out!)
The biggest change made is that, until last year, files in different resolutions (1080p~240p) were delivered as one file containing both video and audio. And this allowed users to see video and hear audio by simply playing the file. However, since April, 2013, video files and audio files are separated from each other, and hence users have to download both of them separately.
But, the most important change was made in the way of delivering contents, i.e. from HTTP progressive download to chunking. Here, chunks do not refer to the real chunks that are split into pieces and kept as separate files (as 1.ts, 2.ts, 3.ts,...) in the server as in HLS (Apple's HTTP Live Streaming). But they refer to the virtual ones, individual parts of a video file, that a device (PC) has downloaded one by one as needed by performing range requests (range=0-4227071, range=4227072-8454143,...). So, YouTube can keep only one video file per resolution, and thus manage the files with ease.
The figure below shows how the patterns of downloading a video file change when watching the video in 720p resolution. Before April 2013, as seen in figure of [BEFORE], the whole file of size 88.8 MB was downloaded upon a single request (HTTP GET). In our actual measurement, the Gangnam Style video file, 4 minutes and 12 seconds long, was downloaded in 40 seconds as seen in the figure below. This means the whole file is saved in the PC even when the user stops watching it and leaves YouTube after only 40 seconds. From the figure, we can also see that i) YouTube server worked so hard to deliver the unwatched part of the video, from the 41st second to the end, to the PC for nothing, and ii) it delivered the initial part of the video at its maximum rate, and then the rest at an adjusted lower rate after certain amount was delivered to the PC, by performing appropriate pacing (throttling) during the delivery. Both i) and ii) cause high loads on the YouTube server.
The figure [AFTER] shows the patterns of requesting and delivering contents: The device made back-to-back requests for the first 3~4 chunks to fill the receiving buffer. Then, it made requests for more chunks in about every 20 seconds. If the user leaves without finishing the video, requests for additional chunks are not made, allowing only about the volume actually watched to be downloaded. Eventually, the YouTube server's load is significantly decreased.
Changes in YouTube streaming in the particular resolution selected by user
Why did YouTube switch?
With the conventional HTTP progressive download method, files can be downloaded as a whole file only. As seen in the figure above, the entire file, 4 minutes and 12 seconds long, is downloaded within 40 seconds. So, even when the user leaves after watching the first one minute only, the entire file is delivered through the YouTube server anyway.
The YouTube server delivers, in vain, even the volume not to be watched by the user to the device (PC). On the other hand, with the new method, a device makes requests for chunks only when the user is still watching the video. As a result, the loads on the server are significantly decreased, making it possible for YouTube to serve more users with fewer servers.
This will probably handle telecom operators' complaints about increasing free-riding traffic of YouTube to some extent.
In a word, YouTube, by giving intelligence which used to be in the YouTube server to devices and making the server dummy, converted its video delivery logic into one that allows less costly CDN expansion.
I've not read on this concept of chunking and it's an interesting topic. With regards to LTE mobile operators, is chunking implemented as well? Or is it just adaptive streaming?