This tutorial will make an overview of common MP4 and QuickTime media file format, highlighting some of the most important format fields, analyzing with AtomBox Studio solution.
MP4 and QuickTime media file format is one of the most popular media formats used not only in file-based media solutions, but in live streaming workflows as well. Most of the times, when common audio and video compressions are used, MP4 solutions are compatible with QuickTime file formats and vice versa.
Starting AtomBox Studio application and opening a MP4 media file
MP4 files are built out of elements called boxes. The box in MP4 corresponds to the atom element in QuickTime. Both elements have the same structure are used the same way in both MP4 and QuickTime –we’ll refer to them as atoms. Looking at the opened AtomBox studio, the overall atom structure can be located in the upper-left corner of the solution. Each atom is denoted with its structural position, file offset, name and size.
There are two important atoms which are located on the root main level. ‘moov’ is the one that contains all the multiplex information, such as number of tracks, media description in each track, number of frames in each track and in general all the information describing the media being carried in the media file. ‘mdat’ is the one that holds the actual media samples, multiplexed/interleaved in one binary block.
The ‘moov’ atom is built out of several sub-atoms. The first mandatory atom is ‘mvhd’ (Movie Header), which contains some general information about the media file, such as media file creation time and file duration. The duration is usually presented with two values: time scale, denoting the amount of units per second and duration in time scale units. This way the duration of the media file would be dividing the duration and the time scale.
Each media file is carrying one or more tracks with audio, video and other data. Each track in the media file is described in the ‘track’ atom, located in the ‘moov’ atom. The first atom in the ‘trak’ atom is named ‘tkhd’ ( Track Header ).
Some of the important fields in this atom are the unique track id, identifying the track, track duration in time scale units, video resolution if video track and audio volume in case of audio track. The second important atom in ‘trak’ atom is called ‘mdia’ ( Media ) – it contains more detailed information about the media carried in the track. ‘mdhd’ ( Media Header ) atom is the first atom contained in the ‘mdia’ atom and contains the exact duration of the track, again represented with time scale and duration.
‘hdrl’ ( Handler Reference ) would the second informational atom denoting the type of the track. In case the track is a video one the ComponentSubType in the ‘hdrl’ atom caries a value ‘vide’. If the track is an audio one, the ComponentSubType caries the ‘soun’ value. And in case of timecode track, the field is ‘tmcd’.
The ‘minf’ ( Media Information ) atom is the atom that holds more descriptive atoms regarding the carried samples in the track. The most important atom carried in the ‘minf’ atom is named ‘stbl’ ( Sample Table ).
First of the atoms that each solution accessing the media track would first search for the ‘stsd’ ( Sample Description ) atom inside the ‘stbl’. ‘stsd’ atom contains the stream description carried in the track. In case of video track the atom holds the resolution of the video, the compression format as well as some additional compression information, such as SPS and PPS for H.264 video streams, carried in separate sub-atoms at the end of the ‘stsd’ atom.
In case of audio track, the ‘stsd’ atom denotes the audio sample rate, number of channels, sample size as well as the additional audio compression information, carried in separate atoms at the end of the ‘stsd’ atom.
The ‘stts’ ( Time-To-Sample ) atom contains information about the time duration of each sample in the track. This atom provides a mechanism of synchronization of multiple tracks, keeping lipsync in case of media interruptions. Samples with the same duration can be grouped and represented with a single item in the ‘stts’ table.
The duration of the samples is represented in time scale units.
The next descriptive atom in the ‘stbl’ atom is the ‘stsz’ ( Sample Size ) atom holding the number of samples in the track and each sample size represented in bytes.
‘stsc’ ( Sample-To-Chunk ) is the atom mapping each sample into the group of samples called chunks. Usually the samples are not stored individually in the ‘mdat’ atom, but rather grouped into chunks.
‘stsc’ atom holds the number of chunks in the track and a table describing the first chunk number of the sequence of chunks, the number of samples in each chunk and the description id of the sequence of chunks. This mechanism provides a way to change the media format of the track in middle of the stream.
‘stco’ ( Chunk Offset ) is the atom that holds the chunks offsets in the ‘mdat’ atom. The offsets are absolute files offsets. ‘stts’ ( Sync Sample ) is the atom that contains a list of the sync points, in case of video track with temporal compression. The atom is used by players for seeking functionality. It contains the number of sync sample points and a table with the number of each sync sample.
Those were the most important atoms and fields carried in each MP4 and QuickTime MOV file, that describe the multiplexed media tracks.
The overall media file information is placed in the top-right box. It contains the media duration as well as the most important information for each track. At the bottom of the solution is placed the Hex view box, showing the actual file data represented in hexadecimal format.