The MPEG standard

What is MPEG ?

MPEG (Moving Pictures Experts Group) is a group of people that meet under ISO (the International Standards Organization) to generate standards for digital video (sequences of images in time) and audio compression. In particular, they define a compressed bit stream, which implicitly defines a decompressor. However, the compression algorithms are up to the individual manufacturers, and that is where proprietary advantage is obtained within the scope of a publicly available international standard. MPEG meets roughly four times a year for roughly a week each time. In between meetings, a great deal of work is done by the members, so it doesn't all happen at the meetings. The work is organized and planned at the meetings. MPEG itself is a nickname. The official name is: ISO/IEC JTC1 SC29 WG11.

   ISO:  International Organization for Standardization
   IEC:  International Electro-technical Commission
   JTC1: Joint Technical Committee 1
   SC29: Sub-committee 29
   WG11: Work Group 11  (moving pictures with audio)

Is MPEG patented?

MPEG core technology includes many different patents from different companies and individuals worldwide but the MPEG committee only sets the technical standards without dealing with patents and intellectual property issues. The effort to form an MPEG-related licensing entity that would provide efficient access to intellectual property rights (IPR) necessary for the implementation of MPEG technology worldwide started with resolution 3.9.6, adopted at New York (July 1993) MPEG meeting, recommending WG 11 support initiative leading to the establishment of a patent pool for MPEG-2, outside of MPEG,...WG 11 expresses its support for the (outside of MPEG) meeting of licensing experts which Mr. Baryn Futa [of Cable Television Laboratories, Inc.] has offered to convene...

The MPEG-IPR group activities where officially announced after the Paris (March 1994) meeting attended by representatives of more than 50 companies who are among the manufacturers and users of digital compression technology worldwide.

Does it have anything to do with JPEG or MHEG?

Well, it sounds the same, and they are part of the same subcommittee of ISO along with JBIG and MHEG, and they usually meet at the same place at the same time. However, they are different sets of people with few or no common individual members, and they have different charters and requirements.

JPEG is for still image compression, JBIG is for binary image compression (like faxes).

MHEG is an emerging software standard that seeks to specify a common use of multimedia/hypermedia objects across applications, platforms, and services. (See IEEE Communications Magazine, May 1992, Standardizing Hypermedia Information Objects p. 60+.)

The most fundamental difference between MPEG and JPEG is MPEG's use of block-based motion compensated prediction (MCP), a general method falling into the temporal DPCM, category.

The second most fundamental difference is in the target application. JPEG adopts a general purpose philosophy: independence from color space (up to 255 components per frame) and quantization tables for each component. Extended modes in JPEG include two sample precisions (8 and 12 bit sample accuracy), combinations of frequency progressive, spatially progressive, and amplitude progressive scanning modes. Color independence is made possible thanks to down-loadable Huffman tables.

Since MPEG is targeted for a set of specific applications, there is only one color space (4:2:0 YCbCr), one sample precision (8 bits), and one scanning mode (sequential). Luminance and chrominance share quantization tables. The range of sampling dimensions are more limited as well. MPEG adds adaptive quantization at the macroblock (16 x 16 pixel area) layer. This permits both smoother bit rate control and more perceptually uniform quantization throughout the picture and image sequence. Adaptive quantization is part of the JPEG-2 charter. MPEG variable length coding tables are non-downloadable, and are therefore optimized for a limited range of compression ratios appropriate for the target applications.

The local spatial decorrelation methods in MPEG and JPEG are very similar. Picture data is block transform coded with the two-dimensional orthonormal 8x8 DCT. The resulting 63 AC transform coefficients are mapped in a zig-zag pattern to statistically increase the runs of zeros. Coefficients of the vector are then uniformly scalar quantized, run-length coded, and finally the run-length symbols are variable length coded using a canonical (JPEG) or modified Huffman (MPEG) scheme. Global frame redundancy is reduced by 1-D DPCM, of the block DC coefficients, followed by quantization and variable length entropy coding.

       MCP                  DCT                    ZZ              Q
 Frame -> 8x8 spatial block -> 8x8 frequency block -> Zig-zag scan -> 
                    
             RLC                  VLC
 quantization -> run-length coding -> variable length coding.
The similarities have made it possible for the development of hard-wired silicon that can code both standards. Even microcoded architectures can better optimize through hardwired instruction primitives or functional blocks. There are many additional minor differences. They include:
  1. DCT and quantization precision in MPEG is 9-bits since the macroblock difference operation expands the 8-bit signal precision by one bit.
  2. Quantization in MPEG-1 forces quantized coefficients to become odd values (oddification).
  3. JPEG run-length coding produces run-size tokens (run of zeros, non-zero coefficient magnitude) whereas MPEG produces fully concatenated run-level tokens that do not require magnitude differential bits.
  4. DC values in MPEG-1 are limited to 8-bit precision (a constant step size of 8), whereas JPEG DC precision can occupy all possible 11-bits. MPEG-2, however, re-introduced extra DC precision.

How do MPEG and H.261 differ ?

H.261 was targeted for teleconferencing applications where motion is naturally more limited. Motion vectors are restricted to a range of +/- 15 pixels. Accuracy is reduced since H.261 motion vectors are restricted to integer-pel accuracy. Other syntactic differences include: no B-pictures, different quantization method.

H.261 is also known as P*64. P is an integer number meant to represent multiples of 64kbit/sec. In the end, this nomenclature probably won't be used as many services other than video will adopt the philosophy of arbitrary B channel (64kbit) bitrate scalability.

Is H.261 the de facto teleconferencing standard ?

Not exactly. To date, about seventy percent of the industrial teleconferencing hardware market is controlled by PictureTel of Mass. The second largest market controller is Compression Labs of Silicon Valley. PictureTel hardware includes compatibility with H.261 as a lowest common denominator, but when in communication with other PictureTel hardware, it can switch to a mode superior at low bit rates (less than 300kbits/sec). In fact, over 2/3 of all teleconferencing is done at two-times switched 56 channel (~P = 2) bandwidth. Long distance ISDN ain't cheap. In each direction, video and audio are coded at an aggregate of 112 kbits/sec (2*56 kbits/sec).

The PictureTel proprietary compression algorithm is acknowledged to be a combination of spatial pyramid, lattice vector quantizer, and an unidentified entropy coding method. Motion compensation is considerably more refined and sophisticated than the 16x16 integer-pel block method specified in H.261.

The Compression Labs proprietary algorithm also offers significant improvement over H.261 when linked to other CLI hardware.

Currently, ITU-TS (International Telecommunications Union, Teleconferencing Sector), formerly CCITT, is quietly defining an improvement to H.261 with the participation of industry vendors.