Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 21 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
21
Dung lượng
398,15 KB
Nội dung
Osama Al-Shaykh, et. Al. “Video Sequence Compression.”
2000 CRC Press LLC. <http://www.engnetbase.com>.
VideoSequenceCompression
OsamaAl-Shaykh
UniversityofCalifornia,
Berkeley
RalphNeff
UniversityofCalifornia,
Berkeley
DavidTaubman
HewlettPackard
AvidehZakhor
UniversityofCalifornia,
Berkeley
55.1Introduction
55.2MotionCompensatedVideoCoding
MotionEstimationandCompensation
•
Transformations
•
Discussion
•
Quantization
•
CodingofQuantizedSymbols
55.3DesirableFeatures
Scalability
•
ErrorResilience
55.4Standards
H.261
•
MPEG-1
•
MPEG-2
•
H.263
•
MPEG-4
Acknowledgment
References
Theimageandvideoprocessingliteratureisrichwithvideocompressionalgorithms.
Thischapteroverviewsthebasicblocksofmostvideocompressionsystems,discusses
someimportantfeaturesrequiredbymanyapplications,e.g.,scalabilityanderrorre-
silience,andreviewstheexistingvideocompressionstandardssuchasH.261,H.263,
MPEG-1,MPEG-2,andMPEG-4.
55.1 Introduction
Videosourcesproducedataatveryhighbitrates.Inmanyapplications,theavailablebandwidthis
usuallyverylimited.Forexample,thebitrateproducedbya30frame/scolorcommonintermediate
format(CIF)(352×288)videosourceis73Mbits/s.Inordertotransmitsuchasequenceovera
64Kbits/schannel(e.g.,ISDNline),weneedtocompressthevideosequencebyafactorof1140.A
simpleapproachistosubsamplethesequenceintimeandspace.Forexample,ifwesubsampleboth
chromacomponentsby2ineachdimension,i.e.,4:2:0format,andthewholesequencetemporally
by4,thebitratebecomes9.1Mbits/s.However,totransmitthevideoovera64kbits/schannel,it
isnecessarytocompressthesubsampledsequencebyanotherfactorof143.Toachievesuchhigh
compressionratios,wemusttoleratesomedistortioninthesubsampledframes.
Compressioncanbeeitherlossless(reversible)orlossy(irreversible).Acompressionalgorithmis
losslessifthesignalcanbereconstructedfromthecompressedinformation;otherwiseitislossy.The
compressionperformanceofanylossyalgorithmisusuallydescribedintermsofitsrate-distortion
curve,whichrepresentsthepotentialtrade-offbetweenthebitrateandthedistortionassociatedwith
thelossyrepresentation.Theprimarygoalofanylossycompressionalgorithmistooptimizethe
rate-distortioncurveoversomerangeofratesorlevelsofdistortion.Forvideoapplications,rate
c
1999byCRCPressLLC
is usually expressed in terms of bits per second. The distortion is usually expressed in terms of the
peak-signal-to-noise ratio (PSNR) per frame or, in some cases, measures that try to quantify the
subjective nature of the distortion.
In addition to good compression performance, many other properties may beimportant or even
critical to the applicability of a given compression algorithm. Such properties include robustness
to errors in the compressed bit stream, low complexity encoders and decoders, low latency require-
ments, andscalability. Developing scalablevideocompressionalgorithmshasattractedconsiderable
attention in recent years. Generally speaking, scalabilityrefers to the potential to effectively decom-
press subsets of the compressed bit stream in order to satisfy some practical constraint, e.g., display
resolution, decoder computational complexity, and bit rate limitations.
The demand for compatible video encoders and decoders has resulted in the development of
differentvideocompressionstandards. Theinternationalstandardsorganization(ISO)hasdeveloped
MPEG-1 to store video on compact discs, MPEG-2 for digital television, and MPEG-4 for a wide
range of applications including multimedia. The international telecommunication union (ITU) has
developed H.261 for video conferencing and H.263 for video telephony.
All existing videocompression standards are hybrid systems. That is, the compression is achieved
in two main stages. The first stage, motion compensation and estimation, predicts each framefrom
its neighboring frames, compresses the prediction parameters, and produces the prediction error
frame. The second stage codes the prediction error. All existing standards use block-based discrete
cosine transform (DCT) to code the residual error. In addition to DCT, others non-block-based
coders, e.g., wavelets and matching pursuit, can be used.
In this chapter, we will provide an overview of hybrid video coding systems. In Section 55.2,we
discussthemainpartsofahybridvideocoder. Thisincludesmotioncompensation, signaldecompo-
sitionsandt ransformations,quantization,andentropycoding. Wecomparevarioustransformations
such as DCT, subband, and matching pursuit. In Section 55.3, we discuss scalability and error re-
silience in videocompression systems. We also describe a non-hybrid video coder that provides
scalable bit-streams [28]. Finally, in Section 55.4, we review the key videocompression standards:
H.261, H.263, MPEG 1, MPEG 2, and MPEG 4.
55.2 Motion Compensated Video Coding
Virtually all videocompression systems identify and reduce four basic types of video data redun-
dancy: inter-frame (temporal) redundancy, interpixel redundancy, psychovisual redundancy, and
coding redundancy. Figure 55.1 shows a typical diagram of a hybrid videocompression system.
First the current frame is predicted from previously decoded frames by estimating the motion of
blocks or objects, thus reducing the inter-frame redundancy. Afterwards to reduce the interpixel
redundancy, the residual error after frame prediction is transformed to another format or domain
such that the energy of the new signal is concentrated in few components and these components
are as uncorrelated as possible. The transformed signal is then quantized according to the desired
compression performance (subjective or objective). The quantized transform coefficients are then
mapped to codewords that reduce the coding redundancy. The rest of this section will discuss the
blocks of the hybrid system in more detail.
55.2.1 Motion Estimation and Compensation
Neighboring frames in typical video sequences are highly correlated. This inter-frame (temporal)
redundancy can be significantly reduced to produce a more compressible sequence by predicting
each frame from its neighbors. Motion compensation is a nonlinear predictive technique in which
the feedback loop contains both the inverse transformation and the inverse quantization blocks, as
c
1999 by CRC Press LLC
FIGURE 55.1: Motion compensated coding of video.
shown in Fig. 55.1.
Most motion compensation techniques divide the frame into regions, e.g., blocks. Each region
is then predicted from the neighboring frames. The displacement of the block or region, d, is not
fixed and must be encoded as side information in the bit stream. In some cases, different prediction
models areusedto predict regions, e.g., affine transformations. These prediction parameters should
also be encoded in the bit stream.
To minimize the amount of side information, which must be included in the bit stream, and to
simplify the encoding process, motion estimation is usually block based. That is, every pixel
i in a
given rectangular block is assigned the same motion vector, d. Block-based motion estimation is an
integral part of all existing videocompression standards.
55.2.2 Transformations
Mostimageandvideocompressionschemesapplyatransformationtotherawpixelsortotheresidual
error resulting from motion compensation before quantizing and coding the resulting coefficients.
The function ofthetransformation is torepresentthesignalinafewuncorrelatedcomponents. The
most common transformations are linear transformations, i.e., the multi-dimensional sequence of
input pixel values, f [
i], is represented in terms of the transform coefficients, t[
k],via
f [
i]=
k
t[
k]w
k
[
i] (55.1)
for some w
k
[
i]. The input image is thus represented as a linear combination of basis vectors, w
k
.
It is important to note that the basis vectors need not be orthogonal. They only need to form an
over-completeset(matchingpursuits),acompleteset(DCTandsomesubband decompositions),or
very close tocomplete (somesubbanddecompositions). Thisisimportantsincethecodershouldbe
abletocodea variety of signals. The remainderofthe section discussesandcomparesDCT,subband
decompositions, and matching pursuits.
The DCT
There are two properties desirable in a unitary transform for image compression: the energy
should be packed into a few transform coefficients, and the coefficients should be as uncorrelated
c
1999 by CRC Press LLC
as possible. The optimum transformunder these two constraints is the Karhunen-Lo
´
eve transform
(KLT) where the eigenvectors of the covariance matrix of the image are the vectors of the trans-
form [10]. Although the KLT is optimal under these two constraints, it is data-dependent, and is
expensive to compute. The discrete cosine transform (DCT) performs very close to KLT especially
when the input is a first order Markov process [10].
TheDCTisablock-based transform. That is, thesignalisdividedintoblocks,whichareindepen-
dently transformedusingorthonormal discrete cosines. The DCT coefficients of a one-dimensional
signal, f , are computed via
t
DCT
[Nb + k]=
1
√
N
N−1
i=0
f [Nb + i],k= 0
N−1
i=0
√
2f [Nb + i]cos
(2i + 1)kπ
2N
, 1 ≤ k<N
∀b
(55.2)
where N is the size of the block and b denotes the block number.
The orthonormal basis vectors associated with the one-dimensional DCT transformation of
Eq. (55.2)are
w
DCT
k
[i]=
1
√
N
1,k= 0, 0 ≤ i<N
√
2 cos
(2i+1)kπ
2N
, 1 ≤ k<N, 0 ≤ i<N
(55.3)
Figure 55.2(a) shows these basis vectors for N = 8.
FIGURE55.2: DCTbasisvectors(N = 8): (a)one-dimensionaland(b)separabletwo-dimensional.
The one-dimensional DCT described above is usually separably extended to two dimensions for
image compression applications. In this case, the two-dimensional basis vectors are formed by the
tensor product of one-dimensional DCT basis vectors and are given by
w
DCT
k
[
i]=w
DCT
k
1
,k
2
[i
1
,i
2
]
= w
DCT
k
1
[i
1
]·w
DCT
k
2
[i
2
]; 0 ≤ k
1
,k
2
,i
1
,i
2
<N
c
1999 by CRC Press LLC
Figure 55.2(b) shows the two-dimensional basis vectors for N = 8.
The DCTis the most common transform in video compression. It is used in the JPEG still image
compression standard, and all existing videocompression standards. This is because it performs
reasonably well at different bit rates. Moreover, there are fast algorithms and special hardware chips
to compute the DCT efficiently.
The major objection to the DCT in image or videocompression applications is that the non-
overlapping blocks of basis vectors, w
k
, are responsible for distinctly “blocky” artifacts in the de-
compressed frames, especially at low bit rates. This is due to the quantization of the transform
coefficients of a block independent from neighboring blocks. Overlapped DCT representation ad-
dresses this problem[15]; however, the common solution is topost-processtheframe bysmoothing
the block boundaries [18, 22].
Due tobitraterestrictions, someblocksareonlyrepresented byoneor a smallnumberofcoarsely
quantizedtransfor mcoefficients,hencethedecompressedblockw illonlyconsistofthesebasisvectors.
This will cause artifacts commonly known as ringing and mosquito noise.
Figure 55.8(b) shows frame250 of the 15 frame/s CIF Coast-guard sequence coded at 112 Kbits/s
usingaDCThybridvideocoder.
1
Thisfigureprovidesagoodillustrationofthe “blocking”artifacts.
Subband Decomposition
The basic idea of subband decomposition is to split the frequency spectrumof the image into
(disjoint)subbands. Thisis efficientwhenthe imagespectrumisnotflat andisconcentratedina few
subbands, which is usually the case. Moreover, we can quantize the subbands differently according
to their visual importance.
As for the DCT, we begin our discussion of subband decomposition by considering only a one-
dimensional source sequence, f [i]. Figure 55.3 provides a general illustration of an N-band one-
dimensional subband system. We refer to the subband decomposition itself as analysis and to the
FIGURE 55.3: 1D, N -band subband analysis and synthesis block diagrams. (Source: Taubman, D.,
Chang, E., and Zakhor, A., Directionality and scalability in subband image and video compression,
inImage Technology: Advances inImage Processing, Multimedia, and Machine Vision, JorgeL.C.Sanz,
Ed., Springer-Verlag, New York, 1996. With permission).
inversetransformationassynthesis. Thetransformationcoefficientsofbands1, 2, ,Naredenoted
by the sequences u
1
[k],u
2
[k], ,u
N
[k], respectively. For notational convenience and consistency
withtheDCTformulationabove,wew ritet
SB
[·]forthesequenceofallsubbandcoefficients, arranged
1
It iscoded usingH.263 [3], which is an ITUstandard.
c
1999 by CRC Press LLC
accordingtot
SB
[(β −1)+Nk]=u
β
[k],where1 ≤ β ≤ N isthesubbandnumber. Thesecoefficients
are generated byfiltering the input sequence with filters H
1
, ,H
N
and downsampling the filtered
sequencesbyafactorofN, as depicted in Fig. 55.3. In subband synthesis, the coefficients for each
band areupsampled, interpolatedwith the synthesis filters, G
1
, ,G
N
, and the resultssummedto
form a reconstructed sequence,
˜
f [i],asdepictedinFig.55.3.
Ifthere constructed sequence,
˜
f [i], andthesourcesequence, f [i], areidentical,thenthesubband
system is referred to as perfect reconstruction (PR) and the corresponding basis set is a complete
basisset. Althoughperfect reconstructionisadesirableproperty,nearperfect reconstruction(NPR),
for whichsubbandsynthesisisonlyapproximately the inverseof subbandanalysis, isoften sufficient
in practice. This is because distortion introduced byquantizationofthesubbandcoefficients, t
SB
[k],
usually dwarfs that introduced by an imperfect synthesis system.
The filters, H
1
, ,H
N
, are usually designed to haveband-passfrequency responses, as indicated
in Fig.55.4, so that the coefficients u
β
[k]for each subband, 1 ≤ β ≤ N , represent different spectral
components of the source sequence.
FIGURE55.4: Typicalanalysis filtermagnituderesponses. (Source: Taubman,D.,Chang,E.,and Za-
khor,A.,Directionalityandscalabilityinsubbandimageandvideocompression,inImageTechnology:
Advances inImage Processing, Multimedia,and Machine Vision, JorgeL.C.Sanz, Ed., Springer-Verlag,
New York, 1996. With permission).
The basis vectors for subband decomposition are the N-translates of the impulse responses,
g
1
[i], ,g
N
[i], of synthesis filters G
1
, ,G
N
. Specifically, denoting the kth basis vector as-
sociated withsubband β by w
SB
Nk+β−1
,wehave
w
SB
Nk + β −1
[i]=g
β
[i − Nk] (55.4)
Figure 55.5 illustrates five of the basis vectors for a particularly simple, yet useful, two-band PR
subbanddecomposition, with symmetric FIR analysisandsynthesisimpulseresponses. Asshown in
Fig. 55.5 and in contrast with the DCT basis vectors, the subband basis vectors overlap.
As for the DCT, one-dimensional subband decompositions may be separably extended to higher
dimensions. By this we mean that a one-dimensional subband decomposition is first applied along
one dimension of an image or video sequence. Any or all of the resulting subbands are then further
decomposedintosubbandsalonganother dimensionandso on. Figure55.6 depictsaseparabletwo-
dimensionalsubbandsystem. Forvideocompression applications,thepredictionerrorissometimes
decomposed into subbands of equal size.
Two-dimensional subband decompositions have the advantage that they do not suffer from the
disturbing blocking artifacts exhibited by the DCT at high compression ratios. Instead, the most
noticeablequantization-induceddistortiontendstobe‘ringing’or‘rippling’artifacts, whichbecome
most bothersome in the vicinity of image edges. Figures 55.11(c) and 55.8(c) clearly show this
effect. Figure 55.11 shows frame210 of the Ping-pong sequence compressed using a scalable, three-
dimensional subbandcoder [28]at1.5Mbits/s,300Kbits/s, and 60 Kbits/s. Asthebitratedecreases,
we notice loss of detail and introduction of more ringing noise. Figure 55.8(c) shows frame 250 of
the Coast-guard sequence compressed at 112 Kbits/s using a zerotree scalable coder [16]. The edges
of the trees and the boat are affected by ringing noise.
c
1999 by CRC Press LLC
FIGURE 55.5: Subband basis vectors with N = 2,h
1
[−2 2]=
√
2 ·
(−
1
8
,
1
4
,
3
4
,
1
4
, −
1
8
), h
2
[−2 0]=
√
2 · (−
1
4
,
1
2
, −
1
4
), g
1
[−1 1]=
√
2 · (
1
4
,
1
2
,
1
4
), and
g
2
[−1 3]=
√
2 ·(−
1
8
, −
1
4
,
3
4
, −
1
4
, −
1
8
).h
i
and g
i
are the impulse responses of the H
i
(analysis)
and G
i
(synthesis) filters, respectively. (Source: Taubman, D., Chang, E., and Zakhor, A., Direction-
ality and scalability in subband image and video compression, in Image Technology: Advances in
Image Processing, Multimedia, and Machine Vision, Jorge L.C. Sanz, Ed., Springer-Verlag,New York,
1996. With permission).
Matching Pursuit
Representing a signal using an over-complete basis set implies that there is more than one
representation for the signal. For coding purposes, we are interested in representing the signal with
the fewest basis vectors. This is an NP-complete problem [14]. Different approaches have been
investigatedtofindorapproximate thesolution. Matchingpursuitsisamultistagealgorithm,which
in each stage finds the basis vector that minimizes the mean-squared-error [14].
Suppose we want to represent a signal f [i] using basis vectors from an over-complete dictionary
(basis set) G. Individual dictionary vectors can be denoted as:
w
γ
[i]∈G. (55.5)
Here γ is anindexingpar ameterassociatedwitha particulardictionaryelement. Thedecomposition
begins by choosing γ to maximize the absolute value of the following inner product:
t =<f[i],w
γ
[i] >, (55.6)
where t is the transform (expansion) coefficient. A residual signal is computed as:
R[i]=f [i]−tw
γ
[i]. (55.7)
Thisresidualsignalisthen expandedinthe samewayastheoriginalsignal. The procedurecontinues
iterativelyuntileitheraset number ofexpansioncoefficientsare generatedorsomeenergythreshold
for the residual is reached. Each stage k yields a dictionary structure specified by γ
k
, an expansion
coefficient t[k], and a residual R
k
, which is passed on to the next stage. After a total of M stages, the
signal can be approximated by a linear function of the dictionaryelements:
ˆ
f [i]=
M
k=1
t[k]w
γ
k
[i]. (55.8)
c
1999 by CRC Press LLC
FIGURE 55.6: Separable spatial subband pyramid. Two level analysis system configuration and
subband passbands shown. (Source: Taubman, D., Chang, E., and Zakhor, A., Directionality and
scalabilityinsubbandimageandvideocompression,inImage Technology: Advancesin ImageProcess-
ing, Multimedia, and Machine Vision, Jorge L.C. Sanz, Ed., Springer-Verlag, New York, 1996. With
permission).
The above technique has useful signal representation properties. For example, the dictionary
elementchosenateachstageistheelementthat providesthegreatestreductioninmeansquare error
between the true signal f [i] and the coded signal
ˆ
f [i]. In this sense, the signalstructures are coded
inorderofimportance,whichisdesirableinsituationswherethebitbudgetislimited. Forimageand
video coding applications, this means that the most visible features tend to be coded first. Weaker
image features arecodedlater, if atall. Itiseven possibletocontrolwhichtypes ofimagefeaturesare
coded well by choosing dictionary functions to match the shape, scale, or frequency of the desired
features.
An interesting feature of the matching pursuit technique is that it places very few restrictions on
the dictionary set. The orig inal Mallat and Zhang paper considers both Gabor and wave-packet
function dictionaries, but such structure is not required by the algorithm itself [14]. Mallat and
Zhang showed that if the dictionary set is at least complete, then
ˆ
f [i] will eventually converge to
f [i], though the rate of convergence is not guaranteed [14]. Convergence speed and thus coding
efficiency are strongly related to the choice of dictionary set. However, true dictionary optimization
can be difficult because there are so few restrictions. Any collection of arbitrarily sized and shaped
functions can be used withmatching pursuits, as long as completeness is satisfied.
Bergeaud and Mallat used the matching pursuit technique to represent and process images [1].
Neff and Zakhor have used the matching pursuit technique to code the motion prediction error
signal [20]. Their coder divides each motion residual into blocks and measures the energy of each
block. The center of the block with the largest energy value is adopted as an initial estimate for the
inner product search. A dictionary of Gabor basis vectors, shown in Fig. 55.7, is then exhaustively
matchedtoanS ×S window aroundtheinitialestimate. The exhaustive searchcanbethoughtofas
follows. EachN ×N dictionary structure is centered at each location in the search window, and the
innerproductbetweenthestructureandthecorrespondingN ×N regionofimagedataiscomputed.
The largest inner-product is then quantized. The location, basis vector index, and quantized inner
product are then coded together.
Video sequences coded using matching pursuit do not suffer from either blocking or ringing
artifacts, because the basis vectors are only coded when they are well-matched to the residual signal.
As bit rate decreases, the distortion introduced by matching pursuit coding takes the form of a
graduallyincreasingblurriness(orlossofdetail). Sincematchingpursuitsinvolvesexhaustivesearch,
it is more complex than DCT approaches, especially at high bit rates.
c
1999 by CRC Press LLC
FIGURE 55.7: Separable two-dimensional 20 ×20 Gabor dictionary.
Figure 55.8(d) shows frame250of the 15 frame/s CIF Coast-guard sequence coded at112 Kbits/s
using the matching pursuit video coder described by Neff and Zakhor [20]. This frame does not
suffer from the blocky artifacts, which affect the DCT coders as shown in Fig. 55.8(b). Moreover, it
does not suffer from the ringing noise, which affects the subband coders as shown in Figs. 55.8(c)
and 55.11(c).
55.2.3 Discussion
Figure 55.8 shows frame 250 of the 15 frame/s CIF Coast-guard sequence coded at 112 Kbits/susing
DCT, subband, and matching pursuit coders. The DCTcodedframe suffers from blocking artifacts.
The subband coded framesuffers from ringing artifact.
Figure55.9comparesthePSNRperformanceofthematchingpursuit coder[20]toaDCT(H.263)
coder [3] and a zerotree subband coder [16] when coding the Coast-guard sequence at 112 Kbits/s.
The matching pursuit coder [20] in this example has consistently higher PSNR than the H.263 [3]
and the zerotree subband [16] coders. Table 55.1 shows the average luminance PSNRs for different
sequences at different bit rates. In all examples mentioned in Table55.1, the matching pursuit coder
has higheraverage PSNR than the DCT coder. The subband coder has the lowest average PSNR.
TABLE 55.1 TheAverage Luminance PSNR of Different
Sequences at Different Bit Rates When Coding Using a DCT Coder
(H.263) [3], Zero-Tree Subband Coder (ZTS) [16], andMatching
Pursuit Coder (MP) [20]
Rate PSNR (dB)
Sequence Format Bit Frame DCT ZTS MP
Container-ship QCIF 10 K 7.5 29.43 28.01 31.10
Hall-Monitor QCIF 10 K 7.5 30.04 28.44 31.27
Mother-Daughter QCIF 10 K 7.5 32.50 31.07 32.78
Container-ship QCIF 24 K 10.0 32.77 30.44 34.26
Silent-Voice QCIF 24 K 10.0 30.89 29.41 31.71
Mother-Daughter QCIF 24 K 10.0 35.17 33.77 35.55
Coast-Guard QCIF 48 K 10.0 29.00 27.65 29.82
News CIF 48 K 7.5 30.95 29.97 31.96
c
1999 by CRC Press LLC
[...]... existing standards Sections 55. 4.2, 55. 4.3, and 55. 4.5 outline the Motion Picture Experts Group (MPEG) standards for videocompression Sections 55. 4.1 and 55. 4.4 review the CCITT H.261 and H.263 standards for digital video communications This section lists the standards according to their chronological order in order to provide an understanding of the progress of the videocompression standardization... namely scalability and error resilience 55. 3.1 Scalability Developing scalable videocompression algorithms has attracted considerable attention in recent years Scalable compression refers to encoding a sequence in such a way so that subsets of the encoded bit-stream correspond to compressed versions of the sequence at different rates and resolutions Scalable compression is useful in today’s heterogeneous... fully scalable video with fine granularity of bit rates Temporal filtering, however, introduces significant overall latency, a critical parameter for interactive videocompression applications To reduce this effect, it is possible to use a 2-tap temporal filter, which results in one frame of delay As a visual demonstration of the quality tradeoff inherent to rate-scalable video compression, Fig 55. 11 shows... tradeoff between good compression performance and error resilience In order to reduce the cost of error resilient codes, some approaches jointly optimize the source and channel codes [6, 23] 55. 4 Standards In this section we review the major videocompression standards Essentially, these schemes are based on the building blocks introduced in Section 55. 2 All these standards use the DCT Table 55. 2 summarizes... organized in a data structure design based on this observation FIGURE 55. 10: (a) A common scan for an 8 × 8 block DCT (b) A common scan for subband decompositions (zero-tree) 55. 3 Desirable Features Some video applications require the encoder to provide more than good compression performance For example, it is desirable to have scalable videocompression schemes so that different users with different bandwidth,... is followed by the motion data, then followed by the block information c 1999 by CRC Press LLC 55. 4.2 MPEG-1 The first (MPEG) videocompression standard [7], MPEG-1, is intended primarily for progressive video at 30 frames/s The targeted bit rate is in the range 1.0 to 1.5 Mbits/s MPEG-1 was designed to store video on compact discs Such applications require MPEG-1 to support random access to the material... the rectangular frame The decoder uses the color information to detect the object in the decoded stream 55. 4.5 MPEG-4 The moving picture expert group is developing a video standard that targets a wide range of applications including Internet multimedia, interactive video games, video- conferencing, video- phones, multimedia storage, wireless multimedia, and broadcasting applications Such a wide range... subband coding of video, IEEE Trans Image Processing, 3(5), 572–588, Sept 1994 [29] Taubman, D and Zakhor, A., A common framework for rate and distortion based scaling of highly scalable compressed video, IEEE Trans Circuits and Systems for Video Technology, 6(4), 329–354, Aug 1996 [30] Vetterli, M and Kalker, T., Matching pursuit for compression and application to motion compensated video coding, Proc... 16 × 16, 8×8 Forward, backward Half pixel No Yes Yes Yes No Yes Yes Yes Yes Weighted uniform Motion Compensation Scalability 55. 4.1 Yes No No No H.261 Recommendation H.261 of the CCITT Study Group XV was adopted in December 1990 [2] as a videocompression standard to be used for video conferencing applications The bit rates supported by H.261 are p × 64 Kbits/s, where p is in the range 1 to 30 H.261... not provide good compression performance, especially since the histogram of the transform coefficients has a significant peak around low frequency c 1999 by CRC Press LLC FIGURE 55. 12: Rate-distortion curves for PING-PONG sequence Overall PSNR values for Y, U, and V components for the codec in [28] are plotted against the bit rate limit imposed on the rate-scalable bit stream prior to decompression MPEG-1 . Osama Al-Shaykh, et. Al. Video Sequence Compression. ”
2000 CRC Press LLC. <http://www.engnetbase.com>.
VideoSequenceCompression
OsamaAl-Shaykh
UniversityofCalifornia,
Berkeley
RalphNeff
UniversityofCalifornia,
Berkeley
DavidTaubman
HewlettPackard
AvidehZakhor
UniversityofCalifornia,
Berkeley
55. 1Introduction
55. 2MotionCompensatedVideoCoding
MotionEstimationandCompensation
•
Transformations
•
Discussion
•
Quantization
•
CodingofQuantizedSymbols
55. 3DesirableFeatures
Scalability
•
ErrorResilience
55. 4Standards
H.261
•
MPEG-1
•
MPEG-2
•
H.263
•
MPEG-4
Acknowledgment
References
Theimageandvideoprocessingliteratureisrichwithvideocompressionalgorithms.
Thischapteroverviewsthebasicblocksofmostvideocompressionsystems,discusses
someimportantfeaturesrequiredbymanyapplications,e.g.,scalabilityanderrorre-
silience,andreviewstheexistingvideocompressionstandardssuchasH.261,H.263,
MPEG-1,MPEG-2,andMPEG-4.
55. 1. Section 55. 4, we review the key video compression standards:
H.261, H.263, MPEG 1, MPEG 2, and MPEG 4.
55. 2 Motion Compensated Video Coding
Virtually all video