IBM Skip to main content
  Home     Products & services     Support & downloads     My account  
  Select a country  
Journals Home  
  Systems Journal  
Journal of Research
and Development
  ·  Current Issue  
  ·  Recent Issues  
  ·  Papers in Progress  
  ·  Search/Index  
  ·  Orders  
  ·  Description  
  ·  Patents  
  ·  Recent publications  
  ·  Author's Guide  
  Staff  
  Contact Us  
Journal of Research and Development  
Volume 43, Number 4, 1999
Digital multimedia technology
 Table of contents: arrowHTML arrowPDF arrowASCII   This article: HTML arrowPDF arrowASCII   DOI: 10.1147/rd.434.0489 arrowCopyright info
   

Single-pass constant- and variable-bit-rate MPEG-2 video compression

by N. Mohsenian, R. Rajagopalan, and C. A. Gonzales
Most real-time MPEG-2 encoders are designed to perform in a constant-bit-rate (CBR) mode, in which buffer constraints are imposed to circumvent large deviations from a desired rate at any instant in time. Although such streams are generally good-quality sequences, certain types of operations or environments call for a more efficient real-time CBR encoder. The first part of the paper describes how a better-quality CBR video stream can be produced by estimating the relative complexity of a picture in comparison with the average complexity of the partially encoded stream and using it to adjust the compression parameters in a single-pass mode of operation. Our CBR encoder is particularly attractive for digital broadcast and editing environments, in which representations of higher-fidelity video objects in both display and freeze modes are constantly pursued. The second part of the paper describes the real-time generation of video streams with a variable-bit-rate (VBR) encoder. This mode of operation is highly desirable for home entertainment and recreational events. We propose a robust single-pass VBR video encoder algorithm which is capable of learning and adapting itself to the complexity of image segments and thereafter creating streams which have constant visual picture quality. The new VBR scheme displays a better performance than the CBR encoder, particularly when special effects such as scene transitions, fades, or luminance changes are to be compressed. Both CBR and VBR encoders are fully compliant with the MPEG-2 standard and are easily implementable with IBM encoder architecture. Compression results for the new single-pass encoding algorithms and comparisons with previous CBR schemes are provided. The result suggests the suitability of our VBR approach for record/playback in storage media such as digital video disc (DVD) players, disk-based camcorders, and digital videocassette recorders (DVCRs). It further reflects the importance of our single-pass CBR scheme for providers of broadcast services, for which it allows more video programs to be allocated to a selected communication link, and for in-studio applications, for which it greatly facilitates visual analysis of captured streams.

1. Introduction

The standardization of MPEG-2 [1] has greatly facilitated the transmission, representation, storage, and manipulation of digital video in various environments such as broadcast television, wireless communications, consumer electronics, and multimedia computers. Further, applications ranging from desktop publishing on personal computers (PCs) to home authoring with digital video disc (DVD) players demonstrate the major role played by the MPEG-2 standard in promoting new storage media for consumer video and interactive multimedia. This has enabled the home user to download video streams from a satellite system or an Internet site in order to create DVD video programs or multimedia presentations using a recordable medium. Although the syntax and specification of MPEG-2 bitstreams and multimedia programs (as in DVD titles) are well defined by the international standards, the actual encoding parameters and functions, which should lead to a fully compliant bitstream, have been and are the subject of many research efforts. The main challenge is how to achieve a close-to-optimum video quality in a compressed stream while reducing the amount of information in the source. This is because the statistical nature of any video source is either not known a priori or will change over time, and a true estimation of source distribution can be anywhere from computationally extensive, as in non-real-time multipass encoding, to almost impossible for real-time encoding.

The nonstationary nature of images makes them inherently variable; compression optimality for MPEG-2 coder-decoders (codecs) is achieved by carefully selecting a set of spatial or temporal image analyzers, quantizers, and variable-length entropy coders, of which some are frozen by the standard and some are to be defined by the designer. The results are variable1 streams which require a sophisticated buffering scheme to smooth out the variability of the signal before transmission over a fixed bandwidth is carried out. The receiver will have a similar buffering policy to convert the fixed-channel rate to variable streams prior to the decoding and display of each picture.

Since design and development of an MPEG-2 video encoder can become cumbersome as a result of formulating several mathematically or perceptually derived parameters, typical approaches [2] enforce a constant bit rate (CBR) for a group of pictures (GOP) regardless of the complexity of the video interval. This scheme assumes equal weighting of bit distribution among GOPs and reduces the degree of freedom of the encoding task. In short, the problem is reduced to minimizing (maximizing) the GOP distortion (quality) subject to a constant target rate. By CBR we mean that the sustainable rate of the encoded video stream per GOP is close to a constant target rate, but the instantaneous rate changes per picture depending on picture type2 or the quantization scaler. Another advantage of a CBR stream is that the transmitted signal may be terminated at any time and the user is assured of maintaining a rate close to the target rate. All CBR MPEG-2 encoders enforce different quantizing scalers for each picture type to achieve good-quality streams within a GOP. This method of compression works adequately when the complexity of the source varies slowly over time and therefore the encoding algorithm has time to adjust itself. However, if the statistical features of the source change rapidly over time, a constant-bit-rate operation may result in good picture quality for a short time window (e.g., a few frames or a GOP) and discontinuous quality when the whole video is perceived.

One way to improve the perceptual quality of a CBR stream while maintaining its constant rate from start to finish is to identify "difficult-to-encode" pictures and increase their bit budget accordingly. Conventional approaches to real-time CBR encoding use picture-to-picture3 correlations in terms of complexity measures to predict the level of difficulty of a picture. In this paper we improve upon this first-order prediction by estimating the encoding difficulty of a picture on the basis of the complexity of all previously encoded pictures. Our single-pass CBR algorithm employs an infinite impulse response (IIR) filter to dynamically determine a nominal value which represents the degree of difficulty of the partially analyzed video stream in real time. We claim that the number of bits consumed by a more (or less) "difficult- to-encode" picture can be adjusted by comparing the encoding difficulty of the substream against a local measurement. This method of real-time compression improves the overall quality of the video and maintains a constant rate throughout the stream. Therefore, it can potentially become a key encoding element in cable television (CATV), direct satellite, and terrestrial broadcast arenas as well as mobile and asynchronous transfer mode (ATM) communications.

Since we argued that the video is inherently variable, an even better compressed stream can be created by employing a variable-bit-rate (VBR) encoder algorithm. Applications for such a scheme are plentiful. VBR can be used for networks that employ a dynamic bandwidth, as in ATMs, or it can be exploited as a means of achieving statistical multiplexing for digital broadcast satellites. Other major arenas which can benefit from the use of a VBR encoder are consumer video and interactive multimedia, where recording of high-quality pictures onto a storage medium is desired. For the aforementioned environments, DVD video movies are created to be played on set-top players, while DVD-RAM drives are used for multimedia productions in computers. Some examples of end-to-end solutions of the above industry segments are shown in Figure 1.

Figure 1Figure 1

Some attempts at formulating a VBR video coding scheme have appeared in the literature. Most of them propose constraining the quantizer-scale parameter to a user-defined target value for long periods of time [3]. This value is adjusted via network negotiation or monitoring the system buffer to match quality of service (QoS). Although simple intuition suggests that fixing the quantizer scaler would redistribute the amount of bits among GOPs of differing complexity, there are no guarantees of obtaining a constant video quality. Further, in ATM applications an encoded stream is usually queued before a bandwidth is available in the network. This lead time enables pre-encoding tasks to be performed on particular streams subject to the overall channel bandwidth and buffer fullness. Others have investigated a simple variable-bandwidth estimation model for the number of ATM cells generated through packetization of video sources [4]. In this allocation scheme, a fixed quantizer-scale parameter is again used by the encoder. Authors in [5] have presented a VBR video encoder which takes advantage of the limits of the human observer to improve the perceived quality of the decoded sequence while maintaining the output bit rate within permitted bounds.

For consumer video, a popular mode of encoding operation has been the multipass VBR scheme. For example, high-end DVD mastering can be accomplished with an encoding system which typically uses a three-step procedure. The first step encodes the video source in a CBR mode and gathers a set of predefined statistical features. This information is then used to compute a set of optimized quantization parameters which closely match the source distribution of the video data and would provide a better compressed stream during a second-pass encoding. The last step is a postprocessing task which creates the final DVD format. The intermediate procedure can be carried out several times to produce an ideal video program. This type of application has the advantage that many lookahead parameters are known a priori when the VBR video coding algorithm is implemented. Unfortunately, for applications such as home DVD productions, writable DVDs for PC, digital camcorders, and DVCRs, the availability of a platform capable of analyzing and performing a multipass VBR encoding is unrealistic. Therefore, a VBR scheme is desired which is superior to CBR and can be implemented in one pass. This paper is intended to introduce such an algorithm.

In the first part of the paper we propose a new single-pass CBR encoder algorithm which is tailor-made for real-time compression and can easily be realized with the IBM encoder architecture. We add a further level of complexity to the CBR encoder and introduce a new real-time single-pass VBR encoder in the second part of the paper. Our VBR scheme employs a causal predictive model to distinguish the "hardness" or "softness" of the incoming video material on the fly and adapts itself accordingly. Moreover, it relies on a perceptual model to improve the quality of the stressful segments of the stream and produce high-fidelity video. The perceptual model is also responsible for adjustment of the average rate of the stream. The rest of the paper is organized as follows. Section 2 gives some background information for CBR and VBR algorithms. The new CBR encoding technique is presented in Section 3, and the VBR rate-control algorithm is defined in Section 4. Simulation results are given in Section 5, and concluding remarks are provided in Section 6.

2. Background formulation

An MPEG-2 sequence is typically partitioned into small intervals called GOPs (groups of pictures), which in turn are categorized by picture types I (intracoded or intrapicture), P (predicted), and B (bidirectionally predicted) [1]. The number of bits per GOP is distributed such that the allocation for an I-picture is more than that for a P-picture. This is because a P-picture uses a motion-estimation (ME) technique to estimate its content; as a result, a motion-compensated frame difference (MCFD) with a lower entropy than the original source is encoded. B-pictures use the smallest number of bits because their ME techniques are more intensive than those for P-pictures. This method provides a basis for maintaining the same picture quality within a GOP when pictures of different types are encoded. We may further lower the bit allocation of B-pictures, since they will not be used to estimate other pictures. Each picture type is subdivided into square blocks of pixels called macroblocks. Since producing an efficient compressed stream is the main mission of an MPEG-2 encoder, devising a robust rate-control (RC) algorithm becomes an integral part of the encoding task. The rate-control algorithm monitors the number of bits that should be allocated to each picture or macroblock on the basis of type or image feature, respectively. Moreover, it should ensure that the decoder buffer does not experience an overflow or underflow during the time the stream is received from the communication channel and prepared for decoding. In short, there is a need for the following items in any type of MPEG-2 compression scheme:

  • Target bits (or quantization scalers) for picture types.
  • Buffer regulation to avoid overflow/underflow conditions.
  • Maintenance of a target rate or consumption of no more than the bit budget.
  • A rate-control strategy which ensures that all of the above are monitored/satisfied.

In the remainder of this section, we build a framework for a CBR compression scheme, since most of these ideas are needed for the new single-pass CBR and VBR approaches. We assume that pictures of a video sequence can be modeled as a memoryless Gaussian source with variance sigma2 varying from picture to picture; hence, their rate-distortion (R-D) relationship is defined by R(D) = (1/2) log (sigma2/D). Experimental results in the literature suggest that a similar behavior exists between the rate R of the source and the quantization factor Q [6],

R(Q) = b1 log b2 (1)

.
Q

Instead of using the logarithmic model of (1), we adopt a simpler hyperbolic relationship which is easily implementable in real time and has proved to be an effective realization of the R-Q model [2]. The simplified equation takes the form

R(Q) = chi (2)

.
Q.

Equation (2) indicates that the picture rate is inversely proportional to the quantization factor. chi is a predefined measure of complexity for each picture type. However, since the nature of the source changes over time, a new complexity measure is required prior to encoding of each picture type. This parameter is usually computed using the past encoding parameters, e.g., bits, quantization factor, and/or some lookahead statistics. For each GOP4 of the MPEG-2 stream, we enforce a total number of bits given by Cscript l,

sum NxRx = Cscript l       x = I, P, B. (3)
x

Index script l denotes the GOP number, and x is the picture type. Nx is the number of pictures of type x in a GOP, and Rx is the number of target bits for picture type x. For a CBR sequence we have Cscript l = Cgop, where Cgop is a fixed number of GOP bits. For a given Cscript l, the video quality of the GOP is maximized by minimizing the average sum of the quantization scalers subject to the condition of Equation (3),

Psi = sumx NxQx (4)

.
sumx Nx

Instead of minimizing Psi subject to the constraint of (3), we remove this condition and use the Lagrange multiplier lambda to minimize the Lagrangian cost Upsilon:

Upsilon = PsilambdaCscript l. (5)

With the aid of the rate-quantization model described in (2), the target for each picture type is deduced:

Rx = chixCscript l (6)

.
sumx chixNx

Targets in Equation (6) represent only ideal picture bits; the actual bits would almost always deviate from this. The accumulated error must be computed and fed back to the rate-control algorithm to ensure that the final MPEG-2 bitstream meets the average bit rate or the total bit budget. Let Cscript l,ideal and Cscript l,actual represent the ideal and actual bits for GOP script l, respectively, and let deltascript l,gop = Cscript l,actualCscript l,ideal be the difference between the two. Further, let Ri,ideal and Ri,actual represent the ideal and actual bits for picture i, respectively, and deltai,pic = Ri,actualRi,ideal be the difference between the two. After n pictures have been encoded, the total accumulated error can be computed as

ng-1 deltascript l,gop + n-ngG-1 deltai,pic = Deltan-1,gop + Deltan-1,pic. (7)
sum sum
script l=0 i-0

The size of the GOP is given by G = summationx Nx, and ng = left floorn/Gright floor is the number of fully encoded GOPs. Suberror accumulation for all processed GOPs is given by Deltan-1,gop, while Deltan-1,pic is the suberror accumulation for the last but not yet finished GOP in the encoding order. The ideal picture target can now be adjusted for overproduction or underproduction of bits, computed from previously encoded pictures. The new set of ideal bits prior to encoding picture n belonging to GOP (script l = ng) is

Rxn,ideal = chixn-1(Cscript lalpha.gif1Deltan-1,gopalpha.gif2Deltan-1,pic) (8)

,
sumx chixn-1Nx ,

where alpha.gif1 and alpha.gif2 are constants that indicate how aggressively this adjustment is carried out. For a CBR scheme it is recommended to use alpha.gif1 = alpha.gif2 = alpha.gif with Deltan-1 = Deltan-1,gop + Deltan-1,pic, and compute the adjusted picture bits as

Rxn,ideal = chixn-1(Cgop-alpha.gifDeltan-1) (9)

.
sumx chixn-1Nx .

The rate-quantization framework described so far is based on the assumptions that the decoder buffer is of infinite size and that a large enough number of bits is always available. However, this is not true in a real-life scenario, where the buffer size is limited and defined by the MPEG-2 standard [1]. The encoding scheme is responsible for eliminating any overflow or underflow condition that the decoder buffer may encounter. This is accomplished by examining a hypothetical decoder buffer, i.e., video buffer verifier (VBV), and computing lower/upper bounds on the number of bits assigned for a picture type. The lower bound should be a large enough nonnegative number to prevent the overflow condition, while the upper bound should not be larger than a predetermined value above which the VBV buffer will underflow. Let Bn and B*n be the decoder buffer fullness before and after picture n is removed, respectively. B*n is then computed as

B*n = BnRn,actual , (10)

and the buffer fullness before removing the next picture is

Bn+1 = B*n + RchTn , (11)

where Rch is the channel rate (in Mb/s) at which the decoder buffer is being filled, and Tn is the display period for picture n. Figure 2 shows how the occupancy of a VBV buffer changes over time. Before we remove picture n, we have all the information in the buffer available to us; therefore, the upper bound Un becomes the buffer occupancy. To compute the lower bound Ln, suppose Rn,actual is just that, and picture n has been removed at once and delivered for display. The decoder buffer is filled at the rate Rch during this period, and the VBV buffer fullness before removing picture (n + 1) would have to be smaller than the total size of the video buffer verifier, i.e., Bvbv, in order to prevent overflow. Therefore, the nominal values by which the picture target bits are bounded are given by

Un = Bn,
Ln = max (0, Bn + RchTnBvbv). (12)

Figure 2Figure 2

After each picture is encoded, the complexity measures chix are updated, and a new target based on Equation (8) is computed for the next picture. This target should meet the constraints imposed by (12) to satisfy the VBV buffer policy. We clip the ideal picture bits using the picture bounds Un and Ln:

Rn,ideal = curly brace Un if (Rn,ideal > Un), (13)
Ln else if (Rn,ideal < Ln),
Rn,ideal else

Finally, a quantization scaler Q, which is defined on the hyperbola of Equation (2), is obtained. It should be noted that each picture type has its own composite R-Q curve, and further, for each picture type the Q factor may be adjusted slightly to ensure that all pictures are perceived to be of equal quality. In this paper we do not define how to incorporate perceptual effects, via modulating the Q factor, for a macroblock-level rate control during the encoding task; the reader can refer to [2] for this procedure. However, we must describe a strategy which ensures that the ideal target number of bits for each picture type is met. The above condition can be satisfied by monitoring the actual number of bits computed for a set of encoded macroblocks against the ideal average number of bits of a macroblock. Let rm,n,actual represent the actual number of bits calculated for macroblock m in picture n. In order to track how closely we match the ideal picture bit, a parameter Deltam,n,mb is defined:

Deltam,n,mb = m-1 rp,n,actual mRn,ideal , (14)
sum
p=0 Mmb

where Mmb is the total number of macroblocks in a picture. A positive Deltam,n,mb indicates an overproduction of bits at the macroblock level; therefore, the picture quantizer must be increased for the next macroblock to satisfy the target number of bits. Similarly, a negative value dictates the opposite scenario. A macroblock-level rate-control strategy can be formulated to modulate the Q factor on the basis of overproduction or underproduction of bits. We accomplish this by defining a factor qm,n, used to scale macroblock m in picture n, through [7]

qm,n = Qn curly brace Ac(Dn, Deltam,n,mb) if (Deltam,n,mb > 0), (15)
Af(En, Deltam,n,mb) else,

where Ac(., .) and Af(., .) are empirically derived functions which help in maintaining the picture target bits. They should behave such that Ac(., .) is always larger than 1 and Af(., .) is smaller than or equal to 1. Qn is the picture quantizer, and differential targets Dn and En are

Dn = UnRn,ideal ,
En = Rn,idealLn . (16)

We have now defined a CBR MPEG-2 encoder specified by {Ri, Qi}ni=0. In the following section, we describe how this framework can be modified to produce a better CBR stream.

3. Single-pass CBR video

Previous approaches to CBR video compression such as the schemes defined in Section 2 and in [2] have used only the encoding parameters of one previously encoded picture to estimate the R-Q relationship of a picture having the same type. After a picture is encoded, its complexity chix is updated as defined in [2] and used for the next picture of the same type,

chixn-1 = curly brace Rxn-1,actualM -1mb Mmb-1 qxm,n-1     if (n-1) is of type x, (17)
sum
m=0
 
chixn-2 for all other types,

and the complexity measures are used to predict a quantization value for picture n,

Qxn = chiIn-1NI + chiPn-1NP + chiBn-1NB . (18)

Cgopalpha.gifDeltan-1

An examination of Equation (18) reveals that a first-order prediction is used to estimate the position of an (Ri, Qi) pair for each picture type to be encoded. This method of estimation can be improved by observing the long-term complexity of all pictures processed so far and determining the relative complexity of a current picture to be encoded. The long-term complexity chi barxn-1 of a picture type x is defined as

chi barxn-1 = curly brace (Gammax – 1)chi barxn-2 + chixn-1     if (n-1) is of type x, (19)

Gammax
 
chi barxn-2 for all other types.

Equation (19) characterizes an IIR filter structure with the complexity of a picture as an input and the average complexity of a subset of the stream as the output. Coefficient Gammax is fixed over time for a picture type x and reflects how strongly the output of the filter is dependent on the previous input and output samples. The canonical structure of the IIR filter represents an efficient and simple way to respond quickly to the statistical variations of a video sequence in real time. The recurrence in Equation (19) can be factored to write the average complexity in terms of the individual complexity of encoded pictures,

chi barxn-1 = (Gammax-1)n-1 chi barx0 + n-1 (Gammax-1)n-(1+i) chixi , (20)

sum
Gammaxn-1 i=1 Gammaxn-i

with

chi barI0 = chi barinit ,
chi barx0 = kappaxchiI0       for x member {P, B}. (21)

Since each GOP typically starts with an I-picture, the selection of the magnitude of chi barinit determines how well we may encode the first few Is of the sequence. Constants kappaP and kappaB are intended to maintain a complexity ratio among the start-up values. We now take advantage of the fact that before compression of a "difficult-to-encode" picture, the differential target Dn may be a large positive number which can be used to adjust the ideal target allocation of such a picture. Similarly, the bit allocation of an "easy- to-encode" picture can be lowered on the basis of the differential target En. It should be noted that the notion of difficulty of encoding reflects the relative complexity of a picture among all processed pictures. This conception is substituted for the expression "more (less) difficult to encode" throughout this formulation. In order to identify the level of difficulty of a picture type, we compare its complexity against the output of the IIR filter, which indicates the history of all previously encoded pictures of this type. If (chixn-1 greater or equal to chi barxn-1), we label this picture as "difficult to encode" and increase its bit allocation. For the opposite scenario, i.e., (chixn-1 < chi barxn-1), the picture is labeled as "easy to encode," and bit allocation is lowered. The actual adjustments are defined by modifying Rn,ideal of Equation (13) through

Rxn,ideal = curly brace min parenthesis Rxn,ideal + chixn-1chi barxn-1 rhomaxDn, numaxRxn,ideal parenthesis

chixn-1 + chi barxn-1
    if (chixn-1 greater or equal to chi barxn-1),
max parenthesis Rxn,ideal + chixn-1chi barxn-1 rhominEn, numinRxn,ideal parenthesis

chixn-1 + chi barxn-1
else,
(22)

and the picture quantizer is computed as

Qxn = chixn-1 (23)

Rxn,ideal .

The new targets in Equation (22) suggest that for extreme cases, i.e., pictures that are very difficult (or easy) to encode, we may come very close to the upper or lower bound of a picture and force the decoder buffer to underflow or overflow. To resolve this problem, constants rhomax and rhomin are incorporated to use only a portion of the available buffers Dn or En. The min and max operations, along with user-defined numax and numin parameters, are used to prevent a picture from using too many or too few bits. A further precaution is taken to avoid decoder buffer overflow by adding a guard band gL to increase the lower picture bound:

Ln,adj = Ln + gL ; (24)

the new adjusted differential targets (buffers) are given by

Dn,adj = UnRn,idealgU ,
En,adj = Rn,idealLn,adj . (25)

Again, a guard band gU is used to prevent the decoder buffer from underflowing. After the new target for a picture is set through the conditions given in Equation (22), we must still clip the bits using the adjusted lower bound Ln,adj and the upper bound Un. The updated parameters Dn,adj and En,adj are used to ensure that the new targets are achieved by implementing a macroblock-level rate-control strategy, as defined in (15). Figure 3 displays a graphical representation of how a transformation of a plot of the relative complexity of a picture is used to set a new bit target. It further reflects how VBV buffer compliance is guaranteed and picture bit budgets are satisfied by enforcing the actual bit production to operate within certain limits and cling to the ideal bit allocation.

Figure 3Figure 3

The robustness of any type of constant-bit-rate control algorithm is directly dependent on the speed with which it can respond to the changes of the video content of an incoming stream. For real-time applications in which an efficient hardware implementation must be realized with dedicated integrated circuits, the veracity of the RC algorithm may be severely tested under stressful conditions. This is because such customized solutions do not use any preprocessing functions to pre-analyze the nature of the stream; consequently, the availability of the true encoding parameters of a picture always lags the assigned values. A real-time RC algorithm must process and memorize a few GOPs of the same complexity before it can reach its optimality. We argue, on the basis of the following observations, that our new approach to CBR video compression, as formulated in Equation (22), offers a quicker response and better results than the conventional method of CBR encoding.

Consider an encoder that has processed a few "difficult-to-encode" pictures and, further, that this new group of pictures belong to the same GOP. The increase in picture difficulty may be due to the sudden (or gradual) appearance of a high degree of spatial image detail, an increase in the velocity of many moving objects of different scales, directional changes of objects, or some form of higher-order combinations. At the start of a new GOP, we must encode an I-picture. Since we have already compressed one difficult I-picture, there is a high probability that the next I-picture is also "difficult to encode," as reflected by the complexity of the previously analyzed picture. Our CBR scheme can adjust to this picture much more quickly by knowing that the I-picture is significantly different from the rest of the video sequence. This observation is made by comparing the output of the IIR filter against the estimated I-complexity. As a result, the I-picture consumes more bits than the normal bit allocation generated by a conventional method of CBR encoding. Therefore, a higher-quality I-picture is reconstructed at the decoder output. I-pictures are used as references to predict a block of pixels in P- and B-pictures. A better prediction is now obtained for the non-intracoded pictures, resulting in better reconstruction of such pictures at the cost of a small number of bits. Hence, we improve the perceptual quality of a GOP while still adhering to the constant bit rate of the stream. If an easy picture is to be encoded, it consumes fewer bits, and the remaining bit budget is used to encode the future pictures of a GOP. Overall, the number of bits and the picture quality average out over an "easy-to-encode" GOP. It should be noted that the human observer finds image distortions in "difficult-to-encode" pictures most annoying; for long video programs, a small degradation in "easy-to-encode" pictures is tolerated. Further assessment of the argument presented in this section and comparison with the method described in Section 2 are provided in Section 5, which discusses the simulation results.

4. VBR video

The CBR rate-control strategy of the previous section is inspired by the fact that for a fixed target Cgop, constant quality is achieved within a GOP by modulating the quantization parameters. Any statistical variations or bit offshoots are exploited to help stabilize the CBR RC algorithm over time and maintain a desired rate. In addition, an MPEG-2 CBR encoder takes advantage of a set of universal constants and predetermined initial complexity measures to ensure a certain ratio between the number of bits allocated among different picture types [2]. However, efforts in classifying a group of continuous pictures, such as GOPs or video segments, into different types of time intervals in terms of complexity "hardness" or "softness" have been limited for real-time single-pass MPEG-2 encoding. We define "hardness" ("softness") of the video by the large (small) number of bits that it requires to produce high-fidelity results. For multipass CBR or VBR encoding, such information is known; hence, quality improvements can be made. One way of achieving single-pass VBR compression is to compare the Q factor, derived by the CBR algorithm, against a fixed parameter [8]:

Qvbr = max (Qfix, Q). (26)

The method in (26) is intended to provide an upper bound on the quality of pictures which belong to soft video segments. When soft segments are analyzed, it is very likely that we have (Qfix > Q). Therefore, a quantization scaler, larger than the values normally assigned by the CBR encoder, is assigned to pictures in soft segments. A VBR stream is produced by distributing the surplus bits among the hard segments of the video. Qfix may be obtained through experimentation with a large number of sequences, but finding a near-optimum value is difficult if not impossible. A better scheme can be formulated by calculating a Qfix scaler in real time using prior image statistics [7]. The concept behind the method in [7] is to combine the Qfix approach of [8] with that of the CBR RC algorithm.

Our real-time single-pass VBR encoder exploits an R-Q compression model to differentiate the degree of "hardness" or "softness" of video segments, each segment corresponding to a particular hyperbola similar to the one defined by (2). The actual encoding parameters of the video segments are computed along this hyperbola. We also use an R-Q perceptual model to prioritize the video segments in terms of visual importance. To satisfy the average rate of the VBR stream, the position of the perceptual model must be changed over time. In this paper we specify two methods for meeting this condition. The perceptual model and the VBR RC algorithms are described next.

Rate-quantization perceptual model
The VBR scheme proposed in this paper is conceptually motivated by the fact that each video segment is associated with a level of encoding difficulty, and this difficulty can be measured by various source statistics or compression parameters such as total picture bits, quantization scaler, spatial activity, temporal activity, signal-to-noise ratio, or any combination of them. A larger number of bits should be allocated to a video segment with a high level of encoding difficulty. This is a different approach from that of the CBR RC algorithm, in which a fixed number of bits is allocated to each GOP regardless of the degree of complexity of the source. Research efforts have shown that for a large number of test cases composed of complex, moderate, and easy materials, a strong correlation exists between the rate of the video interval and its corresponding quantization scaler [9]:

Ra proportional (a1 + a2Qßa). (27)

The R-Q relationship of (27) is deduced on the basis of the criterion that all types of video intervals should be perceived equally. Further, it suggests that difficult video segments producing a large quantization scaler should consume more than the average bit rate of the compressed stream, while easy segments would use smaller bits. This provides a natural building block in formulating a robust single-pass VBR encoding algorithm, since it takes advantage of the variability of the video source. The actual number of bits allocated to each interval is determined by the slope K of the perceptual model,

Ra = K(a1 + a2Qßa) = KF(Qa). (28)

Constant K (measured in Mb/s) is modulated for each video interval to ensure that the average rate of the compressed stream meets the desired target. For the case of ß = 1.0 the perceptual model reduces to a linear relationship.

VBR rate-control algorithms
The efficiency of a single-pass VBR encoder is assessed by the speed with which its rate-control algorithm can learn and adjust itself to the "softness" or "hardness" of the video stream. For regions where image discontinuities or special effects occur, degradations in picture quality should be minimized. Since for single-pass encoding, image statistics are limited by the previously analyzed pictures, the learning rate of the RC algorithm must be adequate to predict the content of the future video intervals, yet not aggressive enough to result in algorithmic instabilities. One way to solve the twofold problem is to adjust the quality of the encoded stream for every time interval and let the RC algorithm learn the local content of each picture within that time interval.

In this subsection we claim that a VBR video stream Svbr is a concatenation of several contiguous video intervals, each operating at a different CBR bit rate. We further use the GOP terminology as the definition of a video (or time) interval for the remainder of the paper, and compute the quantization parameters of the single-pass VBR algorithm for each GOP. Therefore, Svbr can be partitioned into a number of piecewise-continuous GOPs specified by {Sscript l}script l=0m. For GOP Sscript l we define an average number of bits C barscript l and an average quantization scaler Q barscript l such that [C barscript lQ barscript l] = G -1[Cscript lQscript l]. We modify the terms of (28) to form a dependency between the average number of bits and quantization scaler of GOP Sscript l,

C barscript l = f -1KF(Q barscript l). (29)

The picture rate of the video is defined by f. Equation (29) represents the perceptual rate-quantization model for a GOP. We further take advantage of the theoretical rate-quantization model and assume a hyperbolic dependency between the average number of bits and quantization scaler of a GOP as in (2). The GOP rate-quantization model takes the form of

C barscript l(Q barscript l) = chicscript l . (30)

Q barscript l

The average complexity of a GOP is given by chicscript l. To understand how the two perceptual and theoretical rate-quantization models work in harmony, we should look at their graphical representations in Figure 4. The perceptual model is overlaid on the ideal rate-quantization behavior of GOPs. Once the VBR encoder processes a few GOPs, the C bar-Q bar relationship of Equation (30) can be realized. Depending on the actual compression parameters, the model of Equation (30) characterizes a "hard" or "soft" GOP. Then the C bar-Q bar perceptual model is used to emphasize the visual importance of a "hard" GOP. The positive slope of this model ensures that "hard" GOPs are assigned more bits. A more detailed description of the VBR rate-control algorithm is presented later in this section.

Figure 4Figure 4

Since our goal is to formulate a VBR algorithm which is readily applicable for hardware implementations, we form a linear approximation for the model of Equation (30). The new model uses previously processed GOPs to construct a straight line for the C bar-Q barrelationship. We call this line theC bar-Q barpredictive model and define its slope and C barintercept by xiscript l and etascript l, respectively. Next, we explain how to derive this model.

Let (C barscript l-2,actual, Q barscript l-2,actual) and (C barscript l-1,actual, Q barscript l-1,actual) represent the average number of bits and average quantization scaler pairs of GOPs Sscript l-2 and Sscript l-1 just encoded.5 Then, theC bar-Q barrelationship for GOP Sscript l to be encoded is given by

C barscript l = -xiscript lQ barscript l + etascript l , (31)

with

curly brace xiscript l = DeltaC barscript l(DeltaQ barscript l) -1 etascript l = C barscript l-1,actual + xiscript lQ barscript l-1,actual ,
DeltaC barscript l = C barscript l-1,actualC barscript l-2,actual DeltaQ barscript l = Q barscript l-2,actualQ barscript l-1,actual .

Equation (31) defines the instantaneous rate-quantization behavior of a particular GOP under analysis and will change over time. To find the optimum operating point, we solve the causal predictive model of (31) together with the perceptual model of (29) and encode the next GOP at the average number of bits of

C barscript l = K(xiscript la1 + etascript la2) . (32)

fxiscript l + a2K

Constant K is modulated for each GOP to ensure that the total number of bits produced by the VBR stream is not more than the size of the storage or retrieval device, e.g., a DVD disc. Assume that the total number of bits available is RTOT. Then, after every GOP of the video sequence is analyzed and encoded, we can use the perceptual model of (29) to compute K:

fRTOT (33)
K =
.
G sum Ngop-1 F(Q barscript l,actual)
script l=0

The number of GOPs in the video sequence is given by Ngop. Given a number of pictures and a bit budget RTOT, the single-pass VBR encoder has the responsibility of fitting all of the produced bits into the digital medium. To prevent overruns or underruns, RTOT must be dynamically modified for each GOP. The adjusted budget Rscript l,tot is obtained by subtracting the actual bits from RTOT and is used to set a new slope,

fRscript l,tot (34)
Kscript l =
.
G sum Ngop-1 F(Q barscript l,actual)
script l=0

We call the VBR encoder which uses this method of bit assignment (or slope modulation) VBR method 1 (VBR-1). The denominator of Equation (34) can easily be computed in a multipass encoding scheme, but is not available for a single-pass real-time compression. Instead, we use a pre-encoded phantom sequence with a set of GOP quantizers defined by {Q bar*script l}script l=0N*gop-1, and adjust the summation of Equation (34) after each GOP of the test sequence is encoded. N*gop is the number of encoded GOPs in the phantom sequence. The learning procedure for (34) is formulated as follows.

Learning procedure 1

  1. Let P be a pre-encoded phantom defined by the tuple {N*gop, {Q barscript l}script l=0N*gop-1}, and set Z0 = [sumscript l=0N*gop-1 F(Q bar*script l)], N0 = N*gop.
  2. Initialize the VBR-1 RC algorithm by encoding the first two GOPs of the video sequence, i.e., S0 and S1, at a rate of (fC bargop) using a CBR MPEG-2 encoder. The nominal value of (fC bargop) should correspond to the average rate of the VBR stream.
  3. After Sscript l-1 is encoded, update Zscript l and Nscript l as Zscript l = Zscript l-1 + gammaF(Q barscript l-1,actual), Nscript l = Nscript l-1 + gamma, where gamma is the update speed of the learning algorithm. Compute Cscript l-1,actual and then the sum sumj=0script l-1 Cj,actual.
    1. If (script l – 1 = 0) implies increment script l by one, go to step 3.
    2. Otherwise, compute xiscript l and etascript l according to the causal predictive model defined in (31).
  4. Use the sum term of step 3 to determine the remaining bits Rscript l,tot in the budget. Update the number of remaining pictures Pscript l to be encoded.
  5. Measure the new slope: Kscript l = fRscript l,totNscript l(Pscript lZscript l) -1.
  6. Calculate the new target C barscript l and encode Sscript l in VBR mode.
  7. If there are unfinished GOPs in the sequence, go to step 3; otherwise stop.

Term Zscript l is intended to adapt itself to the image content of each GOP and smooth out the volatility of the VBR video. It acts as a safety measure to prohibit unrealistic bit allocations to GOPs of very high (low) complexity. Finally, the average number of bits of the GOP Sscript l to be encoded by the VBR-1 encoder is set by

C barscript l = parenthesis Rscript l,totNscript l parenthesis bracket xiscript la1 + etascript la2 bracket . (35)


Pscript lZscript l xiscript l + a2 parenthesis Rscript l,totNscript l parenthesis

Pscript lZscript l

To adjust the target bits of each picture type, script l is multiplied by the GOP size G and Equation (8) is used to set the VBR picture bits. For a VBR scenario, upper and lower picture bounds are defined differently from those for the CBR mode of operation. An overflow condition cannot occur for the decoder buffer of a VBR codec. This is because the task of filling the decoder buffer is immediately stopped after the VBV buffer occupancy reaches its maximum level. Therefore, a lower bound of zero is imposed for picture bound. Moreover, the decoder buffer is filled at the maximum rate of Rmax,6 set by the user. Figure 5 shows an example of how the occupancy of a VBV buffer changes over time for the VBR mode of compression. We clip the target bits of Equation (8) by the newly derived picture bounds

Un = min (Bvbv, Bn-1 + RmaxTn-1Rn-1,actual),

Ln = 0.

(36)

Figure 5Figure 5

Picture quantization scalers are computed as in (23) and the macroblock-level rate-control strategy of Section 2 is used to maintain the picture targets. We have now defined a single-pass VBR MPEG-2 encoder specified by {RTOT, {Ri, Qi}ni=0}. The encoder operates along constellations which are formed by jointly solving for a time-varying perceptual model and a bank of C bar-Q bar models. The relative ideal position of the (C barscript l, Q barscript l) pair of a GOP within the constellation is first determined by the "softness" or "hardness" of the GOP and then adjusted by the remaining number of bits in the bit budget. The local position of the (Ri, Qi) pair of a picture is represented by the R-Q model of the picture type. Figure 4 displays how a constellation of (C barscript l, Q barscript l) pairs is formed by the VBR-1 encoder. In this figure, the constant line defined by C bar = C bargop indicates the ideal location of all (C barscript l, Q barscript l) pairs if they were to be encoded in CBR mode. Moreover, it reflects that the average quantization scalers increase as the hardness of GOPs increases. The ideal and actual operating points are depicted by light and dark circles, respectively. Let (C bar1, Q bar1) and (C bar2, Q bar2) be the first two pairs of a "soft" GOP, which are obtained by initialization of the VBR-1 encoder in the beginning of a video sequence. The next ideal operating point, denoted by "3," is the intersection of the perceptual model (solid line) and the causal predictive model. The position of point "3" indicates that an average number of bits smaller than the sequence average C bargop is allocated. However, the output of the encoder, i.e., point (C bar3, Q bar3), will be different, and as a result, a new perceptual model (shown by the dotted line) is derived to meet the bit budget constraint. The pairs (C bar2, Q bar2) and (C bar3, Q bar3) are then used to obtain a new predictive model (shown by the dotted line) and compute a new operating point, i.e., "4." If the video sequence contains a large number of contiguous "soft" GOPs, the perceptual model will eventually converge to a line which intersects theC bar = C bargop line in close proximity to the CBR output pairs (C bar1, Q bar1) and (C bar2, Q bar2).

If the video material starts with "hard" GOPs, the location of the actual point (C bar3, Q bar3) is above the C bar = C bargop line. This ensures the optimization of the available bit budget for regions where a CBR encoder is most vulnerable, i.e., "hard" GOPs. In this scenario, a new perceptual model (shown by the dashed line) is formed to gradually lower the allocated GOP bits and comply with the bit budget constraint. This model, along with the new causal predictive model (also shown by the dashed line), determines the next operating point, denoted by "4." For the case in which the incoming sequence is composed only of "hard" GOPs, the perceptual model will eventually conform to a line which meets theC bar = C bargop line at a location close to the (C bar1, Q bar1) and (C bar2, Q bar2) points. For a typical video program, it is unlikely that we operate along the same GOP (or a constellation of previously computed points) for a long duration of time. It is, however, likely that we jump out of a GOP to the next neighboring GOP after a short time. Therefore, we can deduce the following behavior. Before the perceptual model settles in a situation where it can monotonously take away from "hardness" or "softness" of a GOP, the RC algorithm will migrate to a new GOP. The actual quantization scaler values for previously encoded picture types determine the migration to a "harder" or "softer" GOP. For a future "harder" ("softer") GOP, we move to the right (left) of the previously encoded GOP as depicted in Figure 4. Hence, the point at which we intersect the perceptual model produces a number of GOP bits which is higher (lower) than the number of actual bits consumed by a previous GOP. This mechanism ensures that, for a collection of contiguous GOPs of similar "level of encoding difficulty," the effect of the "hardness" (or "softness") reduction or augmentation is distributed over the encoded content of the corresponding video interval, while the most (least) complex GOP still gets the largest (smallest) bit allocation relative to its lookalike GOPs.

While the VBR-1 encoding framework is based on modulating the slope K of the perceptual model, an alternative VBR RC algorithm can be formulated by translation of the perceptual model to meet the total number of bits set by the user. This method is labeled as VBR-2 and is defined next. For the VBR-2 bit-modulation scheme, we fix the slope K at a CBR rate of fC bargop and translate the position of the C bar-Q bar perceptual model by solving for parameter a1 of (29) through

RTOTCgopa2 sum Ngop-1 Q barscript l (37)
script l=0
a1 =
.
CgopNgop

The sum term in the above equation is undefined for the real-time single-pass VBR-2 encoder. We use a pre-encoded phantom sequence (as in the previous algorithm) to initialize the VBR-2 RC algorithm. To ensure that the digital storage medium does not experience overruns/underruns, we monitor the remaining bits periodically and use them as the instantaneous total bit budget, i.e., Rscript l,tot. Translation of the perceptual model is varied as

a1script l = Pscript l -1Rscript l,tot a2 Yscript l = rscript laa2Escript l . (38)


C bargop Nscript l

Term Rscript la represents a ratio between the long-term rate of the GOPs not yet encoded and the CBR rate of the video stream. For "hard" GOPs we have (C barscript l > C bargop) and, therefore, ratio Rscript la must become smaller over time to satisfy the total bit budget. Further enforcement is provided by Escript l to move the position of the (C barscript l,Q barscript l) pairs downstream, along the constellation, to lower the rate of the compressed stream. The opposite scenario takes place if several soft GOPs are encoded over time. The average number of bits of each GOP is set as

C barscript l = xiscript la1script l + etascript la2 . (39)

xiscript lC bar -1gop + a2

Term Yscript l is initialized and updated during a learning procedure given below. The rest of the encoding parameters are defined with the formulation of the VBR-1 algorithm.

Learning procedure 2

  1. Let P be a pre-encoded phantom defined by the tuple {N*gop, {Q barscript l}script l=0N*gop-1}, and set Y0 = (sumscript l=0N*gop-1 Q bar]*script l), N0 = N*gop.
  2. Initialize the VBR-1 RC algorithm by encoding the first two GOPs of the video sequence, i.e., S0 and S1, at a rate of (fC bargop) using a CBR MPEG-2 encoder. The nominal value of (fC bargop) should correspond to the average rate of the VBR stream.
  3. After Sscript l-1 is encoded, update Yscript l and Nscript l as Yscript l = Yscript l-1 + gamma Q barscript l-1,actual, Nscript l = Nscript l-1 + gamma, where gamma is the update speed of the learning algorithm. Compute Cscript l-1,actual and then the sum sumj=0script l-1 Cj,actual.
    1. If (script l – 1 = 0) implies increment script l by one, go to step 3.
    2. Otherwise, compute xiscript l and etascript l according to the causal predictive model defined in (31).
  4. Use the sum term of step 3 to determine the remaining bits Rscript l,tot in the budget. Update the number of remaining pictures Pscript l to be encoded.
  5. Compute Rscript la, Escript l, and the new translation factor a1script l.
  6. Calculate the new target C barscript l and encode Sscript l in VBR mode.
  7. If there are unfinished GOPs in the sequence, go to step 3; otherwise stop.

Constraints on the VBV buffer occupancy and target adjustments are handled as before. Figure 6 displays how the instantaneous causal predictive models are formed within each GOP and used in conjunction with a time-varying perceptual model to estimate a new (C barscript l, Q barscript l) point for the VBR-2 encoder. If several soft (or hard) GOPs are encoded over time, the perceptual model will be shifted upward (or downward) to meet the bit budget constraint. In Section 5 we evaluate the efficiency and robustness of the single-pass VBR compression schemes by encoding several video sequences.

Figure 6Figure 6

Scene transitions
The reliability of any type of real-time single-pass RC algorithm is directly related to the number of pictures processed after the encoder start-up. The encoder is initialized with a set of statistical information which is updated over time as the encoder learns and adjusts itself to the spatial or temporal perturbation of image details. As a result, the output stream may suffer some degradations at the beginning of the video sequence. This distortion is often ignored, since the human visual system (HVS) requires a recovery time to comprehend changes in the image distortion [10], and, more significantly, no past encoded pictures are available as a reference at the beginning of the video. After a few GOPs are encoded, the learning curve of the RC algorithm reaches an equilibrium level and produces streams which display acceptable image quality upon decoding.

During the course of the encoding procedure, the RC algorithm can become unreliable. This is usually caused by special effects or naturally occurring phenomena such as scene cuts, slow- or fast-moving fades, and luminance changes. For such cases, the encoder scheme undergoes temporal transitions for which parameter adjustments become nearly impossible. These image discontinuities typically result in serious image degradations which the human observer finds very distracting. One way to overcome this problem is to detect the temporal position of the special effect in the video and replace the quantization parameters with a properly adjusted set. The former is easily derivable for hard scene cuts but can become complicated for fades, while the latter requires a new framework for deriving a new set of target rates and quantization scalers.

The existence of image discontinuities is even more troublesome for a VBR video because of the volatile nature of the compressed video parameters. Since our intention is to formulate a VBR RC algorithm offering continuous picture quality, we seek a VBR encoding scheme which displays graceful degradations when special effects are compressed. We accomplish this by detecting scenarios in which, as a result of scene transitions, the causal predictive model cannot provide a faithful estimation. Under these circumstances we set the GOP targets along a trajectory different from the procedure described in the subsection on VBR rate-control algorithms. Figure 7 shows two cases (for the VBR-1 video encoder) in which transitions from "soft" GOP to "hard" GOP and vice versa make incorrect estimations for a future GOP to be encoded. The incorrect estimation is denoted by "bad operating point" in the figure. A much better estimation can be made by applying a three-point median filter to a set of carefully selected parameters. For the VBR-1 RC scheme, the new targets are derived using the following rule:

Figure 7Figure 7

Cscript l,new = G curly brace median 1f -1Kscript lF(Q barscript l-1,actual), µ2C bargop, µ3f -1Kscript l]
if (xiscript l < 0),
C barscript l else;
(40)

for the VBR-2 RC scheme, the targets are

Cscript l,new = G curly brace median bracket µ4C bargop(a1script l + a2Q barscript l-1,actual), µ5C bargop, µ6 Rscript l,tot bracket

Pscript l
if (xiscript l < 0),
C barscript l else;
(41)

The median filter is used to compensate for instability conditions that occur in the transition of video scenery among all types of GOPs, i.e., "soft" to "hard" and vice versa. The constant set {µi|1 less or equal to i less or equal to 6} enforces a bound on the number of bits produced during a scene transition. For the "soft"-to-"hard" GOP transition, the output of the filter in Equation (40) is marked by "1" in Figure 7, while point "2" is the new estimate for the reverse transition. The nonlinear nature of the filtering schemes in (40) and (41) is an effective way of controlling the rate of the VBR encoder for higher-order temporal changes. Further, it does not require a local (picture-level) scene-change detector. However, if the encoder is equipped with a scene-change detector, we may take advantage of the GOP target allocation of Equations (40) and (41).

C bar-Q bar irregularities
The R-Q relationship of Equation (2) is known to work fairly well for a diverse class of video sequences and is a fundamental concept used for many real-time MPEG and non-MPEG compression systems. In this paper we have used it to build a causal predictive model to estimate the average bits of a present GOP to be encoded on the basis of the actual average number of GOP bits and average quantization factor of previous GOPs. However, this predictive model can become unreliable for some video segments even when there are no special effects, scene transitions, or scene cuts. The cause of this unreliability is related to the nonstationary nature of video sources, e.g., several highly detailed objects moving slowly across a background of significant luminance or chrominance image details, the presence of unwanted noise in the scene, etc. Figure 8 displays two points derived by an encoder, with one being outside the composite C bar-Q bar curve. This point is a result of the aforementioned criteria and does not obey the C bar-Q bar relationship. Further, it will contribute to the calculation of a "bad operating point." Our VBR encoder suppresses this irregularity by making an additional contribution to the strategies defined in (40) and (41). We set the GOP bits according to the condition

Cscript l,new = curly brace max (Cscript l,new, GphimaxC barscript l,fil)
  if (xiscript l < 0)tepee(thetaL < |DeltaQ bar| < thetaU)\tepee(C barscript l,fil > C bargop),
min (Cscript l,new, GphiminC barscript l,fil)
else if (xiscript l < 0)tepee(thetaL < |DeltaQ bar| < thetaU)tepee(C barscript l,fil < C bargop),
GC bargop
else if (xiscript l < 0)tepee(thetaL < |DeltaQ bar| < thetaU)tepee(C barscript l,fil = C bargop),
(42)

with

C barscript l,fil = omega1C barscript l-2,actual + omega2C barscript l-1,actual . (43)

omega1 + omega2

Figure 8Figure 8

C barscript l,fil is an average number of GOP bits which is determined by applying a linear filter on the average number of bits of the previously encoded GOPs. The output of this filter is the new operating point, and (omega1, omega2) represent the weights of the filter. Constants phimax and phimin are used to scale the filter output. Thresholds thetaL and thetaU are chosen under the assumption that a future GOP belongs to the same video segment as the already processed GOPs. The max and min functions are implemented to filter out the encoding bit parameters that result in C bar-Q bar irregularities.

5. Simulation results and discussion

The performance of the proposed single-pass CBR and VBR video encoders is evaluated by compressing a library of test sequences. This library contains a diverse class of composite sequences, as described in Table 1. The video sequences are digitized according to the CCIR 601 resolution [11], with a color sampling ratio of 4:2:2. A 4:2:0 color sampling ratio is created by applying preprocessing filters on the chrominance samples of each source. The compression procedure is carried out in 4:2:0 mode using MPEG-2-compliant main profile at main level encoder software developed at our research laboratory. Since optimizing the communications bandwidth or the digital storage medium is very important for bro