|
1. Introduction
In typical broadcast systems, such as in direct broadcast satellite (DBS) applications, multiple video programs are encoded in parallel, and the digitally compressed bitstreams are multiplexed onto a single, constant bit rate channel. The simplest approach to this multiprogram encoding is to divide the available channel bandwidth equally among all programs. This method has the disadvantage that at any instant in time, the resulting quality of the video programs is uneven because of the different scene content of the programs and changes of scene content over time. The explanation for this result lies in the rate-distortion theory [1]. To achieve equal video quality (i.e., equal distortion) for all programs, the available channel bandwidth should be distributed unevenly among the programs, namely, in proportion to the information content (e.g., complexity) of each of the video sources. Thus, the objective of statistical multiplexing is to dynamically distribute the available channel bandwidth among the video programs in order to maximize the overall picture quality of the system. This is achieved by using a joint rate-control algorithm that guides the operation of the individual encoders based on a continuous monitoring of the scene content of each of the video sources.
Basically, two different approaches can be distinguished for joint rate control: the feedback approach and the look-ahead approach. In the feedback approach [2-4], statistical measurements of video complexity are generated by the encoders as a by-product of the compression process. The statistics from all encoders are compared and used to control the bit allocation for the subsequent video. In the look-ahead approach [3, 5], the complexity statistics are computed by preprocessing all video programs prior to encoding. These statistics are then used to more accurately predict the bit rate allocation needed for optimum compression of the video sources in the rate-distortion sense. Finding the best statistics to describe the complexity of a program is a challenging task. In the feedback approach, the statistics are limited primarily to coding-related parameters. The look-ahead approach provides more freedom of choice, but at the price of extra computational complexity and additional cost. In either case, the main feature of the statistical multiplexing (stat-mux) system is that each encoder will produce a variable rate bitstream [6].
In this paper, we propose a solution for statistical multiplexing of MPEG-2 compressed video programs [7]. In particular, an external joint rate control algorithm is proposed that dynamically allocates bit rates for the program encoders using the feedback approach. In our algorithm, the bit rate for a given program encoder is updated only at boundaries of groups of pictures (GOPs), or when a scene change is detected in the given program. This strategy allows the encoders to operate in a constant bit rate mode within the GOPs, resulting in piecewise-variable bit rate bitstreams. In contrast to previously published works in this area [2-6], the MPEG encoders in the proposed system are not required to have identical GOP structures. GOP boundaries may occur at arbitrary times in each encoded bitstream. Furthermore, for scene changes, a new GOP is started dynamically at or near the beginning of each new scene, ensuring quick reaction to video complexity changes. Because of these features, a channel buffer and a corresponding buffer control feedback loop are required in the proposed system.
In Section 2 we describe the proposed multiprogram video compression system. The joint rate control algorithm is presented in Section 3. The strategy for joint rate control in the event of scene change is described in Section 4. Determination of the minimum channel buffer size and the corresponding channel buffer control algorithm is given in Section 5. Finally, in Section 6, we present the experimental results obtained by computer simulations of the proposed system, followed by conclusions.
2. Multiprogram video compression system
Figure 1 shows an example multiprogram video compression system using the proposed feedback approach to joint rate control. The system consists of several MPEG-2 video encoders, buffers connected to each encoder, a joint rate controller, a multiplexer, and a channel buffer. The encoders produce bitstreams compatible with the MPEG-2 standard [7]. Along with the compressed bitstream, each encoder generates statistics related to the picture that has just been encoded. No preprocessing of the input sources is required, with the exception of scene change detection, which may be performed by the encoders or external to them.
Figure 1
The bit rate of each encoder is determined dynamically by the joint rate controller on the basis of the relative complexities of the programs and the occurrence of scene changes in the programs. Coding statistics generated by the encoders are input to the joint rate controller. The joint rate controller calculates the relative complexities of the programs and the bit rates based on these statistics. According to the proposed joint rate control algorithm, each encoder changes its bit rate only at GOP boundaries or near scene changes, where new GOPs are inserted. If a scene change does not occur, bit rate changes may still take effect at any GOP boundary. The reason for this is that the calculation of the program bit rates from GOP to GOP is based on the relative complexities of the programs. The joint rate controller acts to minimize the deviation of the sum of the program bit rates from the predefined channel bit rate. This scheme allows the encoders to operate at a constant bit rate (CBR) within the GOPs using the CBR video buffer verifier model according to the MPEG-2 standard [6]. Overall, it results in a piecewise variable bit rate compression.
We emphasize that the encoders are not restricted to identical GOP structures and lengths. Since GOP boundaries in the encoders are not aligned in time and bit rates of each encoder are changed only at GOP boundaries, there are time intervals during which the sum of the individual bit rates is higher or lower than the predefined channel bit rate. To compensate for this occasional deviation from the channel bandwidth, a channel buffer is included in the system. Furthermore, feedback of channel buffer occupancy, or "fullness," is incorporated into the joint rate control algorithm to prevent channel buffer overflow or underflow.
All MPEG-2 encoders used in the proposed multiprogram video compression system must be capable of providing at least the necessary coding statistics required by our joint rate control algorithm. In addition, encoders must have the ability to change bit rates at GOP boundaries. To further exploit the advantages of the proposed system in the event of scene changes, encoders must be able to change GOP structure dynamically, carry out scene change detection and reaction internally, or react to external scene change detection. As an example, IBM's commercially available single chip MPEG-2 encoders fulfill the above requirements [8].
The following sections describe in more detail the proposed joint rate control algorithm, the required minimum channel buffer size, and the corresponding channel buffer control.
3. Joint rate control
The proposed joint rate control algorithm is based on the feedback concept. Statistics are produced by the encoders along with the compressed bitstream. These statistics are continuously fed into the joint rate controller from each encoder after compression of a picture. These coding statistics, together with the information on channel buffer fullness, are used to dynamically compute the bit rate allocation for the individual encoders. The bit rate of a program is proportional to the ratio between the complexity of that program and the sum of the complexities of all programs:
where Ri is the bit rate of program i, Rc is the channel rate, and Xi is the complexity of program i.
While other measures of video complexity are possible, in our algorithm the complexity of a picture is derived from the bit production model of MPEG-2 Test Model 5 [9]:
where the model parameter cj is such that in order to produce a target number of bits bj in picture j, the target quantization scale is set to Qj. Using Equation (2), the bit rate of program i can be calculated for a GOP display time interval as
|
|
(cij/Qij)
|
|
j
|
|
Ri =
|
|
,
|
(3)
|
|
Ni/fi,
|
where cij is the bit production model parameter for picture j, Qij is the quantization parameter for picture j, Ni is the number of pictures in a GOP, and fi is the frame rate of program i. In a stat-mux system, we wish to distribute the channel bandwidth among the programs such that
|
Ri Rc.
|
(4)
|
|
i
|
To achieve the goal of equalizing the picture quality of all programs, an ideal quantization parameter can be derived by using Equations (3) and (4):
|
Qideal =
|
1
|
|
|
(fi/Ni)
|
|
cij
|
|
.
|
(5)
|
|
|
Rc
|
|
i
| |
j
|
This ideal quantization parameter can result in equal picture quality for all pictures in each program. By substituting Qideal for Qij in Equation (3), the bit rate for a GOP in program i is calculated as
|
Ri = Rc
|
(fi/Ni)
|
|
cij
|
.
|
(6)
|
|
j
|
|
|
|
(fi/Ni)
|
|
cij
|
|
|
i
| |
j
|
In the proposed stat-mux system, cij is equal to bijQij, where bij is a bit used for encoding picture j and Qij is the average quantization parameter in that picture. The complexity of a particular program is estimated as the average of the picture complexities over a sliding window of the GOP size of that program.
Equation (6) is used to determine dynamically the bit rates for each GOP of each encoder. As was explained previously, bit rate changes may occur in a program at any of the GOP boundaries, even if a scene change does not take place in that program. If bit rate changes are too abrupt in a program with no scene cut, the picture quality may vary significantly from GOP to GOP. Although the total quality of the system may improve, a noticeable change in picture quality between GOPs at the same scene is not desirable. To prevent this situation, a limit is placed on bit rate changes between GOPs of the same scene. In our experiments we allow a change of no more than 10% relative to the previous bit rate at the GOP boundary if no scene change occurs. If a scene change does occur, no limitation is placed on bit rate change.
4. Joint rate control at scene changes
In a program, scene changes may occur at any time. They may happen for any picture type and at any GOP position. If we assume that each encoder has its own fixed GOP structure and length and that bit rate changes are effective only at GOP boundaries, reaction of the system to complexity changes in the source programs may be slow because of the placement of the scene change within the GOP. To reduce the reaction time of the system to scene changes, the following strategy is set forth.
Let us assume that scene changes can be detected accurately either inside the encoders or externally, and that the location of the first picture of the new scene is known prior to the encoding of this first picture. Whenever a scene change is detected, the current GOP is ended prematurely. The first picture in the new scene is encoded as the last picture of the truncated GOP, because its statistics are used to predict the complexity of the new scene. These statistics are also used to calculate the bit rate for the first GOP of the new scene using Equation (6). This strategy allows a more insightful setting of the bit rate for the new scene, compared with depending upon default complexity values or average bit rate at the onset of the new GOP. Figure 2 shows the original GOP structures and the new ones as scene changes occur. Three cases are distinguished by picture type at scene change occurrence.
Figure 2
The prediction of the new-scene complexity is based on the complexity of the first picture of the new scene and on empirically determined ratios among the complexities of the different picture types. If the picture type1 of the first picture of the new scene (which is the last picture of the truncated GOP) is P, every macroblock is encoded as an intra-macroblock, and the complexity is considered that of an I-picture. On the basis of this I-complexity, the average complexity of the new scene, Xi, is estimated as Xi
|
Xi =
|
XI(1 + rPnP + rBnB)
|
,
|
(7)
|
|
|
Ni/fi
|
where XI is the complexity of the I-picture, nP and nB are the number of P- and B-pictures in a GOP, and rP and rB are the ratios of the P- and B-picture complexities with respect to the I-picture complexity. Typical values of rP and rB are 0.5 and 0.25, respectively. The complexity Xi is used in Equation (6) for the bit-rate calculation of the new scene. As more pictures are encoded in the first GOP of the new scene, the complexity is continuously updated by applying the actual bit count and average quantization parameters used to encode the pictures.
Previously it was stated that the encoders are running in CBR mode inside the GOPs and that each encoder uses a CBR video buffer verifier model. No buffer underflow or overflow is allowed. Often, a goal of CBR rate control algorithms is to ensure that the buffer fullness at the end of a GOP is the same as the initial buffer fullness (e.g., 80% of the buffer size) prior to encoding the first picture of a sequence. However, this goal is not often achieved because of a mismatching of the target bit budget and the actual bits used per picture. Because of the overproduction or underproduction of bits in a GOP, the buffer fullness will be under or over the initial level at the end of the GOP, respectively. A considerable buffer fullness error may accumulate, resulting in a large bit surplus or deficit carried over to the next GOP. This rate control strategy works well if little or no bit rate change takes place at GOP boundaries. However, if abrupt bit rate changes do occur, a buffer fullness error (BFE) strategy is developed to further improve the picture quality at scene changes.
If a scene change is detected, the BFE is considered to be zero for the bit allocation of the first picture of the first GOP in the new scene. In this case, to prevent underflow or overflow of the encoder buffers, the bit rate calculated for this first GOP of the new scene must be modified by the BFE as
|
Rimod = Ri + E(fi/Ni),
|
(8)
|
where Ri is the calculated bit rate for program i according to Equation (6), E is the number of BFE bits to be eliminated, fi is the frame rate for program i, and Ni is the number of pictures in a GOP. The bit rate of the program increases if the BFE is positive (the buffer fullness at the beginning of the GOP was less than the initial fullness at the start of encoding), or decreases if E is negative. This BFE strategy enhances overall picture quality at scene changes.
5. Channel buffer size and feedback control
Because the encoders can operate at different GOP lengths and structures, or may start to encode at different times, there may be time intervals when the sum of the individual bit rates is larger or smaller than the predefined channel bit rate. To remedy this, a channel buffer is required to output the multiplexed bitstream at exactly the channel bit rate. Two issues must be considered with respect to this buffer: One is the determination of the minimum size of the buffer, and the other is a control strategy to prevent channel buffer underflow and overflow.
Let us assume that the maximum total deviation of the sum of program bit rates from the channel bit rate is Rmax. In this calculation, it is valid to use the sum of the individual program bit rates because the bitstreams from each encoder are fed into each corresponding encoder buffer. These buffers output the bitstreams at exactly the calculated program bit rates, regardless of any bit rate fluctuations inside the GOPs. In the worst case, the maximum duration of this deviation can be as large as the longest GOP time among the encoders. For this case, the required minimum size of the channel buffer is determined as
Bs = 2 Rmaxtgopmax,
|
(9)
|
where
Rmax =
|
|
Ri Rc
|
|
i
|
and tgopmax is the maximum GOP time. In Equation (9) a factor of 2 is used because both underproduction and overproduction of the channel bit rate are allowed. It is assumed that at first the buffer is filled to half of its size, Bs, after which it continuously outputs the multiplexed bitstream at the rate of Rc. In this case the time required to fill the buffer to half of its size represents the initial delay. As an example, using Equation (9), if the channel buffer output bit rate is 16 Mb/s, Rmax is 8 Mb/s, and tgopmax is 0.5 s, the minimum buffer size is 8 Mb. For this example, the corresponding initial delay is 0.25 s at a frame rate of 30 frames/s. Note that if a smaller channel buffer than the one determined by Equation (9) is desired for use in the stat-mux system, the maximum total deviation from the channel bit rate must be limited accordingly.
To prevent channel buffer underflow or overflow, the buffer model shown in Figure 3 is used. The channel buffer model includes predefined guard bands at the top and the bottom of the buffer. These guard bands are used to regulate the distribution of the bit rates. To prevent underflow and overflow, the buffer fullness Bf at any time must fulfill
Figure 3
The parameter a determines the size of the guard band; for example, a may have a value of 0.25.
The three possible buffer fullness cases and their corresponding bit rate modifications are listed in the following subsections.
Case 1
Buffer fullness falls between the guard bands:
aBs Bf (1-a)Bs.
In this case the calculated bit rates for the programs are generally not modified, except in extreme circumstances.
If Ri > Rc and Ri Rc > (Bs Bf)/tgopmax,
then Ri = Ri{Rc + [(1-a)Bs Bf]/tgopmax} Ri
(no overflow);
if Ri < Rc and Rc Ri > Bf/tgopmax,
then Ri = Ri[Rc (Bf aBs)/tgopmax] Ri
(no underflow);
otherwise: no action.
Case 2
Buffer fullness falls in the upper guard band:
Bf > (1-a)Bs.
In this case we allow only bit rate changes which will decrease the buffer fullness or maintain the current Bf.
If Ri > Rc,
then Ri = Ri Rc Ri
(scaling down);
if Ri < Rc and Rc Ri > Bf/tgopmax,
then Ri = Ri[Rc (Bf aBs)/tgopmax] Ri
(no underflow);
otherwise: no action.
Case 3
Buffer fullness falls in the lower guard band:
Bf < aBs.
In this case we allow only bit rate changes which will increase the buffer fullness or maintain the current Bf.
If Ri < Rc,
then Ri = Ri Rc Ri
|