2.1. Introduction
High-Frequency Reconstruction/Regeneration (HFR), or BandWidth Extension (BWE), techniques have been researched in the speech coding community for decades [2, 3]. The underlying hypothesis stipulates that it should be possible to reconstruct the higher frequencies of a signal given the corresponding low-frequency content only. In the speech coding community this research was done with the goal to be able to accurately reconstruct the high-band of a speech signal given only the low-pass filtered low-band signal and no other a priori information about the high-band of the original signal. Typically the high-band was recreated by upsampling of the low-band signal without subsequent low-pass filtering (aliasing), or by means of broad-band frequency translation (single side-band modulation) of the low-band signal [2, 3]. The spectral envelope of the recreated high-band was either simply whitened and tilted with a suitable roll-off at higher frequencies, or in more elaborate versions [4] estimated by means of statistical models. This research has not led to any wide adoption of such an HFR-based speech enhancement in the market as of today.
The original SBR technique (of which the development started in early 1997) differs from previously known HFR techniques [5, 6].
(i)The primary means for extending the bandwidth is transposition, which ensures that the correct harmonic structure is maintained for single- and multipitched signals alike.
(ii)Spectral envelope information is always sent from the encoder to the decoder making sure that the spectral envelope of the reconstructed high-band is correct [7].
(iii)Additional means such as inverse filtering, noise, and sinusoidal addition, guided by transmitted information, compensate for shortcomings of any bandwidth extension method originating from occasional fundamental dissimilarities between low-band and high-band [8, 9].
These features successfully enabled the use of a bandwidth extension technique not only for speech signals but for arbitrary signals. The fundamental topology of a system employing SBR is shown in Figure 1. An audio input signal is first processed by an SBR encoder, resulting in a low-pass filtered audio signal and SBR data. The audio signal is subsequently encoded using a core encoder. Finally, the SBR data and the core-coder output are combined into an output bit stream. The decoder performs the reverse process.
Since the HFR method enables a reduction of the core coder bandwidth and the HFR technique requires significantly lower bit rate to code the high-frequency range than a waveform coder would, a coding gain can be achieved by reducing the bit rate allocated to the waveform core coder while maintaining full audio bandwidth. Naturally, this gives the possibility to decrease the total data rate by lowering the crossover frequency between core coder and the HFR part. However, since the audio quality of the HFR part cannot scale towards transparency, this crossover frequency is always a delicate tradeoff between core coder and HFR related artifacts.
This paper only covers SBR in the MPEG context, where it is standardized for use together with AAC, forming the (High Efficiency) HE AAC Profile. However, the algorithm and bit stream are essentially core codec agnostic, and SBR has successfully been applied to other codecs such as MPEG Layer-2 [10] and MPEG Layer-3 (the latter case is known as mp3PRO, see [11]), it is included in (High Definition Codec) HDC, that is, the proprietary codec used by iBiquity, and is standardized within (Digital Radio Mondiale) DRM for use together with the CELP and HVXC speech codecs [12]. Furthermore, it is worth noting that the transposition method included in the MPEG-4 standard is a carefully selected tradeoff between implementation cost and quality, relaxing the strict requirements on harmonic continuation that are met by more advanced transposition methods.
2.2. System Overview
2.2.1. SBR Encoding Process
Overview
Following the general process of MPEG to standardize transmission formats and decoder operation (and hence allowing future encoder-side improvements) the SBR amendment contains an informative (as opposed to normative) encoder description. Hence this section gives a generic overview of the various elements of an encoder; the exact design of these elements is left up to the implementer. However, for detailed information on a realization of the encoder capable of high perceptual performance, the 3GPP specification of the SBR encoder is a good source, see [13].
The basic layout of an SBR encoder is depicted in the block diagram of Figure 2. Central to the operation of both encoder and decoder are dedicated, complex-valued filter banks of the (Quadrature Mirror Filter) QMF type. The encoder has an analysis bank per input channel, and the decoder has an analysis and synthesis pair per channel. Most of the SBR processing, such as encoder-side parameter extraction and decoder-side bandwidth extension and spectral envelope adjustment, is performed in the QMF domain.
QMF Analysis
The original time-domain input signal is first filtered in a 64-channel analysis QMF bank. The filter bank splits the time-domain signal into complex-valued subband signals and is thus oversampled by a factor of two compared to a regular real-valued QMF bank [14]. For every 64 time-domain input samples, the filter bank produces 64 subband samples. At 44.1 kHz sample rate this corresponds to a nominal bandwidth of 344 Hz, and a time resolution of 1.4 ms. All the subsequent modules in the encoder operate on the complex-valued subband samples.
Transient Detection
A transient detector (part of the "Control parameter extraction" in Figure 2) operates on the complex-valued subband signals in order to assist the envelope estimator in the time/frequency (T/F) grid selection. Generally, longer time segments of higher frequency resolution are produced by the envelope estimator during quasistationary passages, while shorter time segments of lower frequency resolution are used for dynamic passages. The transient detection is, for example, accomplished by calculating running short-term energies and detecting significant changes.
T/F Grid Selection and Envelope Estimation
The estimated envelope data are obtained by averaging of subband sample energies within segments in time and frequency. The time borders of these segments are determined mainly by the output from the transient detector, and are subsequently signaled to the decoder. When the transient detector signals a transient to the envelope estimator, segments of shorter duration in time are defined by the envelope estimator, starting with a minimal segment, the leading border of which is placed at the onset of the transient. Subsequent to the short-time segment by the transient, somewhat longer segments are used to correctly track a potential decay of the transient, and finally long segments are used for the stationary part of the signal.
The main objective is to avoid pre- and postechoes that otherwise would be induced by the envelope adjustment process in the decoder for transient input signals.
The envelope estimator also decides on the frequency resolution to use within each time segment. The variable frequency resolution is achieved by employing two different schemes for grouping of QMF samples in frequency: high resolution and low resolution, where the number of estimates differs by a factor of two. In order to reduce instantaneous peaks in the SBR bit rate, the envelope estimator typically trades one high-resolution envelope for two low resolution ones. The grouping in frequency can be either linearly spaced or (approximately) log spaced where the number of bands to use per octave is variable. An example of a T/F grid selection is given in Figure 3 where the grid is superimposed on a spectrogram of the input signal. As is clear from the figure, the time resolution is higher around the transient events, albeit with lower frequency resolution, and vice versa for the more stationary parts of the signal.
Although the segment borders can be chosen with a high degree of freedom, the temporal resolution, as well as the frequency resolution, is constrained by the analysis QMF bank resolution. The filter bank is designed to provide a resolution in both time and frequency that is considered adequate for the adjustment of the envelope for all signal types. Hence the filter bank resolution is not adaptive, as is usually the case for filter banks in perceptual waveform coders, and the estimates are achieved by, within a filter bank of fixed size, adaptively grouping and averaging of subband sample energies as outlined above.
Noise Floor Estimation
An important aspect of the SBR encoder is to assess to which extent the tonal-to-noise ratio of the reconstructed high-band will be correct. For this purpose, the encoder estimates the amount of additional noise that needs to be added at the decoder side after regeneration of the high-band. This is done in an analysis-by-synthesis fashion. In Figure 4 such an analysis-by-synthesis process is illustrated. In the top panel of the figure a spectrum of the input signal is given. In this particular example the input signal is a synthetically generated test signal of which the tonal (harmonic) structure ends abruptly above 5.5 kHz. The remaining spectrum of the signal consists of noise. In the lower panel of the figure a spectrum is given of the high-band given the HF generation method used in the decoder, without additional correction of tonal-to-noise properties. In this case, the tonal structure of the low-band has propagated to the high-band (the region from 5.5 kHz to 15 kHz) and hence within the region of 5.5 to 15 kHz, there is a mismatch in signal characteristics between original input and reconstructed high-band signal. The transmission of additional noise information allows correction of such mismatches. It should be noted that the spectrum in the lower panel illustrates the low-band signal in combination with the high-band signal after HF generation without any subsequent envelope adjustment.
Missing Harmonics Detection
Similarly to the above situation, the encoder also needs to assess whether strong tonal components in the original high-band signal will be missing after the high-frequency reconstruction. In Figure 5 an example is given where three strong tonal components are not reconstructed by the high-frequency regeneration based on the low-band signal. Again an analysis-by-synthesis approach can be beneficial. For this example a glockenspiel signal is used. In the upper panel of Figure 5 the spectrum for the input signal is given, where three strong tonal components in the high-band are indicated by circles. In the lower panel of Figure 5 the spectrum of the HF-generated signal is given similarly to the example in Figure 4. Clearly the three strong tonal components will not be properly regenerated by the HF generator, and therefore need to be replaced by sinusoids generated separately in the decoder. Information on the (frequency) location of these strong tonal components is transmitted to the decoder, and the missing components are inserted in the high-band signal.
Quantization and Encoding
The SBR envelope data, tonal component data, and noise-floor data are quantized and differentially coded in either the time or frequency direction in order to minimize the bit rate. All data is entropy coded using Huffman tables. Details about SBR data coding are given in the next section.
2.2.2. SBR Bit Stream
Overview
To ensure consistent coding of transients regardless of localization within codec frames, the SBR frames have variable time boundaries, that is, the exact duration in time covered by one SBR frame may vary from frame to frame. The bit stream is designed for maximum flexibility such that it scales well from the lowest bit rate applications up to medium and high bit rate use cases, and is easy to adapt for different core codec frame lengths. Furthermore, it is possible to trade bit-error robustness against increased coding efficiency by selecting the degree of interframe dependencies, and the signaling scheme offers error detection capabilities in addition to a Cyclic Redundancy Check (CRC).
2.2.3. SBR Decoding Process
Overview
The block scheme of the SBR decoder is given in Figure 6. The bit stream is input to the core decoder providing the low-band signal, and the SBR relevant bit stream to the SBR decoder. The SBR decoder performs a 32 subband analysis of the low-band signal, which is subsequently used, along with control data from the bit stream, by the HF generator to create the high-band signal. The envelope of the recreated high-band signal is subsequently adjusted and additional signal components are added to the high-band. The combined low-band and high-band are finally synthesized by a 64 subband QMF synthesis filter bank in order to obtain the time-domain output signal. The analysis and synthesis filter banks are constructed such that an upsampling of the low-band signal by a factor of two is inherently obtained in the processing. A detailed description of the decoder can be found in the MPEG-4 Audio standard [15]. In the following, we merely outline the various decoding steps.
An example is given in Figure 7. The original input signal spectrum is shown in the top-left panel. The spectrum of a low-band output from the AAC core decoder is given in the top right panel of Figure 7. It is clear that the signal is low-pass filtered at approximately 6 kHz which is the bandwidth covered by the core coder for the setting corresponding to the bit rate used in this example. It should be noted that in the figure the signal has been upsampled to the sampling frequency of the original signal (and also that of the final output signal) in order to allow for spectrum comparison.
The HF Generator transposes parts of the low-band frequency range to the high-band frequency range covered by SBR as indicated in the bit stream. In the bottom left panel of Figure 7 the spectrum of the transposed intermediate signal in combination with the low-band signal is displayed. This is how the output would look if no envelope adjustment of the recreated high-band would be performed.
The envelope adjuster adjusts the spectral envelope of the recreated high-band signal according to the envelope data and time/frequency grid that was transmitted in the bit stream. Additionally, noise and sinusoid components are added as signaled in the bit stream. The output from the SBR decoder after envelope adjustment is depicted in the bottom right panel of Figure 7. In the following the decoding steps are examined in more detail.
QMF Analysis
The time-domain audio signal, supplied by the core decoder and usually sampled at half the frequency of the original signal, is first filtered in the analysis QMF bank. The filter bank splits the time-domain signal into 32 subband signals. For every 32 time-domain samples, the filter bank produces 32 complex-valued subband samples and is thus over-sampled by a factor of two compared to a regular real-valued QMF bank. The oversampling enables significant reduction of impairments emerging from modifications of subband signals. The oversampling is accomplished through extension of a cosine modulated filter bank with an imaginary sine modulated part, forming a complex-exponential modulated filter bank. In a conventional cosine modulated filter bank the analysis and synthesis filters
and
are cosine modulated versions of a symmetric low-pass prototype filter
as
where
,
is the number of channels and
, where
is the prototype filter order. Figure 8 depicts a simplified block scheme for the implementation of a cosine modulated filter bank. For complex modulation both filters are obtained from
The terms containing
(terms needed for aliasing cancellation) present in the traditional cosine modulated filter bank omitted because of the complex-valued representation [14]. In Figure 9 the corresponding block scheme for a complex-valued filter bank implementation is outlined. The complex-exponential modulation creates complex-valued subband signals that can be interpreted as the analytic versions of the signals obtained from the real part of the filter bank. This feature provides a subband representation suitable for various modifications, and also an inherent measure of the instantaneous energy for the subband signals [14]. The prototype filter used for HE-AAC is of order 640 (
) and gives a reconstruction error of −65 dB.
HF Generation
The complex-valued subband signals obtained from the filter bank are processed in the high-frequency generation unit to obtain a set of high-band subband signals. The generation is performed by selecting low-band subband signals, according to specific rules, which are mirrored or copied to the high-band subband channels. The patches of QMF subband to be copied, their source range and target range, are derived from information on the borders of the SBR range, as indicated by the bit stream. The algorithm generating the patch structure has the following objectives.
(i)The patches should cover the frequency range up to 16 kHz with as few patches as possible, without using the QMF subband lowest in frequency (i.e., the subband including DC) in any patch.
(ii)If several patches constitute the high-band, a patch covering a lower frequency range should have a wider or equal bandwidth compared to a patch covering a higher frequency range. The motivation is that for lower frequencies the human hearing is more sensitive, and therefore patches with wide bandwidth are preferred for lower frequencies in order to move any potential discontinuity between the first and the second patch as high up in frequency as possible.
(iii)The source frequency range for the patches should be as high up in frequency as possible.
Creating the high-band in this way has several advantages and is the reason why SBR can be referred to as a semi-, or quasi-, parametric method. Although the high-band is synthetically generated and shaped by the SBR bit-stream data, the characteristics of the high-band are inherited from the low-band, and, which is the most important aspect, so is the temporal structure of the high-band. This makes the corrections of the high-band, in order to resemble the original, much more likely to succeed in the subsequent processing steps.
With the above in mind, the characteristics of the low-band and the high-band still vary for different audio signals. For example, the tonality is usually more pronounced in the low-band than in the high-band. Therefore, inverse filtering is applied to the generated high-band subband signals. The filtering is accomplished by in-band filtering of the complex-valued signals using adaptive low-order complex-valued FIR filters. The filter coefficients are determined through an analysis of the low-band in combination with control signals extracted from the SBR data stream. A second-order linear predictor is used to estimate the spectral whitening filter using the covariance method. The amount of inverse filtering is controlled by a chirp-factor given from the bit stream. Hence, the HF-generated signal
for QMF subband
and time slot
in the high-band can be defined according to
where
and
are given by the prediction error filter estimated for the low-band subband
, and where
is the chirp factor (between 0 and 1) controlled by the bit stream.
In Figure 10 an example of patching and inverse filtering is given. In the top panel of the figure, a (power) spectrum of the low-band signal is displayed, and the maximum source region for the patching is indicated. For all subbands within this region, prediction error filters are estimated as outlined above. The source range in the low-band is patched, in this example, to region A and B. The frequency plot of the patched signals in these regions are given in the lower panel of Figure 10. Here three inverse filtering regions are also indicated by 1, 2, and 3. The applied inverse filtering level is the same within these regions and its parameters are contained in the bit stream.
Given that the subband signals are patched from the low-frequency region to region A and B in Figure 10, so are the prediction error filter coefficients for the low-frequency region. Thus, the suitable prediction error filter coefficients are available for all subbands within region A and B. Hence, for all the QMF subbands within the region 1 in Figure 10 an inverse filtering is done within each subband, given the corresponding prediction error filter estimated on the corresponding low-band subband samples and the chirp factor signaled in the bit stream for the specific region.
It should be noted that all the processing done in the HF Generation module is done frame-based on a time segment indicated by the outer borders of the SBR frame.
The generated high-band signals are subsequently fed to the envelope adjusting unit.
Envelope Adjustment
The most important, and also the largest part of the SBR data stream, is the spectrotemporal envelope representation of the high-band. This envelope representation is used to adjust the energy of the generated high-band subband signals. The envelope adjusting unit first performs an energy estimate of the high-band signals. An accurate estimate is possible because of the complex-valued subband signal representation. The resulting energy samples are subsequently averaged within segments according to control signals from the data stream. This averaging produces the estimated envelope samples. Based on the estimated envelope and the envelope representation extracted from the data stream, the energy of the high-band subband samples in the respective segments are adjusted.
As previously outlined sinusoids present in the original high-band signal that have no corresponding sinusoid in the generated high-band are synthesized in the decoder, and random white noise is added to the high-band signal to compensate for diverging tonal-to-noise ratios of the high-band and low-band.
A noise floor level
is used to derive the level of noise to be added to the recreated high-band signal, it is defined as the energy ratio between the HF-generated (by means of patching in the HF generator) signal energy and the noise signal energy of the final output signal.
Given the calculated gain values, a limiting procedure is applied. This is designed to avoid the need to excessively high-gain values due to large differences in the transposed signal energy and the reference energy given by the original input signal. The limiter is operative to limit high narrowband gain values while ensuring that the correct wide-band energy is maintained.
QMF Synthesis
The generated high-band signals and the delay-compensated (resulting from the HF generation process) low-band signals are finally supplied to the 64-channel synthesis filter bank, which usually operates at the sampling frequency of the original signal. The synthesis filter bank is just like the analysis filter bank complex-valued, however the imaginary part of the output signal is discarded. Thus, the filter bank generates a real-valued full bandwidth output signal having twice the sampling frequency of the core coder signal.
2.2.4. Other Aspects
Low Power SBR
The SBR tool as outlined in the previous sections is defined in two versions: a High Quality Version and a Low-Power version. The main difference is that the Low-Power version utilizes real-valued QMF filter banks, while the High Quality version utilizes complex-valued filter banks. In order to make the SBR Tool work in the real-valued domain, additional tools are included that strive to minimize the introduction of aliasing in the SBR processing. The main feature is an aliasing detection algorithm that identifies adjacent QMF subbands with strong tonal components in the overlapping range. The detection is done by studying the reflection coefficient of a first-order in-band linear predictor. By observing the signs of the reflection coefficients for adjacent subbands, the subbands prone to introduce aliasing can be identified. For the identified subbands restrictions are put on how much the gain adjustment is allowed to vary between the two subbands.
The following text and figures provide an example of low-power SBR. Envelope adjustment in a real-valued QMF filter bank is displayed in Figure 11.
The upper panel of the Figure 11 illustrates a high-resolution frequency analysis of the input signal superimposed on a stylized visualization of the QMF frequency response. In the middle panel the gain values to be applied on every subband are displayed. As can be seen these vary from subband to subband. In the bottom panel the high-resolution frequency analysis is again displayed, albeit this time after application of the gain values. As can be observed from the figure, aliasing is introduced.
Figure 12 demonstrates aliasing detection and aliasing reduction. This figure is very similar to Figure 11 except for a new panel with "channel signs." These signs are derived from the reflection coefficients of a first-order predictor, where
and where
is given by the prediction error filter
obtained by in-band linear prediction of the subband samples, and
indicates the subband (indexed from zero). Given the definition of the signs and certain relations between the signs of adjacent subbands, the reduction of aliasing can be established by modifying the gain values in the gain vector. For adjacent subbands where the lower subband (in frequency) has a positive sign, and the higher subband (in frequency) has a negative sign, the gain values must be calculated dependently. For all other situations the gain values for the adjacent subbands can be calculated independently. As can be seen from the bottom panel of Figure 12, the use of this algorithm avoids aliasing.
Downsampled SBR
It has been made clear in the previous sections that the combination of AAC and SBR is a dual-rate system. This means that the sampling rate of the output signal from the HE-AAC decoder will always be twice that of the sampling rate of the underlying AAC decoder. Hence, for a normal operation point the AAC will operate at 24 kHz, while the SBR Tool operates at 48 kHz. The dual-rate operation is evident from Figure 13.
For some situations it may be desirable to have an output sampling frequency that is the same as that of the core coder (AAC). One reason is complexity, since for some scenarios, a lower sampling rate output may be desired due to the costs of having D/A converters supporting high sampling rates. This is achieved by operating the SBR Tool in a downsampled mode. When the HE-AAC decoder is operated in the downsampled mode, the synthesis filter bank at the final stage of the SBR decoder is modified. The 64 band QMF synthesis filter bank is replaced by a 32 band QMF synthesis filter bank processing only the lower half of the spectrum of the combined AAC and SBR signal. The result is equivalent to operating the decoder in the normal dual-rate decoder, followed by LP-filtering and 1/2 rate down-sampling. Apart from the modification of the synthesis filter bank, the remainder of the HE-AAC decoder is left unchanged. This is displayed in Figure 14.
Apart from the application where a low sampling rate output is desired due to complexity constraints, the downsampled SBR mode also serves another purpose. When scaling towards higher bit rates it may be desirable to run the AAC core coder at a higher sampling frequency, for example, 44.1 kHz. Hence, an SBR encoder can operate on a 44.1 kHz input signal, and upsample the signal in the encoder to 88.2 kHz, thus enabling the dual-rate mode. The SBR decoder subsequently operates on the 44.1/88.2 kHz dual-rate signal, but does so in a downsampled mode, ensuring that the output signal has the 44.1 kHz sampling rate equal to that of the original input signal. More information on sampling rate modes in High Efficiency AAC is given in [16].
Scalable Systems
For certain applications scalable systems may be of interest. Scalable in this context refers to a data stream where different information is put in different layers of the stream and, depending on reception conditions, a decoder can choose how many of the layers it decodes. As an example, a base layer or lower layers in the stream may have a higher amount of error protection, while higher layers may not, hence requiring better reception conditions in order to allow decoding. Examples of these kinds of scalable systems using SBR include Digital Radio Mondiale (DRM). The use of SBR as an additional bandwidth extension tool for an underlying core coder lends itself very well to scalable systems. One common way of achieving scalability with waveform codecs is to vary the audio bandwidth depending on the available layers. If only the core layer is available, the output signal has a reduced bandwidth, and when additional layers are available the bandwidth of the output signal is increased. The downside of this approach is that it can be highly annoying to listen to a signal with varying audio bandwidth. Since SBR is a bandwidth extension tool it is the perfect solution for this problem. When SBR is combined with a scalable core codec such as AAC Scalable, the SBR information is put in the core layer. The SBR bit stream comprises data that enables to reconstruct the maximum amount of SBR bandwidth used for any of the layers in the stream. Hence, even if the only the lowest layer is available, the output signal will have full audio bandwidth. If higher layers are available, parts of the SBR frequency range will be replaced by waveform coded segments obtained from decoding the enhancement layer with the underlying core coder. This process is illustrated in Figure 15.
In the top left panel of Figure 15 a spectrum of the two AAC layers (the core layer
and the enhancement layer
) is given. In the top right of the figure, the frequency range that can be recreated using the SBR data stored in the core layer is displayed, and a spectrum of the SBR signal available for this range is shown. It is clear that the SBR information covers the widest frequency range required for any combination of layers. In the bottom left figure, the bandwidth relation of the core coder and the SBR tool is illustrated for the scenario where only the core layer is available. In the bottom right figure, the bandwidth relation of the core coder and the SBR tool is illustrated for the scenario where the core layer and the first layer is available. As can be seen from the bottom right picture, the lowest part of the SBR range has been replaced by the core coder.
Apart from supporting bandwidth scalable core coders, the SBR tool can also work in conjunction with mono to stereo scalability. This means that the SBR data can be divided into two groups, one group representing the general SBR data and level information of the one or two channels, and the other group representing the stereo information. If the core coder employs mono/stereo scalability, that is, the base layer contains the mono signal, and the enhancement layer contains the stereo information, the SBR decoder can apply only the monorelevant SBR data to a mono signal and omit the stereo specific parts if only a monocore coder signal is available. If the enhancement layer is decoded, and the core coder outputs a stereo signal, the SBR tool operates on the stereo signal as normal using the complete SBR data in the stream.
MPEG-2 Bit Streams
Although the focus of the present paper is on the MPEG-4 version of SBR, it should be noted that the exact same tool is standardized in MPEG-2 as well. Hence, the MPEG-2 AAC and SBR combination is also defined. This is important for certain applications relying on MPEG-2 technology while still wanting to achieve state-of-the-art compression by using SBR in combination with AAC.
2.3. Listening Tests
At the end of the two-year standardization process a rigorous verification test was performed. Two types of tests were done, a (MUlti Stimulus test with Hidden Reference and Anchor) MUSHRA test [17] and a (Comparative Mean Opinion Score) CMOS test [18]. The MUSHRA test compared the performance of MPEG-4 HE-AAC with that of MPEG-4 AAC when coding mono and stereo signals at bit rates in the range 24 kbps per channel, while the CMOS test was used to show the difference between High Quality SBR and Low Power SBR. Two test sets were selected, one for mono testing, and one for stereo testing. The items were selected from 50 potential candidates by a selection panel identifying ten items considered critical for all of the systems under test.
The codecs under test for the verification tests are outlined in Table 1. The listening tests were performed at France Télécom, T-Systems Nova, Panasonic, NEC, and Coding Technologies.
Table 1 Codecs under test. The listening test results are presented in Figures 16 and 17. From the listening tests it is clear that the SBR enhanced AAC technology (High Efficiency AAC Profile) performs better than the MPEG-4 AAC Profile when the latter is operating at a 25% higher bit rate (i.e., 30 versus 24 kbps for mono, and 60 versus 48 kbps for stereo).
The SBR technology in combination with AAC as standardized in MPEG under the name High Efficiency AAC (also known as aacPlus) offers a substantial improvement in compression efficiency compared to previous state-of-the-art codecs. It is the first audio codec to offer full bandwidth audio at good quality at low bit-rate. This makes it the ideal codec (and enabler) for low bit-rate applications such as Digital Radio Mondiale and streaming to mobile phones.