Low delay error resilience algorithm for H.265|HEVC video transmission

Alfaqheri, Taha T.; Sadka, Abdul Hamid

doi:10.1007/s11554-019-00923-5

Low delay error resilience algorithm for H.265|HEVC video transmission

Special Issue Paper
Open access
Published: 20 November 2019

Volume 17, pages 2047–2063, (2020)
Cite this article

Download PDF

You have full access to this open access article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Low delay error resilience algorithm for H.265|HEVC video transmission

Download PDF

Taha T. Alfaqheri¹ &
Abdul Hamid Sadka¹

4697 Accesses
8 Citations
Explore all metrics

Abstract

Transmission of high-resolution compressed video on unreliable transmission channels with time-varying characteristics such as wireless channels can adversely affect the decoded visual quality at the decoder side. This task becomes more challenging when the video codec computational complexity is an essential factor for low delay video transmission. High-efficiency video coding (H.265|HEVC) standard is the most recent video coding standard produced by ITU-T and ISO/IEC organisations. In this paper, a robust error resilience algorithm is proposed to reduce the impact of erroneous H.265|HEVC bitstream on the perceptual video quality at the decoder side. The proposed work takes into consideration the compatibility of the algorithm implementations with and without feedback channel update. The proposed work identifies and locates the frame’s most sensitive areas to errors and encodes them in intra mode. The intra-refresh map is generated at the encoder by utilising a grey projection method. The conducted experimental work includes testing the codec performance with the proposed work in error-free and error-prone conditions. The simulation results demonstrate that the proposed algorithm works effectively at high packet loss rates. These results come at the cost of a slight increase in the encoding bit rate overhead and computational processing time compared with the default HEVC HM16 reference software.

Video steganography: recent advances and challenges

Article Open access 04 April 2023

HDRC: a subjective quality assessment database for compressed high dynamic range image

Article Open access 06 May 2024

Study and investigation of video steganography over uncompressed and compressed domain: a comprehensive review

Article 27 March 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The advancement in the manufacturing of high-performance electronic devices and their display technologies, such as smart mobile phones, tablets, and televisions devices, resulted in increased demands of ultrahigh-resolution video content delivery with low processing delay. Furthermore, nowadays, most of commercially available displays support spatial resolution up to 4 K (7668 × 4320) display resolution [1]. Such high-resolution display capabilities can consume most of the available bandwidth in conventional networks. Hence, for delivering high-quality video effectively, it is necessary to use efficient video coding tools to support high-resolution video applications. The most recent video coding standard is the high-efficiency video coding (H.265|HEVC) standard [2].

The main target of developing H.265|HEVC standard was to double the coding efficiency of MPEG-4 part 10, advanced video coding (H.264|AVC) standard. This means keeping the same video quality at half encoding bit rate [3]. Furthermore, we extend the H.264|AVC applications, which are already supported by H.264|AVC coding standard to include more efficient video coding tools for high-resolution video and parallel processing applications [4].

However, the increase in video coding efficiency comes at the expense of an increase in the amount of inter-prediction and motion compensation activity in the encoding process, which mainly involves increasing the coding efficiency by increasing the number of temporal and spatial redundant information [5]. Important to realise on the negative side, this high coding efficiency comes at the cost of high computation complexity requirements as well.

In other words, a highly compressed bitstream means more encoded redundant video information. Consequently, compressed video content becomes more sensitive to channel bit errors than encoded bitstream with previous standards.

Therefore, transmitting a highly compressed video bitstream in unreliable transmission channel leads to degrading the perceived visual quality at the video decoder or if errors hit the sensitive encoded data such as slice header data leads to failing in the decoding process for the whole video sequence [6]. Figure 1 demonstrates a video transmission issue on the received video quality when using an unreliable wireless channel.

The H.265|HEVC codec system is a hybrid video coding type, which means the compression techniques depends on removing the temporal information first and then spatial information.

Thus, the injected errors on the transmitted video bitstream are propagated spatially and temporally on the perceived video quality. Consequently, injecting single-bit error on the encoded bitstream can lead to severe visual quality degradation.

In the development of H.265/HEVC, the primary target of both standardisation organisations (i.e. ITU-T and ISO/IEC) was to increase the bit rate saving to more than 50% compared to the previous H.264|AVC video coding standard [7, 8]. The high bit rate saving was achieved by adding new coding features to support more efficient coding and make the video codec more friendly to parallel processing applications [7]. However, developing error resilience tools for video coding standard is out of the video standard scope [6].

There are two main error control categories to reduce the effects of transmission errors on perceived visual quality. The first one employs traditional data error control methods which use lossless channel coding tools in data recovery such as Automatic Retransmission reQuest (ARQ) schemes. However, implementing such error recovery tools in compressed video delivery is less efficient because the nature of the compressed bitstream is of a variable-length code, which makes error recovery process of corrupted video contents a very challenging task.

The second category is video error control techniques implemented within the video coding system. In this case, to minimise the effects of the transmission errors efficiently at the video decoder side, the video error control can be divided into three categories: forward error recovery, error concealment, and interactive error recovery.

In the forward error recovery approach, the video encoder takes the full responsibility to insert redundant error resilience codes and makes the coded bitstream more robust against errors.

The second error control approach is called the error concealment approach, in which the decoder is responsible to spatially and temporally conceal the errors. The spatial error concealment employs correctly received information using interpolation techniques on the surrounded macroblocks. In case the whole macroblock information is lost, the simplest and most common concealment technique consists of replacing the lost macroblock by the spatially corresponding macroblock in the previously decoded frame. The other error concealment approach at the decoder is called temporal error concealment techniques, which extrapolate the correctly received motion vectors of the current and previous frames [9].

The third video error control approach uses the joint encoder–decoder error resilience techniques. In this approach, the video encoder and decoder work in an interactive way to reduce the effects of the channel errors on the perceived visual quality. In this category, a backward feedback channel is used from the decoder to the encoder sides to keep the encoder updated.

All lossy error resilience techniques take advantage of the nature of the human visual system to tolerate the distorted visual quality. The design of video error resilience tools depends on the employed video coding tools.

The developed error resilience tools should keep a balance between error robustness and video encoding efficiency, whilst aiming to maintain the video quality of service under error-prone conditions.

Developing error resilience tools at the encoder is one of the most efficient solutions to mitigate the effects of erroneous errors in real-time video transmission.

This paper presents an adaptive error resilience algorithm to increase perceived visual quality using H.265|HEVC coding system under error-prone conditions.

The proposed work is based on forward error recovery method (without using a feedback channel) and interactive error recovery (using feedback update from the decoder to the encoder sides).

The intra-encoding mode is used at slice level instead of intra-picture to support low delay delivery applications and keep a balance between coding efficiency and error resiliency performance. Our proposed work considers during the design stage the bit rate overhead, time delay of computational complexity, and video start-up time delay. The evaluation work is further extended to investigate the effect of LTE network traffic load on perceived visual quality at the decoder.

This paper is organised as follows. An overview of relevant literature that supports the proposed error resilience algorithm is presented in Sect. 2. Section 3 describes the proposed adaptive slice algorithm. The evaluation process and encoding configuration for testing the proposed algorithm are reported in Sect. 4. Section 5 discusses the obtained objective and frame by frame quality assessment results, coupled with computational complexity, and processing time delay analysis. Finally, Sect. 6 presents the paper’s conclusions and future work recommendations.

2 Background

2.1 Error resilience using region of interest extraction

This section presents a literature review of related work of the state-of-the-art video error resilience algorithms for low delay video delivery applications.

One of the main video encoding requirements at low delay or conversational video applications is to reduce the number of reference frames to a minimum level. These low delay video processing requirements can be achieved by reducing the number of previously used future reference frames at the motion process.

The first conducted work to select a group of MBs to be encoded with intra-refresh in H.264|AVC video coding was by Hoaming Chen et al. [10]. Their proposed error resilience coding scheme is based on adaptive intra-refresh. The error resilience coding scheme selects the important regions depending on the received network conditions, i.e. different packet loss rates (PLRs) and video motion information. To keep coding efficiency at acceptable level, the refresh cycle sizes range at (4, 8, 16). A selected area depends on the PLR value in the feedback channel. When receiving update of low PLR values, i.e. (10⁻⁴), the cycle size will be selected with smaller values, i.e. (4). In contrast to high error-prone conditions such as PLR with (10⁻¹), the refreshing cycle sizes will be increased to obtain a balance between error resilience and H.264|AVC coding efficiency performance.

Two research studies have been conducted in the recent past to improve error resilience of H.265|HEVC coding standard which considers the moving areas as a region of interest. In 2015, the authors considered the moving area of the encoded video as a region of interest (ROI) and these extracted areas should be protected against transmission errors. Their proposed error resilience algorithm is based on generating activity map at frame level. The moving regions are segmented into blocks and, based on the maximum depth level of CTUs, they calculate and protect the moving region parts at the encoder [5]. In 2016, they improved their region of interest algorithm by utilising a rate control in a region of interest extraction process [6]. The obtained objective quality results presented by the authors show that a significant improvement of 0.88 dB was achieved compared with H.265|HEVC reference selection method for a PLR of 5% [6]. The moving region extraction methodologies of these studies are based on a work proposed by Hai-Miao Hu et al. [5]. This work (ROI-based rate control scheme) aimed to improve the coding efficiency of H.264/AVC coding standard by allocating more encoding bit budget to moving regions and improve the perceived quality for ROI area at the cost of non-ROI quality.

However, using the moving region as a region of interest does not provide precise selectivity of the selected moving areas because of the flexibility of block partitioning in the H.265|HEVC encoding process.

Under these circumstances, an adaptive slice encoding (ASE) algorithm is developed and proposed based on an understanding of the previous related work.

One of the most challenging tasks in using region of interest extraction process in H.265|HEVC coding system is how to keep computation complexity to a minimum level. Another challenge is how to implement accurate ROI in low delay transmission constraint such as conversational video applications.

Rate control tool in video coding is responsible for calculating he best trade-off between image quality and the required bit rate. Most video coding systems, in general, are lossy systems, so it is important to keep a bit rate saving at the highest level with maintaining the highest visual quality level-targeted video applications.

There is a great deal of research interest in proposing efficient algorithms for moving region extraction [11]. The moving regions extraction process should take into consideration a computation analysis. Such computation analysis is based on the human visual system analysis. For example, it includes frame texture details, skin colour, and object motion speed. These requirements make rate control adaptation more challenging in real-time processing applications.

The moving extraction process starts after the motion estimation stage, in which quantisation parameters are adjusted accordingly. There is a dilemma between region segmentation and motion estimation priorities. Quantisation parameters (QP) need to be adjusted before the rate-distortion optimisation (RDO) process start. On the other hand, motion information is generated after RDO and QP are generated before the RDO process. However, in the moving region extraction process, the QP needs to be adjusted based on motion information as an ROI. To solve the moving region extraction and QP adjustments dilemma, researchers in [12] proposed the motion differencing method. In this method, the motion vector of each current macroblock is compared with the macroblock of the previous frame. However, this method does not give acceptable results when dealing with fast-moving objects. In [13], the researchers proposed a method to distinguish the importance of each macroblock using the mean absolute difference (MAD) method between the current and previous macroblock (MB) [14]. Also, the researchers in [11] and [15] use the same region segmentation methodology which gives very accurate results when the background of the moving objects is stable. However, a slight movement in the object background area (e.g. temporal changing in lighting conditions or camera zooming) can adversely affect extracting the region of interest areas [16].

2.2 H.265|HEVC encoding with feedback channel

In general, low delay video application such as conference video application requires special attention in reference picture management. In unreliable networks, the encoder with feedback capabilities usually receives acknowledgement signal from the decoder with a delay channel (in milliseconds). Basically, in video codec systems, there are two types of acknowledgement signals; at slice level, these signals are transmitted from the decoder to the encoder [17]. The first type is called a positive acknowledgement (ACK) signal which is responsible for sending acknowledgement signal to the encoder, indicating the correctly received slice. The reference frame is chosen depending on the ACK signal update received from the decoder. If the encoder did not receive an ACK signal at a predefined interval, the encoder assumes that an error occurred and an intra-coding has to be applied.

The second feedback signal type sends a negative acknowledgement (NACK) by the decoder to notify the encoder that an error or loss has occurred in the currently decoded bitstream. In addition to the acknowledgement of the correctly received slice, the addresses of the corrupted parts are signalled back to the encoder. On the other hand, the reference picture buffer is updated each time accordingly on receiving the acknowledgement signal. At the same time, the decoder can apply an error concealment technique to reduce temporal–spatial error propagation in case error occurred.

2.3 H.265|HEVC video coding standard

The H.265|HEVC video coding standard is the result of continuous hard work aimed at enhancing the coding efficiency of the previous video codec standards [18].

Several changes have been added to the H.265|HEVC coding tools such as video coding control signals and bitstream structure [4]. A new coding unit design concept of flexible coding unit sizes has been added to the standard. A basic unit in H.265|HEVC standard is called a coding tree unit (CTU). The encoding size of CTU ranges (8 × 8 − 64 × 64) block sizes, which help to increase the coding efficiency with video resolution higher than high definition (HD) resolution [7]. All of the mentioned previous points make H.265|HEVC coding standard an attractive candidate to meet the main targets of wireless and multimedia applications for high visual quality requirements.

2.3.1 Slice structure in H.265|HEVC system

As discussed earlier, each frame can be represented by one or more slices and each slice contains a group dependent and independent slice segments. As can be seen in Fig. 2, a first slice segment is encoded independently from previously encoded slices. The remaining slice segments are encoded depending on the first slice segment with the same encoding mode of coding units.

In lossy packet networks, the packetized slices should not exceed the maximum transmission unit (MTU) size. On the other hand, the slicing structure helps decrease the transmitted packet length, and this is achieved by decreasing the number of encoded CTUs in each slice; hence, the length of transmitted packets becomes shorter, as a result, reducing error propagation at the decoder [19].

One main usage of the slicing segmentation concept in H.265|HEVC coding system is that it can reduce the effect of errors in corrupted video samples. Only the affected slice will be discarded or recovered from another correctly received slice segments. Moreover, the slicing segment concept helps to expedite the resynchronisation process using the correctly received independent slice segment header.

However, increasing the number of encoded slices adversely affects the coding efficiency. For example, at slice boundaries, the intra- and motion predictions are not allowed, which reduces the spatial frame prediction. Furthermore, it is a slice structure increased overhead.

There are three slice types used in H.265|HEVC coding system: intra mode slice, uni-prediction slice (P-slice), and bi-prediction slice (B-slice). Each slice header contains a complete reference picture list update. A reference picture set concept for H.265|HEVC coding standard will be further explained in detail later.

A slice header contains shared information between slice segments. This shared information differs from slice to slice at frame level. A reference selection list is updated in each slice header and signals explicitly. The slice header information of each frame is stored in picture parameter set (PPS). The PPS data is stored in a sequence parameter set (SPS). Finally, a video parameter set (VPS) contains shared information of the PPS and SPS. Further demonstration of the interconnection of the three parameters set is shown in Fig. 3.

3 Proposed work

The aim of the proposed algorithm is to reduce error propagation at slice level. An adaptive encoding algorithm is introduced at the video encoder to encode and protect the most active slices. A coded video sequence is represented as a series of access units (AUs) in sequential order with shared sequence parameter. Each access unit is represented by a group of NAL units. A prefix code of access unit delimiter is used to identify the start of new AU in NAL unit bitstream. A primary encoded AU contains a group of VCL NAL units, which includes one or multiple slices. These slices contain video sample data. A redundant coded picture is encoded as additional VCL NAL units. These additional VCL samples are used for error recovery when the primary video samples are lost or corrupted. In this case, the decoder will parse the contents of the correctly received data to recover the corrupted video samples. In error-free conditions, the decoder will discard the additional redundant video data. At the end of each video coded sequence, a non-VAL NAL unit is encoded to indicate the end of the NAL units bitstream.

The concept of the proposed algorithm is illustrated in (Fig. 4). For instance, suppose the ship in the video sequence is the most important area that requires protection against errors. This algorithm will extract the active slice, i.e. the ship, and the active slices are then encoded in intra mode.

As discussed earlier in the slice structure section, the independent slice header segment address identifies the exact location of a slice segment at picture level by using count number (ctb) identification in fixed scan order. The objective of the ASE algorithm is to reduce the temporal error propagation by encoding the most sensitive and important coding units (CUs) using intra-coding mode at slice level. In the following subsections, further details are provided as to how the activation map is generated to indicate the active slices that should then be encoded with intra mode. A rate control mechanism for the subdivided frame regions is presented as well.

3.1 Important area protection

The proposed algorithm is described as follows, At the first stage, a slice level differencing method is implemented on the current and previous frames. The active area consists of a change in content with new texture information. A moving slice is a highly important slice that must be protected to reduce error propagation. Each slice is mapped with its greyscale representation. The current and previous encoded slices are mapped into a row and column projection curves with (${\text{CV}}_{n}^{x}$) and (${\text{CV}}_{n}^{y}$), respectively. Each slice in the current frame is projected into a one-dimensional vector.

The greyscale representation for the slice row L_n (x) and slice column L_n (y) can be calculated in Eqs. (1) and (2), respectively:

$$L_{n} \left( x \right) = \sum\nolimits_{x} {L\left( {x,y} \right)} ,$$

(1)

$$L_{n} \left( y \right)\; = \;\sum\nolimits_{y} {L\left( {x,y} \right)} ,$$

(2)

where L_n are the greyscale values for frame number (P). The average values for L_n (x) and L_n (y) are calculated based on the number of the calculated grey samples rows (r) and columns (c), respectively, as defined in Eqs. (3) and (4):

$$L_{\text{avn}} \left( x \right) = \frac{{\mathop \sum \nolimits_{x} L_{n} \left( x \right)}}{r},$$

(3)

$$L_{\text{avn}} \left( y \right) = \frac{{\mathop \sum \nolimits_{y} L_{n} \left( y \right)}}{c}.$$

(4)

Then, the averaged 1-D projected curves are normalised using Eqs. (5) and (6):

$${\text{CV}}_{n}^{x} = L_{n} \left( x \right) - L_{\text{avn}} \left( x \right),$$

(5)

$${\text{CV}}_{n}^{y} = L_{n} \left( y \right) - L_{\text{avn}} \left( y \right),$$

(6)

where (${\text{CV}}_{n}^{x}$) and (${\text{CV}}_{n}^{y}$) represent the 1-D projected curves for the slice number (n).

For better coding efficiency performance, the extraction process uses intra-refresh map which is a calculated greyscale projection method (GPM) [20]. For this reason, the extracted moving objects with unstable object background give more accurate selectivity results. The GPM extraction method is used in image stabilities applications because of its implementation simplicity and moving object selection accuracy [20].

A cross-correlation for the current and previous slice is then calculated [20]. The difference vector ${\text{DV}}_{n} \left( p \right)$ or each slice is then calculated as in Eq. (7):

$${\text{DV}}_{n} \left( p \right) = \frac{1}{256}\sum\nolimits_{{\left( {i,j} \right) \in p}}^{\text{TS}} {\left| {L_{n} \left( {i,j} \right) - L_{n - 1} \left( {i + {\text{CV}}_{n}^{x} , j + {\text{CV}}_{n}^{y} } \right) } \right|} ,$$

(7)

where $L_{n} \left( {i,\;j} \right)$ nd $L_{n - 1} \left( {i,\;j} \right)$ are the luma samples representation for the current (n) and previous slice (n − 1) and (TS) is the total number of encoded slices per current frame.

The searching area to find the maximum cross-correlation of the normalised projection curves between the slices of the current and the previous frames can be calculated as in Eq. (8):

$${\text{Searching area}} = \frac{{{\text{number of}}\; ({\text{CU}}_{\text{level1}}^{p} )\; + \;{\text{number of}}\;({\text{CU}}_{\text{level1}}^{p - 1} )}}{2},$$

(8)

where $({\text{CU}}_{\text{level1}}^{p} )$ and $({\text{CU}}_{\text{level1}}^{p - 1} )$ are the encoded units of (32 × 32) block at coding level 1 for the current and previous frames, respectively. Equation (8) is optimised between the encoding processing delay (an additional computational cost which resulted from the motion estimation calculations) and error resilience performance.

The difference vector ${\text{DV}}_{n} \left( p \right)$ calculations are shown in Fig. 5.

3.2 Subdivision of non-active area

In general, people pay more attention to moving objects in the foreground due to the nature of the human visual system. Additionally, people focus more on the middle area in visualisation. To get the best trade-off between the coding efficiency and perceived quality, a non-active area is further subdivided into a high textured area which contains a high stationary spatial detail, and a passive (or flat) area which includes a fixed background area with lowest texture details. A simple subdivision example is shown in Fig. 6. The decoded video quality is reduced in a gradual way from high important areas to textured and passive areas, respectively.

An adaptive modified grey projection $({\text{AMGP}}_{\text{w}} )$ based on the grey projection method in [20] is implemented to achieve a more accurate selection of motion-active slices.

To find out the relation between the encoded slice location within the frame and adaptive modified grey projection (AMGP) value, different weighting factors ranging 0.1–0.9 were objectively evaluated to obtain the best rate control optimisation with ASE algorithm implementation. The selected weighting factors were achieved during the trial and error optimisation experimental work on modified HM16.06 +ASE encoder. Due to limited space, one selected test result is presented in Fig. 7. The figure presents the obtained results for Akiyo video test sequence encoded with the frame rate (25 fps).

The idea of allocating different weighting factors at this stage is to optimise the proposed algorithm with the video coding efficiency. In the proposed algorithm, the frame content complexity is divided into three main areas. The decision of allocating weighting factor depends on the encoded frame area’s sizes, which is proportional to the frame dimensions.

As the natural human visual system focuses more on the area of the central frame, this means the probability of the active areas to be encoded in the frame area (0.9). Thus, the condition of encoding with a higher weighting factor will be in the central area. A less probability of moving area will be the area between the central and the corner areas (0.6). The corner area will be allocated with the lowest weighting factor (0.1). The predefined weighting factors are selected based on the trial and errors experiments to be optimised with the intra-coding refresh.

A weighting factor is assigned for each frame region according to Eq. (9):

$${\text{AMGP}}\left( {\text{W}} \right) = \left\{ {\begin{array}{*{20}c} {0.9, \quad {\text{if the block location}}\; \le \;{\text{bounding box of centre frame area}}} \\& {0.2, \quad {\text{if the block location}}\; \ge {\text{corners frame area}}} \\& {0.6,\quad {\text{otherwise}}} \\ \end{array} } \right..$$

(9)

The active area extraction process mainly depends on the calculated difference vector ${\text{DV}}_{n} \left( p \right)$ and the weighting factor of the current frame. Finally, the most active slice areas in the current frame (p) are encoded.

The encoding unit in the active map is encoded with intra mode as defined in Eq. (10):

$${\text{AMGP}}_{n} \left( p \right) = \left\{ {\begin{array}{*{20}c} {1,\quad {\text{if AMGP}}_{\text{w}} \times {\text{DV}}_{n} \left( p \right)/{\text{average }}\left[ {{\text{DV}}_{n} \left( p \right)} \right]\; > \;{\text{AMGP}}_{\text{th}} } \\ &{ 0 , \quad {\text{otherwise}}} \\ \end{array} ,} \right.$$

(10)

where ${\text{DV}}_{n} \left( p \right)$ is the calculated difference vector and ${\text{AMGP}}_{\text{w}}$ is the calculated weighting factor for the currently encoded frame.

3.3 Non-active area selection

As discussed earlier, a non-active area for each frame is further divided into two regions according to the content features. A further subdivision region contributes to ensuring a perceived visual quality at frame transition level from high-quality regions (active areas) to lower quality (passive or high flat region areas). Furthermore, it helps to allocate larger weighting factors for active areas and lower bit budget to non-active regions, which results in the assignment of more bits to important frame areas. The extraction process of high textured areas from non-active areas is done using mean absolute difference (MAD) calculations between the current and previous frames. In this work, a (0.35) value is selected as a threshold point for generating high textured map as defined in the following equation, Eq. (11).

$$H_{n} \left( p \right) = \left\{ {\begin{array}{*{20}c} {1,\quad {\text{if}}\; H_{n - 1} \left( p \right)\;{ < }\;{\text{threshold}}} \\& { 0 , \quad {\text{elsewhere }}} \\ \end{array} } \right.,$$

(11)

where $H_{n - 1} \left( p \right)$ refers to macroblock in the previous frame. Then, the remaining map areas are extracted and encoded as the lowest complex areas (refer to passive areas).

At each slice header, a full set of reference picture list is extracted at the decoded picture buffer (DPB). To identify whether the current slice is suitable to be used in the prediction process or not, a reference picture set (RPS) data at the slice header is compared with the referenced pictures at DPB.

For error detection and recovery purposes, a feedback channel from the decoder is used to notify the encoder about the occurred errors. The H.265|HEVC codec use a flag named (used–by–curring–pic–X–flag). The encoder parses the slice header and checks the flag activation [21]. At the decoder side, a slice header RPS is checked against the available reference pictures at the DPB. If there is an update from RPS at the slice header, but there is no update available at DPB, it will consider this slice as not used in the current prediction process. However, if the flag is activated, then the encoded current slice is intended to be used in the prediction process, but there is loss or corruption in the reference pictures at the decoder.

A flowchart illustrating the proposed ASE algorithm with feedback channel implementations is depicted in Fig. 8a.

3.3.1 Error resilience algorithm with feedback update

The proposed error resilience algorithm is further extended to work with receiving feedback update the acknowledgement (ACK) signals to enhance perceived visual quality at the decoder. The H.265|HEVC coding system requires a feedback channel to locate a damaged slice. To get a more accurate error localisation, the segment header information of the corrupted slices is sent back via a feedback channel to the encoder. This header information contains the most recent update about the reference picture list which includes the address of the most recent erroneous slice. A flowchart of the proposed algorithm with feedback channel implementation is shown in Fig. 8b.

3.4 Rate control adaptation of the proposed algorithm

The challenging tasks in the region of interest extraction implementation are how to keep the computation complexity of the extraction process to the minimum level and how to implement an accurate ROI process in low delay transmission.

In this algorithm, the encoder optimises the best trade-off between the number of intra-coded slices per frame and the coding efficiency target. A frame is divided into a passive or flat areas and high texture or complex areas. In HM16 reference software, a lambda rate control is used to optimise the encoding bit rate (bit allocation budget) and video quality (target quantisation parameters) [22]. The encoding bit rate is adjusted based on a target bit rate and picture buffer size for each group of pictures (GOP). Then, the encoder allocates the required encoding bit budget at the LCU level. Depending on the calculated target bit rate, the number of bits per pixel (bpp) is measured depending on the rate distortion Eq. (12):

$$\lambda \; = \;\alpha .{\text{bpp}}^{\beta } ,$$

(12)

where bpp is bit per pixel and $\alpha$ and $\beta$ are predefined parameters values. Once $\lambda$ is calculated, a QP (delta quant) value can be obtained using Eq. (13) and quantisation step size using Eq. (14).

$${\text{QP}}\; = \;4.2\;\ln \lambda \; + \;13.7,$$

(13)

$${\text{Q}}_{ - } {\text{step}}\; = \;1\; + \;2^{(1/6)} .$$

(14)

3.4.1 Network testbed setup

The NS3 is chosen to be installed on Linux operating system. The long-term evolution (LTE) module is embedded within the NS3 environment. The LTE module simulates the core LTE network. To integrate the NS3 simulator embedded with LTE network module with a real physical Ethernet interface, a hardware in loop (HIL) platform in [24] is employed. Each node in NS3/LTE network is connected using carrier-sense multiple access (CSMA) scheme. LTE serving gateway (SGW)/packet data network gateway (PGW) uses a point to point internet connection. The NS3/LTE network simulator is configured with network configuration parameters reported in Table 1.

Table 1 LTE network parameters

Full size table

3.4.2 Hardware and software testbed setup

Three PCs are used in the experimental work. Two PCs are implemented as PC video server (Dell T410 Power Edge server.

CPU: Quad-core 2.35 GHz, RAM: 16 GB, operating system: Microsoft Windows 10 and video receiver (Dell XPS, CPU: Intel Core i5-7200 @2.5 GHz, RAM: 8 GB, operating system: Microsoft Windows 10). Open-source network simulator version 3 (NS3) is installed on separated PC (HP Compac 8200, CPU: Core i5-2500 s, RAM: 8 GB, operating system: Ubuntu server 15.04). The open-source cross-platform multimedia player (VLC) is used to stream the video test sequences at the sender side, and at the receiver used to visualise the perceived visual quality.

3.4.3 Error-prone environment setup

For evaluation the performance of the proposed algorithm under error-prone conditions, various packet loss rates are injected into encoded video bitstreams. A modified version of NAL unit loss software produced in [25] is used to generate different packet rates to be injected on the encoded bitstream. It is utilised to support the NAL unit structure of H.265|HEVC coding standard.

4 Experimental setup

In this section, the hardware and software experimental setup and encoding video configurations are presented. The main objectives are to test the performance of the proposed algorithm with different error-prone conditions, encoding bit rates, in addition to the computational complexity of the modified video reference HM16 software.

Pre-selected video test sequences are chosen in the experiments. The input video test sequences are in raw format (YUV) with colour space format 4:2:0. They are classified into two groups; the classification is done according to their texture video information and motion activity speed. Class A: represents video sequences with low texture details and slow-motion activity. Class B: represents video sequences with high texture details and high-motion activity. The video sequences characteristics are presented in Table 2.

Table 2 Characteristics of the test video sequence [23]

Full size table

5 Experimental results and discussion

For each video test sequence, 30 test times with different seeds are tested and the averaged Y_PSNR values are recorded. The process involves comparing the ASE algorithm performance with default reference software (HM16.06), region-based error-resilient scheme (ROI) algorithm in [26], and improved region of interest (IROI) algorithm in [27]. All video sequences were randomly injected with packet loss rate PLRs ranging 2–18% using packet loss rate generator software and encoding setting as in Table 3. The experimental work results are shown in Fig. 9a.

Table 3 Encoding parameters of modified HM 16.06

Full size table

The evaluation includes both error-free and error-prone condition. In error-prone condition, video test sequences are injected with random bit error generated with BER = (1 × 10⁻⁵). All the tested video sequences are in CIF resolution.

Further evaluation tests are performed to measure the effectiveness of the proposed ASE algorithm under erroneous packet loss conditions with different encoding bit rates. The objective quality evaluation for ASE algorithm with different encoding bit rates is shown in Fig. 9b.

Table 4 reports the obtained Y-PSNR gain with and without feedback update for the three video sequences (coastguard, hall, and mobile sequences) in CIF resolution. Notably, the average (Y-PSNR) of ASE algorithm with different PLRs has improved by (4.521 db), (2.283 db), and (1.076 db) compared to HM16 for H.265|HEVC codec, ROI, and IROI algorithms, respectively.

Table 4 ASE performance comparison in terms of Y-PSNR (db)

Full size table

In error-free conditions (PLR = 0%), Y-PSNR of the ASE algorithm is reduced by (− 1.096 db; HM16), (− 0.605 db; ROI), and (− 0.318 db; IROI).

The obtained test results in Table 4 indicate that the performance of the proposed algorithm is less effective in error-free conditions. The complex processing part in the proposed algorithm is in the frame partitioning depending on complexity of the frame content. These three transitions areas contribute to obtaining the best balance between coding deficiency and error resilience performance.

5.1 Frame by frame video quality assessment

The proposed ASE algorithm is further evaluated using subjective quality assessment. Pre-selected frames are extracted from the raw video test sequence for video test assessment. The video sequence named Coastguard sequence with CIF resolution was encoded at 30 fps. Packet errors at a mean rate of 2% are injected into the video test sequence. The obtained results are shown in Fig. 10, using frame by frame visual quality assessment. It can be seen from the decoded frames that ASE algorithm produces a better perceptual visual quality compared to the reference error resilience algorithms.

5.2 Network congestion and time delay

The proposed algorithm is further evaluated by streaming the encoded video in LTE network in terms of network load. The number of the end users is ranging from 10 to 30 users per base station. The experimental work is conducted depending on the settings described in the network testbed setup section. A frame copy concealment is used at the decoder to avoid failure in the decoding process. The two main objectives are targeted in this experimental work. The first one is to show the effect of a different number of LTE network clients over a shared bandwidth, on objective decoded video quality.

The second one is the start-up time, as it is a critical factor for meeting the user’s quality of experience requirements [28].

The start-up time is defined as the required time that the decoder buffer needs to process before displaying the decoded pictures.

The authors in Ref. [28] recommended that the start-up video delay in video streaming applications should not be more than 2 s. In this experimental work, we chose 1000 ms and 500 ms as realistic use cases.

Based on the network configuration parameters and the network testbed which was described earlier, the encoding configuration settings are reported in Table 3. Eighteen video sequences were selected and the average for each tested number of clients group (10, 20, 30 end users) is recorded. A network load is categorised into three levels: light (10 users), medium (20 users), and heavy load (30 users).

Each test is repeated ten times to get more reliable verifications.

Our algorithm is integrated with the video evaluation platform.

The average Y-PSNR results for the pre-selected test sequences are recorded for network load behaviour.

Figures 11 and 12 show the effect of increasing the number of users on the perceptual visual quality in terms of the Y-PSNR at the decoder.

It is obvious when we increase the number of clients, the proposed algorithm outperforms the default reference software.

When the encoded video sequences is streamed with a high network load, the dropped packets are increased significantly due to network congestion. Therefore, the objective video quality is deteriorated further at a higher network load with a shared bandwidth network.

It is noted from Figs. 11 and 12 results that the start-up delay takes longer, the number of decoded redundant intra slices increased, in return the probability of recovered the damaged slices increased due to using the encoded redundant slices to resynchronise the corrupted areas. Hence, in real-time video streaming, the decoded video quality is affected by two factors in the proposed algorithm: the encoding bit rate and the GOP structure used.

In general, for a GOP with longer encoded length, the coding efficiency will be higher due to less encoding bit rate required. However, a high GOP size means more dependent frames are encoded between I-frame interval; this longer interval lowering the encoding error resilience. This is in addition to the increase in the picture decoding delay. The acceptable total delay could be between 500 ms and 1000 ms before the picture is ready for display.

5.3 Computational complexity

The aim of this work is to determine the impact of encoding/decoding computational complexity from a processing time perspective. It is worth noting that the reference software (HM) is mainly used for developing H.265|HEVC video coding algorithms. Moreover, it does not practically support real-time video encoding applications. Although the HM suffers from slow speed of execution of encoding and decoding, some attempts have been made during the regular reference development process, i.e. HM versions.

In the experimental work, we measured the encoding/decoding processing time for the proposed algorithm, and then compared the obtained results with the standard default reference software HM16.

The video test sequences are encoded using the same encoding setting reported in Table 3. For video test sequences, the experiment was implemented with the same video test sequences reported in Table 2.

As stated in the JCT-VC common test conditions [29], there are mainly three encoding configuration settings: low delay-B slices, all intra (AI), random access (RA). In low delay-B configuration, the first frame is encoded with intra frame type and the following frames are encoded as redundant frames with bi-prediction B-frames which give higher coding efficiency and coding delay than redundant frames with uni-prediction P-frames [21, 30].

For (AI) configuration, intra mode is used to encode the whole video sequence. This encoding type gives low encoding time, but requires very high encoding rates.

For (RA) encoding configuration, the encoded video frames are organised in hierarchical B structure. This mode gives higher compression efficiency than other encoding modes. However, it is not suitable for low delay applications, because it requires more processing for reorganising the decoding pictures order at the far end decoder.

In this paper, we evaluated the proposed algorithm with low delay-B configuration mode and encoded with QP at (32). The execution time of encoding/decoding process for each video test sequence without feedback channel is reported in Table 5.

Table 5 Encoding and decoding time of the ASE algorithm compared to the HM16 reference software

Full size table

The table presents the encoding and decoding run times as an indication of algorithmic complexity compared with the default reference (HM 16).

To examine the processing time of the proposed error resilience algorithm, the average processing time of enc/decoding 18 video test sequences is obtained. Figure 13 shows the required increase in the enc/decoding processing time percentage compared with HM 16 software without the error resilience tool implementation.

It is noted from the obtained results that the encoder consumes more time than that at the decoder (Fig. 13). The additional computation time in the modified HM16 encoder has arisen from rate control adaptation for encoding different areas with different encoding bit rates, and also allocating different quantisation parameters at different largest coding units levels. In addition to the picture segmentation processing into different subregion areas, this process includes differencing method as additional computations between the current and previous frames.

For additional computations at the decoder, the high amount of time spent on parsing the redundant slices leads to an increase in reference sample generation at the decoded picture buffer. Furthermore, an additional part of the decoding process is spent on the scanning process at the slice boundaries level, in addition to the reference sample generation for the intra slices prediction. The increasing time in both encoding and decoding process arises from adding a set of C++ classes for error resilience implementation.

6 Conclusions and recommendation

This paper presents an efficient H.265|HEVC error resilience algorithm to support low delay video delivery applications. The novelty of this algorithm lies in automatically selecting the most active frame regions and protecting them against transmission errors at the cost of an increase in the encoding bit rate overhead and encoding/decoding computational complexity. The proposed work also took into consideration the coding efficiency by subdividing the non-active regions into flat and high textured areas. Hence, the saving of the bit budget in non-active areas is achieved by spending larger portions of the available bit budget on active frame areas and obtaining the best trade-off between the coding efficiency and error resilience performance.

We conducted several simulation scenarios for evaluating the proposed algorithm. Firstly, we presented different network testbeds used for a modified video codec performance evaluation. The experimental work was conducted in error-prone and error-free environments, with averaging packet loss rates ranging from 2 to 18%. The obtained results show the proposed algorithm yields a Y-PSNR gain of 4.52 db over the HM16 reference software, and outperforms the state-of-the-art algorithms ROI and IROI by 2.28 db and 1.07 db, respectively. However, in error-free conditions, the proposed algorithm suffered the highest db loss of − 1.09 against the default software HM16.

Furthermore, the encoding and decoding processing time of the tested video sequences is analysed and reported in terms of computational complexity. The processing time results of the proposed algorithm showed that the encoding and decoding time increased by 19% and 11%, respectively.

The algorithm is further investigated with start-up video play delay (0.5 s and 1 s), in long-term evolution LTE network. The obtained results showed that when the start-up delay increases (0.5–1 s) at the decoder, the objective decoded video quality remarkably increases (1 db on average). These results indicate that the proposed algorithm, without a feedback channel, can be used in low delay video applications. Our future work includes implementing a Gilbert–Elliott model with the proposed algorithm for providing real-time quality service estimation. The model will enable the automatic control and adjustment of the encoding parameters.

References

Nightingale, J., Wang, Q., Grecos, C.: HEVStream: a framework for streaming and evaluation of high efficiency video coding (HEVC) content in loss-prone networks. IEEE Trans Consum Electron 58(2), 404–412 (2012)
Article Google Scholar
International Telecommunication Union Telecommunication Standardization Sector (ITU-T) and Inter- national Organization for standardization/international electrotechnical commission (ISO/IEC), High efficiency video coding, Rec. H.265/ISO/IEC 23008–2 (2013)
Garcia, R., Kalva, H.: Subjective evaluation of HEVC and AVC/H.264 in mobile environments. IEEE Trans Consum Electron 60(1), 116–123 (2014)
Article Google Scholar
Sjöberg, R., Boyce, J.: HEVC high-level syntax. In: High efficiency video coding (HEVC): algorithms and architectures, pp 13–48 (2014)
Flynn, D., et al.: Overview of the range extensions for the HEVC standard: tools, profiles, and performance. IEEE Trans. Circuits Syst. Video Technol. 26(1), 4–19 (2016)
Article MathSciNet Google Scholar
Psannis, K.E.: HEVC in wireless environments. J Real-Time Image Process. 12(2), 509–516 (2015)
Article Google Scholar
Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012)
Article Google Scholar
Yousfi, R., Ben Omor, M., Damak, T., Ben Ayed, M.A., Masmoudi, N.: JEM-post HEVC vs. HM-H265/HEVC performance and subjective quality comparison based on QVA metric. In: 2018 4th International Conference Advances Technology Signal Image Process. ATSIP 2018, pp. 1–4, (2018)
Usman, M., He, X., Lam, K.M., Xu, M., Bokhari, S.M.M., Chen, J.: Frame interpolation for cloud-based mobile video streaming. IEEE Trans. Multimed. 18(5), 831–839 (2016)
Article Google Scholar
Chen, H., Zhao, C., Sun, M.T., Drake, A.: Adaptive intra-refresh for low-delay error-resilient video coding. J. Vis. Commun. Image Represent. 31, 294–304 (2015)
Article Google Scholar
Wang, D.Y., Sun, S.X.: Region-based rate control and bit allocation for video coding. In: 2008 International Conference Apperceiving Computing Intelligence Analysis. ICACIA 2008, vol. 8, no. 1, pp. 147–151, (2008)
Li, H., Wang, Z., Cui, H., Tang, K.: An improved ROI-based rate control algorithm for H.264/AVC. In: International Conference Signal Processing Proceedings, ICSP, vol. 2, (2007)
Liu, Y., Li, Z.G., Soh, Y.C.: Region-of-interest based resource allocation for conversational video communication of H.264/AVC. IEEE Trans. Circuits Syst. Video Technol. 18(1), 134–139 (2008)
Article Google Scholar
Liu, Y., Li, Z.G., Soh, Y.C.: A novel rate control scheme for low delay video communication of H.264/AVC standard. IEEE Trans Circuits Syst Video Technol 17(1), 68–78 (2007)
Article Google Scholar
Song, H., Kuo, C.-C.J.: A region-based H.263 + codec and its rate control for low VBR video. IEEE Trans Multimed 6(3), 489–500 (2004)
Article Google Scholar
Hu, H.M., Lin, W., Li, B., Sun, M.T.: A region-based rate-control scheme using inter-layer information for H.264/SVC. J. Vis. Commun. Image Represent. 22(7), 615–626 (2011)
Article Google Scholar
Wang, F., Wu, W., Lou, Y., Yang, A.: An adaptive H264 video protection scheme for video conferencing. IEEE Vis. Commun. Image Proces, pp 1–4, (2011)
Ohm, J., Sullivan, G.J.J.: High efficiency video coding: the next frontier in video compression. IEEE Signal Process. Mag. 30(1), 152–158 (2013)
Article Google Scholar
Stockhammer, T., Hannuksela, M.M., Wiegand, T.: H. 264/AVC in wireless environments. IEEE Trans. Circuits Syst. Video Technol. 13(7), 657–673 (2003)
Article Google Scholar
Ren, G., Li, P., Wang, G.: A novel hybrid coarse-to-fine digital image stabilization algorithm. Inform. Technol J 9(7), 1390–1396 (2010)
Article Google Scholar
Sjoberg, R., et al.: Overview of HEVC high-level syntax and reference picture management. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1858–1870 (2012)
Article Google Scholar
Li, B., Li, H., Li, L., Zhang, J.: λ domain rate control algorithm for high efficiency video coding. IEEE Trans. Image Process. 23(9), 3841–3854 (2014)
Article MathSciNet Google Scholar
http://trace.eas.asu.edu/yuv/yuv.html. Accessed 12 Mar 2019
Fouda, A., et al.: Real-time video streaming over NS3-based emulated LTE networks. Int J Electron Commun Comput Technol 4(3), 659–663 (2014)
MathSciNet Google Scholar
Wenger, S. NAL Unit Loss Software. Document JCTVC-H0072, ITU-T/ISO/IEC Joint Collaborative Team on Video Coding (JCT-VC). (2012). http://phenix.intevry.fr/jct/doc_end_user/current_document.php5?id=4373. Accessed 3 July 2018
Maung, H.M., Aramvith, S., Miyanaga, Y.: Region-of-interest based error resilient method for HEVC video transmission. In: 2015 15th International symposium on information and communication technology, ISC 2015, pp. 241–244, (2016)
Maung H., Aramvith, S., Miyanaga, Y.: Improved region-of-interest based rate control for error resilient HEVC framework. In: International conference on digital signal processing. DSP, pp. 286–290, (2017)
Krishnan, S.S., Sitaraman, R.K.: Video stream quality impacts viewer behavior: inferring causality using quasi-experimental designs. IEEE/ACM Trans. Netw. 21(6), 2001–2014 (2013)
Article Google Scholar
Bossen, F.: Common test conditions and software reference configurations, document JCTVC-L1100 of JCT-VC, Geneva (2013)
Zhao, T., Wang, Z., Kwong, S.: Flexible mode selection and complexity allocation in high efficiency video coding. IEEE J Sel. Top. Signal Process. 7(6), 1135–1144 (2013)
Article Google Scholar

Download references

Acknowledgements

This publication was made possible by NPRP grant 9-181-1-036 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors (www.ceproqha.qa). The authors would also like to thank Mr. Marc Pelletreau, the MIA Multimedia team, the Art Curators and the management staff of the Museum of Islamic art, Doha, Qatar, for their support in using their Media Lab facilities during the experimental work.

Author information

Authors and Affiliations

Electronic and Computer Engineering, Brunel University London, London, UK
Taha T. Alfaqheri & Abdul Hamid Sadka

Authors

Taha T. Alfaqheri
View author publications
You can also search for this author in PubMed Google Scholar
Abdul Hamid Sadka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdul Hamid Sadka.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Alfaqheri, T.T., Sadka, A.H. Low delay error resilience algorithm for H.265|HEVC video transmission. J Real-Time Image Proc 17, 2047–2063 (2020). https://doi.org/10.1007/s11554-019-00923-5

Download citation

Received: 28 April 2019
Accepted: 21 October 2019
Published: 20 November 2019
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11554-019-00923-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Low delay error resilience algorithm for H.265|HEVC video transmission

Abstract

Similar content being viewed by others

Video steganography: recent advances and challenges

HDRC: a subjective quality assessment database for compressed high dynamic range image

Study and investigation of video steganography over uncompressed and compressed domain: a comprehensive review

1 Introduction

2 Background

2.1 Error resilience using region of interest extraction