Advertisement

Multimedia Tools and Applications

, Volume 77, Issue 23, pp 30683–30701 | Cite as

Efficient decoding algorithm for 3D video over wireless channels

  • S. L. P. Yasakethu
  • C. T. E. R. Hewage
Open Access
Article
  • 277 Downloads

Abstract

In wireless communication networks, additive channel noise and time varying multipath fading are considered amongst the main challenges and these effects will bring significant quality degradation to the transmitted 2D/3D video streams. This paper presents an efficient decoding algorithm for Distributed Video Coding (DVC) for enhanced performance of colour and depth based 3D video over error prone wireless channels. The maximum a-posteriori (MAP) algorithm for turbo decoding is modified considering the effects of the channel errors on both Wyner-Ziv and key frame bit streams of colour and depth videos. The proposed codec is simulated using a W-CDMA wireless channel model and the results are analyzed to determine the effect of the modifications. The performance of the state-of-the-art H.264/AVC video codec is also presented for comparison under similar conditions. The results show that the proposed modifications provide a significant improvement in the DVC codec performance under unfavourable channel conditions for colour plus depth based 3D video.

Keywords

3D video DVC Wireless channels Quality of Experience (QoE) 

1 Introduction

The 3D video has gone one step beyond the conventional 2D video by providing the depth perception of the scene to the users. These formats of enhanced multimedia content have the potential to render high quality immersive experiences (i.e., immersive QoE) for a variety of applications in home, in workplace, and in public spaces. Distributed Video Coding (DVC) is an emerging video coding approach, particularly attractive due to its flexibility for building extremely simple and low cost video encoders. This feature could be very effectively exploited in several 3D video application domains including wireless sensor networks for 3D mobile video communications and future 3D video surveillance systems, where the conventional video coding technologies, with highly computational intensive algorithms become weak candidates (the usage in sensor networks).

A 3DTV broadcasting approach was proposed by the European Information Society Technologies (IST) project ‘Advanced Three-Dimensional Television System Technologies’ (ATTEST), [9]. The 3D image/video representation used in this scenario is based on a monocular colour video and an associated per-pixel depth map. An example of this representation is illustrated in Fig. 1 for the “Orbi” test 3D video sequence. Depth Image Based Rendering (DIBR) methodology utilizes this colour and its depth videos to generate artificial stereoscopic views, one for each eye of a viewer. The main advantage of DIBR technique compared to traditional representation of 3D video with the left-right views is that it provides high quality 3D video with smaller bandwidth required for transmission [6, 7, 8, 10]. It is a common scenario to deploy wireless networks for 3D video sensor networks, where the adverse effects of channel noise and time varying multipath fading on the reconstructed video quality is a significant problem. When DVC is concerned, even though its performance over noisy channels has been the focus of some recent research, notably for packet based networks [15], it is noted that the performance degradation due to fading with wireless channels has received limited attention. This paper considers this fading wireless channel scenario and modifies the DVC decoding algorithm to enhance the decoded quality of 3D video (rendered left and right videos) of the transform domain DVC codec. This discussion is based on a DVC codec that follows the Stanford framework [1] with turbo coding, where the Wyner-Ziv coded bit stream and the key frame stream of both colour and depth videos traverses the communication channel. Thus the effects of the channel errors on both Wyner-Ziv and key frame streams are considered herein. Appropriate noise models will be developed for each stream at the decoder and the suitable compensative solutions will be proposed.
Fig. 1

A Frame from the “Orbi” test sequence

2 System overview

The considered transform domain DVC codec architecture [4] with the W-CDMA wireless channel simulation system is presented in this section. Colour and depth videos are encoded independently at the DVC encoder and are transmitted over the wireless channel. After independently decoding both colour and depth videos DIBR is used to generate stereoscopic views (left and right) at the decoder. A block diagram of the system is shown in Fig. 2. It should be noted that this architecture is common to both colour and depth videos of 3D content. The DVC encoder generate separate punctured parity bit streams (Wyner-Ziv coded bit stream) [4] for colour and depth videos and are transmitted over the fading wireless channel. As shown in Fig. 2, a quadrature phase shift keying (QPSK) modulator and a demodulator with built-in rake receiver are used in the transmitting and receiving nodes of the wireless channel respectively. In wireless communications it is common to use a discrete time Rayleigh fading channel model consisting of multiple narrow-band channel paths separated by delay elements as illustrated in Fig. 3. With these channel models, it is common in communication systems to use rake receivers to improve the performance by collecting the energy dispersed over multiple fading paths. In Fig. 4, which illustrates the rake receiver, R(t) and Y(t) correspond to the inputs and the output of the rake receiver. G1, G2…GN are the complex fading coefficients of each path. The output of the rake receiver Y(t), with respect to signals received from N fading paths is computed as:
$$ {\displaystyle \begin{array}{c}{R}_i(t)=\sum \limits_{j=1}^N\left({G}_jX\left(t+2-i-j\right)\right)+n(t),\kern0.5em \forall i\in \left\{1,2,\dots, N\right\}\\ {}Y(t)=\sum \limits_{i=1}^N{G}_i^{\ast }{R}_i\\ {}Y(t)=\left(\sum \limits_{i=1}^N{\left|{G}_i\right|}^2\right)X(t)+\sum \limits_{\begin{array}{l}i=1\\ {}i\ne j\end{array}}^N\sum \limits_{j=1}^N{G}_i^{\ast}\left\{{G}_jX\left(t+2-i-j\right)+n(t)\right\}Y(t)=\left(\sum \limits_{i=1}^N{\left|{G}_i\right|}^2\right)X(t)+{n}^{\prime }(t)\end{array}} $$
(1)
Fig. 2

Block diagram of the DVC codec with the noisy channel models

Fig. 3

Rayleigh fading channel model

Fig. 4

Block diagram of the rake receiver

where X(t) is the transmitted symbol, \( {G}_i^{\ast } \) represent the complex conjugate values of the fading coefficient of path index i (i = 1,2,…N), and n(t) is the composite noise in the rake receive output.

3 Proposed solution

In this section, we present the mathematical design of the novel noise model for the systematic and parity bit streams of colour and depth videos to be used in iterative MAP decoding of Wyner-Ziv frames in DVC. It is important to note that all the notations and derivations in the paper are common to both colour and depth videos, unless or otherwise stated.

The conditional Log Likelihood Ratio (LLR) L(uk| yk) of the data bit uk, is defined as [16]:
$$ L\left({u}_k|{y}_k\right)=\ln \left(\frac{P\left({u}_k=+1\right)\mid {y}_k}{P\left({u}_k=-1\right)\mid {y}_k}\right) $$
(2)
where yk is the received parity information. Incorporating the code’s trellis, this may be written as,
$$ L\left({u}_k\right)=\ln \left(\frac{\sum \limits_{s+}p\left({s}_{k-1}={s}^{\prime },{s}_k=s,y\right)/p(y)}{\sum \limits_{s-}p\left({s}_{k-1}={s}^{\prime },{s}_k=s,y\right)/p(y)}\right) $$
(3)
where sk ∈ S is the state of the encoder at time instance k, S+ is the set of ordered pairs (s, s) corresponding to all state transitions (sk − 1 = s) → (sk = s) caused by data input uk =  + 1, and S is similarly defined for uk =  − 1.

From the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm [16], p(s, s, y) = p(sk − 1 = s, sk = s, y), is computed as,

$$ p\left({s}^{\prime },s,y\right)={\alpha}_{k-1}\left({s}^{\prime}\right)\cdot {\gamma}_k\left({s}^{\prime },s\right)\cdot {\beta}_k(s) $$
(4)
where αk(s) and βk(s) are the forward and backward trellis state probabilities respectively, and γk(s', s) is the state transition probability, defined as follows:
$$ {\alpha}_k(s)\kern0.5em =\kern0.5em {\alpha}_{k-1}\left({v}_1\right)\;{\gamma}_k\left({v}_1,s\right)\kern0.5em +\kern0.5em {\alpha}_{k-1}\left({v}_2\right)\;{\gamma}_k\left({v}_2,s\right) $$
(5)
$$ {\beta}_k(s)\kern0.5em =\kern0.5em {\beta}_{k+1}\left({w}_1\right)\;{\gamma}_{k+1}\left({w}_1,s\right)\kern0.5em +\kern0.5em {\beta}_{k+1}\left({w}_2\right)\;{\gamma}_{k+1}\left({w}_2,s\right) $$
(6)
$$ {\displaystyle \begin{array}{l}{\gamma}_k\left({s}^{\prime },s\right)=P\left(\left.s\right|{s}^{\prime}\right)p\left(\left.{y}_k\right|{s}^{\prime },s\right)\\ {}\kern3.6em =P\left({u}_k\right)p\left(\left.{y}_k\right|{u}_k\right)\end{array}} $$
(7)
where v1 and v2 denote valid previous states and w1 and w2 denote valid next states for the current state.

It is reminded that, in the case of DVC, the parity stream for the turbo decoder is transmitted over a noisy wireless channel and the systematic stream is taken from the side information frame locally generated at the decoder which assumes a hypothetical Laplacian noise model. But in real world scenarios, both the transmitted parity and the intra-coded key frame information suffer the same noisy wireless channel conditions. Thus, under practical situations, the effects of corrupted parity information, corrupted intra-coded key frame information, and the hypothetical channel characteristics have been considered in our discussion for both colour and depth videos.

Considering the noise distribution in parity and systematic components of the input to the turbo decoder to be independent, p(yk| uk) in Eq. (7) can be written as:

$$ p\left({y}_k|{u}_k\right)=p\left(\left.{y}_k^p\right|{u}_k^p\right)p\left(\left.{y}_k^s\right|{u}_k^s\right) $$
(8)
where \( {y}_k=\left\{{y}_k^p,{y}_k^s\right\},{u}_k=\left\{{u}_k^p,{u}_k^s\right\} \) are the received and transmitted parity and systematic information respectively. The mathematical modelling of received parity, and systematic information are elaborated in the following sub-sections.

3.1 Modelling the parity information for colour and depth videos

For both colour and depth parity bit sequences received through a multipath fading wireless channel, the probability of the received bit \( {y}_k^p \) conditioned on the transmitted bit \( {u}_k^p \) can be written as:
$$ p\left(\left.{y}_k^p\right|{u}_k^p=\pm 1\right)=\frac{E_b}{\sqrt{2{\pi \sigma}^2}}{e}^{\frac{-{\left[{y}_k^p\mp \sum \limits_{i=1}^N{\left|{a}_i\right|}^2\right]}^2}{2{\sigma}^2}} $$
(9)
where ai is the complex fading coefficient, σ is the standard deviation in the Gaussian probability distribution, Eb is the energy per bit, i is the path index and N is the total number of multipaths.
The conditional LLR for parity information \( L\left({y}_k^p|{x}_k^p\right) \) can be written as:
$$ L\left(\left.{y}_k^p\right|{x}_k^p\right)=\ln \left(\frac{P\left(\left.{y}_k^p\right|{x}_k^p=+1\right)}{P\left(\left.{y}_k^p\right|{x}_k^p=-1\right)}\right) $$
(10)
Substituting (9) in (10), we get:
$$ {\displaystyle \begin{array}{l}L\left(\left.{y}_k^p\right|{x}_k^p\right)=\ln \left\{\left(\frac{\exp \left(-\frac{E_b}{2{\sigma}^2}{\left({y}_k^p-\sum \limits_{i=1}^N{\left|{a}_i\right|}^2\right)}^2\right)}{\exp \left(-\frac{E_b}{2{\sigma}^2}{\left({y}_k^p+\sum \limits_{i=1}^N{\left|{a}_i\right|}^2\right)}^2\right)}\right)\right\}\\ {}\kern4em =\left\{\left(-\frac{E_b}{2{\sigma}^2}{\left({y}_k^p-\sum \limits_{i=1}^N{\left|{a}_i\right|}^2\right)}^2\right)-\left(-\frac{E_b}{2{\sigma}^2}{\left({y}_k^p+\sum \limits_{i=1}^N{\left|{a}_i\right|}^2\right)}^2\right)\right\}=\frac{E_b}{2{\sigma}^2}4\sum \limits_{i=1}^N{\left|{a}_i\right|}^2{y}_k^p\\ {}\kern4em ={L}_C.{y}_k^p\end{array}} $$
(11)
where \( {L}_c=\frac{E_b}{2{\sigma}^2}4\sum \limits_{i=1}^N{\left|{a}_i\right|}^2 \) is defined as the channel reliability value, and depends only on the channel SNR and the fading amplitude.

3.2 Modelling the systematic information for colour and depth videos

Under the error free transmission of intra-coded key frames, the correlation of the original Wyner-Ziv frames to be encoded and the estimated side information is assumed to be Laplacian distributed as,
$$ f(x)=\frac{\alpha }{2}\exp \left(-\alpha \left|{y}_k^s-{u}_k^s\right|\right) $$
(12)
where α is the distribution parameter. But experimentally we have found that for the colour video, in the presence of a noisy wireless channel, the correlation between the estimated side information and the original information no longer exhibits a true Laplacian behaviour for the DC band of the DCT coefficients, even though the distributions of the AC bands can still be approximated as a Laplacian model. With the loss of information due to channel conditions, the new noise pdf of the DC band of the colour video (denoted as h(x) in Fig. 5c) is identified as the resultant additive effect of the original pdf of the DC band and the above Laplacian distribution, f(x). This phenomenon is illustrated for Interview test video sequence in Fig. 5. Tests were performed for several standard test video sequences and consistent results were observed. Thus, the conditional probability, \( p\left({y}_k^s|{u}_k^s\right) \) in Eq. (8), is derived from this additive channel model. However, the original DC coefficients of the encoded sequence for the current Wyner-Ziv frames are not available at the DVC decoder. Therefore, the pdf of the DC band distribution is estimated at the DVC decoder using the previously decoded Wyner-Ziv frame and is continuously updated during the decoding process. For the depth video, it is observed that, the correlation between the estimated side information and the original information for all the DC and AC bands can still be approximated as a Laplacian model.
Fig. 5

The noise modelling of systematic information (Colour video of Interview test sequence)

Thus the conditional probabilities, \( p\left({y}_k^s|{u}_k^s\right) \), calculated by modelling the side information, and \( p\left({y}_k^p|{u}_k^p\right) \), calculated by modelling the parity information, for colour and depth videos as discussed in section 3, are used in Eq. (8) and the results are used in Eq. (7) to calculate the branch metricγk(s', s).

Now, the SISO decoder is modified incorporating the proposed noise models for the MAP decoding algorithm as derived above. This modified turbo decoder is incorporated to the existing DVC decoder architecture [4].

When simulating the channel noise for the intra coded key frames, of both colour and depth videos transmitted over identical channel, the decoded key frames are too degraded for use as reference frames for side information estimation. Therefore, a channel coder [12] is incorporated to the intra codec in key frame coding. A low complex channel encoder (rate: 1/3, generator matrix G(D) = [1 1/(1 + D)]) is chosen not to overburden the DVC encoder and suitable channel encoders are incorporated, for both I and B frames in H.264/AVC codec which will be discussed later in the comparison of results.

The reverse channel used in DVC for parity-request transmission is assumed to be free of channel errors. This assumption is justified by the very small bit rate utilized in the reverse channel and the possibility of building very strong error correction coding to protect it using the sufficiently available reverse channel bandwidth, in a common symmetric bi-directional channel.

4 Simulation results and discussion

Simulation results of the DVC codec with the proposed modifications are discussed in this section. The test conditions are first explained followed by an analysis of results.

4.1 Test conditions

The common test conditions used for all simulations are listed here. Specific conditions adopted in each analysis scenario are described in the following sections. Three test video sequences are selected to represent different motion characteristics. “Breakdancers” (high motion/texture variation with static camera, at 15fps) “Orbi” (medium to high motion/texture variation and parallel camera motion, at 25fps) and “Interview” (low motion/texture variation with static camera, at 25fps). Colour video is represented in YUV 4:2:0 and the depth video is represented using the luminance component with 8-bit grey values. Where grey value 0 specifies the furthest value (i.e. away from camera) and the grey value 255 specifies the closest value (i.e. closer to the camera). Resolution is 176 × 144 and number of coded frames is 100 with a GOP of 2. The transform (DCT) coefficients are quantized using the quantization matrices in [4]. Key frames of colour and depth videos are H.264/AVC Intra coded with the quantization parameter (QP) optimized for each rate-distortion (RD) point for comparable picture quality in Wyner-Ziv and key frames. H.264/AVC codec [11] is simulated in IBIB GOP structure (JM10.1/Main profile) for identical channel conditions. Here, a channel coder identical to that used for intra coding the key frames, as described in section 3.2, is used. For both H.264/AVC and DVC codecs, the PSNR and bit rates are calculated for both intra coded and inter coded frames of colour and depth videos. Then the average PSNR of the rendered left and right videos and the total bit rate for colour and depth videos (The channel coding overhead bit rate is accounted.) are calculated and are presented in the simulation results. Previous studies by the authors have demonstrated the relationship between avaerage PSNR and true 3D image quality [6, 7, 8]. These studies provide correlation figures (mapping using a logistic function proposed in ITU-R/BT 500–13 [10]) to show how much the average PSNR of left and right video is correlating with MOS provided by the users. However, to provide the performance parameters against true 3D video quality, subjective quality evaluation tests were conducted. This will illustrate how significant the achieved results are interms of true 3D user pereption. The subjective test conditions and parameters are listed below.

The 42” Philips multi-view auto-stereoscopic display was used in the experiment to display the stereoscopic material. The optics of this display were optimized for a viewing distance of 3 m. Hence, the viewing distance for the observers was set to 3 m. This viewing distance was in compliance with the Preferred Viewing Distance (PVD) of the ITU-R BT.500-13 recommendation [10], which specifies the methodology for subjective assessment of the quality of television pictures. The 3-D display was calibrated using a GretagMacbeth Eye-One Display 2 calibration device. Peak luminance of the display was 200 cd/m2. The measured environmental illumination was 180 lx, which was slightly below the ITU-R BT.500-13 recommended value for home environments (i.e. 200 lux) [10]. The background luminance of the wall behind the monitor was 20 lx. The ratio of luminance of inactive screen to peak luminance was less than 0.02. These environmental luminance measures remained the same for all test sessions, as the lighting conditions of the test room were kept constant. Thirty two non-expert observers (ten female observers and 22 male observers) volunteered to participate in the experiment. They were divided into two groups in order to assess one attribute per group. The attribute assessed in this experiment was overall perceived image quality. The observers were mostly research students and staff with a technical background. Their ages ranged from 20 to 40. Fourteen observers had prior experience with 3-D video content using different viewing aids such as shutter glasses and red-blue anaglyph glasses. All participants had a visual acuity of ≥1 (as tested with the Snellen chart), good stereo vision <60 s of arc (as tested with the TNO stereo test), and good colour vision (as tested with the Ishihara test).

A W-CDMA wireless channel is used for all simulations. A slow fading channel is considered to closely resemble a practical scenario (Modulation: QPSK, Chip rate: 3.84Mchip/s, Spreading factor: 32, Spreading sequence: OVSF, Carrier freq.: 2GHz, Doppler speed: 100 km/h).

4.2 Analysis of results

The simulation results are analyzed in the following sub-sections for the effects of proposed noise models and the effects of the multiple fading paths, followed by a comparison with the state of the art in conventional video coding.

4.2.1 Effects of noise models

The performance of the proposed solution with respect to each modification (i.e. parity and side information) is shown in Fig. 6a and b for the Orbi sequence, with multipath fading (2 paths) and a channel SNR of 3 dB. It is observed that the DVC codec RD performance is improved by each individual modification. First, the proposed models for the parity (Wyner-Ziv bit stream) and systematic (side information) bit streams of colour and depth videos have been applied separately to verify the performance. The plots in Fig. 6a and b are described here in the order of performance improvement (bottom to top):
  1. 1.

    The original result without the proposed modifications

     
  2. 2.

    The proposed model applied for the systematic (side information) bit stream only. Parity bit stream (Wyner-Ziv bit stream) noise model is not used. Here, up to 0.5 dB gain in average PSNR of rendered views is observed for the same bit rate when compared with the no-model result (see Fig. 6b).

     
  3. 3.

    The proposed noise model applied only for the parity bit stream (Wyner-Ziv bit stream). Noise model for systematic bit stream is not used. Here, an average PSNR gain of up to 1.5 dB of rendered views is evident compared with the no-model result.

     
  4. 4.

    Both proposed noise models applied. An overall average PSNR gain of up to 1.8 dB is evident here. Similar gains can be visible in terms of true 3D video quality measures in Normalized MOS (see Fig. 6a)

     
Fig. 6

a RD results (Normalized MOS) for the 2 path fading model (channel SNR = 3 dB), Orbi test sequence. b. RD results for the 2 path fading model (channel SNR = 3 dB), Orbi test sequence

Performance of the proposed model with the Interview and Breakdancers sequences are shown in Figs. 7a/b and 8a/b, where a similar Normalized MOS/Average PSNR gain is evident when both parity and systematic bit models are incorporated. Thus, the observations for the three different sequences are consistent proving that the positive gain of the proposed modification comes irrespective of the motion level in the 3D video content.
Fig. 7

a RD results (Normalized MOS) for the 2 path fading model (channel SNR = 3 dB), Interview test sequence. b. RD results (Average PSNR) for the 2 path fading model (channel SNR = 3 dB), Interview test sequence

Fig. 8

a RD results (Normalized MOS) for the 2 path fading model (channel SNR = 3 dB), Breakdancers test sequence. b RD results (Average PSNR) for the 2 path fading model (channel SNR = 3 dB), Breakdancers test sequence

4.2.2 Effects of the multiple fading paths

The verification of the proposed model for the multipath fading channel model is further extended by varying the number of transmission paths for a single path and a 4-path case. Simulation results are shown in Figs. 9a/b and 10a/b, respectively for the single path and 4-path fading channels and it is evident that the performance improvement from the proposed modifications is consistent for each case.
Fig. 9

a. RD results (Normalized MOS) for single path in fading model, (channel SNR = 3 dB), Orbi test sequence. b. RD results (Average PSNR) for single path in fading model, (channel SNR = 3 dB), Orbi test sequence

Fig. 10

a RD results (Normalized MOS) for 4 path in fading model, (channel SNR = 3 dB), Orbi test sequence. b. RD results for 4 path in fading model, (channel SNR = 3 dB), Orbi test sequence

4.2.3 Comparison with the state-of-the-art (H.264/AVC)

Finally, the results of the proposed model, are compared with the state-of-the-art H.264/AVC codec (with default error concealment), for different channel SNR values. From the results presented in Fig. 11a and b, it is evident that the proposed DVC codec demonstrates a very reliable and consistent performance over various channel conditions. It is further noted that the H.264/AVC codec is very sensitive to adverse channel conditions. The low complex channel encoder (rate: 1/3, G(D) = [1 1/(1 + D)]) used for DVC did not yeild reliable performance for H.264/AVC. Accordingly, suitable new channel coders (only for H.264/AVC) were empirically determined for different SNR levels (for SNR = 3 dB: code rate = 1/3, G(D) = [1 (1 + D + D2+ D3) /(1 + D + D3)], for SNR = 5 dB: code rate = 1/3, G(D) = [1 (1 + D2) /(1 + D + D2)] and for SNR = 9 dB: code rate = 1/3, G(D) = [1 1/(1 + D2)]). Note here that the use of larger constraint lengths means significantly increased coding complexity. This approach was chosen over increasing the code rate (e.g. ¼) to keep the overall bit rate at a minimum.
Fig. 11

a DVC and H.264/AVC Performance comparison (using Normalized MOS) for Orbi test sequence (2 path fading). b DVC and H.264/AVC Performance comparison (using average PSNR) for Orbi test sequence (2 path fading)

4.3 Related work

Several optimization frameworks have been proposed in the literature for distributed processing or handling of data to maximize the performance under given resource limitations [2, 3, 13, 17, 18, 19]. AutoReplica [17] provides a solution to effectively balance the trade-off between I/O performance and fault tolerance using SSD-HDD tier storages. Primarily this explains replication across multiple storages to achieve fault-tolerance. However, this does not explain how high end applications such as next generation multimedia applications (e.g., 3D video) could use similar distributed architectures for distributed processing. This optimization framework proposed in [2] presents a performance approximation approach to model the computing performance of iterative, multi-stage applications running on a master-computer framework. Whist 3D video could be considered as a multi-stage application, this work is mainly for performance evaluation of such applications rather than end user quality, which is the main focus of our article. Authors in [18] explains automated framework for data placement in multi-tier all-flash data centres. This could be useful in multi-view 3D video applications where the importance of different views arise at different times and the proposed work can be extended to be used with such back end framework.

3D and multi-view video applications in proposed DVC architectures can be found in the literature [5, 14]. These work explain how DVC can be used instead of conventional video coding architectures which has high complexity at the encoder side due to motion prediction etc. At the same time, these articles highlight the challenges faced by DVC architectures to enable 3D/multi-video video. For example Side-Information (SI) generation for the same views as well as for adjacent views.

The fact that the DVC codec with the proposed noise model gives very reliable performance at minimal cost of channel coding yields very promising prospects for the target applications discussed earlier in the introduction. This is particulary significant since the applications are implemented over multipath fading wireless channel.

The improvement is proven to be significant irrespective of the motion level in the video content, or the conditions of the fading. Due to its very stable and consistent behaviour, DVC even outperforms the H.264/AVC codec when the channel effects are adverse. Furthermore, it is noted that these results are obtained with no additional complexity burden to the already extremely low complex DVC encoder.

Although the proposed solution shows improvements in the results there are few limitations due to the nature of the DVC codec. The main issue of DVC is its open-loop encoding architecture compared to traditional codecs. The latter got a perfect idea about the encoding artefacts and minimise them at the encoder side (closed-loop encoder). However, in the DVC architecture, neither encoder nor decoder got any idea about decoding artefacts. The delay is another issue. The decoder has to send a request for more data depending on the decoded quality. This repeat-request nature adds delay. However we can think of mechanisms to minimise the delay by sticking to an out-of-order transmission architecture with a buffer. Also, at the decoder in order to request parity bits from the encoder the decoded quality of the video frames needs to be measured. Currently this is done by statistical measures such as PSNR and MSE. However when comes to 3D video statistical measures such as PSNR and MSE fail to address the perceptual attributes of 3D video such as depth quality and visual discomfort [20]. Thus developing a no reference 3D video quality measure still remains as a challenge in the context of DVC.

5 Conclusions

In this paper, novel noise models are proposed for a DVC codec for colour and depth 3D video and the decoder is accordingly modified for the use on noisy wireless channels. The main contributions of the paper are the design and implementation of noise models for parity information (as discussed in section 3.1) and systematic information for colour and depth videos (as discussed in section 3.2). The performance of the proposed modifications is verified for a W-CDMA wireless channel with multipath fading. The simulation results clearly show that the two proposed noise models for the parity (Wyner-Ziv bit stream) and systematic (side information) bit streams have resulted in a significant gain in the RD performance of the codec when operated over noisy and fading channel conditions. The improvement is proven to be significant irrespective of the motion level in the video content, or the conditions of the fading. Due to its very stable and consistent behaviour, DVC even outperforms the H.264/AVC codec when the channel effects are adverse. Furthermore, it is noted that these results are obtained with no additional complexity burden to the already extremely low complex DVC encoder.

References

  1. 1.
    Aaron A, Zhang R, Girod B (2002) Wyner-Ziv coding of motion video. Proc. Asilomar conference on signals and systems, Pacific Grove, CA, Nov. 2002Google Scholar
  2. 2.
    Bhimani J, Mi N, Leeser M, Yang Z (2017) FIM: performance prediction for parallel computation in iterative data processing applications. 2017 IEEE 10th International Conference on Cloud Computing (CLOUD), Honolulu, CA, pp 359–366Google Scholar
  3. 3.
    Bhimani J, Yang Z, Leeser M, Mi N (2017) Accelerating big data applications using lightweight virtualization framework on enterprise cloud. 2017 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, pp 1–7Google Scholar
  4. 4.
    Brites C, Ascenso J, Pereira F (2006) Improving transform domain Wyner-Ziv video coding performance. Proc IEEE ICASSP 2:II-525–II-528Google Scholar
  5. 5.
    Guillemot C, Pereira F, Torres L, Ebrahimi T, Leonardi R, Ostermann J (2007) Distributed Monoview and Multiview video coding. IEEE Signal Process Mag 24(5):67–76CrossRefGoogle Scholar
  6. 6.
    Hewage CT (2008) Perceptual quality driven 3-D video over networks, Thesis, University of SurreyGoogle Scholar
  7. 7.
    Hewage CTER (2013) Perceptual quality driven 3D video communication. Scholars’ Press, Kingston upon ThamesGoogle Scholar
  8. 8.
    Hewage CT, Worrall S, Dogan S, Kondoz AM (2009) Quality evaluation of colour plus depth map based stereoscopic video. IEEE J Sel Top Signal Process 3(2):304–318 ISSN 1932-4553CrossRefGoogle Scholar
  9. 9.
    Ijsselsteijn WA, Seutiens PJH, Meesters LMJ (2002) State-of-the- art in human factors and quality issues of stereoscopic broadcast television. Technical report D1, IST-2001 34396 ATTEST 2002Google Scholar
  10. 10.
    International Telecommunication Union/ITU Radio communication Sector (2012) Methodology for the subjective assessment of the quality of television pictures. ITU-R BT.500-13, Jan 2012Google Scholar
  11. 11.
    Joint Video Team of ISO/IEC MPEG and ITU-T VCEG, H.264/AVC, Reference Software JM10.1 (online), http://iphome.hhi.de/suehring/tml/download/old_jm/jm10.1.zip
  12. 12.
    Lin S, Costello DJ Jr (2004) Error control coding, 2nd edn. Pearson Prentice Hall, Upper Saddle River, pp 810–811Google Scholar
  13. 13.
    Montalbano G, Slock DTM (2003) Joint common-dedicated pilots based estimation of time-varying channels for W-CDMA receivers. Proc IEEE VTC 2:1253–1257Google Scholar
  14. 14.
    Ouaret M, Dufaux F, Ebrahimi T (2006) Fusion-based multiview distributed video coding. In Proceedings of the 4th ACM international workshop on video surveillance and sensor networks (VSSN ‘06). ACM, New York, 139–144Google Scholar
  15. 15.
    Pedro J, Ducla Soares L, Pereira F (2007) Studying error resilience performance for a feedback channel based transform domain Wyner-Ziv Video Codec. Picture coding symposium, Lisbon – Portugal, November 2007Google Scholar
  16. 16.
    Ryan WE (1998) A turbo code tutorial. Proc. IEEE Globecom 1998, At: www.ece.arizona.edu/~ryan/publications/turbo2c.pdf
  17. 17.
    Yang Z, Wang J, Evans D, Mi N (2016) AutoReplica: automatic data replica manager in distributed caching and data processing systems. 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC), Las Vegas, NV, pp 1–6Google Scholar
  18. 18.
    Yang Z et al (2017) AutoTiering: automatic data placement manager in multi-tier all-flash datacenter. 2017 IEEE 36th International Performance Computing and Communications Conference (IPCCC), San Diego, CA, USA, pp 1–8Google Scholar
  19. 19.
    Yang Z et al (2017) H-NVMe: a hybrid framework of NVMe-based storage system in cloud computing environment. 2017 IEEE 36th International Performance Computing and Communications Conference (IPCCC), San Diego, CA, USA, pp 1–8Google Scholar
  20. 20.
    Yasakethu SLP, Hewage CTER, Fernando WAC, Kondoz A (2008) Quality analysis for 3D video using 2D video quality models. IEEE Trans Consum Electron 54(4):1969–1197CrossRefGoogle Scholar

Copyright information

© The Author(s) 2018

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  1. 1.Department of Electrical, Electronic and Computer ScienceSri Lanka Technological Campus (SLTC)PadukkaSri Lanka
  2. 2.Cardiff School of TechnologyCardiff Metropolitan UniversityCardiffUK

Personalised recommendations