Content Adaptive and Error Propagation Aware Deep Video Compression

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12347)


Recently, learning based video compression methods attract increasing attention. However, the previous works suffer from error propagation due to the accumulation of reconstructed error in inter predictive coding. Meanwhile, the previous learning based video codecs are also not adaptive to different video contents. To address these two problems, we propose a content adaptive and error propagation aware video compression system. Specifically, our method employs a joint training strategy by considering the compression performance of multiple consecutive frames instead of a single frame. Based on the learned long-term temporal information, our approach effectively alleviates error propagation in reconstructed frames. More importantly, instead of using the hand-crafted coding modes in the traditional compression systems, we design an online encoder updating scheme in our system. The proposed approach updates the parameters for encoder according to the rate-distortion criterion but keeps the decoder unchanged in the inference stage. Therefore, the encoder is adaptive to different video contents and achieves better compression performance by reducing the domain gap between the training and testing datasets. Our method is simple yet effective and outperforms the state-of-the-art learning based video codecs on benchmark datasets without increasing the model size or decreasing the decoding speed.



This work was supported in part by National Natural Science Foundation of China (61771306) Natural Science Foundation of Shanghai(18ZR1418100),111 plan (B07022), Shanghai Key Laboratory of Digital Media Processing and Transmissions(STCSM 18DZ2270700). Dong Xu was partially supported by the Australian Research Council (ARC) Future Fellowship under Grant FT180100116. Wanli Ouyang was supported by SenseTime, the Australian Research Council Grant DP200103223, and Australian Medical Research Future Fund MRFAI000085.

Supplementary material (18.2 mb)
Supplementary material 1 (zip 18600 KB)


  1. 1.
    Bellard, F.: BPG image format. Accessed 30 Oct 2018
  2. 2.
    Ultra video group test sequences. Accessed 30 Oct 2018
  3. 3.
    Video trace library (VTL) dataset. Accessed 30 Oct 2018
  4. 4.
    Webp. Accessed 30 Oct 2018
  5. 5.
    Agustsson, E., et al.: Soft-to-hard vector quantization for end-to-end learning compressible representations. In: NIPS, pp. 1141–1151 (2017)Google Scholar
  6. 6.
    Agustsson, E., Tschannen, M., Mentzer, F., Timofte, R., Gool, L.V.: Generative adversarial networks for extreme learned image compression. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, pp. 221–231. IEEE (2019)Google Scholar
  7. 7.
    Ahmed, N., Natarajan, T., Rao, K.R.: Discrete cosine transform. IEEE Trans. Comput. 100(1), 90–93 (1974)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. In: Proceedings of the 5th International Conference on Learning Representations, ICLR (2017)Google Scholar
  9. 9.
    Ballé, J., Minnen, D., Singh, S., Hwang, S.J., Johnston, N.: Variational image compression with a scale hyperprior. In: Proceedings of the 6th International Conference on Learning Representations, ICLR (2018)Google Scholar
  10. 10.
    Bjontegaard, G.: Calculation of average PSNR differences between RD-curves. VCEG-M33 (2001)Google Scholar
  11. 11.
    Campos, J., Meierhans, S., Djelouah, A., Schroers, C.: Content adaptive optimization for neural image compression. In: IEEE CVPR Workshops 2019 (2019)Google Scholar
  12. 12.
    Chen, Z., He, T., Jin, X., Wu, F.: Learning for video compression. IEEE Trans. Circuits Syst. Video Techn. 30(2), 566–576 (2020). Scholar
  13. 13.
    Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Learning image and video compression through spatial-temporal energy compaction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 10071–10080 (2019)Google Scholar
  14. 14.
    Choi, Y., El-Khamy, M., Lee, J.: Variable rate deep image compression with a conditional autoencoder. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, pp. 3146–3154. IEEE (2019)Google Scholar
  15. 15.
    Djelouah, A., Campos, J., Schaub-Meyer, S., Schroers, C.: Neural inter-frame compression for video coding. In: The IEEE International Conference on Computer Vision (ICCV) (2019)Google Scholar
  16. 16.
    Habibian, A., van Rozendaal, T., Tomczak, J.M., Cohen, T.: Video compression with rate-distortion autoencoders. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, pp. 7032–7041. IEEE (2019)Google Scholar
  17. 17.
    Hu, Z., Chen, Z., Xu, D., Lu, G., Ouyang, W., Gu, S.: Improving deep video compression by resolution-adaptive flow coding. In: ECCV (2020)Google Scholar
  18. 18.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  19. 19.
    Li, M., Zuo, W., Gu, S., Zhao, D., Zhang, D.: Learning convolutional networks for content-weighted image compression. In: CVPR (2018)Google Scholar
  20. 20.
    Lu, G., Ouyang, W., Xu, D., Zhang, X., Cai, C., Gao, Z.: DVC: an end-to-end deep video compression framework. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 11006–11015 (2019)Google Scholar
  21. 21.
    Lu, G., Ouyang, W., Xu, D., Zhang, X., Gao, Z., Sun, M.T.: Deep kalman filtering network for video compression artifact reduction. In: ECCV (2018)Google Scholar
  22. 22.
    Lu, G., Zhang, X., Ouyang, W., Chen, L., Gao, Z., Xu, D.: An end-to-end learning framework for video compression. IEEE Trans. Pattern Anal. Mach. Intell. PP, 1 (2020)Google Scholar
  23. 23.
    Mentzer, F., Agustsson, E., Tschannen, M., Timofte, R., Van Gool, L.: Conditional probability models for deep image compression. In: CVPR, p. 3, no. 2 (2018)Google Scholar
  24. 24.
    Minnen, D., Ballé, J., Toderici, G.D.: Joint autoregressive and hierarchical priors for learned image compression. In: Advances in Neural Information Processing Systems, pp. 10771–10780 (2018)Google Scholar
  25. 25.
    Rippel, O., Bourdev, L.: Real-time adaptive image compression. In: ICML (2017)Google Scholar
  26. 26.
    Rippel, O., Nair, S., Lew, C., Branson, S., Anderson, A.G., Bourdev, L.D.: Learned video compression. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, pp. 3453–3462. IEEE (2019)Google Scholar
  27. 27.
    Schwarz, H., Marpe, D., Wiegand, T.: Overview of the scalable video coding extension of the H.264/AVC standard. IEEE Trans. Circuits Syst. Video Technol. 17(9), 1103–1120 (2007)CrossRefGoogle Scholar
  28. 28.
    Shensa, M.J.: The discrete wavelet transform: wedding the a trous and Mallat algorithms. IEEE Trans. Signal Process. 40(10), 2464–2482 (1992)CrossRefGoogle Scholar
  29. 29.
    Skodras, A., Christopoulos, C., Ebrahimi, T.: The JPEG 2000 still image compression standard. IEEE Signal Process. Mag. 18(5), 36–58 (2001)CrossRefGoogle Scholar
  30. 30.
    Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T., et al.: Overview of the high efficiency video coding (HEVC) standard. TCSVT 22(12), 1649–1668 (2012)Google Scholar
  31. 31.
    Theis, L., Shi, W., Cunningham, A., Huszár, F.: Lossy image compression with compressive autoencoders. In: Proceedings of the 5th International Conference on Learning Representations, ICLR (2017)Google Scholar
  32. 32.
    Toderici, G., et al.: Variable rate image compression with recurrent neural networks. In: Proceedings of the 4th International Conference on Learning Representations, ICLR (2016)Google Scholar
  33. 33.
    Toderici, G., et al.: Full resolution image compression with recurrent neural networks. In: CVPR, pp. 5435–5443 (2017)Google Scholar
  34. 34.
    Tsai, Y.H., Liu, M.Y., Sun, D., Yang, M.H., Kautz, J.: Learning binary residual representations for domain-specific video streaming. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)Google Scholar
  35. 35.
    Wallace, G.K.: The JPEG still picture compression standard. IEEE Trans. Consum. Electron. 38(1), xviii–xxxiv (1992)Google Scholar
  36. 36.
    Wang, H., et al.: MCL-JCV: a JND-based H.264/AVC video quality assessment dataset. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 1509–1513. IEEE (2016)Google Scholar
  37. 37.
    Wang, X., Chan, K.C., Yu, K., Dong, C., Change Loy, C.: EDVR: video restoration with enhanced deformable convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2019)Google Scholar
  38. 38.
    Wang, Z., Simoncelli, E., Bovik, A., et al.: Multi-scale structural similarity for image quality assessment. In: ASILOMAR Conference on Signals systems and Computers, vol. 2, pp. 1398–1402. IEEE (2003). 1998Google Scholar
  39. 39.
    Wiegand, T., Sullivan, G.J., Bjontegaard, G., Luthra, A.: Overview of the H.264/AVC video coding standard. TCSVT 13(7), 560–576 (2003)Google Scholar
  40. 40.
    Wu, C.Y., Singhal, N., Krahenbuhl, P.: Video compression through image interpolation. In: ECCV (2018)Google Scholar
  41. 41.
    Xue, T., Chen, B., Wu, J., Wei, D., Freeman, W.T.: Video enhancement with task-oriented flow. Int. J. Comput. Vision 127(8), 1106–1125 (2019)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Beijing Institute of TechnologyBeijingChina
  2. 2.School of Electronic Information and Electrical EngineeringShanghai Jiao Tong UniversityShanghaiChina
  3. 3.School of Electrical and Information EngineeringThe University of SydneySydneyAustralia

Personalised recommendations