Advertisement

Deep Neural Network Based Frame Reconstruction for Optimized Video Coding

  • Dandan DingEmail author
  • Peng Liu
  • Yu Chen
  • Zheng Zhu
  • Zoe Liu
  • James Bankoski
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10970)

Abstract

Video coding has served as a key enabling technology to the explosion in online video sharing and consumption. This includes live video streaming, online video sharing, video conferencing, video surveillance, remote medicine, online education, online gaming, video broadcasting, cloud video services, and many others. The recently released open source royalty-free video coding standard known as AV1, designed and developed by the Alliance of Open Media (AOM), achieves a 30%–40% data rate reduction from previous generational video coding standards, which includes VP9 and HEVC. This paper aims to outline paradigms that may provide further coding performance gains over AV1. Image restoration has demonstrated significant effectiveness in video coding performance enhancement in AV1. This paper describes techniques in the same vein effectively optimizing frame reconstruction through the use of the Deep Neural Networks (DNN) to further improve coding performance. Initial explorations of our proposed approach have demonstrated promising results.

Keywords

Deep learning Neural networks Video coding AOM/AV1 Frame reconstruction 

References

  1. 1.
    Mukherjee, D., Bankoski, J., Grange, A., Han, J., Koleszar, J., Wilkins, P., Xu, Y., Bultje, R.S.: The latest open-source video codec VP9 - an overview and preliminary results. In: Picture Coding Symposium (PCS), December 2013Google Scholar
  2. 2.
    Sullivan, G.J., Ohm, J., Han, W., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circ. Syst. Video Technol 22(12), 1649–1668 (2012)CrossRefGoogle Scholar
  3. 3.
    Wiegand, T., Sullivan, G.J., Bjontegaard, G., Luthra, A.: Overview of the H.264/AVC video coding standard. IEEE Trans. Circ. Syst. Video Technol. 13(7), 560–576 (2003)CrossRefGoogle Scholar
  4. 4.
    Alliance for Open Media. http://aomedia.org
  5. 5.
    Mukherjee, D., Li, S., Chen, Y., Anis, S., Parker, S., Bankoski, J.: A switchable loop-restoration with side-information framework for the emerging AV1 video codec. In: Proceedings of the IEEE International Conference on Image Processing, 17–20 September 2017, Beijing, China (2017)Google Scholar
  6. 6.
    Fu, C., Chen, D., Liu, Z., Zhu, F., Delp, E.J.: Texture segmentation based video compression using convolutional neural networks. In: Proceedings of the IS&T Electronic Imaging on Visual Information Processing and Communication Conference, San Jose, California, United States, February 2018Google Scholar
  7. 7.
    Chen, Y., Murherjee, D., Han, J., Grange, A., Xu, Y., Liu, Z., Parker, S., Chen, C., Su, H., Joshi, U., Chiang, C.-H., Wang, Y., Wilkins, P., Bankoski, J., Trudeau, L., Egge, N., Valin, J.-M., Davies, T., Midtskogen, S, Norkin, A., de Rivaz, P.: An overview of core coding tools in the AV1 video codec. In: Picture Coding Symposium (PCS), 24–27 June 2018, San Francisco, California, United States (2018, submitted)Google Scholar
  8. 8.
    Chen, D., Fu, C., Zhu, F., Liu, Z.: AV1 video coding using texture analysis with convolutional neural networks. In: Picture Coding Symposium (PCS), 24–27 June 2018, San Francisco, California, United States (2018, submitted)Google Scholar
  9. 9.
    Finn, C., Goodfellow, I., Levine, S.: Unsupervised learning for physical interaction through video prediction. In: 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain (2016)Google Scholar
  10. 10.
    Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. In: International Conference on Learning Representations (ICLR) (2016)Google Scholar
  11. 11.
    Oh, J., Guo, X., Lee, H., Lewis, R.L., Singh, S.: Action-conditional video prediction using deep networks in atari games. In: Neural Information Processing Systems (NIPS) (2015)Google Scholar
  12. 12.
    Walker, J., Doersch, C., Gupta, A., Hebert, M.: An uncertain future: forecasting from static images using variational autoencoders. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 835–851. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46478-7_51CrossRefGoogle Scholar
  13. 13.
    Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_43CrossRefGoogle Scholar
  14. 14.
    Dong, C., Deng, Y., Loy, C.C., Tang, X.: Compression artifacts reduction by a deep convolutional network. In: 2015 IEEE International Conference on Computer Vision (ICCV 2015), 7–13 December 2015, Santiago, Chile, pp. 576–584 (2015)Google Scholar
  15. 15.
    Wang, Z., Liu, D., Chang, S., Ling, Q., Yang, Y., Huang, T.S.: Deep dual-domain based fast restoration of jpeg-compressed images. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), 27–30 June 2016, Las Vegas, USA, pp. 2764–2772 (2016)Google Scholar
  16. 16.
    Guo, J., Chao, H.: Building dual-domain representations for compression artifacts reduction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 628–644. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_38CrossRefGoogle Scholar
  17. 17.
    Park, W.-S., Kim, M.: CNN-based in-loop filtering for coding efficiency improvement. In: IEEE 12th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP 2016), 11–12 July 2016, Bordeaux, France, pp. 1–5 (2016)Google Scholar
  18. 18.
    Dai, Y., Liu, D., Wu, D.: A convolutional neural network approach for post-processing in HEVC intra coding. In: The 24th International Conference on MultiMedia Modeling (MMM 2017), 4–6 January, Reykjavik, Iceland, pp. 28–39 (2017)Google Scholar
  19. 19.
    Li, C., Song, L., Xie, R., Zhang, W.: CNN based post-processing to improve HEVC. In: 2017 IEEE International Conference on Image Processing (ICIP 2017), Beijing, China, 17–20 September 2017 (2017)Google Scholar
  20. 20.
    Kang, J., Kim, S., Lee, K.M.: Multi-modal/multi-scale convolutional neural network based in-loop filter design for next generation video codec. In: 2017 IEEE International Conference on Image Processing (ICIP 2017), Beijing, China, 17–20 September 2017 (2017)Google Scholar
  21. 21.
    Greaves, A., Winter, H.: Multi-frame video super-resolution using convolutional neural networks (2018)Google Scholar
  22. 22.
    Mnih, V., et al.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, pp. 2204–2212 (2014)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Dandan Ding
    • 1
    Email author
  • Peng Liu
    • 1
  • Yu Chen
    • 1
  • Zheng Zhu
    • 2
  • Zoe Liu
    • 3
  • James Bankoski
    • 3
  1. 1.Hangzhou Normal UniversityHangzhouChina
  2. 2.Visionular Inc.HangzhouChina
  3. 3.Google Inc.Mountain ViewUSA

Personalised recommendations