Deep Neural Network Based Frame Reconstruction for Optimized Video Coding

Ding, Dandan; Liu, Peng; Chen, Yu; Zhu, Zheng; Liu, Zoe; Bankoski, James

doi:10.1007/978-3-319-94361-9_18

Dandan Ding¹⁷,
Peng Liu¹⁷,
Yu Chen¹⁷,
Zheng Zhu¹⁸,
Zoe Liu¹⁹ &
…
James Bankoski¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10970))

Included in the following conference series:

International Conference on AI and Mobile Services

1201 Accesses
3 Altmetric

Abstract

Video coding has served as a key enabling technology to the explosion in online video sharing and consumption. This includes live video streaming, online video sharing, video conferencing, video surveillance, remote medicine, online education, online gaming, video broadcasting, cloud video services, and many others. The recently released open source royalty-free video coding standard known as AV1, designed and developed by the Alliance of Open Media (AOM), achieves a 30%–40% data rate reduction from previous generational video coding standards, which includes VP9 and HEVC. This paper aims to outline paradigms that may provide further coding performance gains over AV1. Image restoration has demonstrated significant effectiveness in video coding performance enhancement in AV1. This paper describes techniques in the same vein effectively optimizing frame reconstruction through the use of the Deep Neural Networks (DNN) to further improve coding performance. Initial explorations of our proposed approach have demonstrated promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Mukherjee, D., Bankoski, J., Grange, A., Han, J., Koleszar, J., Wilkins, P., Xu, Y., Bultje, R.S.: The latest open-source video codec VP9 - an overview and preliminary results. In: Picture Coding Symposium (PCS), December 2013
Google Scholar
Sullivan, G.J., Ohm, J., Han, W., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circ. Syst. Video Technol 22(12), 1649–1668 (2012)
Article Google Scholar
Wiegand, T., Sullivan, G.J., Bjontegaard, G., Luthra, A.: Overview of the H.264/AVC video coding standard. IEEE Trans. Circ. Syst. Video Technol. 13(7), 560–576 (2003)
Article Google Scholar
Alliance for Open Media. http://aomedia.org
Mukherjee, D., Li, S., Chen, Y., Anis, S., Parker, S., Bankoski, J.: A switchable loop-restoration with side-information framework for the emerging AV1 video codec. In: Proceedings of the IEEE International Conference on Image Processing, 17–20 September 2017, Beijing, China (2017)
Google Scholar
Fu, C., Chen, D., Liu, Z., Zhu, F., Delp, E.J.: Texture segmentation based video compression using convolutional neural networks. In: Proceedings of the IS&T Electronic Imaging on Visual Information Processing and Communication Conference, San Jose, California, United States, February 2018
Google Scholar
Chen, Y., Murherjee, D., Han, J., Grange, A., Xu, Y., Liu, Z., Parker, S., Chen, C., Su, H., Joshi, U., Chiang, C.-H., Wang, Y., Wilkins, P., Bankoski, J., Trudeau, L., Egge, N., Valin, J.-M., Davies, T., Midtskogen, S, Norkin, A., de Rivaz, P.: An overview of core coding tools in the AV1 video codec. In: Picture Coding Symposium (PCS), 24–27 June 2018, San Francisco, California, United States (2018, submitted)
Google Scholar
Chen, D., Fu, C., Zhu, F., Liu, Z.: AV1 video coding using texture analysis with convolutional neural networks. In: Picture Coding Symposium (PCS), 24–27 June 2018, San Francisco, California, United States (2018, submitted)
Google Scholar
Finn, C., Goodfellow, I., Levine, S.: Unsupervised learning for physical interaction through video prediction. In: 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain (2016)
Google Scholar
Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. In: International Conference on Learning Representations (ICLR) (2016)
Google Scholar
Oh, J., Guo, X., Lee, H., Lewis, R.L., Singh, S.: Action-conditional video prediction using deep networks in atari games. In: Neural Information Processing Systems (NIPS) (2015)
Google Scholar
Walker, J., Doersch, C., Gupta, A., Hebert, M.: An uncertain future: forecasting from static images using variational autoencoders. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 835–851. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_51
Chapter Google Scholar
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Chapter Google Scholar
Dong, C., Deng, Y., Loy, C.C., Tang, X.: Compression artifacts reduction by a deep convolutional network. In: 2015 IEEE International Conference on Computer Vision (ICCV 2015), 7–13 December 2015, Santiago, Chile, pp. 576–584 (2015)
Google Scholar
Wang, Z., Liu, D., Chang, S., Ling, Q., Yang, Y., Huang, T.S.: Deep dual-domain based fast restoration of jpeg-compressed images. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), 27–30 June 2016, Las Vegas, USA, pp. 2764–2772 (2016)
Google Scholar
Guo, J., Chao, H.: Building dual-domain representations for compression artifacts reduction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 628–644. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_38
Chapter Google Scholar
Park, W.-S., Kim, M.: CNN-based in-loop filtering for coding efficiency improvement. In: IEEE 12th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP 2016), 11–12 July 2016, Bordeaux, France, pp. 1–5 (2016)
Google Scholar
Dai, Y., Liu, D., Wu, D.: A convolutional neural network approach for post-processing in HEVC intra coding. In: The 24th International Conference on MultiMedia Modeling (MMM 2017), 4–6 January, Reykjavik, Iceland, pp. 28–39 (2017)
Google Scholar
Li, C., Song, L., Xie, R., Zhang, W.: CNN based post-processing to improve HEVC. In: 2017 IEEE International Conference on Image Processing (ICIP 2017), Beijing, China, 17–20 September 2017 (2017)
Google Scholar
Kang, J., Kim, S., Lee, K.M.: Multi-modal/multi-scale convolutional neural network based in-loop filter design for next generation video codec. In: 2017 IEEE International Conference on Image Processing (ICIP 2017), Beijing, China, 17–20 September 2017 (2017)
Google Scholar
Greaves, A., Winter, H.: Multi-frame video super-resolution using convolutional neural networks (2018)
Google Scholar
Mnih, V., et al.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, pp. 2204–2212 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Hangzhou Normal University, Hangzhou, 311121, Zhejiang, China
Dandan Ding, Peng Liu & Yu Chen
Visionular Inc., Hangzhou, 310000, Zhejiang, China
Zheng Zhu
Google Inc., Mountain View, CA, 94043, USA
Zoe Liu & James Bankoski

Authors

Dandan Ding
View author publications
You can also search for this author in PubMed Google Scholar
Peng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Zoe Liu
View author publications
You can also search for this author in PubMed Google Scholar
James Bankoski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dandan Ding .

Editor information

Editors and Affiliations

University of Stuttgart, Stuttgart, Germany
Marco Aiello
Tsinghua University, Beijing, China
Yujiu Yang
Peking University, Beijing, China
Yuexian Zou
Kingdee International Software Group Co., Ltd., Shenzhen, China
Liang-Jie Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ding, D., Liu, P., Chen, Y., Zhu, Z., Liu, Z., Bankoski, J. (2018). Deep Neural Network Based Frame Reconstruction for Optimized Video Coding. In: Aiello, M., Yang, Y., Zou, Y., Zhang, LJ. (eds) Artificial Intelligence and Mobile Services – AIMS 2018. AIMS 2018. Lecture Notes in Computer Science(), vol 10970. Springer, Cham. https://doi.org/10.1007/978-3-319-94361-9_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-94361-9_18
Published: 21 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94360-2
Online ISBN: 978-3-319-94361-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics