Skip to main content
Log in

Overview of Research in the field of Video Compression using Deep Neural Networks

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Deep Neural Networks (DNN) have emerged in recent year as a best-of-breed alternative for performing various classification, prediction and identification tasks in images and other fields of study. In the last few years, various research groups are exploring the option to harness them to improve video coding with the primary purpose of reducing video compression rates while retaining same video quality. Evolving neural-networks based video coding research efforts are focused on two different directions: (1) improving existing video codecs by performing better predictions that are incorporated within the same codec framework, and (2) holistic methods of end-to-end image/video compression schemes. While some of the results are promising and the prospects are good, no breakthrough has been reported as of yet. This paper provides an overview of state-of-the-art research work, providing examples of few prominent publications that illustrate and further explain the different highlighted topics in the field of using DNNs for video compression. Our conclusion is that the benefits have not been fully explored yet and additional work is expected to accomplish the next generation, neural networks based codecs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Notes

  1. A heat map image that reflects the movement magnitude and direction of individual pixels between consecutive video frames.

  2. A basic processing unit of HEVC that is the equivalent to block in previous standards (such as H.264)

References

  1. Ball’e J, Laparra V, Simoncelli EP (2017) End-to-end optimized image compression. International Conference on Learning Representations (ICLR)

  2. R. Birman, Y. Segal, A. D. Malka, O. Hadar (2018) Intra prediction with deep learning. SPIE Optics + Photonics conference, San Diego, California

  3. Chaabouni S, Benois-Pineau J, Hadar O, Amar CB (2016) Deep learning for saliency prediction in natural video. arXiv preprint arXiv:1604.08010

  4. Chen T, Liu H, Shen Q, Yue T, Cao X, Ma Z (2017) DeepCoder: A deep neural network based video compression. IEEE Visual Communications and Image Processing (VCIP), pp. 1–4

  5. Chen Z, He T, Jin X, Wu F (2019) Learning for video compression. IEEE Transactions on Circuits and Systems for Video Technology

  6. Cui W, Zhang T, Zhang S, Jiang F, Zuo W, Zhao D (2018) Convolutional neural networks based intra prediction for HEVC. arXiv preprint arXiv:1808.05734

  7. Goodfellow JI, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial networks. Advances in neural information processing systems (pp. 2672-2680)

  8. Hadar O, Shleifer A, Mukherjee D, Joshi U, Mazar I, Yuzvinsky M, Tavor N, Itzhak N, Birman R (2017) Novel Modes and Adaptive Block Scanning Order for Intra Prediction in AV1. SPIE Optics + Photonics conference, San Diego, California (USA)

  9. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  MathSciNet  Google Scholar 

  10. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  11. Hu Y, Yang W, Li M, Liu J (2018) Progressive spatial recurrent neural network for intra prediction. arXiv preprint arXiv:1807.02232

  12. Huo S, Liu D, Wu F, Li H (2018) Convolutional neural network-based motion compensation refinement for video coding. IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–4

  13. Ibrahim EM, Badry E, Abdelsalam AM, Abdalla IL, Sayed M, Shalaby H (2018) Neural networks based fractional pixel motion estimation for HEVC. IEEE International Symposium on Multimedia (ISM), pp. 110–113

  14. Jiang F, Tao W, Liu S, Ren J, Guo X, Zhao D (2017) An end-to-end compression framework based on convolutional neural networks. IEEE Transact Circ Syst Vid Technol 28(10):3007–3018

    Article  Google Scholar 

  15. Johnston N, Vincent D, Minnen D, Covell M, Singh S, Chinen T, Hwang SJ, Shor J, Toderici G (2017) Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks. arXiv preprint arXiv: 1703.10114

  16. Kin CYS, Coker B (2017) Video compression using recurrent convolutional neural networks. cs231n/Stanford

  17. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, pp. 1097–1105

  18. Lainema J, Ugur K (2011) Angular intra prediction in high efficiency video coding (HEVC). Multimedia Signal Processing (MMSP), IEEE 13th International Workshop

  19. Lainema J, Bossen F, Han WJ, Min J, Ugur K (2012) Intra coding of the HEVC standard. IEEE Transact Circ Syst Vid Technol 22(12):1792–1801

    Article  Google Scholar 

  20. Larsen ABL, Sønderby SK, Larochelle H, Winther O (2015) Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv: 1512.09300

  21. Laude T, Ostermann J (2016) Deep learning-based intra prediction mode decision for HEVC. Picture Coding Symposium (PCS), IEEE

  22. Lee JK, Kim N, Cho S, Kang JW (2018) Convolution neural network based video coding technique using reference video synthesis. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, pp. 505–508

  23. Li Honggui, M. Trocan (2018) Deep neural network based single pixel prediction for unified video coding. Neurocomputing 272, pp. 558–570

  24. Li J, Li B, Xu J, Xiong R (2017) Intra prediction using fully connected network for video coding. IEEE International Conference on Image Processing (ICIP), pp. 1–5

  25. Li Y, Liu D, Li H, Li L, Wu F, Zhang H, Yang H (2017) Convolutional neural network-based block up-sampling for intra frame coding. IEEE Transactions on Circuits and Systems Video Technology 28, no. 9. pp. 2316–2330

  26. Li Y, Li B, Liu D, Chen Z (2017) A convolutional neural network-based approach to rate control in HEVC intra coding. 2017 IEEE Visual Communications Image Processing (VCIP), pp. 1–4

  27. Lin J, Liu D, Li H, Wu F (2018) Generative adversarial network-based frame extrapolation for video coding. VCIP:pp. 1–4.

  28. Liu J, Xia S, Yang W, Li M, Liu D (2019) One-for-all: grouped variation network based fractional interpolation in video coding. IEEE Transact Image Process 28(5):2140–2151

    Article  MathSciNet  Google Scholar 

  29. Liu D, Li Y, Lin J, Li H, Wu F (2019) Deep learning-based video coding: a review and a case study. arXiv preprint arXiv:1904.12462

  30. Lu G, Ouyang W, Xu D, Zhang X, Cai C, Gao Z (2018) DVC: an end-to-end deep video compression framework. arXiv preprint arXiv:1812.00101

  31. Ma S, Zhang X, Jia C, Zhao Z, Wang S, Wang S (2019) Image and video compression with neural networks: A review. IEEE Transact Circ Syst Vid Technol. https://doi.org/10.1109/TCSVT.2019.2910119

  32. Mathieu M, Couprie C, LeCun Y (2015) Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440

  33. Mukherjee D, Su H, Bankoski J, Converse A, Han J, Liu Z, Xu Y (2015) An overview of new video coding tools under consideration for VP10: the successor to VP9. In Applications of Digital Image Processing XXXVIII, vol. 9599, p. 95991E. International Society for Optics and Photonics.

  34. Oord AVD, Kalchbrenner N, Kavukcuoglu K (2017) Pixel recurrent neural networks. International Conference on Machine Learning (ICML)

  35. Santurkar S, Budden D, Shavit N (2018) Generative compression. Picture coding symposium (PCS). IEEE, pp. 258–262

  36. Schiopu I, Liu Y, Munteanu A (2018) CNN-based Prediction for Lossless Coding of Photographic Images”. in IEEE Picture Coding Symposium (PCS) (pp. 16–20)

  37. Selimović A, Meden B, Peer P, Hladnik A (2018) Analysis of Content-Aware Image Compression with VGG16. 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI). IEEE, pp. 1–7

  38. Shen M, Xue P, Wang C (2011) Down-sampling based video coding using super-resolution technique. IEEE Transact Circ Syst Vid Technol 21(6):755–765

    Article  Google Scholar 

  39. Srivastava N, Mansimov E, Salakhudinov R (2015) Unsupervised learning of video representations using lstms. International conference on machine learning, pp. 843–852

  40. Su H, Wen M, Wu N, Ren J, Zhang C (2014) Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation. Sci World J 2014, Article ID 716020:19. https://doi.org/10.1155/2014/716020

    Article  Google Scholar 

  41. Sullivan GJ, Ohm J, Han W-J, Wiegand T (2012) Overview of the high efficiency video coding (hevc) standard. IEEE Transact Circ Syst Vid Technol 22(12):1649–1668

    Article  Google Scholar 

  42. Takahashi K, Naemura T, Tanaka M (2011) Rate-distortion analysis of super-resolution image/video decoding. IEEE International Conference on Image Processing. IEEE, pp. 1629–1632

  43. Toderici G, Vincent D, Johnston N, Jin Hwang S, Minnen D, Shor J, Covell M (2017) Full resolution image compression with recurrent neural networks. IEEE Conference on Computer Vision Pattern Recognition, pp. 5306–5314

  44. Wang Y, Fan X, Jia C, Zhao D, Gao W (2018) Neural network based inter prediction for HEVC. IEEE International Conference on Multimedia and Expo (ICME) (pp. 1–6)

  45. Wu CY, Singhal N, Krahenbuhl P (2018) Video compression through image interpolation. Proceedings of the European Conference on Computer Vision (ECCV), pp. 416–431

  46. Xingjian SHI, Chen Z, Wang H, Yeung DY, Wong WK, Woo WC (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. Advances in neural information processing systems, pp. 802–810

  47. Yan N, Liu D, Li B, Li H, Xu T, Wu F (2018) Convolutional neural network-based invertible half-pixel interpolation filter for video coding. IEEE International Conference on Image Processing (ICIP), pp. 201–205

  48. Zhang H, Song L, Luo Z, Yang X (2017) Learning a convolutional neural network for fractional interpolation in HEVC inter coding. IEEE Visual Communications and Image Processing (VCIP), pp. 1–4

  49. Zhao Z, Wang S, Zhang X, Ma S, Yang J (2018) CNN-based bi-directional motion compensation for high efficiency video coding, IEEE International Symposium on Circuits and Systems (ISCAS),pp. 1–4

  50. Zhu S, Liu C, Xu Z (2019) High-definition video compression system based on perception guidance of salient information of a convolutional neural network and HEVC compression domain. IEEE Transactions on Circuits and Systems for Video Technology

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raz Birman.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Birman, R., Segal, Y. & Hadar, O. Overview of Research in the field of Video Compression using Deep Neural Networks. Multimed Tools Appl 79, 11699–11722 (2020). https://doi.org/10.1007/s11042-019-08572-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-08572-3

Keywords

Navigation