Abstract
Deep Neural Networks (DNN) have emerged in recent year as a best-of-breed alternative for performing various classification, prediction and identification tasks in images and other fields of study. In the last few years, various research groups are exploring the option to harness them to improve video coding with the primary purpose of reducing video compression rates while retaining same video quality. Evolving neural-networks based video coding research efforts are focused on two different directions: (1) improving existing video codecs by performing better predictions that are incorporated within the same codec framework, and (2) holistic methods of end-to-end image/video compression schemes. While some of the results are promising and the prospects are good, no breakthrough has been reported as of yet. This paper provides an overview of state-of-the-art research work, providing examples of few prominent publications that illustrate and further explain the different highlighted topics in the field of using DNNs for video compression. Our conclusion is that the benefits have not been fully explored yet and additional work is expected to accomplish the next generation, neural networks based codecs.
Similar content being viewed by others
Notes
A heat map image that reflects the movement magnitude and direction of individual pixels between consecutive video frames.
A basic processing unit of HEVC that is the equivalent to block in previous standards (such as H.264)
References
Ball’e J, Laparra V, Simoncelli EP (2017) End-to-end optimized image compression. International Conference on Learning Representations (ICLR)
R. Birman, Y. Segal, A. D. Malka, O. Hadar (2018) Intra prediction with deep learning. SPIE Optics + Photonics conference, San Diego, California
Chaabouni S, Benois-Pineau J, Hadar O, Amar CB (2016) Deep learning for saliency prediction in natural video. arXiv preprint arXiv:1604.08010
Chen T, Liu H, Shen Q, Yue T, Cao X, Ma Z (2017) DeepCoder: A deep neural network based video compression. IEEE Visual Communications and Image Processing (VCIP), pp. 1–4
Chen Z, He T, Jin X, Wu F (2019) Learning for video compression. IEEE Transactions on Circuits and Systems for Video Technology
Cui W, Zhang T, Zhang S, Jiang F, Zuo W, Zhao D (2018) Convolutional neural networks based intra prediction for HEVC. arXiv preprint arXiv:1808.05734
Goodfellow JI, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial networks. Advances in neural information processing systems (pp. 2672-2680)
Hadar O, Shleifer A, Mukherjee D, Joshi U, Mazar I, Yuzvinsky M, Tavor N, Itzhak N, Birman R (2017) Novel Modes and Adaptive Block Scanning Order for Intra Prediction in AV1. SPIE Optics + Photonics conference, San Diego, California (USA)
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Hu Y, Yang W, Li M, Liu J (2018) Progressive spatial recurrent neural network for intra prediction. arXiv preprint arXiv:1807.02232
Huo S, Liu D, Wu F, Li H (2018) Convolutional neural network-based motion compensation refinement for video coding. IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–4
Ibrahim EM, Badry E, Abdelsalam AM, Abdalla IL, Sayed M, Shalaby H (2018) Neural networks based fractional pixel motion estimation for HEVC. IEEE International Symposium on Multimedia (ISM), pp. 110–113
Jiang F, Tao W, Liu S, Ren J, Guo X, Zhao D (2017) An end-to-end compression framework based on convolutional neural networks. IEEE Transact Circ Syst Vid Technol 28(10):3007–3018
Johnston N, Vincent D, Minnen D, Covell M, Singh S, Chinen T, Hwang SJ, Shor J, Toderici G (2017) Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks. arXiv preprint arXiv: 1703.10114
Kin CYS, Coker B (2017) Video compression using recurrent convolutional neural networks. cs231n/Stanford
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, pp. 1097–1105
Lainema J, Ugur K (2011) Angular intra prediction in high efficiency video coding (HEVC). Multimedia Signal Processing (MMSP), IEEE 13th International Workshop
Lainema J, Bossen F, Han WJ, Min J, Ugur K (2012) Intra coding of the HEVC standard. IEEE Transact Circ Syst Vid Technol 22(12):1792–1801
Larsen ABL, Sønderby SK, Larochelle H, Winther O (2015) Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv: 1512.09300
Laude T, Ostermann J (2016) Deep learning-based intra prediction mode decision for HEVC. Picture Coding Symposium (PCS), IEEE
Lee JK, Kim N, Cho S, Kang JW (2018) Convolution neural network based video coding technique using reference video synthesis. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, pp. 505–508
Li Honggui, M. Trocan (2018) Deep neural network based single pixel prediction for unified video coding. Neurocomputing 272, pp. 558–570
Li J, Li B, Xu J, Xiong R (2017) Intra prediction using fully connected network for video coding. IEEE International Conference on Image Processing (ICIP), pp. 1–5
Li Y, Liu D, Li H, Li L, Wu F, Zhang H, Yang H (2017) Convolutional neural network-based block up-sampling for intra frame coding. IEEE Transactions on Circuits and Systems Video Technology 28, no. 9. pp. 2316–2330
Li Y, Li B, Liu D, Chen Z (2017) A convolutional neural network-based approach to rate control in HEVC intra coding. 2017 IEEE Visual Communications Image Processing (VCIP), pp. 1–4
Lin J, Liu D, Li H, Wu F (2018) Generative adversarial network-based frame extrapolation for video coding. VCIP:pp. 1–4.
Liu J, Xia S, Yang W, Li M, Liu D (2019) One-for-all: grouped variation network based fractional interpolation in video coding. IEEE Transact Image Process 28(5):2140–2151
Liu D, Li Y, Lin J, Li H, Wu F (2019) Deep learning-based video coding: a review and a case study. arXiv preprint arXiv:1904.12462
Lu G, Ouyang W, Xu D, Zhang X, Cai C, Gao Z (2018) DVC: an end-to-end deep video compression framework. arXiv preprint arXiv:1812.00101
Ma S, Zhang X, Jia C, Zhao Z, Wang S, Wang S (2019) Image and video compression with neural networks: A review. IEEE Transact Circ Syst Vid Technol. https://doi.org/10.1109/TCSVT.2019.2910119
Mathieu M, Couprie C, LeCun Y (2015) Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440
Mukherjee D, Su H, Bankoski J, Converse A, Han J, Liu Z, Xu Y (2015) An overview of new video coding tools under consideration for VP10: the successor to VP9. In Applications of Digital Image Processing XXXVIII, vol. 9599, p. 95991E. International Society for Optics and Photonics.
Oord AVD, Kalchbrenner N, Kavukcuoglu K (2017) Pixel recurrent neural networks. International Conference on Machine Learning (ICML)
Santurkar S, Budden D, Shavit N (2018) Generative compression. Picture coding symposium (PCS). IEEE, pp. 258–262
Schiopu I, Liu Y, Munteanu A (2018) CNN-based Prediction for Lossless Coding of Photographic Images”. in IEEE Picture Coding Symposium (PCS) (pp. 16–20)
Selimović A, Meden B, Peer P, Hladnik A (2018) Analysis of Content-Aware Image Compression with VGG16. 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI). IEEE, pp. 1–7
Shen M, Xue P, Wang C (2011) Down-sampling based video coding using super-resolution technique. IEEE Transact Circ Syst Vid Technol 21(6):755–765
Srivastava N, Mansimov E, Salakhudinov R (2015) Unsupervised learning of video representations using lstms. International conference on machine learning, pp. 843–852
Su H, Wen M, Wu N, Ren J, Zhang C (2014) Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation. Sci World J 2014, Article ID 716020:19. https://doi.org/10.1155/2014/716020
Sullivan GJ, Ohm J, Han W-J, Wiegand T (2012) Overview of the high efficiency video coding (hevc) standard. IEEE Transact Circ Syst Vid Technol 22(12):1649–1668
Takahashi K, Naemura T, Tanaka M (2011) Rate-distortion analysis of super-resolution image/video decoding. IEEE International Conference on Image Processing. IEEE, pp. 1629–1632
Toderici G, Vincent D, Johnston N, Jin Hwang S, Minnen D, Shor J, Covell M (2017) Full resolution image compression with recurrent neural networks. IEEE Conference on Computer Vision Pattern Recognition, pp. 5306–5314
Wang Y, Fan X, Jia C, Zhao D, Gao W (2018) Neural network based inter prediction for HEVC. IEEE International Conference on Multimedia and Expo (ICME) (pp. 1–6)
Wu CY, Singhal N, Krahenbuhl P (2018) Video compression through image interpolation. Proceedings of the European Conference on Computer Vision (ECCV), pp. 416–431
Xingjian SHI, Chen Z, Wang H, Yeung DY, Wong WK, Woo WC (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. Advances in neural information processing systems, pp. 802–810
Yan N, Liu D, Li B, Li H, Xu T, Wu F (2018) Convolutional neural network-based invertible half-pixel interpolation filter for video coding. IEEE International Conference on Image Processing (ICIP), pp. 201–205
Zhang H, Song L, Luo Z, Yang X (2017) Learning a convolutional neural network for fractional interpolation in HEVC inter coding. IEEE Visual Communications and Image Processing (VCIP), pp. 1–4
Zhao Z, Wang S, Zhang X, Ma S, Yang J (2018) CNN-based bi-directional motion compensation for high efficiency video coding, IEEE International Symposium on Circuits and Systems (ISCAS),pp. 1–4
Zhu S, Liu C, Xu Z (2019) High-definition video compression system based on perception guidance of salient information of a convolutional neural network and HEVC compression domain. IEEE Transactions on Circuits and Systems for Video Technology
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Birman, R., Segal, Y. & Hadar, O. Overview of Research in the field of Video Compression using Deep Neural Networks. Multimed Tools Appl 79, 11699–11722 (2020). https://doi.org/10.1007/s11042-019-08572-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-08572-3