Can learned frame prediction compete with block motion compensation for video coding?

Sulun, Serkan; Tekalp, A. Murat

doi:10.1007/s11760-020-01751-y

Can learned frame prediction compete with block motion compensation for video coding?

Original Paper
Published: 18 August 2020

Volume 15, pages 401–410, (2021)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

474 Accesses
4 Citations
4 Altmetric
Explore all metrics

Abstract

Given recent advances in learned video prediction, we investigate whether a simple video codec using a pretrained deep model for next frame prediction based on previously encoded/decoded frames without sending any motion side information can compete with standard video codecs based on block motion compensation. Frame differences given learned frame predictions are encoded by a standard still-image (intra) codec. Experimental results show that the rate distortion performance of the simple codec with symmetric complexity is on average better than that of x264 codec on 10 MPEG test videos, but does not yet reach the level of x265 codec. This result demonstrates the power of learned frame prediction (LFP), since unlike motion compensation, LFP does not use information from the current picture. The implications of training with \(\ell ^1\), \(\ell ^2\) or combined \(\ell ^2\) and adversarial loss on prediction performance and compression efficiency are analyzed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AlphaVC: High-Performance and Efficient Learned Video Compression

Content Adaptive and Error Propagation Aware Deep Video Compression

Conditional Entropy Coding for Efficient Video Compression

References

A new image format for the web. https://developers.google.com/speed/webp
x264: A high performance h.264/avc encoder. https://www.videolan.org/developers/x264.html (2006)
Babaeizadeh, M., Finn, C., Erhan, D., Campbell, R.H., Levine, S.: Stochastic variational video prediction. In: International Conference on Learning Representations (ICLR), Vancouver, Canada (2018)
Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling (2018). arXiv:1803.01271.pdf
Bellard, F.: Better portable graphics. https://www.bellard.org/bpg. Last accessed: April 2020
Bellard, F.: Ffmpeg multimedia system. https://www.ffmpeg.org/ [Last accessed: Apr. 2020]
Bjontegaard, G.: Calculation of average PSNR differences between rd-curves. VCEG-M33 (2001)
Chen, Z., He, T., Jin, X., Wu, F.: Learning for video compression. IEEE Trans. Circuits Syst. Video Technol. 30(2), 566–576 (2020)
Article Google Scholar
Chintala, S., Denton, E., Arjovsky, M., Mathieu, M.: How to train a GAN? Tips and tricks to make GANs work. https://github.com/soumith/ganhacks (2016)
Choi, H., Bajić, I.V.: Deep Frame Prediction for Video Coding. IEEE Trans. Circuits Syst. Video Technol. 30(7), 1843–1855 (2020)
Google Scholar
Chollet, F.: Deep Learning with Python. Manning Publications Company, Shelter Island (2017)
Google Scholar
Denton, E., Fergus, R.: Stochastic video generation with a learned prior. In: Proceedings of International Conference on Machine Learning (PMLR), vol. 80, pp. 1174–1183 (2018)
Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: Advances in Neural Information Processing Systems, pp. 658–666 (2016)
Dumas, T., Roumy, A., Guillemot, C.: Autoencoder based image compression: can the learning be quantization independent? In: IEEE ICASSP, Calgary, Canada (2018)
Finn, C., Goodfellow, I., Levine, S.: Unsupervised learning for physical interaction through video prediction. In: Advances in Neural Information Processing Systems, pp. 64–72 (2016)
Huo, S., Liu, D., Wu, F., Li, H.: Convolutional neural network-based motion compensation refinement for video coding. In: IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy (2018)
Kalchbrenner, N., Oord, A.v.d., Simonyan, K., Danihelka, I., Vinyals, O., Graves, A., Kavukcuoglu, K.: Video pixel networks. In: Proceedings of International Conference on Machine Learning (PMLR), vol. 70, pp. 1771–1779 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Representations Learning (ICLR) (2015)
Lee, A.X., Zhang, R., Ebert, F., Abbeel, P., Finn, C., Levine, S.: Stochastic adversarial video prediction (2018). arXiv:1804.01523
Lim, B., Son, S., Kim, H., Nah, S., Lee, K.M.: Enhanced deep residual networks for single image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), vol. 1, p. 4 (2017)
Lin, J., Liu, D., Li, H., Wu, F.: Generative adversarial network-based frame extrapolation for video coding. In: Visual Communications and Image Processing (VCIP) (2018)
Lu, G., Zhang, X., Chen, L., Gao, Z.: Novel integration of frame rate up conversion and HEVC coding based on rate-distortion optimization. IEEE Trans. Image Process. 27(2), 678–691 (2018)
Article MathSciNet Google Scholar
Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. In: Proceedings of International Conference on Learning Representation (ICLR) (2016)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (Poster) (2016)
Schwarz, H., Wiegand, T.: Video coding: part II of fundamentals of source and video coding. Found. Trends Signal Process. 10(1–3), 1–346 (2016)
MathSciNet MATH Google Scholar
Selva Castelló, J.: A comprehensive survey on deep future frame video prediction. Master’s thesis, Universitat Politècnica de Catalunya (2018)
Shi, W., et al.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: IEEE Conference on CVPR, pp. 1874–1883 (2016)
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human actions classes from videos in the wild (2012). arXiv:1212.0402
Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using LSTMs. In: International Conference on Machine Learning, pp. 843–852 (2015)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, vol. 4, p. 12 (2017)
Timofte, R., et al.: NTIRE 2017 challenge on single image super-resolution: methods and results. In: IEEE Conference Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1110–1121 (2017)
Timofte, R., et al.: NTIRE 2018 challenge on single image super-resolution: methods and results. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 965–976 (2018)
Van Amersfoort, J., Kannan, A., Ranzato, M., Szlam, A., Tran, D., Chintala, S.: Transformation-based models of video sequences (2017). arXiv:1701.08435
van den Oord, A., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: Proceedings of International Conference on Machine Learning (ICML), vol. 48, pp. 1747–1756 (2016)
Villegas, R., Pathak, A., Kannan, H., Erhan, D., Le, Q.V., Lee, H.: High fidelity video prediction with large stochastic recurrent neural networks. In: Conference on Neural Information Processing Systems (NIPS) (2019)
Villegas, R., Yang, J., Zou, Y., Sohn, S., Lin, X., Lee, H.: Learning to generate long-term future via hierarchical prediction. In: International Conference on Machine Learning (ICML) (2017)
Vondrick, C., Torralba, A.: Generating the future with adversarial transformers. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, p. 3 (2017)
Wang, X., et al.: ESRGAN: enhanced super-resolution generative adversarial networks. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Wang, Y., Fan, X., Jia, C., Zhao, D., Gao, W.: Neural network based inter prediction for HEVC. In: IEEE International Conference on Multimedia and Expo (2018)
Wichers, N., Villegas, R., Erhan, D., Lee, H.: Hierarchical long-term video prediction without supervision. In: Proceedings of International Conference on Machine Learning (PMLR), Stockholm (2018)
Xia, S., Yang, W., Hu, Y., Liu, J.: Deep inter prediction via pixel-wise motion oriented reference generation. In: IEEE International Conference Image Processing (2019)
Zhao, L., Wang, S., Zhang, X., Wang, S., Ma, S., Gao, W.: Enhanced CTU-level inter prediction with deep frame rate up-conversion for high efficiency video coding. In: IEEE International Conference on Image Processing (2018)

Download references

Author information

Authors and Affiliations

College of Engineering, Koç University, Istanbul, 34450, Turkey
Serkan Sulun & A. Murat Tekalp

Authors

Serkan Sulun
View author publications
You can also search for this author in PubMed Google Scholar
A. Murat Tekalp
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Murat Tekalp.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A. M. Tekalp acknowledges support from the TUBITAK project 217E033 and Turkish Academy of Sciences (TUBA).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sulun, S., Tekalp, A.M. Can learned frame prediction compete with block motion compensation for video coding?. SIViP 15, 401–410 (2021). https://doi.org/10.1007/s11760-020-01751-y

Download citation

Received: 29 December 2019
Revised: 24 June 2020
Accepted: 18 July 2020
Published: 18 August 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11760-020-01751-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Can learned frame prediction compete with block motion compensation for video coding?

Abstract

Access this article

Similar content being viewed by others

AlphaVC: High-Performance and Efficient Learned Video Compression

Content Adaptive and Error Propagation Aware Deep Video Compression

Conditional Entropy Coding for Efficient Video Compression

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Can learned frame prediction compete with block motion compensation for video coding?

Abstract

Access this article

Similar content being viewed by others

AlphaVC: High-Performance and Efficient Learned Video Compression

Content Adaptive and Error Propagation Aware Deep Video Compression

Conditional Entropy Coding for Efficient Video Compression

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation