Abstract
Recently, learned video compression has drawn lots of attention and show a rapid development trend with promising results. However, the previous works still suffer from some critical issues and have a performance gap with traditional compression standards in terms of widely used PSNR metric. In this paper, we propose several techniques to effectively improve the performance. First, to address the problem of accumulative error, we introduce a conditional-I-frame as the first frame in the GoP, which stabilizes the reconstructed quality and saves the bit-rate. Second, to efficiently improve the accuracy of inter prediction without increasing the complexity of decoder, we propose a pixel-to-feature motion prediction method at encoder side that helps us to obtain high-quality motion information. Third, we propose a probability-based entropy skipping method, which not only brings performance gain, but also greatly reduces the runtime of entropy coding. With these powerful techniques, this paper proposes AlphaVC, a high-performance and efficient learned video compression scheme. To the best of our knowledge, AlphaVC is the first E2E AI codec that exceeds the latest compression standard VVC on all common test datasets for both PSNR (−28.2% BD-rate saving) and MSSSIM (−52.2% BD-rate saving), and has very fast encoding (0.001x VVC) and decoding (1.69x VVC) speeds.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agustsson, E., Minnen, D., Johnston, N., Balle, J., Hwang, S.J., Toderici, G.: Scale-space flow for end-to-end optimized video compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8503–8512 (2020)
Ahmed, N., Natarajan, T., Rao, K.R.: Discrete cosine transform. IEEE Trans. Comput. 100(1), 90–93 (1974)
Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimization of nonlinear transform codes for perceptual quality. In: 2016 Picture Coding Symposium (PCS), pp. 1–5. IEEE (2016)
Ballé, J., Minnen, D., Singh, S., Hwang, S.J., Johnston, N.: Variational image compression with a scale hyperprior. arXiv preprint arXiv:1802.01436 (2018)
Bellard, F.: BPG image format (2014). www.bellard.org/bpg/. Accessed 05 Aug 2016
Bossen, F.: Common test conditions and software reference configurations, document jctvc-l1100. JCT-VC, San Jose, CA (2012)
Bross, B., Chen, J., Ohm, J.R., Sullivan, G.J., Wang, Y.K.: Developments in international video coding standardization after AVC, with an overview of versatile video coding (VVC). Proc. IEEE 109(9), 1463–1493 (2021)
Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Learned image compression with discretized gaussian mixture likelihoods and attention modules. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7939–7948 (2020)
Christopoulos, C., Skodras, A., Ebrahimi, T.: The JPEG2000 still image coding system: an overview. IEEE Trans. Consum. Electron. 46(4), 1103–1127 (2000)
Cisco: Cisco annual internet report (2018–2023) white paper. www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.html (2020)
Cui, Z., Wang, J., Gao, S., Guo, T., Feng, Y., Bai, B.: Asymmetric gained deep image compression with continuous rate adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10532–10541 (2021)
Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)
Duda, J.: Asymmetric numeral systems. arXiv preprint arXiv:0902.0271 (2009)
Feng, R., Guo, Z., Zhang, Z., Chen, Z.: Versatile learned video compression. arXiv preprint arXiv:2111.03386 (2021)
Guo, T., Wang, J., Cui, Z., Feng, Y., Ge, Y., Bai, B.: Variable rate image compression with content adaptive optimization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 122–123 (2020)
Guo, Z., Feng, R., Zhang, Z., Jin, X., Chen, Z.: Learning cross-scale prediction for efficient neural video compression. arXiv preprint arXiv:2112.13309 (2021)
Habibian, A., Rozendaal, T.v., Tomczak, J.M., Cohen, T.S.: Video compression with rate-distortion autoencoders. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7033–7042 (2019)
Howard, P.G., Vitter, J.S.: Arithmetic coding for data compression. Proc. IEEE 82(6), 857–865 (1994)
Hu, Z., Lu, G., Xu, D.: FVC: a new framework towards deep video compression in feature space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1502–1511 (2021)
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114 (2013)
Li, J., Li, B., Lu, Y.: Deep contextual video compression. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Liu, J., et al.: Conditional entropy coding for efficient video compression. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 453–468. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_27
Lu, G., et al.: Content adaptive and error propagation aware deep video compression. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 456–472. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_27
Lu, G., Zhang, X., Ouyang, W., Chen, L., Gao, Z., Xu, D.: An end-to-end learning framework for video compression. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3292–3308 (2020)
Mercat, A., Viitanen, M., Vanne, J.: UVG dataset: 50/120fps 4k sequences for video codec analysis and development. In: Proceedings of the 11th ACM Multimedia Systems Conference, pp. 297–302 (2020)
Minnen, D., Ballé, J., Toderici, G.D.: Joint autoregressive and hierarchical priors for learned image compression. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Pourreza, R., Cohen, T.: Extending neural p-frame codecs for b-frame coding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6680–6689 (2021)
Sheng, X., Li, J., Li, B., Li, L., Liu, D., Lu, Y.: Temporal context mining for learned video compression. arXiv preprint arXiv:2111.13850 (2021)
Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012)
Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8934–8943 (2018)
Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24
Wallace, G.K.: The JPEG still picture compression standard. IEEE Trans. Consum. Electron. 38(1), xviii-xxxiv (1992)
Wang, H., et al.: MCL-JCV: a JND-based H. 264/AVC video quality assessment dataset. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 1509–1513. IEEE (2016)
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2, pp. 1398–1402. IEEE (2003)
Wiegand, T., Sullivan, G.J., Bjontegaard, G., Luthra, A.: Overview of the H. 264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol. 13(7), 560–576 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Shi, Y., Ge, Y., Wang, J., Mao, J. (2022). AlphaVC: High-Performance and Efficient Learned Video Compression. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13679. Springer, Cham. https://doi.org/10.1007/978-3-031-19800-7_36
Download citation
DOI: https://doi.org/10.1007/978-3-031-19800-7_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19799-4
Online ISBN: 978-3-031-19800-7
eBook Packages: Computer ScienceComputer Science (R0)