Conditional Entropy Coding for Efficient Video Compression

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12362)


We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames. Unlike prior learning-based approaches, we reduce complexity by not performing any form of explicit transformations between frames and assume each frame is encoded with an independent state-of-the-art deep image compressor. We first show that a simple architecture modeling the entropy between the image latent codes is as competitive as other neural video compression works and video codecs while being much faster and easier to implement. We then propose a novel internal learning extension on top of this architecture that brings an additional \(\sim \)10% bitrate savings without trading off decoding speed. Importantly, we show that our approach outperforms H.265 and other deep learning baselines in MS-SSIM on higher bitrate UVG video, and against all video codecs on lower framerates, while being thousands of times faster in decoding than deep models utilizing an autoregressive entropy model.

Supplementary material

Supplementary material 1 (mp4 77818 KB)

504472_1_En_27_MOESM2_ESM.pdf (5.8 mb)
Supplementary material 2 (pdf 5932 KB)


  1. 1.
    Avc jm reference software. Accessed 01 May 2020
  2. 2.
    Consumer digital video library. Accessed 01 Nov 2019
  3. 3.
    Hevc hm reference software. Accessed 01 May 2020
  4. 4.
    Ultra video group. Accessed 01 Nov 2019
  5. 5.
    Video trace library. Accessed 01 Nov 2019
  6. 6.
    Shocher, A., Cohen, N., Irani, M.: “zero-shot” super-resolution using deep internal learning. In: CVPR (2018) Google Scholar
  7. 7.
    Ballé, J., Laparra, V., Simoncelli, E.P.: Density modeling of images using a generalized normalization transformation. ArXiv (2015)Google Scholar
  8. 8.
    Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. In: ICLR (2017)Google Scholar
  9. 9.
    Ballé, J., Minnen, D., Singh, S., Hwang, S.J., Johnston, N.: Variational image compression with a scale hyperprior. In: ICLR (2018)Google Scholar
  10. 10.
    Campos, J., Meierhans, S., Djelouah, A., Schroers, C.: Content adaptive optimization for neural image compression (2019)Google Scholar
  11. 11.
    Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset (2017)Google Scholar
  12. 12.
    Djelouah, A., Campos, J., Schaub-Meyer, S., Schroers, C.: Neural inter-frame compression for video coding. In: ICCV (2019)Google Scholar
  13. 13.
    Glasner, D., Bagon, S., Irani, M.: Super-resolution from a single image. In: ICCV (2009)Google Scholar
  14. 14.
    Habibian, A., van Rozendaal, T., Tomczak, J.M., Cohen, T.S.: Video compression with rate-distortion autoencoders. In: ICCV (2019)Google Scholar
  15. 15.
    Han, J., Lombardo, S., Schroers, C., Mandt, S.: Deep generative video compression (2019)Google Scholar
  16. 16.
    Lee, J., Cho, S., Beack, S.K.: Context-adaptive entropy model for end-to-end optimized image compression. In: ICLR (2019)Google Scholar
  17. 17.
    Liu, H., Chen, T., Guo, P., Shen, Q., Cao, X., Wang, Y., Ma, Z.: Non-local Attention Optimized Deep Image Compression. ArXiv (2019)Google Scholar
  18. 18.
    Liu, H., Chen, T., Lu, M., Shen, Q., Ma, Z.: Neural Video Compression using Spatio-Temporal Priors. ArXiv (2019)Google Scholar
  19. 19.
    Liu, J., Wang, S., Urtasun, R.: Dsic: deep stereo image compression. In: ICCV (2019)Google Scholar
  20. 20.
    Lu, G., Ouyang, W., Xu, D., Zhang, X., Cai, C., Gao, Z.: Dvc: An end-to-end deep video compression framework. In: CVPR (2019)Google Scholar
  21. 21.
    Mentzer, F., Agustsson, E., Tschannen, M., Timofte, R., Gool, L.V.: Conditional probability models for deep image compression. In: CVPR (2018)Google Scholar
  22. 22.
    Mentzer, F., Agustsson, E., Tschannen, M., Timofte, R., Gool, L.V.: Practical full resolution learned lossless image compression. In: CVPR (2019)Google Scholar
  23. 23.
    Minnen, D., Ballé, J., Toderici, G.: Joint autoregressive and hierarchical priors for learned image compression. In: NIPS (2018)Google Scholar
  24. 24.
    Rippel, O., Bourdev, L.: Real-time adaptive image compression. In: ICML (2017)Google Scholar
  25. 25.
    Rippel, O., Nair, S., Lew, C., Branson, S., Anderson, A.G., Bourdev, L.: Learned video compression. In: ICCV (2019)Google Scholar
  26. 26.
    Shannon, C.E.: A mathematical theory of communication. Bell Syst.Tech. J. 27, 379–423 (1948) MathSciNetCrossRefGoogle Scholar
  27. 27.
    Shocher, A., Bagon, S., Isola, P., Irani, M.: Ingan: capturing and remapping the “DNA” of a natural image. In: ICCV (2019)Google Scholar
  28. 28.
    Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans Circuits Syst. Video Technol. 22, 1649–1668 (2012)CrossRefGoogle Scholar
  29. 29.
    Sun, Y., Wang, X., Liu, Z., Miller, J., Efros, A.A., Hardt, M.: Test-time training for out-of-distribution generalization (2019)Google Scholar
  30. 30.
    Tamar Rott Shaham, Tali Dekel, T.M.: Singan: Learning a generative model from a single natural image. In: ICCV (2019)Google Scholar
  31. 31.
    Theis, L., Shi, W., Cunningham, A., Huszar, F.: Lossy image compression with compressive autoencoders. In: ICLR (2017)Google Scholar
  32. 32.
    Toderici, G., O’Malley, S.M., Hwang, S.J., Vincent, D.: Variable rate image compression with recurrent neural networks. In: ICLR (2016)Google Scholar
  33. 33.
    Toderici, G., Vincent, D., Johnston, N., Hwang, S.J., Minnen, D., Shor, J., Covell, M.: Full resolution image compression with recurrent neural networks. In: CVPR (2017)Google Scholar
  34. 34.
    Ulyanov, D., Vedaldi, A., Lempitsky, V.S.: Deep image prior. In: CVPR (2018)Google Scholar
  35. 35.
    Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995). Scholar
  36. 36.
    Wang, H., et al.: MCL-JCV: a JND-based H.264/AVC video quality assessment dataset. In: ICIP (2016)Google Scholar
  37. 37.
    Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: ACSSC (2003)Google Scholar
  38. 38.
    Wiegand, T., Sullivan, G.J., Bjontegaard, G., Luthra, A.: Overview of the H.264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol. 13, 560–576 (2003)CrossRefGoogle Scholar
  39. 39.
    Wu, C.-Y., Singhal, N., Krähenbühl, P.: Video compression through image interpolation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part VIII. LNCS, vol. 11212, pp. 425–440. Springer, Cham (2018). Scholar
  40. 40.
    Zhou, Y., Zhu, Z., Bai, X., Lischinski, D., Cohen-Or, D., Huang, H.: Non-stationary texture synthesis by adversarial expansion. In: SIGGRAPH (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Uber ATGSt. PittsburghUSA
  2. 2.University of TorontoTorontoCanada
  3. 3.MITCambridgeUSA

Personalised recommendations