Skip to main content

Disentangling Multiple Features in Video Sequences Using Gaussian Processes in Variational Autoencoders

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12368))

Included in the following conference series:

Abstract

We introduce MGP-VAE (Multi-disentangled-features Gaussian Processes Variational AutoEncoder), a variational autoencoder which uses Gaussian processes (GP) to model the latent space for the unsupervised learning of disentangled representations in video sequences. We improve upon previous work by establishing a framework by which multiple features, static or dynamic, can be disentangled. Specifically we use fractional Brownian motions (fBM) and Brownian bridges (BB) to enforce an inter-frame correlation structure in each independent channel, and show that varying this structure enables one to capture different factors of variation in the data. We demonstrate the quality of our representations with experiments on three publicly available datasets, and also quantify the improvement using a video prediction task. Moreover, we introduce a novel geodesic loss function which takes into account the curvature of the data manifold to improve learning. Our experiments show that the combination of the improved representations with the novel loss function enable MGP-VAE to outperform the baselines in video prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.cs.toronto.edu/ nitish/unsupervised_video.

  2. 2.

    https://github.com/deepmind/dsprites-dataset.

  3. 3.

    https://github.com/rubenvillegas/iclr2017mcnet.

  4. 4.

    https://github.com/ap229997/DRNET.

  5. 5.

    https://github.com/jthsieh/DDPAE-video-prediction.

  6. 6.

    https://github.com/SUTDBrainLab/MGP-VAE.

References

  1. Arvanitidis, G., Hansen, L.K., Hauberg, S.: Latent space oddity: on the curvature of deep generative models. In: International Conference on Learning Representations (ICLR) (2017)

    Google Scholar 

  2. Bayer, C., Friz, P., Gatheral, J.: Pricing under rough volatility. Quant. Finan. 16(6), 887–904 (2016)

    Article  MathSciNet  Google Scholar 

  3. Bengio, Y., Courville, A.C., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2012)

    Article  Google Scholar 

  4. Casale, F.P., Dalca, A.V., Saglietti, L., Listgarten, J., Fusi, N.: Gaussian process prior variational autoencoders. In: Conference on Neural Information Processing Systems (NeurIPS) (2018)

    Google Scholar 

  5. Chen, R.T.Q., Li, X., Grosse, R., Duvenaud, D.: Isolating sources of disentanglement in VAEs. In: Advances in Neural Information Processing Systems (2018)

    Google Scholar 

  6. Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversial nets. In: Conference on Neural Information Processing Systems (NIPS) (2016)

    Google Scholar 

  7. Denton, E.L., Birodkar, V.: Unsupervised learning of disentangled representations from video. In: Conference on Neural Information Processing Systems (NIPS) (2017)

    Google Scholar 

  8. Fortuin, V., Rätsch, G., Mandt, S.: Multivariate Time Series Imputation with Variational Autoencoders. arXiv:1907.04155 (2019)

  9. Glasserman, P.: Monte-Carlo Methods in Financial Engineering. Springer, New York (2003)

    Book  Google Scholar 

  10. Goodfellow, I., et al.: Generative adversial nets. In: Advances in Neural Information Processing Systems (2014)

    Google Scholar 

  11. Grathwohl, W., Wilson, A.: Disentangling Space and Time in Video with Hierarchical Variational Auto-encoders. arXiv:1612.04440 (2016)

  12. Hida, T., Hitsuda, M.: Gaussian Processes. American Mathematical Society, Providence (2008)

    MATH  Google Scholar 

  13. Higgins, I., et al.: \(\upbeta \)-VAE: learning basic visual concepts with a constrained variational framework. In: International Conference on Learning Representations (ICLR) (2017)

    Google Scholar 

  14. Hsieh, J.T., Liu, B., Huang, D.A., Li, F.F., Niebles, J.C.: Learning to decompose and disentangle representations for video prediction. In: Conference on Neural Information Processing Systems (NeurIPS) (2018)

    Google Scholar 

  15. Hyvarinen, A., Morioka, H.: Unsupervised feature extraction by time-contrastive learning and nonlinear ICA. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 3765–3773. Curran Associates, Inc. (2016)

    Google Scholar 

  16. Karatzas, I., Shreve, S.E.: Brownian Motion and Stochastic Calculus. GTM, vol. 113. Springer, New York (1998). https://doi.org/10.1007/978-1-4612-0949-2d

    Book  MATH  Google Scholar 

  17. Khemakhem, I., Kingma, D., Monti, R., Hyvarinen, A.: Variational autoencoders and nonlinear ICA: a unifying framework. In: Chiappa, S., Calandra, R. (eds.) Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, PMLR, 26–28 August 2020, vol. 108, pp. 2207–2217 (2020)

    Google Scholar 

  18. Kim, H., Mnih, A.: Disentangling by factorising. In: International Conference on Machine Learning (ICML) (2018)

    Google Scholar 

  19. Kim, M., Wang, Y., Sahu, P., Pavlovic, V.: Bayes-Factor-VAE: Hierarchical Bayesian Deep Auto-Encoder Models for Factor Disentanglement. arXiv:1909.02820 (2019)

  20. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: International Conference on Learning Representations (ICLR) (2013)

    Google Scholar 

  21. Kühnel, L., Fletcher, T.E., Joshi, S.C., Sommer, S.: Latent Space Non-Linear Statistics. arXiv:1805.07632 (2018)

  22. Li, Y., Mandt, S.: Disentangled sequential autoencoder. In: International Conference on Machine Learning (ICML) (2018)

    Google Scholar 

  23. Lim, N., Gong, T., Cheng, L., Lee, H.K., et al.: Finding distinctive shape features for automatic hematoma classification in head CT images from traumatic brain injuries (2013)

    Google Scholar 

  24. Mandelbrot, B.B., Van Ness, J.W.: Fractional Brownian motions, fractional noises and applications. SIAM Rev. 10(4), 422–437 (1968)

    Article  MathSciNet  Google Scholar 

  25. Ornstein, L.S., Uhlenbeck, G.E.: On the theory of Brownian motion. Phys. Rev. 36, 823–841 (1930)

    Article  Google Scholar 

  26. Reed, S.E., Zhang, Y., Zhang, Y., Lee, H.: Deep visual analogy-making. In: Conference on Neural Information Processing Systems (NIPS) (2015)

    Google Scholar 

  27. Shao, H., Kumar, A., Fletcher, P.T.: The Riemannian geometry of deep generative models. In: Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2018)

    Google Scholar 

  28. Shukla, A., Uppal, S., Bhagat, S., Anand, S., Turaga, P.K.: Geometry of Deep Generative Models for Disentangled Representations. arXiv:1902.06964 (2019)

  29. Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using LSTMs. In: International Conference on Machine Learning (ICML) (2015)

    Google Scholar 

  30. Stuehmer, J., Turner, R., Nowozin, S.: Independent subspace analysis for unsupervised learning of disentangled representations. In: Chiappa, S., Calandra, R. (eds.) Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, PMLR, 26–28 August 2020, vol. 108, pp. 1200–1210 (2020)

    Google Scholar 

  31. Villegas, R., Yang, J., Hong, S., Lin, X., Lee, H.: Decomposing Motion and Content for Natural Video Sequence Prediction (2017)

    Google Scholar 

  32. Williams, C.K., Rasmussen, C.E.: Gaussian Processes for Machine Learning, vol. 2. MIT Press, Cambridge (2006)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nengli Lim .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 17122 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bhagat, S., Uppal, S., Yin, Z., Lim, N. (2020). Disentangling Multiple Features in Video Sequences Using Gaussian Processes in Variational Autoencoders. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12368. Springer, Cham. https://doi.org/10.1007/978-3-030-58592-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58592-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58591-4

  • Online ISBN: 978-3-030-58592-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics