Disentangling Multiple Features in Video Sequences Using Gaussian Processes in Variational Autoencoders

Bhagat, Sarthak; Uppal, Shagun; Yin, Zhuyun; Lim, Nengli

doi:10.1007/978-3-030-58592-1_7

Sarthak Bhagat¹²,
Shagun Uppal¹²,
Zhuyun Yin¹³ &
…
Nengli Lim¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12368))

Included in the following conference series:

European Conference on Computer Vision

3667 Accesses
10 Citations

Abstract

We introduce MGP-VAE (Multi-disentangled-features Gaussian Processes Variational AutoEncoder), a variational autoencoder which uses Gaussian processes (GP) to model the latent space for the unsupervised learning of disentangled representations in video sequences. We improve upon previous work by establishing a framework by which multiple features, static or dynamic, can be disentangled. Specifically we use fractional Brownian motions (fBM) and Brownian bridges (BB) to enforce an inter-frame correlation structure in each independent channel, and show that varying this structure enables one to capture different factors of variation in the data. We demonstrate the quality of our representations with experiments on three publicly available datasets, and also quantify the improvement using a video prediction task. Moreover, we introduce a novel geodesic loss function which takes into account the curvature of the data manifold to improve learning. Our experiments show that the combination of the improved representations with the novel loss function enable MGP-VAE to outperform the baselines in video prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Arvanitidis, G., Hansen, L.K., Hauberg, S.: Latent space oddity: on the curvature of deep generative models. In: International Conference on Learning Representations (ICLR) (2017)
Google Scholar
Bayer, C., Friz, P., Gatheral, J.: Pricing under rough volatility. Quant. Finan. 16(6), 887–904 (2016)
Article MathSciNet Google Scholar
Bengio, Y., Courville, A.C., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2012)
Article Google Scholar
Casale, F.P., Dalca, A.V., Saglietti, L., Listgarten, J., Fusi, N.: Gaussian process prior variational autoencoders. In: Conference on Neural Information Processing Systems (NeurIPS) (2018)
Google Scholar
Chen, R.T.Q., Li, X., Grosse, R., Duvenaud, D.: Isolating sources of disentanglement in VAEs. In: Advances in Neural Information Processing Systems (2018)
Google Scholar
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversial nets. In: Conference on Neural Information Processing Systems (NIPS) (2016)
Google Scholar
Denton, E.L., Birodkar, V.: Unsupervised learning of disentangled representations from video. In: Conference on Neural Information Processing Systems (NIPS) (2017)
Google Scholar
Fortuin, V., Rätsch, G., Mandt, S.: Multivariate Time Series Imputation with Variational Autoencoders. arXiv:1907.04155 (2019)
Glasserman, P.: Monte-Carlo Methods in Financial Engineering. Springer, New York (2003)
Book Google Scholar
Goodfellow, I., et al.: Generative adversial nets. In: Advances in Neural Information Processing Systems (2014)
Google Scholar
Grathwohl, W., Wilson, A.: Disentangling Space and Time in Video with Hierarchical Variational Auto-encoders. arXiv:1612.04440 (2016)
Hida, T., Hitsuda, M.: Gaussian Processes. American Mathematical Society, Providence (2008)
MATH Google Scholar
Higgins, I., et al.: \(\upbeta \)-VAE: learning basic visual concepts with a constrained variational framework. In: International Conference on Learning Representations (ICLR) (2017)
Google Scholar
Hsieh, J.T., Liu, B., Huang, D.A., Li, F.F., Niebles, J.C.: Learning to decompose and disentangle representations for video prediction. In: Conference on Neural Information Processing Systems (NeurIPS) (2018)
Google Scholar
Hyvarinen, A., Morioka, H.: Unsupervised feature extraction by time-contrastive learning and nonlinear ICA. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 3765–3773. Curran Associates, Inc. (2016)
Google Scholar
Karatzas, I., Shreve, S.E.: Brownian Motion and Stochastic Calculus. GTM, vol. 113. Springer, New York (1998). https://doi.org/10.1007/978-1-4612-0949-2d
Book MATH Google Scholar
Khemakhem, I., Kingma, D., Monti, R., Hyvarinen, A.: Variational autoencoders and nonlinear ICA: a unifying framework. In: Chiappa, S., Calandra, R. (eds.) Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, PMLR, 26–28 August 2020, vol. 108, pp. 2207–2217 (2020)
Google Scholar
Kim, H., Mnih, A.: Disentangling by factorising. In: International Conference on Machine Learning (ICML) (2018)
Google Scholar
Kim, M., Wang, Y., Sahu, P., Pavlovic, V.: Bayes-Factor-VAE: Hierarchical Bayesian Deep Auto-Encoder Models for Factor Disentanglement. arXiv:1909.02820 (2019)
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: International Conference on Learning Representations (ICLR) (2013)
Google Scholar
Kühnel, L., Fletcher, T.E., Joshi, S.C., Sommer, S.: Latent Space Non-Linear Statistics. arXiv:1805.07632 (2018)
Li, Y., Mandt, S.: Disentangled sequential autoencoder. In: International Conference on Machine Learning (ICML) (2018)
Google Scholar
Lim, N., Gong, T., Cheng, L., Lee, H.K., et al.: Finding distinctive shape features for automatic hematoma classification in head CT images from traumatic brain injuries (2013)
Google Scholar
Mandelbrot, B.B., Van Ness, J.W.: Fractional Brownian motions, fractional noises and applications. SIAM Rev. 10(4), 422–437 (1968)
Article MathSciNet Google Scholar
Ornstein, L.S., Uhlenbeck, G.E.: On the theory of Brownian motion. Phys. Rev. 36, 823–841 (1930)
Article Google Scholar
Reed, S.E., Zhang, Y., Zhang, Y., Lee, H.: Deep visual analogy-making. In: Conference on Neural Information Processing Systems (NIPS) (2015)
Google Scholar
Shao, H., Kumar, A., Fletcher, P.T.: The Riemannian geometry of deep generative models. In: Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2018)
Google Scholar
Shukla, A., Uppal, S., Bhagat, S., Anand, S., Turaga, P.K.: Geometry of Deep Generative Models for Disentangled Representations. arXiv:1902.06964 (2019)
Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using LSTMs. In: International Conference on Machine Learning (ICML) (2015)
Google Scholar
Stuehmer, J., Turner, R., Nowozin, S.: Independent subspace analysis for unsupervised learning of disentangled representations. In: Chiappa, S., Calandra, R. (eds.) Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, PMLR, 26–28 August 2020, vol. 108, pp. 1200–1210 (2020)
Google Scholar
Villegas, R., Yang, J., Hong, S., Lin, X., Lee, H.: Decomposing Motion and Content for Natural Video Sequence Prediction (2017)
Google Scholar
Williams, C.K., Rasmussen, C.E.: Gaussian Processes for Machine Learning, vol. 2. MIT Press, Cambridge (2006)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

IIIT Delhi, Delhi, India
Sarthak Bhagat & Shagun Uppal
Bioinformatics Institute, A*STAR, Singapore, Singapore
Zhuyun Yin
Singapore University of Technology and Design, Singapore, Singapore
Nengli Lim

Authors

Sarthak Bhagat
View author publications
You can also search for this author in PubMed Google Scholar
Shagun Uppal
View author publications
You can also search for this author in PubMed Google Scholar
Zhuyun Yin
View author publications
You can also search for this author in PubMed Google Scholar
Nengli Lim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nengli Lim .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 17122 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhagat, S., Uppal, S., Yin, Z., Lim, N. (2020). Disentangling Multiple Features in Video Sequences Using Gaussian Processes in Variational Autoencoders. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12368. Springer, Cham. https://doi.org/10.1007/978-3-030-58592-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-58592-1_7
Published: 03 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58591-4
Online ISBN: 978-3-030-58592-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics