Abstract
We introduce MGP-VAE (Multi-disentangled-features Gaussian Processes Variational AutoEncoder), a variational autoencoder which uses Gaussian processes (GP) to model the latent space for the unsupervised learning of disentangled representations in video sequences. We improve upon previous work by establishing a framework by which multiple features, static or dynamic, can be disentangled. Specifically we use fractional Brownian motions (fBM) and Brownian bridges (BB) to enforce an inter-frame correlation structure in each independent channel, and show that varying this structure enables one to capture different factors of variation in the data. We demonstrate the quality of our representations with experiments on three publicly available datasets, and also quantify the improvement using a video prediction task. Moreover, we introduce a novel geodesic loss function which takes into account the curvature of the data manifold to improve learning. Our experiments show that the combination of the improved representations with the novel loss function enable MGP-VAE to outperform the baselines in video prediction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arvanitidis, G., Hansen, L.K., Hauberg, S.: Latent space oddity: on the curvature of deep generative models. In: International Conference on Learning Representations (ICLR) (2017)
Bayer, C., Friz, P., Gatheral, J.: Pricing under rough volatility. Quant. Finan. 16(6), 887–904 (2016)
Bengio, Y., Courville, A.C., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2012)
Casale, F.P., Dalca, A.V., Saglietti, L., Listgarten, J., Fusi, N.: Gaussian process prior variational autoencoders. In: Conference on Neural Information Processing Systems (NeurIPS) (2018)
Chen, R.T.Q., Li, X., Grosse, R., Duvenaud, D.: Isolating sources of disentanglement in VAEs. In: Advances in Neural Information Processing Systems (2018)
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversial nets. In: Conference on Neural Information Processing Systems (NIPS) (2016)
Denton, E.L., Birodkar, V.: Unsupervised learning of disentangled representations from video. In: Conference on Neural Information Processing Systems (NIPS) (2017)
Fortuin, V., Rätsch, G., Mandt, S.: Multivariate Time Series Imputation with Variational Autoencoders. arXiv:1907.04155 (2019)
Glasserman, P.: Monte-Carlo Methods in Financial Engineering. Springer, New York (2003)
Goodfellow, I., et al.: Generative adversial nets. In: Advances in Neural Information Processing Systems (2014)
Grathwohl, W., Wilson, A.: Disentangling Space and Time in Video with Hierarchical Variational Auto-encoders. arXiv:1612.04440 (2016)
Hida, T., Hitsuda, M.: Gaussian Processes. American Mathematical Society, Providence (2008)
Higgins, I., et al.: \(\upbeta \)-VAE: learning basic visual concepts with a constrained variational framework. In: International Conference on Learning Representations (ICLR) (2017)
Hsieh, J.T., Liu, B., Huang, D.A., Li, F.F., Niebles, J.C.: Learning to decompose and disentangle representations for video prediction. In: Conference on Neural Information Processing Systems (NeurIPS) (2018)
Hyvarinen, A., Morioka, H.: Unsupervised feature extraction by time-contrastive learning and nonlinear ICA. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 3765–3773. Curran Associates, Inc. (2016)
Karatzas, I., Shreve, S.E.: Brownian Motion and Stochastic Calculus. GTM, vol. 113. Springer, New York (1998). https://doi.org/10.1007/978-1-4612-0949-2d
Khemakhem, I., Kingma, D., Monti, R., Hyvarinen, A.: Variational autoencoders and nonlinear ICA: a unifying framework. In: Chiappa, S., Calandra, R. (eds.) Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, PMLR, 26–28 August 2020, vol. 108, pp. 2207–2217 (2020)
Kim, H., Mnih, A.: Disentangling by factorising. In: International Conference on Machine Learning (ICML) (2018)
Kim, M., Wang, Y., Sahu, P., Pavlovic, V.: Bayes-Factor-VAE: Hierarchical Bayesian Deep Auto-Encoder Models for Factor Disentanglement. arXiv:1909.02820 (2019)
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: International Conference on Learning Representations (ICLR) (2013)
Kühnel, L., Fletcher, T.E., Joshi, S.C., Sommer, S.: Latent Space Non-Linear Statistics. arXiv:1805.07632 (2018)
Li, Y., Mandt, S.: Disentangled sequential autoencoder. In: International Conference on Machine Learning (ICML) (2018)
Lim, N., Gong, T., Cheng, L., Lee, H.K., et al.: Finding distinctive shape features for automatic hematoma classification in head CT images from traumatic brain injuries (2013)
Mandelbrot, B.B., Van Ness, J.W.: Fractional Brownian motions, fractional noises and applications. SIAM Rev. 10(4), 422–437 (1968)
Ornstein, L.S., Uhlenbeck, G.E.: On the theory of Brownian motion. Phys. Rev. 36, 823–841 (1930)
Reed, S.E., Zhang, Y., Zhang, Y., Lee, H.: Deep visual analogy-making. In: Conference on Neural Information Processing Systems (NIPS) (2015)
Shao, H., Kumar, A., Fletcher, P.T.: The Riemannian geometry of deep generative models. In: Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2018)
Shukla, A., Uppal, S., Bhagat, S., Anand, S., Turaga, P.K.: Geometry of Deep Generative Models for Disentangled Representations. arXiv:1902.06964 (2019)
Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using LSTMs. In: International Conference on Machine Learning (ICML) (2015)
Stuehmer, J., Turner, R., Nowozin, S.: Independent subspace analysis for unsupervised learning of disentangled representations. In: Chiappa, S., Calandra, R. (eds.) Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, PMLR, 26–28 August 2020, vol. 108, pp. 1200–1210 (2020)
Villegas, R., Yang, J., Hong, S., Lin, X., Lee, H.: Decomposing Motion and Content for Natural Video Sequence Prediction (2017)
Williams, C.K., Rasmussen, C.E.: Gaussian Processes for Machine Learning, vol. 2. MIT Press, Cambridge (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Bhagat, S., Uppal, S., Yin, Z., Lim, N. (2020). Disentangling Multiple Features in Video Sequences Using Gaussian Processes in Variational Autoencoders. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12368. Springer, Cham. https://doi.org/10.1007/978-3-030-58592-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-58592-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58591-4
Online ISBN: 978-3-030-58592-1
eBook Packages: Computer ScienceComputer Science (R0)