Advertisement

METRON

, Volume 77, Issue 2, pp 119–135 | Cite as

Unsupervised separation of dynamics from pixels

  • Silvia ChiappaEmail author
  • Ulrich Paquet
Article

Abstract

We present an approach to learn the dynamics of multiple objects from image sequences in an unsupervised way. We introduce a probabilistic model that first generate noisy positions for each object through a separate linear state-space model, and then renders the positions of all objects in the same image through a highly non-linear process. Such a linear representation of the dynamics enables us to propose an inference method that uses exact and efficient inference tools and that can be deployed to query the model in different ways without retraining.

Keywords

Variational auto-encoders Linear Gaussian state space models Deep neural networks 

Notes

References

  1. 1.
    Babaeizadeh, M., Finn, C., Erhan, D., Campbell, R., Levine, S.: Stochastic variational video prediction. In: 6th International Conference on Learning Representations (2018)Google Scholar
  2. 2.
    Bar-Shalom, Y., Li, X.R.: Estimation and Tracking: Principles, Techniques, and Software. Artech House, Norwood (1993)zbMATHGoogle Scholar
  3. 3.
    Barber, D., Cemgil, A.T., Chiappa, S.: Inference and estimation in probabilistic time series models. In: Bayesian Time Series Models, pp. 1–31 (2011)Google Scholar
  4. 4.
    Blackman, S., Popoli, R.: Design and Analysis of Modern Tracking Systems. Artech House, Norwood (1999)zbMATHGoogle Scholar
  5. 5.
    Chiappa, S.: Analysis and Classification of EEG Signals using Probabilistic Models for Brain Computer Interfaces. Ph.D. thesis, EPF Lausanne, Switzerland (2006)Google Scholar
  6. 6.
    Chiappa, S.: A Bayesian approach to switching linear Gaussian state-space models for unsupervised time-series segmentation. In: Proceedings of the Seventh International Conference on Machine Learning and Applications, pp. 3–9 (2008)Google Scholar
  7. 7.
    Chiappa, S.: Explicit-duration Markov switching models. Found. Trends Mach. Learn. 7(6), 803–886 (2014)CrossRefzbMATHGoogle Scholar
  8. 8.
    Chiappa, S., Racanière, S., Wierstra, D., Mohamed, S.: Recurrent environment simulators. In: 5th International Conference on Learning Representations (2017)Google Scholar
  9. 9.
    Denton, E.L., Birodkar, V.: Unsupervised learning of disentangled representations from video. Adv. Neural Inf. Process. Syst. 30, 4414–4423 (2017)Google Scholar
  10. 10.
    Finn, C., Goodfellow, I.J., Levine, S.: Unsupervised learning for physical interaction through video prediction. Adv. Neural Inf. Process. Syst. 29, 64–72 (2016)Google Scholar
  11. 11.
    Fraccaro, M., Kamronn, S., Paquet, U., Winther, O.: A disentangled recognition and nonlinear dynamics model for unsupervised learning. Adv. Neural Inf. Process. Syst. 30, 3604–3613 (2017)Google Scholar
  12. 12.
    Fraccaro, M., Sønderby, S.K., Paquet, U., Winther, O.: Sequential neural models with stochastic layers. Adv. Neural Inf. Process. Syst. 29, 2199–2207 (2016)Google Scholar
  13. 13.
    Gao, Y., Archer, E.W., Paninski, L., Cunningham, J.P.: Linear dynamical neural population models through nonlinear embeddings. Adv. Neural Inf. Process. Syst. 29, 163–171 (2016)Google Scholar
  14. 14.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  15. 15.
    Johnson, M., Duvenaud, D.K., Wiltschko, A., Adams, R.P., Datta, S.R.: Composing graphical models with neural networks for structured representations and fast inference. Adv. Neural Inf. Process. Syst. 29, 2946–2954 (2016)Google Scholar
  16. 16.
    Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: 2nd International Conference on Learning Representations (2014)Google Scholar
  17. 17.
    Krishnan, R., Shalit, U., Sontag, D.: Structured inference networks for nonlinear state space models. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 2101–2109 (2017)Google Scholar
  18. 18.
    Lin, W., Hubacher, N., Khan, M.E.: Variational message passing with structured inference networks. In: 6th International Conference on Learning Representations (2018)Google Scholar
  19. 19.
    Oh, J., Guo, X., Lee, H., Lewis, R.L., Singh, S.: Action-conditional video prediction using deep networks in Atari games. Adv. Neural Inf. Process. Syst. 28, 2863–2871 (2015)Google Scholar
  20. 20.
    Pearce, M., Chiappa, S., Paquet, U.: Comparing interpretable inference models for videos of physical motion. In: Symposium on Advances in Approximate Bayesian Inference (2018)Google Scholar
  21. 21.
    Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of the 31st International Conference on Machine Learning, pp. 1278–1286 (2014)Google Scholar
  22. 22.
    Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using LSTMs. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 843–852 (2015)Google Scholar
  23. 23.
    Sun, W., Venkatraman, A., Boots, B., Bagnell, J.A.: Learning to filter with predictive state inference machines. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 1197–1205 (2016)Google Scholar
  24. 24.
    Watters, N., Tacchetti, A., Weber, T., Pascanu, R., Battaglia, P., Zoran, D.: Visual interaction networks. CoRR. arXiv:1706.01433 (2017)

Copyright information

© Sapienza Università di Roma 2019

Authors and Affiliations

  1. 1.DeepMindLondonUK

Personalised recommendations