Where Will They Go? Predicting Fine-Grained Adversarial Multi-agent Motion Using Conditional Variational Autoencoders

  • Panna FelsenEmail author
  • Patrick Lucey
  • Sujoy Ganguly
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11215)


Simultaneously and accurately forecasting the behavior of many interacting agents is imperative for computer vision applications to be widely deployed (e.g., autonomous vehicles, security, surveillance, sports). In this paper, we present a technique using conditional variational autoencoder which learns a model that “personalizes” prediction to individual agent behavior within a group representation. Given the volume of data available and its adversarial nature, we focus on the sport of basketball and show that our approach efficiently predicts context-specific agent motions. We find that our model generates results that are three times as accurate as previous state of the art approaches (5.74 ft vs. 17.95 ft).


Forecasting Motion prediction Multi-agent tracking Context aware prediction Conditional variational autoencoders 

Supplementary material

474198_1_En_45_MOESM1_ESM.pdf (48 kb)
Supplementary material 1 (pdf 47 KB)

Supplementary material 2 (mp4 10701 KB)


  1. 1.
    Lee, N., Choi, W., Vernaza, P., Choy, C., Torr, P., Chandraker, M.: DESIRE: distance future prediction in dynamic scenes with interacting agents (2017)Google Scholar
  2. 2.
    Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., Savarese, S.: Social LSTM: human trajectory prediction in crowded spaces (2016)Google Scholar
  3. 3.
    Jain, A., Singh, A., Koppula, H., Soh, S., Saxena, A.: Recurrent neural networks for driver activity anticipation via sensory-fusion architecture (2016)Google Scholar
  4. 4.
    Akhter, I., Simon, T., Khan, S., Matthews, I., Sheikh, Y.: Bilinear spatiotemporal basis models. ACM Trans. Graph. (TOG) (2012)Google Scholar
  5. 5.
    Lucey, P., Bialkowski, A., Carr, P., Morgan, S., Matthews, I., Sheikh, Y.: Representing and discovering adversarial team behaviors using player roles (2013)Google Scholar
  6. 6.
    Insafutdinov, E., et al.: ArtTrack: articulated multi-person tracking in the wild (2017)Google Scholar
  7. 7.
    Le, H., Yue, Y., Carr, P., Lucey, P.: Coordinated multi-agent imitation learning (2017)Google Scholar
  8. 8.
    Yamaguchi, K., Berg, A., Ortiz, L., Berg, T.: Who are you with and where are you going? (2011)Google Scholar
  9. 9.
    Butt, A., Collins, R.: Multi-target tracking by lagrangian relaxation to min-cost network flow (2013)Google Scholar
  10. 10.
    Wang, S., Fowlkes, C.: Learning optimal parameters for multi-target tracking (2016)Google Scholar
  11. 11.
    Maksai, A., Wang, X., Fua, P.: What players do with the ball: a physically constrained interaction modeling (2016)Google Scholar
  12. 12.
    Kim, K., Grundmann, M., Shamir, A., Matthews, I., Hodgins, J., Essa, I.: Motion fields to PRedict play evolution in dynamic sports scenes (2010)Google Scholar
  13. 13.
    Chen, J., Le, H., Carr, P., Yue, Y., Little, J.: Learning online smooth predictors for Realtime camera planning using recurrent decision trees (2016)Google Scholar
  14. 14.
    Zheng, S., Yue, Y., Lucey, P.: Generating long-term trajectories using deep hierarchical networks (2016)Google Scholar
  15. 15.
    Felsen, P., Agrawal, P., Malik, J.: What will happen next? Forecasting player moves in sports videos (2017)Google Scholar
  16. 16.
    Su, S., Hong, J.P., Shi, J., Park, H.S.: Social behavior prediction from first person videos. CoRR abs/1611.09464 (2016)Google Scholar
  17. 17.
    Koren, Y., Bell, R., Volinksy, C.: Matrix factorization techniques for recommender systems. Computer 42(8) (2009)CrossRefGoogle Scholar
  18. 18.
    Deng, Z., et al.: Factorized variational autoencoders for modeling audience reactions to movies (2017)Google Scholar
  19. 19.
    Charles, J., Pfister, T., Magee, D., Hogg, D., Zisserman, A.: Personalizing human video pose estimation (2016)Google Scholar
  20. 20.
    Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
  21. 21.
    Gregor, K., Danihelka, I., Graves, A., Wierstra, D.: DRAW: A recurrent neural network for image generation. CoRR abs/1502.04623 (2015)Google Scholar
  22. 22.
    Bowman, S.R., Vilnis, L., Vinyals, O., Dai, A.M., Józefowicz, R., Bengio, S.: Generating sentences from a continuous space. CoRR abs/1511.06349 (2015)Google Scholar
  23. 23.
    Kingma, D., Mohamed, S., Rezende, D., Welling, M.: Semi-supervised learning with deep generative models (2014)Google Scholar
  24. 24.
    Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models (2015)Google Scholar
  25. 25.
    van den Oord, A., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. CoRR abs/1601.06759 (2016)Google Scholar
  26. 26.
    Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting, June 2016Google Scholar
  27. 27.
    Walker, J., Doersch, C., Gupta, A., Hebert, M.: An uncertain future: Forecasting from static images using variational autoencoders. CoRR abs/1606.07873 (2016)Google Scholar
  28. 28.
    Sha, L., Lucey, P., Zheng, S., Kim, T., Yue, Y., Sridharan, S.: Fine-grained retrieval of sports plays using tree-based alignment of trajectories (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.BAIRUC BerkeleyBerkeleyUSA
  2. 2.STATSChicagoUSA

Personalised recommendations