Predicting Actions from Static Scenes

  • Tuan-Hung Vu
  • Catherine Olsson
  • Ivan Laptev
  • Aude Oliva
  • Josef Sivic
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8693)


Human actions naturally co-occur with scenes. In this work we aim to discover action-scene correlation for a large number of scene categories and to use such correlation for action prediction. Towards this goal, we collect a new SUN Action dataset with manual annotations of typical human actions for 397 scenes. We next discover action-scene associations and demonstrate that scene categories can be well identified from their associated actions. Using discovered associations, we address a new task of predicting human actions for images of static scenes. We evaluate prediction of 23 and 38 action classes for images of indoor and outdoor scenes respectively and show promising results. We also propose a new application of geo-localized action prediction and demonstrate ability of our method to automatically answer queries such as “Where is a good place for a picnic?” or “Can I cycle along this path?”.


Action prediction scene recognition functional properties 


  1. 1.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)Google Scholar
  2. 2.
    Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 392–405. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  3. 3.
    Sadanand, S., Corso, J.: Action bank: A high-level representation of activity in video. In: CVPR (2012)Google Scholar
  4. 4.
    Wang, H., Kläser, A., Schmid, C., Cheng-Lin, L.: Action Recognition by Dense Trajectories. In: CVPR (2011)Google Scholar
  5. 5.
    Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR (2009)Google Scholar
  6. 6.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)Google Scholar
  7. 7.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV 42, 145–175 (2001)CrossRefzbMATHGoogle Scholar
  8. 8.
    Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: CVPR (2009)Google Scholar
  9. 9.
    Vogel, J., Schiele, B.: Natural scene retrieval based on a semantic modeling step. In: Enser, P.G.B., Kompatsiaris, Y., O’Connor, N.E., Smeaton, A.F., Smeulders, A.W.M. (eds.) CIVR 2004. LNCS, vol. 3115, pp. 207–215. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  10. 10.
    Kitani, K.M., Ziebart, B.D., Bagnell, J.A., Hebert, M.: Activity forecasting. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 201–214. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  11. 11.
    Grabner, H., Gall, J., Van Gool, L.: What makes a chair a chair? In: CVPR (2011)Google Scholar
  12. 12.
    Gupta, A., Satkin, S., Efros, A., Hebert, M.: From 3d scene geometry to human workspace. In: CVPR (2011)Google Scholar
  13. 13.
    Delaitre, V., Fouhey, D.F., Laptev, I., Sivic, J., Gupta, A., Efros, A.A.: Scene semantics from long-term observation of people. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 284–298. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  14. 14.
    Jianxiong, X., Hays, J., Ehinger, K., Oliva, A., Torralba, A.: Sun database: Large-scale scene recognition from abbey to zoo. In: CVPR, pp. 3485–3492 (2010)Google Scholar
  15. 15.
    Li, L.J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: ICCV (2007)Google Scholar
  16. 16.
    Walker, J., Gupta, A., Hebert, M.: Patch to the future: Unsupervised visual prediction. In: CVPR (2014)Google Scholar
  17. 17.
    Yuen, J., Torralba, A.: A data-driven approach for event prediction. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 707–720. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  18. 18.
    Patterson, G., Hays, J.: Sun attribute database: Discovering, annotating, and recognizing scene attributes. In: CVPR (2012)Google Scholar
  19. 19.
    Arietta, S., Agrawala, M., Ramamoorthi, R.: On relating visual elements to city statistics. Technical Report UCB/EECS-2013-157, EECS Department, University of California, Berkeley (September 2013)Google Scholar
  20. 20.
    Khosla, A., An, B., Lim, J., Torralba, A.: Looking beyond the visible scene. In: CVPR (2014)Google Scholar
  21. 21.
    Ehinger, K.A., Xiao, J., Torralba, A., Oliva, A.: Estimating scene typicality from human ratings and image features (2011)Google Scholar
  22. 22.
    Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 173–180 (2003)Google Scholar
  23. 23.
  24. 24.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)CrossRefGoogle Scholar
  25. 25.
    Abdel-Hakim, A.E., Farag, A.A.: Csift: A sift descriptor with color invariant characteristics. In: CVPR (2006)Google Scholar
  26. 26.
    van de Sande, K.E.A., Gevers, T., Snoek, C.G.M.: Empowering visual categorization with the gpu. IEEE Transactions on Multimedia 13(1), 60–70 (2011)CrossRefGoogle Scholar
  27. 27.
    Chang, C.C., Lin, C.J.: Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)Google Scholar
  28. 28.
    Zhang, J., Marszałek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: A comprehensive study. IJCV 73(2), 213–238 (2007)CrossRefGoogle Scholar
  29. 29.
    Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press (1999)Google Scholar
  30. 30.
    Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: CVPR (2007)Google Scholar
  31. 31.
    Google: Panoramio service (2007),
  32. 32.
    Map of ski stations in france (2013),

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Tuan-Hung Vu
    • 1
  • Catherine Olsson
    • 2
  • Ivan Laptev
    • 1
  • Aude Oliva
    • 2
  • Josef Sivic
    • 1
  1. 1.WILLOW, ENS/INRIA/CNRS UMR 8548ParisFrance
  2. 2.CSAIL, MITCambridgeUSA

Personalised recommendations