Human Activities as Stochastic Kronecker Graphs

  • Sinisa Todorovic
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7573)


A human activity can be viewed as a space-time repetition of activity primitives. Both instances of the primitives, and their repetition are stochastic. They can be modeled by a generative model-graph, where nodes correspond to the primitives, and the graph’s adjacency matrix encodes their affinities for probabilistic grouping into observable video features. When a video of the activity is represented by a graph capturing the space-time layout of video features, such a video graph can be viewed as probabilistically sampled from the activity’s model-graph. This sampling is formulated as a successive Kronecker multiplication of the model’s affinity matrix. The resulting Kronecker-power matrix is taken as a noisy permutation of the adjacency matrix of the video graph. The paper presents our: 1) model-graph; 2) memory- and time-efficient, weakly supervised learning of activity primitives and their affinities; and 3) inference aimed at finding the best expected correspondences between the primitives and observed video features. Our results demonstrate good scalability on UCF50, and superior performance to that of the state of the art on individual, structured, and collective activities of UCF YouTube, Olympic, and Collective datasets.


Kronecker Product Equal Error Rate Training Video Activity Primitive Video Feature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Messing, R., Pal, C., Kautz, H.A.: Activity recognition using the velocity histories of tracked keypoints. In: ICCV (2009)Google Scholar
  2. 2.
    Brendel, W., Todorovic, S.: Learning spatiotemporal graphs of human activities. In: ICCV (2011)Google Scholar
  3. 3.
    Torsello, A., Hancock, E.R.: Learning shape-classes using a mixture of tree-unions. IEEE TPAMI 28, 954–967 (2006)CrossRefGoogle Scholar
  4. 4.
    Torsello, A.: An importance sampling approach to learning structural representations of shape. In: CVPR (2008)Google Scholar
  5. 5.
    Todorovic, S., Ahuja, N.: Unsupervised category modeling, recognition, and segmentation in images. IEEE TPAMI 30, 1–17 (2008)CrossRefGoogle Scholar
  6. 6.
    Leskovec, J., Chakrabarti, D., Kleinberg, J.M., Faloutsos, C., Ghahramani, Z.: Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Research 11, 985–1042 (2010)zbMATHMathSciNetGoogle Scholar
  7. 7.
    Cheung, V., Frey, B.J., Jojic, N.: Video epitomes. IJCV 76, 141–152 (2008)CrossRefGoogle Scholar
  8. 8.
    Sadanand, S., Corso, J.J.: Action bank: A high-level representation of activity in video. In: CVPR (2012)Google Scholar
  9. 9.
    Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: A review. ACM Comput. Surv. 43, 16:1–16:43 (2011)Google Scholar
  10. 10.
    Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: CVPR (2010)Google Scholar
  11. 11.
    Lan, T., Wang, Y., Mori, G.: Discriminative figure-centric models for joint action localization and recognition. In: ICCV (2011)Google Scholar
  12. 12.
    Bhattacharya, S., Sukthankar, R., Jin, R., Shah, M.: A probabilistic representation for efficient large scale visual recognition tasks. In: CVPR (2011)Google Scholar
  13. 13.
    Gupta, A., Srinivasan, P., Shi, J.B., Davis, L.S.: Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In: CVPR (2009)Google Scholar
  14. 14.
    Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 392–405. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Pei, M., Jia, Y., Zhu, S.-C.: Parsing video events with goal inference and intent prediction. In: ICCV (2011)Google Scholar
  16. 16.
    Choi, W., Shahid, K., Savarese, S.: Learning context for collective activity recognition. In: CVPR (2011)Google Scholar
  17. 17.
    Lan, T., Wang, Y., Yang, W., Robinovitch, S., Mori, G.: Discriminative latent models for recognizing contextual group activities. IEEE TPAMI (2011)Google Scholar
  18. 18.
    Wu, S., Oreifej, O., Shah, M.: Action recognition in videos acquired by a moving camera using motion decomposition of Lagrangian particle trajectories. In: ICCV (2011)Google Scholar
  19. 19.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Sinisa Todorovic
    • 1
  1. 1.Oregon State UniversityCorvallisUSA

Personalised recommendations