Modeling Complex Temporal Composition of Actionlets for Activity Prediction

  • Kang Li
  • Jie Hu
  • Yun Fu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7572)


Early prediction of ongoing activity has been more and more valuable in a large variety of time-critical applications. To build an effective representation for prediction, human activities can be characterized by a complex temporal composition of constituent simple actions. Different from early recognition on short-duration simple activities, we propose a novel framework for long-duration complex activity prediction by discovering the causal relationships between constituent actions and the predictable characteristics of activities. The major contributions of our work include: (1) we propose a novel activity decomposition method by monitoring motion velocity which encodes a temporal decomposition of long activities into a sequence of meaningful action units; (2) Probabilistic Suffix Tree (PST) is introduced to represent both large and small order Markov dependencies between action units; (3) we present a Predictive Accumulative Function (PAF) to depict the predictability of each kind of activity. The effectiveness of the proposed method is evaluated on two experimental scenarios: activities with middle-level complexity and activities with high-level complexity. Our method achieves promising results and can predict global activity classes and local action units.


Action Unit Interest Point Rand Index Activity Prediction Predictable Characteristic 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Hamid, R., Maddi, S., Johnson, A., Bobick, A., Essa, I., Isbell, C.: A novel sequence representation for unsupervised analysis of human activities. Artificial Intelligence 173, 1221–1244 (2009)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Brendel, W., Todorovic, S.: Learning spatiotemporal graphs of human activities. In: IEEE ICCV, pp. 778–785 (2011)Google Scholar
  3. 3.
    Pei, M., Jia, Y., Zhu, S.-C.: Parsing video events with goal inference and intent prediction. In: IEEE ICCV (2011)Google Scholar
  4. 4.
    Ryoo, M.S.: Human activity prediction: Early recognition of ongoing activities from streaming videos. In: IEEE ICCV (2011)Google Scholar
  5. 5.
    Ivanov, Y.A., Bobick, A.F.: Recognition of visual activities and interactions by stochastic parsing. IEEE PAMI 22(8), 852–872 (2000)CrossRefGoogle Scholar
  6. 6.
    Ryoo, M.S., Aggarwal, J.K.: Recognition of composite human activities through context-free grammar based representation. In: CVPR (2006)Google Scholar
  7. 7.
    Si, Z., Pei, M., Yao, B., Zhu, S.-C.: Unsupervised learning of event and-or grammar and semantics from video. In: IEEE ICCV (2011)Google Scholar
  8. 8.
    Morariu, V.I., Davis, L.S.: Multi-agent event recognition in structured scenarios. In: CVPR (2011)Google Scholar
  9. 9.
    Brendel, W., Fern, A., Todorovic, S.: Probabilistic event logic for interval-based event recognition. In: CVPR (2011)Google Scholar
  10. 10.
    Gaidon, A., Harchaoui, Z., Schmid, C.: Actom sequence models for efficient action detection. In: CVPR (2011)Google Scholar
  11. 11.
    Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 392–405. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  12. 12.
    Kwak, S., Han, B., Han, J.H.: Scenario-based video event recognition by constraint flow. In: CVPR (2011)Google Scholar
  13. 13.
    Fan, Q., Bobbitt, R., Zhai, Y., Yanagawa, A., Pankanti, S., Hampapur, A.: Recognition of repetitive sequential human activity. In: CVPR (2009)Google Scholar
  14. 14.
    Gupta, A., Kembhavi, A., Davis, L.: Observing human-object interactions: Using spatial and functional compatibility for recognition. PAMI 31, 1775–1789 (2009)CrossRefGoogle Scholar
  15. 15.
    Turaga, P.K., Veeraraghavan, A., Chellappa, R.: From videos to verbs: Mining videos for activities using a cascade of dynamical systems. In: CVPR (2007)Google Scholar
  16. 16.
    Laptev, I.: On space-time interest points. Int. J. Comput. Vision 64, 107–123 (2005)CrossRefGoogle Scholar
  17. 17.
    Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. IJCV 79, 299–318 (2008)CrossRefGoogle Scholar
  18. 18.
    Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011)Google Scholar
  19. 19.
    Collins, R., Zhou, X., Teh, S.K.: An open source tracking testbed and evaluation web site. In: IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (2005)Google Scholar
  20. 20.
    Begleiter, R., El-Yaniv, R., Yona, G.: On prediction using variable order markov models. J. Artif. Intell. Res (JAIR) 22, 385–421 (2004)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Ron, D., Singer, Y., Tishby, N.: The power of amnesia: Learning probabilistic automata with variable memory length. Machine Learning 25, 117–149 (1996)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Kang Li
    • 1
  • Jie Hu
    • 2
  • Yun Fu
    • 1
  1. 1.Department of ECE and College of CISNortheastern UniversityBostonUSA
  2. 2.Department of CSEState University of New YorkBuffaloUSA

Personalised recommendations