Propagative Hough Voting for Human Activity Recognition

  • Gang Yu
  • Junsong Yuan
  • Zicheng Liu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7574)


Hough-transform based voting has been successfully applied to both object and activity detections. However, most current Hough voting methods will suffer when insufficient training data is provided. To address this problem, we propose propagative Hough voting for activity analysis. Instead of letting local features vote individually, we perform feature voting using random projection trees (RPT) which leverage the low-dimension manifold structure to match feature points in the high-dimensional feature space. Our RPT can index the unlabeled feature points in an unsupervised way. After the trees are constructed, the label and spatial-temporal configuration information are propagated from the training samples to the testing data via RPT. The proposed activity recognition method does not rely on human detection and tracking, and can well handle the scale and intra-class variations of the activity patterns. The superior performances on two benchmarked activity datasets validate that our method outperforms the state-of-the-art techniques not only when there is sufficient training data such as in activity recognition, but also when there is limited training data such as in activity search with one query example.


Training Data Video Clip Activity Recognition Interest Point Activity Search 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Laptev, I.: On space-time interest points. International Journal of Computer Vision 64(2-3), 107–123 (2005)CrossRefGoogle Scholar
  2. 2.
    Yuan, J., Liu, Z., Wu, Y.: Discriminative Video Pattern Search for Efficient Action Detection. IEEE Trans. on PAMI (2011)Google Scholar
  3. 3.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proc. CVPR (2008)Google Scholar
  4. 4.
    Yuan, F., Prinet, V., Yuan, J.: Middle-Level Representation for Human Activities Recognition: the Role of Spatio-temporal Relationships. In: ECCV Workshop on Human Motion (2010)Google Scholar
  5. 5.
    Gaur, U., Zhu, Y., Song, B., Roy-Chowdhury, A.: A String of Feature Graphs Model for Recognition of Complex Activities in Natural Videos. In: ICCV (2011)Google Scholar
  6. 6.
    Ryoo, M.S.: Human Activity Prediction: Early Recognition of Ongoing Activities from Streaming Videos. In: ICCV (2011)Google Scholar
  7. 7.
    Gall, J., Yao, A., Razavi, N., Van Gool, L., Lempitsky, V.: Hough forests for object detection, tracking, and action recognition. PAMI, 2188–2202 (2011)Google Scholar
  8. 8.
    Ryoo, M.S., Chen, C., Aggarwal, J.: An overview of contest on semantic description of human activities, SDHA (2010)Google Scholar
  9. 9.
    Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: International Conference on Very Large Data Bases (VLDB), pp. 518–529 (1999)Google Scholar
  10. 10.
    Ryoo, M.S., Aggarwal, J.K.: Spatio-Temporal Relationship Match: Video Structure Comparison for Recognition of Complex Human Activities. In: ICCV (2009)Google Scholar
  11. 11.
    Amer, M.R., Todorovic, S.: A Chains Model for Localizing Participants of Group Activities in Videos. In: ICCV (2011)Google Scholar
  12. 12.
    Brendel, W., Todorovic, S.: Learning Spatiotemporal Graphs of Human Activities. In: ICCV (2011)Google Scholar
  13. 13.
    Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior Recognition via Sparse Spatio-Temporal Features. In: Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (2005)Google Scholar
  14. 14.
    Razavi, N., Gall, J., Van Gool, L.: Backprojection Revisited: Scalable Multi-view Object Detection and Similarity Metrics for Detections. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 620–633. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Choi, W., Savarese, S.: Learning Context for Collective Activity Recognition. In: CVPR (2011)Google Scholar
  16. 16.
    Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: CVPR (2010)Google Scholar
  17. 17.
    Leibe, B., Leonardis, A., Schiele, B.: Robust Object Detection with Interleaved Categorization and Segmentation. IJCV 77(1-3), 259–289 (2007)CrossRefGoogle Scholar
  18. 18.
    Dasgupta, S., Freund, Y.: Random projection trees and low dimensional manifolds. In: ACM Symposium on Theory of Computing (STOC), pp. 537–546 (2008)Google Scholar
  19. 19.
    Patron-perez, A., Marszalek, M., Zisserman, A., Reid, I.: High Five: Recognising human interactions in TV shows. In: BMVC (2010)Google Scholar
  20. 20.
    Yu, G., Yuan, J., Liu, Z.: Unsupervised Random Forest Indexing for Fast Action Search. In: CVPR (2011)Google Scholar
  21. 21.
    Moosmann, F., Nowak, E., Jurie, F.: Randomized clustering forests for image classification. PAMI 30, 1632–1646 (2008)CrossRefGoogle Scholar
  22. 22.
    Klaser, A., Marszalek, M.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Gang Yu
    • 1
  • Junsong Yuan
    • 1
  • Zicheng Liu
    • 2
  1. 1.School of Electrical and Electronic EngineeringNanyang Technological UniversitySingapore
  2. 2.Microsoft Research RedmondUSA

Personalised recommendations