Skip to main content
Log in

Multi-feature hierarchical topic models for human behavior recognition

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Human behavior recognition is one important task of image processing and surveillance system. One main challenge of human behavior recognition is how to effectively model behaviors on condition of unconstrained videos due to tremendous variations from camera motion, background clutter, object appearance and so on. In this paper, we propose two novel Multi-Feature Hierarchical Latent Dirichlet Allocation models for human behavior recognition by extending the bag-of-word topic models such as the Latent Dirichlet Allocation model and the Multi-Modal Latent Dirichlet Allocation model. The two proposed models with three hierarchies including low-level visual features, feature topics, and behavior topics can effectively fuse two different types of features including motion and static visual features, avoid detecting or tracking the motion objects, and improve the recognition performance even if the features are extracted with a great amount of noise. Finally, we adopt the variational EM algorithm to learn the parameters of these models. Experiments on the YouTube dataset demonstrate the effectiveness of our proposed models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Hu W, Tan T, Wang L. A survey on visual surveillance of object motion and behaviors. IEEE Trans Syst Man Cybern C-Appl Rev, 2004, 34: 334–352

    Article  Google Scholar 

  2. Dollar P, Rabaud V, Cottrell G, et al. Behavior recognition via sparse spatio-temporal features. In: the 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, Beijing, 2005. 65–72

    Chapter  Google Scholar 

  3. Bay H, Ess A, Tuytelaars T, et al. SURF: speeded up robust features. Comput Vis Image Und, 2008, 110: 346–358

    Article  Google Scholar 

  4. Liu J G, Luo J B, Shah M. Recognizing realistic actions from video “in the wild”. In: International Conference on Computer Vision and Pattern Recognition, Florida, 2009. 1996–2003

    Google Scholar 

  5. Gelman A, Carlin J B, Stern H S, et al. Bayesian data analysis. 2nd ed. Chapman Hall/CRC Texts in Statistical Science, 2004

    MATH  Google Scholar 

  6. Wang X G, Ma X X, Eric W, et al. Unsupervised activity perception in crowded and complicated scenes using hierarchical Bayesian models. IEEE Trans Pattern Anal, 2009, 31: 539–555

    Article  Google Scholar 

  7. Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. J Mach Learn Res, 2003, 3: 993–1022

    MATH  Google Scholar 

  8. Blei D M, Jordan M I. Modeling annotated data. In: Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, 2003. 127–134

    Google Scholar 

  9. Yakhnenko O, Honavar V. Multi-modal hierarchical Dirichlet process model for predicting image annotation and image-object label correspondence. In: the 9th SIAM International Conference on Data Mining, Sparks, Nevada, 2009. 281–294

    Google Scholar 

  10. Bobick A, Davis J. The recognition of human movement using temporal templates. IEEE Trans Pattern Anal, 2001, 23: 257–267

    Article  Google Scholar 

  11. Schuldt C, Laptev I, Caputo B. Recognizing human actions: a local SVM approach. In: International Conference on Pattern Recognition, Cambridge, 2004. 32–36

    Google Scholar 

  12. Oikonomopoulos A, Patras I, Pantic M. Spatiotemporal salient points for visual recognition of human actions. IEEE Trans Syst Man Cybern B-Cybern, 2006, 36: 710–719

    Article  Google Scholar 

  13. Blank M, Gorelick L, Shechtman E, et al. Actions as space-time shapes. In: International Conference on Computer Vision, Beijing, 2005. 1395–1402

    Google Scholar 

  14. Seo H J, Milanfar P. Detection of human actions from a single example. In: International Conference on Computer Vision, Kyoto, 2009. 1965–1970

    Google Scholar 

  15. Fathi A, Mori G. Action recognition by learning mid-level motion features. In: International Conference on Computer Vision and Pattern Recognition, Alaska, 2008. 1–8

    Google Scholar 

  16. Mauthner T, Roth P M, Bischof H. Instant action recognition. In: the 16th Scandinavian Conference on Image Analysis, Oslo, 2009. 1–10

    Google Scholar 

  17. Brendel W, Todorovic S. Activities as time series of human postures. In: European Conference on Computer Vision, Crete, 2010. 721–734

    Google Scholar 

  18. MatiKainen P, Hebert M, Sukthankar R. Representing pairwise spatial and temporal relations for action recognition. In: European Conference on Computer Vision, Crete, 2010. 508–521

    Google Scholar 

  19. Lui Y M, Beveridge J R. Action classification on product manifolds. In: International Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 833–839

    Google Scholar 

  20. Li Y, Fermuller C, Aloimonos Y, et al. Learning shift-invariant sparse representation of actions. In: International Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 2630–2637

    Google Scholar 

  21. Wang L, Suter D. Learning and matching of dynamic shape manifolds for human action recognition. IEEE Trans Image Process, 2007, 16: 1646–1661

    Article  MathSciNet  Google Scholar 

  22. Gong S G, Xiang T. Recognition of group activities using dynamic probabilistic networks. In: International Conference on Computer Vision, Nice, 2003. 742–749

    Chapter  Google Scholar 

  23. Li W Q, Zhang Z Y, Liu Z C. Expandable data-driven graphical modeling of human actions based on salient postures. IEEE Trans Circ Syst for Vid, 2008, 18: 1499–1510

    Article  Google Scholar 

  24. Niebles J, Li F F. A hierarchical model of shape and appearance for human action classification. In: International Conference on Computer Vision and Pattern Recognition, Minnesota, 2007. 1–8

    Google Scholar 

  25. Nater F, Grabner H, Gool L V. Exploiting simple hierarchies for unsupervised human behavior analysis. In: International Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 2014–2021

    Google Scholar 

  26. Laptev I, Marszalek M, Schmid C, et al. Learning realistic human action movies. In: International Conference on Computer Vision and Pattern Recognition, Alaska, 2008. 1–8

    Google Scholar 

  27. Kratz L, Nishino K. Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models. In: International Conference on Computer Vision and Pattern Recognition, Florida, 2009. 1446–1453

    Google Scholar 

  28. Ikizler-Cinbis N, Sclaroff S. Object, scene and actions: combining multiple features for human action recognition. In: European Conference on Computer Vision, Crete, 2010. 494–507

    Google Scholar 

  29. Yao A, Gall G, Gool L V. A hough transform-based voting framework for action recognition. In: International Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 2061–2068

    Google Scholar 

  30. Niebles J C, Wang H C, Li F F. Unsupervised learning of human action categories using spatio-temporal words. Int J Comput Vision, 2008, 79: 299–318

    Article  Google Scholar 

  31. Wang Y, Mori G. Human action recognition by semi-latent topic models. IEEE Trans Pattern Anal, 2009, 31: 1762–1774

    Article  Google Scholar 

  32. Hospedale T, Gong S G, Xiang T. A markov clustering topic model for mining behavior in video. In: International Conference on Computer Vision, Kyoto, 2009. 1165–1172

    Google Scholar 

  33. Li H P, Liu J, Zhang S W. Hierarchical Latent Dirichlet Allocation models for realistic action recognition. In: International Conference on Acoustics, Speech, and Singal Processing, Prague, 2011. 1297–1300

    Google Scholar 

  34. Lowe D G. Distinctive image features from scale-invariant keypoints. Int J Comput Vision, 2004, 60: 91–110

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to HePing Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, H., Zhang, F. & Zhang, S. Multi-feature hierarchical topic models for human behavior recognition. Sci. China Inf. Sci. 57, 1–15 (2014). https://doi.org/10.1007/s11432-013-4794-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-013-4794-9

Keywords

Navigation