Advertisement

Mixture of Heterogeneous Attribute Analyzers for Human Action Detection

  • Yong Pei
  • Bingbing NiEmail author
  • Indriyati Atmosukarto
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8925)

Abstract

We propose a human action detection framework called “mixture of heterogeneous attribute analyzer”. This framework integrates heterogeneous attributes learned from static and dynamic, local and global video features, to boost the action detection performance. To this end, we first detect and track multiple people by SVM-HOG detector and tracklet generation. Multiple short human tracklets are then linked into long trajectories by spatio-temporal matching. Human key poses and local dense motion trajectories are then extracted within the tracked human bounding box sequences. Second, we propose a mining method to learn discriminative attributes from these three feature modalities: human bounding box trajectory, key pose and local dense motion trajectories. Finally, the learned discriminative attributes are integrated in a latent structural max-margin learning framework which also explores the spatio-temporal relationship between heterogeneous feature attributes. Experiments on the ChaLearn 2014 human action dataset demonstrate the superior detection performance of the proposed framework.

Keywords

Human trajectory Key pose Local dense trajectories Discriminative mining Latent structural max-margin learning 

References

  1. 1.
    Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: VS-PETS (2005)Google Scholar
  2. 2.
    Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3d gradients. In: British Machine Vision Conference (2008)Google Scholar
  3. 3.
    Laptev, I., Lindeberg, T.: Space-time interest points. In: International Conference on Computer Vision (2003)Google Scholar
  4. 4.
    Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 392–405. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  5. 5.
    Tang, K., Fei-Fei, L., Koller, D.: Learning latent temporal structure for complex event detection. In: International Conference on Computer Vision and Pattern Recognition (2012)Google Scholar
  6. 6.
    Raptis, M., Kokkinos, I., Soatto, S.: Discovering discriminative action parts from mid-level video representations. In: International Conference on Computer Vision and Pattern Recognition (2012)Google Scholar
  7. 7.
    Wang, H., Kläser, A., Schmid, C., Cheng-Lin, L.: Action recognition by dense trajectories. In: International Conference on Computer Vision and Pattern Recognition, pp. 3169–3176 (2011)Google Scholar
  8. 8.
    Wang, Y., Mori, G.: Hidden part models for human action recognition: Probabilistic versus max margin. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(7), 1310–1323 (2011)CrossRefGoogle Scholar
  9. 9.
    Ryoo, M.S., Aggarwal, J.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: International Conference on Computer Vision, pp. 1593–1600 (2009)Google Scholar
  10. 10.
    Yamato, J., Ohya, J., Ishii, K.: Recognizing human action in time-sequential images using hidden markov model. In: International Conference on Computer Vision and Pattern Recognition, pp. 379–385 (1992)Google Scholar
  11. 11.
    Lv, F., Nevatia, R.: Single view human action recognition using key pose matching and viterbi path searching. In: International Conference Computer Vision and Pattern Recognition (2007)Google Scholar
  12. 12.
    Vahdat, A., Gao, B., Ranjbar, M., Mori, G.: A discriminative key pose sequence model for recognizing human interactions. In: ICCV Workshop, pp. 1729–1736 (2011)Google Scholar
  13. 13.
    Raptis, M., Sigal, L.: Poselet key-framing: A model for human activity recognition. In: International Conference on Computer Vision and Pattern Recognition, pp. 2650–2657 (2013)Google Scholar
  14. 14.
    Bourdev, L., Malik, J.: Poselets: body part detectors trained using 3d human pose annotations. In: International Conference on Computer Vision (2009)Google Scholar
  15. 15.
    Snchez, D., Bautista, M., Escalera, S.: Hupba 8k+: Dataset and ecoc-graphcut based segmentation of human limbs. Neurocomputing (2014)Google Scholar
  16. 16.
    Gupta, A., Davis, L.: Objects in action: an approach for combining action understanding and object perception. In: IEEE Conference on Computer Vision and Pattern Recognition (2007)Google Scholar
  17. 17.
    Escorcia, V., Niebles, J.: Spatio-temporal human-object interactions for action recognition in videos. In: IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 508–514 (2013)Google Scholar
  18. 18.
    Prest, A., Ferrari, V., Schmid, C.: Explicit modeling of human-object interactions in realistic videos. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(4), 835–848 (2013)CrossRefGoogle Scholar
  19. 19.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: International Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005)Google Scholar
  20. 20.
    Kuhn, H.W.: The hungarian method for the assignment problem. Naval Research Logistics Quarterly 2, 83–97 (1955)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Juneja, M., Vedaldi, A., Jawahar, C.V., Zisserman, A.: Blocks that shout: distinctive parts for scene classification. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)Google Scholar
  22. 22.
    Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision 103(1), 60–79 (2013)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3337–3344 (2011)Google Scholar
  24. 24.
    Joachims, T., Finley, T., Yu, C.N.J.: Cutting-plane training of structural svms. Machine Learning 77(1), 27–59 (2009)CrossRefzbMATHGoogle Scholar
  25. 25.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  26. 26.
    Wang, H., Schmid, C.: Action recognition with improved trajectories. In: International Conference on Computer Vision (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Advanced Digital Sciences CenterSingaporeSingapore

Personalised recommendations