Advertisement

Action Detection with Improved Dense Trajectories and Sliding Window

  • Zhixin ShuEmail author
  • Kiwon Yun
  • Dimitris Samaras
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8925)

Abstract

In this paper we describe an action/interaction detection system based on improved dense trajectories [19], multiple visual descriptors and bag-of-features representation. Given that the actions/interactions are not mutual exclusive, we train a binary classifier for every predefined action/interaction. We rely on a non-overlapped temporal sliding window to enable the temporal localization. We have tested our system in ChaLearn Looking at People Challenge 2014 Track 2 dataset [1, 2]. We obtained 0.4226 average overlap, which is the 3rd place in the track of the challenge. Finally, we provide an extensive analysis of the performance of this system on different actions and provide possible ways to improve a general action detection system.

Keywords

Video analysis Action recognition Action detection Dense trajectories 

References

  1. 1.
    Escalera, S., et al.: ChaLearn looking at people challenge 2014: dataset and results. In: Bronstein, M., Agapito, L., Rother, C. (eds.) Computer Vision - ECCV 2014 Workshops. LNCS, vol. 8925, pp. 459–473. Springer, Heidelberg (2015)Google Scholar
  2. 2.
    Snchez, D., Bautista, M., Escalera, S.: HuPBA 8k+: Dataset and ECOC-GraphCut based Segmentation of Human Limbs. Neurocomputing (2014)Google Scholar
  3. 3.
    Poppe, R.: A survey on vision-based human action recognition. Image and Vision Computing 28(6), 976–990 (2010)CrossRefGoogle Scholar
  4. 4.
    Aggarwal, J., Ryoo, M.: Human activity analysis: A review. ACM Comput. Surv. 43(3), 16:1–16:43 (2011)CrossRefGoogle Scholar
  5. 5.
    Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Proceedings of the International Conference On Computer Vision, ICCV (2005)Google Scholar
  6. 6.
    Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, PETS (2005)Google Scholar
  7. 7.
    Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision 79(3), 299–318 (2008)CrossRefGoogle Scholar
  8. 8.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR (2004)Google Scholar
  9. 9.
    Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: Proceedings of the International Conference On Computer Vision, ICCV (2009)Google Scholar
  10. 10.
    Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: Hmdb: a large video database for human motion recognition. In: Proceedings of the International Conference On Computer Vision, ICCV (2011)Google Scholar
  11. 11.
    Ayazoglu, M., Yilmaz, B., Sznaier, M., Camps, O.: Finding causal interactions in video sequences. In: Proceedings of the International Conference On Computer Vision, ICCV (2013)Google Scholar
  12. 12.
    Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: Proceedings of the International Conference On Computer Vision, ICCV (2009)Google Scholar
  13. 13.
    Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body-pose features and multiple instance learning. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW (2012)Google Scholar
  14. 14.
    Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2010)Google Scholar
  15. 15.
    Wang, H., Ullah, M.M., Kläser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: British Machine Vision Conference, BMVC (2009)Google Scholar
  16. 16.
    Laptev, I.: On space-time interest points. International Journal of Computer Vision 64(2-3), 107–123 (2005)CrossRefGoogle Scholar
  17. 17.
    Ali, S., Basharat, A., Shah, M.: Chaotic invariants for human action recognition. In: Proceedings of the International Conference On Computer Vision, ICCV (2007)Google Scholar
  18. 18.
    Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2011)Google Scholar
  19. 19.
    Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the International Conference On Computer Vision, ICCV (2013)Google Scholar
  20. 20.
    Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2008)Google Scholar
  21. 21.
    Raptis, M., Kokkinos, I., Soatto, S.: Discovering discriminative action parts from mid-level video representations. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2012)Google Scholar
  22. 22.
    Zhang, W., Zhu, M., Derpanis, K.: From actemes to action: a strongly-supervised representation for detailed action understanding. In: Proceedings of the International Conference On Computer Vision, ICCV (2013)Google Scholar
  23. 23.
    Oneata, D., Verbeek, J., Schmid, C.: Efficient action localization with approximately normalized fisher vectors. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2014)Google Scholar
  24. 24.
    Simonyan, K., Zisserman, A.: Two-Stream Convolutional Networks for Action Recognition in Videos. arXiv:1406.2199v1 (2014)
  25. 25.
    Harris, C., Stephens, M.: A combined corner and edge detector. In: Alvey Vision Conference, vol. 15, p. 50 (1988)Google Scholar
  26. 26.
    Jain, M., Jgou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2013)Google Scholar
  27. 27.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2005)Google Scholar
  28. 28.
    Laptev, I., Marszaek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2008)Google Scholar
  29. 29.
    Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  30. 30.
    Raptis, M., Sigal, L.: Poselet key-framing: a model for human activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Stony Brook UniversityStony BrookUSA

Personalised recommendations