Skip to main content

Hand Detection and Tracking in Videos for Fine-Grained Action Recognition

  • Conference paper
  • First Online:
Computer Vision - ACCV 2014 Workshops (ACCV 2014)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9008))

Included in the following conference series:

Abstract

In this paper, we develop an effective method of detecting and tracking hands in uncontrolled videos based on multiple cues including hand shape, skin color, upper body position and flow information. We apply our hand detection results to perform fine-grained human action recognition. We demonstrate that motion features extracted from hand areas can help classify actions even when they look familiar and they are associated with visually similar objects. We validate our method of detecting and tracking hands on VideoPose2.0 dataset and apply our method of classifying actions to the playing-instrument group of UCF-101 dataset. Experimental results show the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://vision.grasp.upenn.edu/cgi-bin/index.php?n=VideoLearning.VideoPose2.

  2. 2.

    http://groups.inf.ed.ac.uk/calvin/calvin_upperbody_detector/.

  3. 3.

    http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/.

  4. 4.

    http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2010/.

  5. 5.

    http://www.robots.ox.ac.uk/~vgg/data/stickmen/.

  6. 6.

    http://crcv.ucf.edu/ICCV13-Action-Workshop/.

References

  1. Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: Proceedings of IEEE Computer Vision and Pattern Recognition, pp. 17–24 (2010)

    Google Scholar 

  2. Yao, B., Fei-Fei, L.: Discovering object functionality. In: Proceedings of IEEE International Conference on Computer Vision, pp. 2512–2519 (2013)

    Google Scholar 

  3. Delaitre, V., Sivic, J., Laptev, I.: Learning person-object interactions for action recognition in still images. In: Advances in Neural Information Processing Systems (2011)

    Google Scholar 

  4. Prest, A., Schmid, C., Ferrari, V.: Weakly supervised learning of interactions between humans and objects. IEEE Trans. Pattern Anal. Mach. Intell. 34, 601–614 (2012)

    Article  Google Scholar 

  5. Prest, A., Ferrari, V., Schmid, C.: Explicit modeling of human-object interactions in realistic videos. IEEE Trans. Pattern Anal. Mach. Intell. 35, 835–848 (2013)

    Article  Google Scholar 

  6. Khurram, S., Amir, R., Mubarak, S.: UCF101: A dataset of 101 human actions classes from videos in the wild. CoRR abs/1212.0402 (2012)

    Google Scholar 

  7. Binh, N.D., Shuichi, E., Ejima, T.: Real-time hand tracking and gesture recognition system. In: Proceedings of International Conference on Graphics, Vision and Image Processing, pp. 19–21 (2005)

    Google Scholar 

  8. Manresa, C., Varona, J., Mas, R., Perales, F.: Hand tracking and gesture recognition for human-computer interaction. Electron. Lett. Comput. Vis. Image Anal. 5, 96–104 (2005)

    Google Scholar 

  9. Angelopoulou, A., Rodríguez, J.G., Psarrou, A.: Learning 2d hand shapes using the topology preservation model GNG. In: Proceedings of European Conference on Computer Vision, pp. 313–324 (2006)

    Google Scholar 

  10. Ren, Z., Yuan, J., Zhang, Z.: Robust hand gesture recognition based on finger-earth mover’s distance with a commodity depth camera. In: Proceedings of ACM International Conference on Multimedia, pp. 1093–1096 (2011)

    Google Scholar 

  11. Van den Bergh, M., Van Gool, L.: Combining RGB and ToF cameras for real-time 3d hand gesture interaction. In: IEEE Workshop on Applications of Computer Vision, pp. 66–72 (2011)

    Google Scholar 

  12. Cerlinca, T.I., Pentiuc, S.G.: Robust 3D hand detection for gestures recognition. In: Brazier, F.M.T., Nieuwenhuis, K., Pavlin, G., Warnier, M., Badica, C. (eds.) Intelligent Distributed Computing V. SCI, vol. 382, pp. 259–264. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  13. Oikonomidis, I., Lourakis, M.I., Argyros, A.: Evolutionary quasi-random search for hand articulations tracking. In: Proceedings of IEEE Computer Vision and Pattern Recognition (2014)

    Google Scholar 

  14. Sapp, B., Weiss, D., Taskar, B.: Parsing human motion with stretchable models. In: Proceedings of IEEE Computer Vision and Pattern Recognition, pp. 1281–1288 (2011)

    Google Scholar 

  15. Yuan, Q., Sclaroff, S., Athitsos, V.: Automatic 2d hand tracking in video sequences. In: IEEE Workshops on Application of Computer Vision, vol. 1, pp. 250–256 (2005)

    Google Scholar 

  16. Baltzakis, H., Argyros, A.A., Lourakis, M.I.A., Trahanias, P.: Tracking of human hands and faces through probabilistic fusion of multiple visual cues. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds.) ICVS 2008. LNCS, vol. 5008, pp. 33–42. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  17. Filipovych, R., Ribeiro, E.: Recognizing primitive interactions by exploring actor-object states. In: Proceedings of IEEE Computer Vision and Pattern Recognition, pp. 1–7 (2008)

    Google Scholar 

  18. Filipovych, R., Ribeiro, E.: Robust sequence alignment for actor-object interaction recognition: discovering actor-object states. Comput. Vis. Image Underst. 115, 177–193 (2011)

    Article  Google Scholar 

  19. Gupta, A., Kembhavi, A., Davis, L.: Observing human-object interactions: using spatial and functional compatibility for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31, 1775–1789 (2009)

    Article  Google Scholar 

  20. Weinzaepfel, P., Revaud, J., Harchaoui, Z., Schmid, C.: DeepFlow: large displacement optical flow with deep matching. In: Proceedings of IEEE International Conference on Computer Vision (2013)

    Google Scholar 

  21. Mittal, A., Zisserman, A., Torr, P.H.: Hand detection using multiple proposals. In: Proceedings of British Machine Vision Conference, pp. 1–11 (2011)

    Google Scholar 

  22. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1627–1645 (2010)

    Article  Google Scholar 

  23. Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2, 121–167 (1998)

    Article  Google Scholar 

  24. Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33, 898–916 (2011)

    Article  Google Scholar 

  25. Bourdev, L., Malik, J.: Poselets: body part detectors trained using 3d human pose annotations. In: Proceedings of IEEE International Conference on Computer Vision (2009)

    Google Scholar 

  26. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of IEEE Computer Vision and Pattern Recognition, vol. 1, pp. 886–893 (2005)

    Google Scholar 

  27. Karlinsky, L., Dinerstein, M., Harari, D., Ullman, S.: The chains model for detecting parts by their context. In: Proceedings of IEEE Computer Vision and Pattern Recognition, pp. 25–32 (2010)

    Google Scholar 

  28. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)

    Google Scholar 

  29. Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn Res. 6, 1453–1484 (2005)

    MATH  MathSciNet  Google Scholar 

Download references

Acknowledgement

This work was supported by JSPS KAKENHI Grant Number 26011435.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nga H. Do .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Do, N.H., Yanai, K. (2015). Hand Detection and Tracking in Videos for Fine-Grained Action Recognition. In: Jawahar, C., Shan, S. (eds) Computer Vision - ACCV 2014 Workshops. ACCV 2014. Lecture Notes in Computer Science(), vol 9008. Springer, Cham. https://doi.org/10.1007/978-3-319-16628-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16628-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16627-8

  • Online ISBN: 978-3-319-16628-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics