Hand Detection and Tracking in Videos for Fine-Grained Action Recognition

Do, Nga H.; Yanai, Keiji

doi:10.1007/978-3-319-16628-5_2

Nga H. Do¹⁵ &
Keiji Yanai¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9008))

Included in the following conference series:

Asian Conference on Computer Vision

1888 Accesses
1 Citations

Abstract

In this paper, we develop an effective method of detecting and tracking hands in uncontrolled videos based on multiple cues including hand shape, skin color, upper body position and flow information. We apply our hand detection results to perform fine-grained human action recognition. We demonstrate that motion features extracted from hand areas can help classify actions even when they look familiar and they are associated with visually similar objects. We validate our method of detecting and tracking hands on VideoPose2.0 dataset and apply our method of classifying actions to the playing-instrument group of UCF-101 dataset. Experimental results show the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: Proceedings of IEEE Computer Vision and Pattern Recognition, pp. 17–24 (2010)
Google Scholar
Yao, B., Fei-Fei, L.: Discovering object functionality. In: Proceedings of IEEE International Conference on Computer Vision, pp. 2512–2519 (2013)
Google Scholar
Delaitre, V., Sivic, J., Laptev, I.: Learning person-object interactions for action recognition in still images. In: Advances in Neural Information Processing Systems (2011)
Google Scholar
Prest, A., Schmid, C., Ferrari, V.: Weakly supervised learning of interactions between humans and objects. IEEE Trans. Pattern Anal. Mach. Intell. 34, 601–614 (2012)
Article Google Scholar
Prest, A., Ferrari, V., Schmid, C.: Explicit modeling of human-object interactions in realistic videos. IEEE Trans. Pattern Anal. Mach. Intell. 35, 835–848 (2013)
Article Google Scholar
Khurram, S., Amir, R., Mubarak, S.: UCF101: A dataset of 101 human actions classes from videos in the wild. CoRR abs/1212.0402 (2012)
Google Scholar
Binh, N.D., Shuichi, E., Ejima, T.: Real-time hand tracking and gesture recognition system. In: Proceedings of International Conference on Graphics, Vision and Image Processing, pp. 19–21 (2005)
Google Scholar
Manresa, C., Varona, J., Mas, R., Perales, F.: Hand tracking and gesture recognition for human-computer interaction. Electron. Lett. Comput. Vis. Image Anal. 5, 96–104 (2005)
Google Scholar
Angelopoulou, A., Rodríguez, J.G., Psarrou, A.: Learning 2d hand shapes using the topology preservation model GNG. In: Proceedings of European Conference on Computer Vision, pp. 313–324 (2006)
Google Scholar
Ren, Z., Yuan, J., Zhang, Z.: Robust hand gesture recognition based on finger-earth mover’s distance with a commodity depth camera. In: Proceedings of ACM International Conference on Multimedia, pp. 1093–1096 (2011)
Google Scholar
Van den Bergh, M., Van Gool, L.: Combining RGB and ToF cameras for real-time 3d hand gesture interaction. In: IEEE Workshop on Applications of Computer Vision, pp. 66–72 (2011)
Google Scholar
Cerlinca, T.I., Pentiuc, S.G.: Robust 3D hand detection for gestures recognition. In: Brazier, F.M.T., Nieuwenhuis, K., Pavlin, G., Warnier, M., Badica, C. (eds.) Intelligent Distributed Computing V. SCI, vol. 382, pp. 259–264. Springer, Heidelberg (2011)
Chapter Google Scholar
Oikonomidis, I., Lourakis, M.I., Argyros, A.: Evolutionary quasi-random search for hand articulations tracking. In: Proceedings of IEEE Computer Vision and Pattern Recognition (2014)
Google Scholar
Sapp, B., Weiss, D., Taskar, B.: Parsing human motion with stretchable models. In: Proceedings of IEEE Computer Vision and Pattern Recognition, pp. 1281–1288 (2011)
Google Scholar
Yuan, Q., Sclaroff, S., Athitsos, V.: Automatic 2d hand tracking in video sequences. In: IEEE Workshops on Application of Computer Vision, vol. 1, pp. 250–256 (2005)
Google Scholar
Baltzakis, H., Argyros, A.A., Lourakis, M.I.A., Trahanias, P.: Tracking of human hands and faces through probabilistic fusion of multiple visual cues. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds.) ICVS 2008. LNCS, vol. 5008, pp. 33–42. Springer, Heidelberg (2008)
Chapter Google Scholar
Filipovych, R., Ribeiro, E.: Recognizing primitive interactions by exploring actor-object states. In: Proceedings of IEEE Computer Vision and Pattern Recognition, pp. 1–7 (2008)
Google Scholar
Filipovych, R., Ribeiro, E.: Robust sequence alignment for actor-object interaction recognition: discovering actor-object states. Comput. Vis. Image Underst. 115, 177–193 (2011)
Article Google Scholar
Gupta, A., Kembhavi, A., Davis, L.: Observing human-object interactions: using spatial and functional compatibility for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31, 1775–1789 (2009)
Article Google Scholar
Weinzaepfel, P., Revaud, J., Harchaoui, Z., Schmid, C.: DeepFlow: large displacement optical flow with deep matching. In: Proceedings of IEEE International Conference on Computer Vision (2013)
Google Scholar
Mittal, A., Zisserman, A., Torr, P.H.: Hand detection using multiple proposals. In: Proceedings of British Machine Vision Conference, pp. 1–11 (2011)
Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1627–1645 (2010)
Article Google Scholar
Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2, 121–167 (1998)
Article Google Scholar
Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33, 898–916 (2011)
Article Google Scholar
Bourdev, L., Malik, J.: Poselets: body part detectors trained using 3d human pose annotations. In: Proceedings of IEEE International Conference on Computer Vision (2009)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of IEEE Computer Vision and Pattern Recognition, vol. 1, pp. 886–893 (2005)
Google Scholar
Karlinsky, L., Dinerstein, M., Harari, D., Ullman, S.: The chains model for detecting parts by their context. In: Proceedings of IEEE Computer Vision and Pattern Recognition, pp. 25–32 (2010)
Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)
Google Scholar
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn Res. 6, 1453–1484 (2005)
MATH MathSciNet Google Scholar

Download references

Acknowledgement

This work was supported by JSPS KAKENHI Grant Number 26011435.

Author information

Authors and Affiliations

Department of Informatics, The University of Electro-Communications, Tokyo, 1-5-1 Chofugaoka, Chofu, Tokyo, 182-8585, Japan
Nga H. Do & Keiji Yanai

Authors

Nga H. Do
View author publications
You can also search for this author in PubMed Google Scholar
Keiji Yanai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nga H. Do .

Editor information

Editors and Affiliations

Center for Visual Information Technology, International Institute of Information Technology, Hyderabad, India
C.V. Jawahar
Institue of Computing Technology, Chinese Academy of Sciences, Beijing, China
Shiguang Shan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Do, N.H., Yanai, K. (2015). Hand Detection and Tracking in Videos for Fine-Grained Action Recognition. In: Jawahar, C., Shan, S. (eds) Computer Vision - ACCV 2014 Workshops. ACCV 2014. Lecture Notes in Computer Science(), vol 9008. Springer, Cham. https://doi.org/10.1007/978-3-319-16628-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-16628-5_2
Published: 12 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16627-8
Online ISBN: 978-3-319-16628-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics