Abstract
In this paper, we develop an effective method of detecting and tracking hands in uncontrolled videos based on multiple cues including hand shape, skin color, upper body position and flow information. We apply our hand detection results to perform fine-grained human action recognition. We demonstrate that motion features extracted from hand areas can help classify actions even when they look familiar and they are associated with visually similar objects. We validate our method of detecting and tracking hands on VideoPose2.0 dataset and apply our method of classifying actions to the playing-instrument group of UCF-101 dataset. Experimental results show the effectiveness of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
References
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: Proceedings of IEEE Computer Vision and Pattern Recognition, pp. 17–24 (2010)
Yao, B., Fei-Fei, L.: Discovering object functionality. In: Proceedings of IEEE International Conference on Computer Vision, pp. 2512–2519 (2013)
Delaitre, V., Sivic, J., Laptev, I.: Learning person-object interactions for action recognition in still images. In: Advances in Neural Information Processing Systems (2011)
Prest, A., Schmid, C., Ferrari, V.: Weakly supervised learning of interactions between humans and objects. IEEE Trans. Pattern Anal. Mach. Intell. 34, 601–614 (2012)
Prest, A., Ferrari, V., Schmid, C.: Explicit modeling of human-object interactions in realistic videos. IEEE Trans. Pattern Anal. Mach. Intell. 35, 835–848 (2013)
Khurram, S., Amir, R., Mubarak, S.: UCF101: A dataset of 101 human actions classes from videos in the wild. CoRR abs/1212.0402 (2012)
Binh, N.D., Shuichi, E., Ejima, T.: Real-time hand tracking and gesture recognition system. In: Proceedings of International Conference on Graphics, Vision and Image Processing, pp. 19–21 (2005)
Manresa, C., Varona, J., Mas, R., Perales, F.: Hand tracking and gesture recognition for human-computer interaction. Electron. Lett. Comput. Vis. Image Anal. 5, 96–104 (2005)
Angelopoulou, A., RodrÃguez, J.G., Psarrou, A.: Learning 2d hand shapes using the topology preservation model GNG. In: Proceedings of European Conference on Computer Vision, pp. 313–324 (2006)
Ren, Z., Yuan, J., Zhang, Z.: Robust hand gesture recognition based on finger-earth mover’s distance with a commodity depth camera. In: Proceedings of ACM International Conference on Multimedia, pp. 1093–1096 (2011)
Van den Bergh, M., Van Gool, L.: Combining RGB and ToF cameras for real-time 3d hand gesture interaction. In: IEEE Workshop on Applications of Computer Vision, pp. 66–72 (2011)
Cerlinca, T.I., Pentiuc, S.G.: Robust 3D hand detection for gestures recognition. In: Brazier, F.M.T., Nieuwenhuis, K., Pavlin, G., Warnier, M., Badica, C. (eds.) Intelligent Distributed Computing V. SCI, vol. 382, pp. 259–264. Springer, Heidelberg (2011)
Oikonomidis, I., Lourakis, M.I., Argyros, A.: Evolutionary quasi-random search for hand articulations tracking. In: Proceedings of IEEE Computer Vision and Pattern Recognition (2014)
Sapp, B., Weiss, D., Taskar, B.: Parsing human motion with stretchable models. In: Proceedings of IEEE Computer Vision and Pattern Recognition, pp. 1281–1288 (2011)
Yuan, Q., Sclaroff, S., Athitsos, V.: Automatic 2d hand tracking in video sequences. In: IEEE Workshops on Application of Computer Vision, vol. 1, pp. 250–256 (2005)
Baltzakis, H., Argyros, A.A., Lourakis, M.I.A., Trahanias, P.: Tracking of human hands and faces through probabilistic fusion of multiple visual cues. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds.) ICVS 2008. LNCS, vol. 5008, pp. 33–42. Springer, Heidelberg (2008)
Filipovych, R., Ribeiro, E.: Recognizing primitive interactions by exploring actor-object states. In: Proceedings of IEEE Computer Vision and Pattern Recognition, pp. 1–7 (2008)
Filipovych, R., Ribeiro, E.: Robust sequence alignment for actor-object interaction recognition: discovering actor-object states. Comput. Vis. Image Underst. 115, 177–193 (2011)
Gupta, A., Kembhavi, A., Davis, L.: Observing human-object interactions: using spatial and functional compatibility for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31, 1775–1789 (2009)
Weinzaepfel, P., Revaud, J., Harchaoui, Z., Schmid, C.: DeepFlow: large displacement optical flow with deep matching. In: Proceedings of IEEE International Conference on Computer Vision (2013)
Mittal, A., Zisserman, A., Torr, P.H.: Hand detection using multiple proposals. In: Proceedings of British Machine Vision Conference, pp. 1–11 (2011)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1627–1645 (2010)
Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2, 121–167 (1998)
Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33, 898–916 (2011)
Bourdev, L., Malik, J.: Poselets: body part detectors trained using 3d human pose annotations. In: Proceedings of IEEE International Conference on Computer Vision (2009)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of IEEE Computer Vision and Pattern Recognition, vol. 1, pp. 886–893 (2005)
Karlinsky, L., Dinerstein, M., Harari, D., Ullman, S.: The chains model for detecting parts by their context. In: Proceedings of IEEE Computer Vision and Pattern Recognition, pp. 25–32 (2010)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn Res. 6, 1453–1484 (2005)
Acknowledgement
This work was supported by JSPS KAKENHI Grant Number 26011435.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Do, N.H., Yanai, K. (2015). Hand Detection and Tracking in Videos for Fine-Grained Action Recognition. In: Jawahar, C., Shan, S. (eds) Computer Vision - ACCV 2014 Workshops. ACCV 2014. Lecture Notes in Computer Science(), vol 9008. Springer, Cham. https://doi.org/10.1007/978-3-319-16628-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-16628-5_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16627-8
Online ISBN: 978-3-319-16628-5
eBook Packages: Computer ScienceComputer Science (R0)