Abstract
We describe in this paper our gesture detection and recognition system for the 2014 ChaLearn Looking at People (Track 3: Gesture Recognition) organized by ChaLearn in conjunction with the ECCV 2014 conference. The competition’s task was to learn a vacabulary of 20 types of Italian gestures and detect them in sequences. Our system adopts a multi-modality approach for detecting as well as recognizing the gestures. The goal of our approach is to identify semantically meaningful contents from dense sampling spatio-temporal feature space for gesture recognition. To achieve this, we develop three concepts under the random forest framework: un-supervision; discrimination; and randomization. Un-supervision learns spatio-temporal features from two channels (grayscale and depth) of RGB-D video in an unsupervised way. Discrimination extracts the information in dense sampling spatio-temporal space effectively. Randomization explores the dense sampling spatio-temporal feature space efficiently. An evaluation of our approach shows that we achieve a mean Jaccard Index of \(0.6489\), and a mean average accuracy of \(90.3\,\%\) over the test dataset.
Chapter PDF
Similar content being viewed by others
References
Bosch, A., Zisserman, A., Muoz, X.: Image classification using random forests and ferns. In: IEEE 11th International Conference on Computer Vision, ICCV 2007, pp. 1–8, October 2007
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Chen, G., Clarke, D., Giuliani, M., Gaschler, A., Knoll, A.: Combining unsupervised learning and discrimination for 3d action recognition. Signal Processing (2014)
Chen, G., Clarke, D., Knoll, A.: Learning weighted joint-based features for action recognition using depth camera. In: International Conference on Computer Vision Theory and Applications (2014)
Chen, G., Giuliani, M., Clarke, D., Knoll, A.: Action recognition using ensemble weighted multi-instance learning. In: IEEE International Conference on Robotics and Automation (2014)
Chen, G., Zhang, F., Giuliani, M., Buckl, C., Knoll, A.: Unsupervised learning spatio-temporal features for human activity recognition from RGB-D video data. In: Herrmann, G., Pearson, M.J., Lenz, A., Bremner, P., Spiers, A., Leonards, U. (eds.) ICSR 2013. LNCS, vol. 8239, pp. 341–350. Springer, Heidelberg (2013)
Escalera, S., Bar, X., Gonzlez, J., Bautista, M.A., Madadi, M., Reyes, M., Ponce, V., Escalante, H.J., Shotton, J., Guyon, I.: Chalearn looking at people challenge 2014: dataset and results. In: Proceedings of the ChaLearn Looking at People 2014 Workshop, ECCV 2014 (2014)
Escalera, S., Gonzlez, J., Bar, X., Reyes, M., Lops, O., Guyon, I., Athitsos, V., Escalante, H.J.: Multi-modal gesture recognition challenge 2013: dataset and results. In: Chalearn Multi-Modal Gesture Recognition Workshop, International Conference on Multimodal Interaction (2013)
Gaschler, A., Huth, K., Giuliani, M., Kessler, I., de Ruiter, J., Knoll, A.: Modelling state of interaction from head poses for social Human-Robot Interaction. In: ACM/IEEE HCI Conference on Gaze in Human-Robot Interaction Workshop (2012)
Hadfield, S., Bowden, R.: Hollywood 3d: recognizing actions in 3d natural scenes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3398–3405 (2013)
Laptex, I.: On space-time interest points. International Journal of Computer Vision 64, 107–123 (2005)
Le, Q., Zou, W., Yeung, S., Ng, A.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3361–3368 (2011)
Lu, D.V., Pileggi, A., Smart, W.D.: Multi-person motion capture dataset for analyzing human interaction. In: RSS 2011 Workshop on Human-Robot Interaction. RSS, Los Angeles, California, July 2011
Wang, J., Liu, Z., Chorowski, J., Chen, Z., Wu, Y.: Robust 3D action recognition with random occupancy patterns. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 872–885. Springer, Heidelberg (2012)
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1290–1297 (2012)
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3360–3367, June 2010
Yao, B., Khosla, A., Fei-Fei, L.: Combining randomization and discrimination for fine-grained image categorization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2011
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Chen, G. et al. (2015). Multi-modality Gesture Detection and Recognition with Un-supervision, Randomization and Discrimination. In: Agapito, L., Bronstein, M., Rother, C. (eds) Computer Vision - ECCV 2014 Workshops. ECCV 2014. Lecture Notes in Computer Science(), vol 8925. Springer, Cham. https://doi.org/10.1007/978-3-319-16178-5_43
Download citation
DOI: https://doi.org/10.1007/978-3-319-16178-5_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16177-8
Online ISBN: 978-3-319-16178-5
eBook Packages: Computer ScienceComputer Science (R0)