On Multi-modal Fusion for Freehand Gesture Recognition

Schak, Monika; Gepperth, Alexander

doi:10.1007/978-3-030-61609-0_68

Monika Schak¹¹ &
Alexander Gepperth¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12396))

Included in the following conference series:

International Conference on Artificial Neural Networks

3042 Accesses
3 Citations

Abstract

We present a study of multi-modal freehand gesture recognition relying on three sensory modalities. The modalities are RGB images, depth data, and acceleration data from an IMD attached to the hand. Based on a new self-recorded dataset, we initially establish the ability of a deep Long Short-Term Memory (LSTM) network to correctly classify individual data streams from each modality. Notably, classifying the IMD stream alone generates very good results already. In addition, we investigate two different strategies of multi-modal fusion, since there is no agreement in the literature as to which strategy is preferable. Combining the modalities leads to better recognition performance. Most importantly, fusion considerably improves ahead-of-time classification, i.e., gesture class estimates before sequences are completed, for classes that are difficult to classify on their own.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Gesture MNIST: A New Free-Hand Gesture Dataset

Gesture Recognition and Multi-modal Fusion on a New Hand Gesture Dataset

Robustness of Deep LSTM Networks in Freehand Gesture Recognition

References

Angelaki, D.E., Gu, Y., DeAngelis, G.C.: Multisensory integration: psychophysics, neurophysiology, and computation. Curr. Opinion Neurobiol. 19(4), 452–458 (2009)
Article Google Scholar
Beauchamp, M.S.: See me, hear me, touch me: multisensory integration in lateral occipital-temporal cortex. Curr. Opinion Neurobiol. 15(2), 145–153 (2005)
Article Google Scholar
Bradski, G.: The OpenCV library. Dr. Dobb’s J. Softw. Tools (2000)
Google Scholar
Caron, L.-C., Filliat, D., Gepperth, A.: Neural network fusion of color, depth and location for object instance recognition on a mobile robot. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8927, pp. 791–805. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16199-0_55
Chapter Google Scholar
Chen, C., Jafari, R., Kehtarnavaz, N.: Improving human action recognition using fusion of depth camera and inertial sensors. IEEE Trans. Hum. Mach. Syst. 45 (2014). https://doi.org/10.1109/THMS.2014.2362520
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 886–893 (2005)
Google Scholar
Ernst, M.O., Banks, M.S.: Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415(6870), 429–433 (2002)
Article Google Scholar
Gepperth, A.R., Hecht, T., Gogate, M.: A generative learning approach to sensor fusion and change detection. Cogn. Comput. 8(5), 806–817 (2016)
Article Google Scholar
Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning, pp. 1764–1772. No. 2 in Proceedings of Machine Learning Research, PMLR, Bejing, China, 22–24 June 2014. http://proceedings.mlr.press/v32/graves14.html
Imran, J., Raman, B.: Evaluating fusion of RGB-D and inertial sensors for multimodal human action recognition. J. Ambient Intell. Hum. Comput. February 2019. https://doi.org/10.1007/s12652-019-01239-9
Khaire, P., Kumar, P., Imran, J.: Combining CNN streams of RGB-D and skeletal data for human activity recognition. Pattern Recognit. Lett. 115, 107–116 (2018)
Article Google Scholar
Liu, K., Chen, C., Jafari, R., Kehtarnavaz, N.: Fusion of inertial and depth sensor data for robust hand gesture recognition. IEEE Sens. J. 14(6), 1898–1903 (2014)
Article Google Scholar
McConnell, R.: Method of and Apparatus for Pattern Recognition, January 1986
Google Scholar
Rusu, R.B., Blodow, N., Marton, Z.C., Beetz, M.: Aligning point cloud views using persistent feature histograms. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3384–3391. IEEE (2008). https://doi.org/10.1109/IROS.2008.4650967
Sachara, F., Kopinski, T., Gepperth, A., Handmann, U.: Free-hand gesture recognition with 3D-CNNs for in-car infotainment control in real-time. In: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), pp. 959–964, October 2017. https://doi.org/10.1109/ITSC.2017.8317684
Sarkar, A., Gepperth, A., Handmann, U., Kopinski, T.: Dynamic hand gesture recognition for mobile systems using deep LSTM. In: Horain, P., Achard, C., Mallem, M. (eds.) IHCI 2017. LNCS, vol. 10688, pp. 19–31. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-72038-8_3
Chapter Google Scholar
Stein, S., McKenna, S.J.: Combining embedded accelerometers with computer vision for recognizing food preparation activities. In: Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp 2013, pp. 729–738. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2493432.2493482
Tran, T., et al.: A multi-modal multi-view dataset for human fall analysis and preliminary investigation on modality. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 1947–1952, August 2018. https://doi.org/10.1109/ICPR.2018.8546308
William, T., Freeman, M.R.: Orientation histograms for hand gesture recognition. Technical report TR94-03, MERL - Mitsubishi Electric Research Laboratories, Cambridge, MA 02139, December 1994. https://www.merl.com/publications/TR94-03/

Download references

Author information

Authors and Affiliations

Fulda University of Applied Sciences, 36037, Fulda, Germany
Monika Schak & Alexander Gepperth

Authors

Monika Schak
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Gepperth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Monika Schak or Alexander Gepperth .

Editor information

Editors and Affiliations

Department of Applied Informatics, Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kgs. Lyngby, Denmark
Paolo Masulli
Department of Informatics, University of Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schak, M., Gepperth, A. (2020). On Multi-modal Fusion for Freehand Gesture Recognition. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12396. Springer, Cham. https://doi.org/10.1007/978-3-030-61609-0_68

Download citation

DOI: https://doi.org/10.1007/978-3-030-61609-0_68
Published: 14 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61608-3
Online ISBN: 978-3-030-61609-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On Multi-modal Fusion for Freehand Gesture Recognition

Abstract

Access this chapter

Similar content being viewed by others

Gesture MNIST: A New Free-Hand Gesture Dataset

Gesture Recognition and Multi-modal Fusion on a New Hand Gesture Dataset

Robustness of Deep LSTM Networks in Freehand Gesture Recognition

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

On Multi-modal Fusion for Freehand Gesture Recognition

Abstract

Access this chapter

Similar content being viewed by others

Gesture MNIST: A New Free-Hand Gesture Dataset

Gesture Recognition and Multi-modal Fusion on a New Hand Gesture Dataset

Robustness of Deep LSTM Networks in Freehand Gesture Recognition

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation