Skip to main content

On Multi-modal Fusion for Freehand Gesture Recognition

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2020 (ICANN 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12396))

Included in the following conference series:

Abstract

We present a study of multi-modal freehand gesture recognition relying on three sensory modalities. The modalities are RGB images, depth data, and acceleration data from an IMD attached to the hand. Based on a new self-recorded dataset, we initially establish the ability of a deep Long Short-Term Memory (LSTM) network to correctly classify individual data streams from each modality. Notably, classifying the IMD stream alone generates very good results already. In addition, we investigate two different strategies of multi-modal fusion, since there is no agreement in the literature as to which strategy is preferable. Combining the modalities leads to better recognition performance. Most importantly, fusion considerably improves ahead-of-time classification, i.e., gesture class estimates before sequences are completed, for classes that are difficult to classify on their own.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Angelaki, D.E., Gu, Y., DeAngelis, G.C.: Multisensory integration: psychophysics, neurophysiology, and computation. Curr. Opinion Neurobiol. 19(4), 452–458 (2009)

    Article  Google Scholar 

  2. Beauchamp, M.S.: See me, hear me, touch me: multisensory integration in lateral occipital-temporal cortex. Curr. Opinion Neurobiol. 15(2), 145–153 (2005)

    Article  Google Scholar 

  3. Bradski, G.: The OpenCV library. Dr. Dobb’s J. Softw. Tools (2000)

    Google Scholar 

  4. Caron, L.-C., Filliat, D., Gepperth, A.: Neural network fusion of color, depth and location for object instance recognition on a mobile robot. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8927, pp. 791–805. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16199-0_55

    Chapter  Google Scholar 

  5. Chen, C., Jafari, R., Kehtarnavaz, N.: Improving human action recognition using fusion of depth camera and inertial sensors. IEEE Trans. Hum. Mach. Syst. 45 (2014). https://doi.org/10.1109/THMS.2014.2362520

  6. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 886–893 (2005)

    Google Scholar 

  7. Ernst, M.O., Banks, M.S.: Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415(6870), 429–433 (2002)

    Article  Google Scholar 

  8. Gepperth, A.R., Hecht, T., Gogate, M.: A generative learning approach to sensor fusion and change detection. Cogn. Comput. 8(5), 806–817 (2016)

    Article  Google Scholar 

  9. Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning, pp. 1764–1772. No. 2 in Proceedings of Machine Learning Research, PMLR, Bejing, China, 22–24 June 2014. http://proceedings.mlr.press/v32/graves14.html

  10. Imran, J., Raman, B.: Evaluating fusion of RGB-D and inertial sensors for multimodal human action recognition. J. Ambient Intell. Hum. Comput. February 2019. https://doi.org/10.1007/s12652-019-01239-9

  11. Khaire, P., Kumar, P., Imran, J.: Combining CNN streams of RGB-D and skeletal data for human activity recognition. Pattern Recognit. Lett. 115, 107–116 (2018)

    Article  Google Scholar 

  12. Liu, K., Chen, C., Jafari, R., Kehtarnavaz, N.: Fusion of inertial and depth sensor data for robust hand gesture recognition. IEEE Sens. J. 14(6), 1898–1903 (2014)

    Article  Google Scholar 

  13. McConnell, R.: Method of and Apparatus for Pattern Recognition, January 1986

    Google Scholar 

  14. Rusu, R.B., Blodow, N., Marton, Z.C., Beetz, M.: Aligning point cloud views using persistent feature histograms. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3384–3391. IEEE (2008). https://doi.org/10.1109/IROS.2008.4650967

  15. Sachara, F., Kopinski, T., Gepperth, A., Handmann, U.: Free-hand gesture recognition with 3D-CNNs for in-car infotainment control in real-time. In: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), pp. 959–964, October 2017. https://doi.org/10.1109/ITSC.2017.8317684

  16. Sarkar, A., Gepperth, A., Handmann, U., Kopinski, T.: Dynamic hand gesture recognition for mobile systems using deep LSTM. In: Horain, P., Achard, C., Mallem, M. (eds.) IHCI 2017. LNCS, vol. 10688, pp. 19–31. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-72038-8_3

    Chapter  Google Scholar 

  17. Stein, S., McKenna, S.J.: Combining embedded accelerometers with computer vision for recognizing food preparation activities. In: Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp 2013, pp. 729–738. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2493432.2493482

  18. Tran, T., et al.: A multi-modal multi-view dataset for human fall analysis and preliminary investigation on modality. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 1947–1952, August 2018. https://doi.org/10.1109/ICPR.2018.8546308

  19. William, T., Freeman, M.R.: Orientation histograms for hand gesture recognition. Technical report TR94-03, MERL - Mitsubishi Electric Research Laboratories, Cambridge, MA 02139, December 1994. https://www.merl.com/publications/TR94-03/

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Monika Schak or Alexander Gepperth .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schak, M., Gepperth, A. (2020). On Multi-modal Fusion for Freehand Gesture Recognition. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12396. Springer, Cham. https://doi.org/10.1007/978-3-030-61609-0_68

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-61609-0_68

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61608-3

  • Online ISBN: 978-3-030-61609-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics