Skip to main content

Robustness of Deep LSTM Networks in Freehand Gesture Recognition

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2019: Image Processing (ICANN 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11729))

Included in the following conference series:

Abstract

We present an analysis of the robustness of deep LSTM networks for freehand gesture recognition against temporal shifts of the performed gesture w.r.t. the “temporal receptive field”. Such shifts inevitably occur when not only the gesture type but also its onset needs to be determined from sensor data, and it is imperative that recognizers be as invariant as possible to this effect which we term gesture onset variability. Based on a real-world hand gesture classification task we find that LSTM networks are very sensitive to this type of variability, which we confirm by creating a synthetic sequence classification task of similar dimensionality. Lastly, we show that including gesture onset variability in the training data by a simple data augmentation strategy leads to a high robustness against all tested effects, so we conclude that LSTM networks can be considered good candidates for real-time and real-world gesture recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Camgoz, N.C., Hadfield, S., Koller, O., Bowden, R.: Using convolutional 3D neural networks for user-independent continuous gesture recognition. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 49–54. IEEE (2016). https://doi.org/10.1109/ICPR.2016.7899606

  2. Caron, L.-C., Filliat, D., Gepperth, A.: Neural network fusion of color, depth and location for object instance recognition on a mobile robot. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8927, pp. 791–805. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16199-0_55

    Chapter  Google Scholar 

  3. Duan, J., Wan, J., Zhou, S., Guo, X., Li, S.Z.: A unified framework for multi-modal isolated gesture recognition. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 14(1s), 21 (2018). https://doi.org/10.1145/3131343

    Article  Google Scholar 

  4. Graves, A.: Generating sequences with recurrent neural networks (2013). arXiv preprint arXiv:1308.0850

  5. Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: International Conference on Machine Learning, pp. 1764–1772 (2014). https://doi.org/10.1186/s13636-018-0141-9

  6. Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013). https://doi.org/10.1109/ICASSP.2013.6638947

  7. Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 6(02), 107–116 (1998). https://doi.org/10.1142/S0218488598000094

    Article  MATH  Google Scholar 

  8. Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128–3137 (2015). https://doi.org/10.1109/TPAMI.2016.2598339

    Article  Google Scholar 

  9. Lea, C., Flynn, M.D., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks for action segmentation and detection. In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. January 2017, pp. 1003–1012. Institute of Electrical and Electronics Engineers Inc., November 2017. https://doi.org/10.1109/CVPR.2017.113

  10. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)

    Article  Google Scholar 

  11. Miao, Q., et al.: Multimodal gesture recognition based on the ResC3D network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3047–3055 (2017). https://doi.org/10.1109/ICCVW.2017.360

  12. Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., Khudanpur, S.: Recurrent neural network based language model. In: 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, Makuhari, Chiba, Japan, 26–30 September 2010, pp. 1045–1048 (2010). http://www.isca-speech.org/archive/interspeech_2010/i10_1045.html

  13. Nguyen, A., Kanoulas, D., Muratore, L., Caldwell, D.G., Tsagarakis, N.G.: Translating videos to commands for robotic manipulation with deep recurrent neural networks. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–9. IEEE (2018). https://doi.org/10.1109/ICRA.2018.8460857

  14. Ordóñez, F., Roggen, D.: Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors 16(1), 115 (2016). https://doi.org/10.3390/s16010115

    Article  Google Scholar 

  15. Rusu, R.B., Blodow, N., Marton, Z.C., Beetz, M.: Aligning point cloud views using persistent feature histograms. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3384–3391. IEEE (2008). https://doi.org/10.1109/IROS.2008.4650967

  16. Sachara, F., Kopinski, T., Gepperth, A., Handmann, U.: Free-hand gesture recognition with 3D-CNNs for in-car infotainment control in real-time. In: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), pp. 959–964, October 2017. https://doi.org/10.1109/ITSC.2017.8317684

  17. Sarkar, A., Gepperth, A., Handmann, U., Kopinski, T.: Dynamic hand gesture recognition for mobile systems using deep LSTM. In: Horain, P., Achard, C., Mallem, M. (eds.) IHCI 2017. LNCS, vol. 10688, pp. 19–31. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-72038-8_3

    Chapter  Google Scholar 

  18. Tsironi, E., Barros, P., Wermter, S.: Gesture recognition with a convolutional long short-term memory recurrent neural network. In: Proceedings of the European Symposium on Artificial Neural Networks Computational Intelligence and Machine Learning (ESANN), pp. 213–218 (2016)

    Google Scholar 

  19. Wu, J., Ishwar, P., Konrad, J.: Two-stream CNNs for gesture-based verification and identification: learning user style. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 42–50 (2016). https://doi.org/10.1109/CVPRW.2016.21

  20. Zhu, G., Zhang, L., Shen, P., Song, J.: Multimodal gesture recognition using 3-D convolution and convolutional LSTM. IEEE Access 5, 4517–4524 (2017). https://doi.org/10.1109/ACCESS.2017.2684186

    Article  Google Scholar 

  21. Zhu, G., Zhang, L., Mei, L., Shao, J., Song, J., Shen, P.: Large-scale isolated gesture recognition using pyramidal 3D convolutional networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 19–24. IEEE (2016). https://doi.org/10.1109/ICPR.2016.7899601

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Monika Schak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schak, M., Gepperth, A. (2019). Robustness of Deep LSTM Networks in Freehand Gesture Recognition. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Image Processing. ICANN 2019. Lecture Notes in Computer Science(), vol 11729. Springer, Cham. https://doi.org/10.1007/978-3-030-30508-6_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30508-6_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30507-9

  • Online ISBN: 978-3-030-30508-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics