Robustness of Deep LSTM Networks in Freehand Gesture Recognition

Schak, Monika; Gepperth, Alexander

doi:10.1007/978-3-030-30508-6_27

Monika Schak¹² &
Alexander Gepperth¹²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11729))

Included in the following conference series:

International Conference on Artificial Neural Networks

2623 Accesses
5 Citations

Abstract

We present an analysis of the robustness of deep LSTM networks for freehand gesture recognition against temporal shifts of the performed gesture w.r.t. the “temporal receptive field”. Such shifts inevitably occur when not only the gesture type but also its onset needs to be determined from sensor data, and it is imperative that recognizers be as invariant as possible to this effect which we term gesture onset variability. Based on a real-world hand gesture classification task we find that LSTM networks are very sensitive to this type of variability, which we confirm by creating a synthetic sequence classification task of similar dimensionality. Lastly, we show that including gesture onset variability in the training data by a simple data augmentation strategy leads to a high robustness against all tested effects, so we conclude that LSTM networks can be considered good candidates for real-time and real-world gesture recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Camgoz, N.C., Hadfield, S., Koller, O., Bowden, R.: Using convolutional 3D neural networks for user-independent continuous gesture recognition. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 49–54. IEEE (2016). https://doi.org/10.1109/ICPR.2016.7899606
Caron, L.-C., Filliat, D., Gepperth, A.: Neural network fusion of color, depth and location for object instance recognition on a mobile robot. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8927, pp. 791–805. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16199-0_55
Chapter Google Scholar
Duan, J., Wan, J., Zhou, S., Guo, X., Li, S.Z.: A unified framework for multi-modal isolated gesture recognition. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 14(1s), 21 (2018). https://doi.org/10.1145/3131343
Article Google Scholar
Graves, A.: Generating sequences with recurrent neural networks (2013). arXiv preprint arXiv:1308.0850
Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: International Conference on Machine Learning, pp. 1764–1772 (2014). https://doi.org/10.1186/s13636-018-0141-9
Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013). https://doi.org/10.1109/ICASSP.2013.6638947
Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 6(02), 107–116 (1998). https://doi.org/10.1142/S0218488598000094
Article MATH Google Scholar
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128–3137 (2015). https://doi.org/10.1109/TPAMI.2016.2598339
Article Google Scholar
Lea, C., Flynn, M.D., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks for action segmentation and detection. In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. January 2017, pp. 1003–1012. Institute of Electrical and Electronics Engineers Inc., November 2017. https://doi.org/10.1109/CVPR.2017.113
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Article Google Scholar
Miao, Q., et al.: Multimodal gesture recognition based on the ResC3D network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3047–3055 (2017). https://doi.org/10.1109/ICCVW.2017.360
Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., Khudanpur, S.: Recurrent neural network based language model. In: 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, Makuhari, Chiba, Japan, 26–30 September 2010, pp. 1045–1048 (2010). http://www.isca-speech.org/archive/interspeech_2010/i10_1045.html
Nguyen, A., Kanoulas, D., Muratore, L., Caldwell, D.G., Tsagarakis, N.G.: Translating videos to commands for robotic manipulation with deep recurrent neural networks. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–9. IEEE (2018). https://doi.org/10.1109/ICRA.2018.8460857
Ordóñez, F., Roggen, D.: Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors 16(1), 115 (2016). https://doi.org/10.3390/s16010115
Article Google Scholar
Rusu, R.B., Blodow, N., Marton, Z.C., Beetz, M.: Aligning point cloud views using persistent feature histograms. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3384–3391. IEEE (2008). https://doi.org/10.1109/IROS.2008.4650967
Sachara, F., Kopinski, T., Gepperth, A., Handmann, U.: Free-hand gesture recognition with 3D-CNNs for in-car infotainment control in real-time. In: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), pp. 959–964, October 2017. https://doi.org/10.1109/ITSC.2017.8317684
Sarkar, A., Gepperth, A., Handmann, U., Kopinski, T.: Dynamic hand gesture recognition for mobile systems using deep LSTM. In: Horain, P., Achard, C., Mallem, M. (eds.) IHCI 2017. LNCS, vol. 10688, pp. 19–31. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-72038-8_3
Chapter Google Scholar
Tsironi, E., Barros, P., Wermter, S.: Gesture recognition with a convolutional long short-term memory recurrent neural network. In: Proceedings of the European Symposium on Artificial Neural Networks Computational Intelligence and Machine Learning (ESANN), pp. 213–218 (2016)
Google Scholar
Wu, J., Ishwar, P., Konrad, J.: Two-stream CNNs for gesture-based verification and identification: learning user style. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 42–50 (2016). https://doi.org/10.1109/CVPRW.2016.21
Zhu, G., Zhang, L., Shen, P., Song, J.: Multimodal gesture recognition using 3-D convolution and convolutional LSTM. IEEE Access 5, 4517–4524 (2017). https://doi.org/10.1109/ACCESS.2017.2684186
Article Google Scholar
Zhu, G., Zhang, L., Mei, L., Shao, J., Song, J., Shen, P.: Large-scale isolated gesture recognition using pyramidal 3D convolutional networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 19–24. IEEE (2016). https://doi.org/10.1109/ICPR.2016.7899601

Download references

Author information

Authors and Affiliations

Fulda University of Applied Sciences, 36037, Fulda, Germany
Monika Schak & Alexander Gepperth

Authors

Monika Schak
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Gepperth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Monika Schak .

Editor information

Editors and Affiliations

Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Igor V. Tetko
Institute of Computer Science, Czech Academy of Sciences, Praha 8, Czech Republic
Věra Kůrková
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Pavel Karpov
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Fabian Theis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schak, M., Gepperth, A. (2019). Robustness of Deep LSTM Networks in Freehand Gesture Recognition. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Image Processing. ICANN 2019. Lecture Notes in Computer Science(), vol 11729. Springer, Cham. https://doi.org/10.1007/978-3-030-30508-6_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-30508-6_27
Published: 09 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30507-9
Online ISBN: 978-3-030-30508-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics