Abstract
We present an analysis of the robustness of deep LSTM networks for freehand gesture recognition against temporal shifts of the performed gesture w.r.t. the “temporal receptive field”. Such shifts inevitably occur when not only the gesture type but also its onset needs to be determined from sensor data, and it is imperative that recognizers be as invariant as possible to this effect which we term gesture onset variability. Based on a real-world hand gesture classification task we find that LSTM networks are very sensitive to this type of variability, which we confirm by creating a synthetic sequence classification task of similar dimensionality. Lastly, we show that including gesture onset variability in the training data by a simple data augmentation strategy leads to a high robustness against all tested effects, so we conclude that LSTM networks can be considered good candidates for real-time and real-world gesture recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Camgoz, N.C., Hadfield, S., Koller, O., Bowden, R.: Using convolutional 3D neural networks for user-independent continuous gesture recognition. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 49–54. IEEE (2016). https://doi.org/10.1109/ICPR.2016.7899606
Caron, L.-C., Filliat, D., Gepperth, A.: Neural network fusion of color, depth and location for object instance recognition on a mobile robot. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8927, pp. 791–805. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16199-0_55
Duan, J., Wan, J., Zhou, S., Guo, X., Li, S.Z.: A unified framework for multi-modal isolated gesture recognition. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 14(1s), 21 (2018). https://doi.org/10.1145/3131343
Graves, A.: Generating sequences with recurrent neural networks (2013). arXiv preprint arXiv:1308.0850
Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: International Conference on Machine Learning, pp. 1764–1772 (2014). https://doi.org/10.1186/s13636-018-0141-9
Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013). https://doi.org/10.1109/ICASSP.2013.6638947
Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 6(02), 107–116 (1998). https://doi.org/10.1142/S0218488598000094
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128–3137 (2015). https://doi.org/10.1109/TPAMI.2016.2598339
Lea, C., Flynn, M.D., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks for action segmentation and detection. In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. January 2017, pp. 1003–1012. Institute of Electrical and Electronics Engineers Inc., November 2017. https://doi.org/10.1109/CVPR.2017.113
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Miao, Q., et al.: Multimodal gesture recognition based on the ResC3D network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3047–3055 (2017). https://doi.org/10.1109/ICCVW.2017.360
Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., Khudanpur, S.: Recurrent neural network based language model. In: 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, Makuhari, Chiba, Japan, 26–30 September 2010, pp. 1045–1048 (2010). http://www.isca-speech.org/archive/interspeech_2010/i10_1045.html
Nguyen, A., Kanoulas, D., Muratore, L., Caldwell, D.G., Tsagarakis, N.G.: Translating videos to commands for robotic manipulation with deep recurrent neural networks. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–9. IEEE (2018). https://doi.org/10.1109/ICRA.2018.8460857
Ordóñez, F., Roggen, D.: Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors 16(1), 115 (2016). https://doi.org/10.3390/s16010115
Rusu, R.B., Blodow, N., Marton, Z.C., Beetz, M.: Aligning point cloud views using persistent feature histograms. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3384–3391. IEEE (2008). https://doi.org/10.1109/IROS.2008.4650967
Sachara, F., Kopinski, T., Gepperth, A., Handmann, U.: Free-hand gesture recognition with 3D-CNNs for in-car infotainment control in real-time. In: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), pp. 959–964, October 2017. https://doi.org/10.1109/ITSC.2017.8317684
Sarkar, A., Gepperth, A., Handmann, U., Kopinski, T.: Dynamic hand gesture recognition for mobile systems using deep LSTM. In: Horain, P., Achard, C., Mallem, M. (eds.) IHCI 2017. LNCS, vol. 10688, pp. 19–31. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-72038-8_3
Tsironi, E., Barros, P., Wermter, S.: Gesture recognition with a convolutional long short-term memory recurrent neural network. In: Proceedings of the European Symposium on Artificial Neural Networks Computational Intelligence and Machine Learning (ESANN), pp. 213–218 (2016)
Wu, J., Ishwar, P., Konrad, J.: Two-stream CNNs for gesture-based verification and identification: learning user style. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 42–50 (2016). https://doi.org/10.1109/CVPRW.2016.21
Zhu, G., Zhang, L., Shen, P., Song, J.: Multimodal gesture recognition using 3-D convolution and convolutional LSTM. IEEE Access 5, 4517–4524 (2017). https://doi.org/10.1109/ACCESS.2017.2684186
Zhu, G., Zhang, L., Mei, L., Shao, J., Song, J., Shen, P.: Large-scale isolated gesture recognition using pyramidal 3D convolutional networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 19–24. IEEE (2016). https://doi.org/10.1109/ICPR.2016.7899601
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Schak, M., Gepperth, A. (2019). Robustness of Deep LSTM Networks in Freehand Gesture Recognition. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Image Processing. ICANN 2019. Lecture Notes in Computer Science(), vol 11729. Springer, Cham. https://doi.org/10.1007/978-3-030-30508-6_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-30508-6_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30507-9
Online ISBN: 978-3-030-30508-6
eBook Packages: Computer ScienceComputer Science (R0)