Abstract
Dynamic hand gestures are usually unique to individual users in terms of style, speed, and magnitude of the gestures’ performance. A gesture recognition model trained with data from a group of users may not generalize well for unseen users and its performance is likely to be different for different users. To address these issues, this paper investigates the approach of fine-tuning a global model using user-specific data locally for personalizing dynamic hand gesture recognition. Using comprehensive experiments with state-of-the-art convolutional neural network architectures for video recognition, we evaluate the impact of four different choices on personalization performance - fine-tuning the earlier vs the later layers of the network, number of user-specific training samples, batch size, and learning rate. The user-specific data is collected from 11 users performing 7 gesture classes. Our findings show that with proper selection of fine-tuning strategy and hyperparameters, improved model performance can be achieved on personalized models for all users by only fine-tuning a small portion of the network weights and using very few labeled user-specific training samples.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Costante, G., Bellocchio, E., Valigi, P., Ricci, E.: Personalizing vision-based gestural interfaces for HRI with uavs: a transfer learning approach. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3319–3326. IEEE (2014)
Costante, G., Galieni, V., Yan, Y., Fravolini, M.L., Ricci, E., Valigi, P.: Exploiting transfer learning for personalized view invariant gesture recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1250–1254. IEEE (2014)
Harris, B., Bae, I., Egger, B.: Architectures and algorithms for on-device user customization of CNNs. Integration 67, 121–133 (2019)
He, J., et al.: On-device few-shot personalization for real-time gaze estimation. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)
Jiang, Y., Konečnỳ, J., Rush, K., Kannan, S.: Improving federated learning personalization via model agnostic meta learning. arXiv preprint arXiv:1909.12488 (2019)
Joshi, A., Ghosh, S., Betke, M., Sclaroff, S., Pfister, H.: Personalizing gesture recognition using hierarchical Bayesian neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6513–6522 (2017)
Köpüklü, O., Kose, N., Gunduz, A., Rigoll, G.: Resource efficient 3D convolutional neural networks. arXiv preprint arXiv:1904.02422 (2019)
Lin, J., Gan, C., Han, S.: TSM: temporal shift module for efficient video understanding. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7083–7093 (2019)
Mansour, Y., Mohri, M., Ro, J., Suresh, A.T.: Three approaches for personalization with applications to federated learning. arXiv preprint arXiv:2002.10619 (2020)
Materzynska, J., Berger, G., Bax, I., Memisevic, R.: The jester dataset: a large-scale video dataset of human gestures. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)
Mazankiewicz, A., Böhm, K., Bergés, M.: Incremental real-time personalization in human activity recognition using domain adaptive batch normalization. arXiv preprint arXiv:2005.12178 (2020)
McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics, pp. 1273–1282 (2017)
Popov, V., Kudinov, M., Piontkovskaya, I., Vytovtov, P., Nevidomsky, A.: Distributed fine-tuning of language models on private data. In: ICLR (2018)
Sim, K.C., et al.: Personalization of end-to-end speech recognition on mobile devices for named entities. arXiv preprint arXiv:1912.09251 (2019)
Wang, K., Mathews, R., Kiddon, C., Eichner, H., Beaufays, F., Ramage, D.: Federated evaluation of on-device personalization. arXiv preprint arXiv:1910.10252 (2019)
Xu, M., Qian, F., Mei, Q., Huang, K., Liu, X.: Deeptype: on-device deep learning for input personalization service with minimal privacy concern. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2(4), 1–26 (2018)
Yao, A., Van Gool, L., Kohli, P.: Gesture recognition portfolios for personalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1915–1922 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Guo, J., Kurup, U., Shah, M. (2021). Efficacy of Model Fine-Tuning for Personalized Dynamic Gesture Recognition. In: Li, X., Wu, M., Chen, Z., Zhang, L. (eds) Deep Learning for Human Activity Recognition. DL-HAR 2021. Communications in Computer and Information Science, vol 1370. Springer, Singapore. https://doi.org/10.1007/978-981-16-0575-8_8
Download citation
DOI: https://doi.org/10.1007/978-981-16-0575-8_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-0574-1
Online ISBN: 978-981-16-0575-8
eBook Packages: Computer ScienceComputer Science (R0)