Skip to main content

Efficacy of Model Fine-Tuning for Personalized Dynamic Gesture Recognition

  • Conference paper
  • First Online:
Deep Learning for Human Activity Recognition (DL-HAR 2021)

Abstract

Dynamic hand gestures are usually unique to individual users in terms of style, speed, and magnitude of the gestures’ performance. A gesture recognition model trained with data from a group of users may not generalize well for unseen users and its performance is likely to be different for different users. To address these issues, this paper investigates the approach of fine-tuning a global model using user-specific data locally for personalizing dynamic hand gesture recognition. Using comprehensive experiments with state-of-the-art convolutional neural network architectures for video recognition, we evaluate the impact of four different choices on personalization performance - fine-tuning the earlier vs the later layers of the network, number of user-specific training samples, batch size, and learning rate. The user-specific data is collected from 11 users performing 7 gesture classes. Our findings show that with proper selection of fine-tuning strategy and hyperparameters, improved model performance can be achieved on personalized models for all users by only fine-tuning a small portion of the network weights and using very few labeled user-specific training samples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Costante, G., Bellocchio, E., Valigi, P., Ricci, E.: Personalizing vision-based gestural interfaces for HRI with uavs: a transfer learning approach. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3319–3326. IEEE (2014)

    Google Scholar 

  2. Costante, G., Galieni, V., Yan, Y., Fravolini, M.L., Ricci, E., Valigi, P.: Exploiting transfer learning for personalized view invariant gesture recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1250–1254. IEEE (2014)

    Google Scholar 

  3. Harris, B., Bae, I., Egger, B.: Architectures and algorithms for on-device user customization of CNNs. Integration 67, 121–133 (2019)

    Article  Google Scholar 

  4. He, J., et al.: On-device few-shot personalization for real-time gaze estimation. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)

    Google Scholar 

  5. Jiang, Y., Konečnỳ, J., Rush, K., Kannan, S.: Improving federated learning personalization via model agnostic meta learning. arXiv preprint arXiv:1909.12488 (2019)

  6. Joshi, A., Ghosh, S., Betke, M., Sclaroff, S., Pfister, H.: Personalizing gesture recognition using hierarchical Bayesian neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6513–6522 (2017)

    Google Scholar 

  7. Köpüklü, O., Kose, N., Gunduz, A., Rigoll, G.: Resource efficient 3D convolutional neural networks. arXiv preprint arXiv:1904.02422 (2019)

  8. Lin, J., Gan, C., Han, S.: TSM: temporal shift module for efficient video understanding. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7083–7093 (2019)

    Google Scholar 

  9. Mansour, Y., Mohri, M., Ro, J., Suresh, A.T.: Three approaches for personalization with applications to federated learning. arXiv preprint arXiv:2002.10619 (2020)

  10. Materzynska, J., Berger, G., Bax, I., Memisevic, R.: The jester dataset: a large-scale video dataset of human gestures. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)

    Google Scholar 

  11. Mazankiewicz, A., Böhm, K., Bergés, M.: Incremental real-time personalization in human activity recognition using domain adaptive batch normalization. arXiv preprint arXiv:2005.12178 (2020)

  12. McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics, pp. 1273–1282 (2017)

    Google Scholar 

  13. Popov, V., Kudinov, M., Piontkovskaya, I., Vytovtov, P., Nevidomsky, A.: Distributed fine-tuning of language models on private data. In: ICLR (2018)

    Google Scholar 

  14. Sim, K.C., et al.: Personalization of end-to-end speech recognition on mobile devices for named entities. arXiv preprint arXiv:1912.09251 (2019)

  15. Wang, K., Mathews, R., Kiddon, C., Eichner, H., Beaufays, F., Ramage, D.: Federated evaluation of on-device personalization. arXiv preprint arXiv:1910.10252 (2019)

  16. Xu, M., Qian, F., Mei, Q., Huang, K., Liu, X.: Deeptype: on-device deep learning for input personalization service with minimal privacy concern. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2(4), 1–26 (2018)

    Article  Google Scholar 

  17. Yao, A., Van Gool, L., Kohli, P.: Gesture recognition portfolios for personalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1915–1922 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junyao Guo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Guo, J., Kurup, U., Shah, M. (2021). Efficacy of Model Fine-Tuning for Personalized Dynamic Gesture Recognition. In: Li, X., Wu, M., Chen, Z., Zhang, L. (eds) Deep Learning for Human Activity Recognition. DL-HAR 2021. Communications in Computer and Information Science, vol 1370. Springer, Singapore. https://doi.org/10.1007/978-981-16-0575-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-0575-8_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-0574-1

  • Online ISBN: 978-981-16-0575-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics