Hand Gesture Detection with Convolutional Neural Networks

  • Samer AlashhabEmail author
  • Antonio-Javier Gallego
  • Miguel Ángel Lozano
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 800)


In this paper, we present a method for locating and recognizing hand gestures from images, based on Deep Learning. Our goal is to provide an intuitive and accessible way to interact with Computer Vision-based mobile applications aimed to assist visually impaired people (e.g. pointing a finger at an object in a real scene to zoom in for a close-up of the pointed object). Initially, we have defined different hand gestures that can be assigned to different actions. After that, we have created a database containing images corresponding to these gestures. Lastly, this database has been used to train Neural Networks with different topologies (testing different input sizes, weight initialization, and data augmentation process). In our experiments, we have obtained high accuracies both in localization (96%–100%) and in recognition (99.45%) with Networks that are appropriate to be ported to mobile devices.


Hand Gestures Convolutional Neural Network (CNN) Data Augmentation Process Input Size MobileNet 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was partially supported by the project TIN2015-69077-P of the Spanish Government.


  1. 1.
    Organization, W.H., et al.: Global Data on Visual Impairments 2010. World Health Organization Organization, Geneva (2012)Google Scholar
  2. 2.
    ONCE Foundation, Afiliados a la ONCE, junio 2017, June 2017.
  3. 3.
    Manduchi, R., Coughlan, J.: (Computer) Vision without sight. Commun. ACM 55(1), 96–104 (2012)CrossRefGoogle Scholar
  4. 4.
    Rituerto, A., Fusco, G., Coughlan, J.M.: Towards a sign-based indoor navigation system for people with visual impairments. In: Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility, pp. 287–288. ACM (2016)Google Scholar
  5. 5.
    Ahmetovic, D., Manduchi, R., Coughlan, J.M., Mascetti, S.: Mind your crossings: mining GIS imagery for crosswalk localization. ACM Trans. Access. Comput. (TACCESS) 9(4), 11 (2017)Google Scholar
  6. 6.
    The voice for android (2017).
  7. 7.
    Sáez, J.M., Escolano, F., Lozano, M.A.: Aerial obstacle detection with 3D mobile devices. IEEE J. Biomed. Health Inf. 19(1), 74–80 (2015)CrossRefGoogle Scholar
  8. 8.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRefGoogle Scholar
  9. 9.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1–9 (2012)Google Scholar
  10. 10.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567 (2015)Google Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  12. 12.
    Bheda, V., Radpour, D.: Using deep convolutional networks for gesture recognition in American sign language. In: CoRR, abs/1710.06836 (2017)Google Scholar
  13. 13.
    Molchanov, P., Gupta, S., Kim, K., Kautz, J.: Hand gesture recognition with 3D convolutional neural networks. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1–7, June 2015Google Scholar
  14. 14.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ImageNet Challenge, pp. 1–10 (2014)Google Scholar
  15. 15.
    Chollet, F.: Xception: deep learning with depthwise separable convolutions. CoRR, abs/1610.02357 (2016)Google Scholar
  16. 16.
    Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
  17. 17.
    Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: alexnet-level accuracy with 50x fewer parameters and \({<}\)0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016)
  18. 18.
    Selvaraju, R.R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., Batra, D.: Grad-cam: why did you say that? Visual explanations from deep networks via gradient-based localization. arXiv preprint arXiv:1610.02391 (2016)
  19. 19.
    Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT 2010, pp. 177–186. Springer (2010)CrossRefGoogle Scholar
  20. 20.
    Zeiler, M.D.: ADADELTA: an adaptive learning rate method. CoRR, abs/1212.5701 (2012)Google Scholar
  21. 21.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.-F.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  • Samer Alashhab
    • 1
    Email author
  • Antonio-Javier Gallego
    • 1
  • Miguel Ángel Lozano
    • 2
  1. 1.Department of Software and Computing SystemsUniversity of AlicanteAlicanteSpain
  2. 2.Department of Computer Science and AIUniversity of AlicanteAlicanteSpain

Personalised recommendations