Deep Learning Based Hand Gesture Recognition and UAV Flight Controls

Abstract

Dynamic hand gesture recognition is a desired alternative means for human-computer interactions. This paper presents a hand gesture recognition system that is designed for the control of flights of unmanned aerial vehicles (UAV). A data representation model that represents a dynamic gesture sequence by converting the 4-D spatiotemporal data to 2-D matrix and a 1-D array is introduced. To train the system to recognize designed gestures, skeleton data collected from a Leap Motion Controller are converted to two different data models. As many as 9 124 samples of the training dataset, 1 938 samples of the testing dataset are created to train and test the proposed three deep learning neural networks, which are a 2-layer fully connected neural network, a 5-layer fully connected neural network and an 8-layer convolutional neural network. The static testing results show that the 2-layer fully connected neural network achieves an average accuracy of 96.7% on scaled datasets and 12.3% on non-scaled datasets. The 5-layer fully connected neural network achieves an average accuracy of 98.0% on scaled datasets and 89.1% on non-scaled datasets. The 8-layer convolutional neural network achieves an average accuracy of 89.6% on scaled datasets and 96.9% on non-scaled datasets. Testing on a drone-kit simulator and a real drone shows that this system is feasible for drone flight controls.

This is a preview of subscription content, log in to check access.

References

  1. [1]

    S. Mitra, T. Acharya. Gesture recognition: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 32, no. 3, pp. 311–324, 2007. DOI: https://doi.org/10.1109/TSMCC.2007.893280.

    Article  Google Scholar 

  2. [2]

    V. I. Pavlovic, R. Sharma, T. S. Huang. Visual interpretation of hand gestures for human-computer interaction: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 677–695, 1997. DOI: https://doi.org/10.1109/34.598226.

    Article  Google Scholar 

  3. [3]

    B. Raj, K. Kalgaonkar, C. Harrison, P. Dietz. Ultrasonic Doppler sensing in HCI. IEEE Pervasive Computing, vol. 11, no. 2, pp. 24–29, 2012. DOI: https://doi.org/10.1109/MPRV.2012.17.

    Article  Google Scholar 

  4. [4]

    C. Oz, M. C. Leu. Human-computer interaction system with artificial neural network using motion tracker and data glove. In Proceedings of the 1st International Conference on Pattern Recognition and Machine Intelligence, Springer, Kolkata, India, pp. 280–286, 2005. DOI: https://doi.org/10.1007/11590316_40.

    Google Scholar 

  5. [5]

    O. Aran. Vision Based Sign Language Recognition: Modeling and Recognizing Isolated Signs with Manual and Non-manual Components, Ph. D. dissertation, Bogazici University, Turkey, 2008.

    Google Scholar 

  6. [6]

    S. Mitra, T. Acharya. Gesture recognition: A survey. IEEE Transactions on Systems, Man, and Cybernetics, vol. 37, no. 3, pp. 311–324, 2007. DOI: https://doi.org/10.1109/TSMCC.2007.893280.

    Article  Google Scholar 

  7. [7]

    C. Z. Li, V. M. Lubecke, O. Boric-Lubecke, J. Lin. A review on recent advances in Doppler radar sensors for non-contact healthcare monitoring. IEEE Transactions on Microwave Theory and Techniques, vol. 61, no. 5, pp. 2046–2060, 2013. DOI: https://doi.org/10.1109/TMTT.2013.2256924.

    Article  Google Scholar 

  8. [8]

    C. Z. Gu, C. Z. Li, J. Lin, J. Long, J. T. Huangfu, L. X. Ran. Instrument-based noncontact Doppler radar vital sign detection system using heterodyne digital quadrature demodulation architecture. IEEE Transactions on Instrumentation and Measurement, vol. 59, no. 6, pp. 1580–1588, 2010. DOI: https://doi.org/10.1109/TIM.2009.2028208.

    Article  Google Scholar 

  9. [9]

    T. Starner, A. Pentland. Real-time American Sign Language recognition from video using hidden Markov models. Motion-based Recognition, M. Shah, R. Jain, Eds., Dordrecht, Netherlands: Springer, pp. 227–243, 1997. DOI: https://doi.org/10.1007/978-94-015-8935-2_10.

    Google Scholar 

  10. [10]

    F. Weichert, D. Bachmann, B. Rudak, D. Fisseler. Analysis of the accuracy and robustness of the Leap Motion Controller. Sensors, vol. 13, no. 5, pp. 6380–6393, 2013. DOI: https://doi.org/10.3390/s130506380.

    Article  Google Scholar 

  11. [11]

    J. Guna, G. Jakus, M. Pogačnik, S. Tomažič, J. Sodnik. An analysis of the precision and reliability of the Leap Motion. Sensors, vol. 14, no. 2, pp. 3702–3720, 2014. DOI: https://doi.org/10.3390/s140203702.

    Article  Google Scholar 

  12. [12]

    D. Y. Huang, W. C. Hu, S. H. Chang. Gabor filter-based hand-pose angle estimation for hand gesture recognition under varying illumination. Expert Systems with Applications, vol. 38, no. 5, pp. 6031–6042, 2011. DOI: https://doi.org/10.1016/j.2010.11.016.

    Article  Google Scholar 

  13. [13]

    G. Rigoll, A. Kosmala, S. Eickeler. High performance real-time gesture recognition using hidden Markov models. In Proceedings of International Gesture Workshop on Gesture and Sign Language in Human-computer Interaction, Springer, Berlin, Germany, pp. 69–80, 1998. DOI: https://doi.org/10.1007/BFb0052990.

    Google Scholar 

  14. [14]

    C. Nolker, H. Ritter. Visual recognition of continuous hand postures. IEEE Transactions on Neural Networks, vol. 13, no. 4, pp. 983–994, 2002. DOI: https://doi.org/10.1109/TNN.2002.1021898.

    Article  Google Scholar 

  15. [15]

    Z. Yang, Y. Li, W. D. Chen, Y. Zheng. Dynamic hand gesture recognition using hidden Markov models. In Proceedings of the 7th International Conference on Computer Science & Education, IEEE, Melbourne, Australia, pp. 360–365, 2012. DOI: https://doi.org/10.1109/ICCSE.2012.6295092.

    Google Scholar 

  16. [16]

    D. J. Li, Y. Y. Li, J. X. Li, Y. Fu. Gesture recognition based on BP neural network improved by chaotic genetic algorithm. International Journal of Automation and Computing, vol. 15, no. 3, pp. 267–276, 2018. DOI: https://doi.org/10.1007/s11633-017-1107-6.

    Article  Google Scholar 

  17. [17]

    O. Koller, S. Zargaran, H. Ney, R. Bowden. Deep sign: Hybrid CNN-HMM for continuous sign language recognition. In Proceeding of British Machine Vision Conference, BMVA Press, York, UK, pp. 1–12, 2016.

    Google Scholar 

  18. [18]

    H. Cooper, E. J. Ong, N. Pugeault, R. Bowden. Sign Language recognition using sub-units. The Journal of Machine Learning Research, vol. 13, no. 1, pp. 2205–2231, 2012.

    MATH  Google Scholar 

  19. [19]

    R. D. Yang, S. Sarkar. Gesture recognition using hidden Markov models from fragmented observations. In Proceeding of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, New York, USA, pp. 766–773, 2006. DOI: https://doi.org/10.1109/CVPR.2006.126.

    Google Scholar 

  20. [20]

    C. Keskin, A. Erkan, L. Akarun. Real time gestural interface for generic applications. In Proceedings of the 13th European Signal Processing Conference, IEEE, Antalya, Turkey, pp. 1–4, 2005.

    Google Scholar 

  21. [21]

    S. B. Wang, A. Quattoni, L. P. Morency, D. Demirdjian, T. Darrell. Hidden conditional random fields for gesture recognition. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, New York, USA, pp. 1521–1527, 2006. DOI: https://doi.org/10.1109/CVPR.2006.132.

    Google Scholar 

  22. [22]

    T. Ishihara, N. Otsu. Gesture recognition using auto-regressive coefficients of higher-order local auto-correlation features. In Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition, IEEE, Seoul, South Korea, pp. 583–588, 2004. DOI: https://doi.org/10.1109/AFGR.2004.1301596.

    Google Scholar 

  23. [23]

    A. Ghotkar, P. Vidap, K. Deo. Dynamic hand gesture recognition using hidden Markov model by Microsoft kinect sensor. International Journal of Computer Applications, vol. 150, no. 5, pp. 5–9, 2016. DOI: https://doi.org/10.5120/ijca2016911498.

    Article  Google Scholar 

  24. [24]

    O. Bimber. Continuous DOF gesture recognition: A fuzzy logic approach. In Proceedings of the 7th International Conference in Central Europe on Computer Graphics and Visualization and Digital Interactive Media, University of West Bohemia, Plzen, Czech Republic, pp. 24–30, 1999.

    Google Scholar 

  25. [25]

    A. Ramamoorthy, N. Vaswani, S. Chaudhury, S. Banerjee. Recognition of dynamic hand gestures. Pattern Recognition, vol. 36, no. 9, pp. 2069–2081, 2003. DOI: https://doi.org/10.1016/S0031-3203(03)00042-6.

    Article  Google Scholar 

  26. [26]

    N. H. Dardas, N. D. Georganas. Real-time hand gesture detection and recognition using bag-of-features and support vector machine techniques. IEEE Transactions on Instrumentation and Measurement, vol. 60, no. 11, pp. 3592–3607, 2011. DOI: https://doi.org/10.1109/TIM.2011.2161140.

    Article  Google Scholar 

  27. [27]

    L. Pigou, A. van den Oord, S. Dieleman, M. van Herreweghe, J. Dambre. Beyond temporal pooling: Recurrence and temporal convolutions for gesture recognition in video. International Journal of Computer Vision, vol. 126, no. 2–4, pp. 430–439, 2018. DOI: https://doi.org/10.1007/s11263-016-0957-7.

    MathSciNet  Article  Google Scholar 

  28. [28]

    X. J. Chai, Z. P. Liu, F. Yin, Z. Liu, X. L. Chen. Two streams recurrent neural networks for large-scale continuous gesture recognition. In Proceedings of the 23rd International Conference on Pattern Recognition, IEEE, Cancun, Mexico, pp. 31–36, 2016. DOI: https://doi.org/10.1109/ICPR.2016.7899603.

    Google Scholar 

  29. [29]

    R. M. Tan, Y. Cao. Multi-layer contribution propagation analysis for fault diagnosis. International Journal of Automation and Computing, vol. 16, no. 1, pp. 40–51, 2019. DOI: https://doi.org/10.1007/s11633-018-1142-y.

    MathSciNet  Article  Google Scholar 

  30. [30]

    N. Neverova, C. Wolf, G. Taylor, F. Nebout. ModDrop: Adaptive multi-modal gesture recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 8, pp. 1692–1706, 2016. DOI: https://doi.org/10.1109/TPAMI.2015.2461544.

    Article  Google Scholar 

  31. [31]

    P. Molchanov, S. Gupta, K. Kim, J. Kautz. Hand gesture recognition with 3D convolutional neural networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, IEEE, Boston, USA, pp. 1–7, 2015. DOI: https://doi.org/10.1109/CVPRW.2015.7301342.

    Google Scholar 

  32. [32]

    A. Krizhevsky, I. Sutskever, G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, ACM, Lake Tahoe, USA, pp. 1097–1105, 2012.

    Google Scholar 

  33. [33]

    Y. Le Cun, B. Boser, J. S. Denker, R. E. Howard, W. Habbard, L. D. Jackel, D. Henderson. Handwritten digit recognition with a back-propagation network. Advances in Neural Information Processing Systems 2, D. S. Touretzky, Ed., San Francisco, USA: Morgan Kaufmann Publishers, pp. 396–404, 1989.

    Google Scholar 

  34. [34]

    B. Hu, J. C. Wang. Deep learning based hand gesture recognition and UAV flight controls. In Proceedings of the 24th International Conference on Automation and Computing, IEEE, Newcastle upon Tyne, UK, 2018. DOI: https://doi.org/10.23919/IConAC.2018.8748953.

    Google Scholar 

  35. [35]

    G. Zaccone, R. Karim, A. Menshawy. Deep Learning with TensorFlow, Birmingham, UK: Packt Publishing, pp. 8–28, 2017.

    Google Scholar 

  36. [36]

    Y. Kim. Convolutional neural networks for sentence classification. In Proceedings of Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Doha, Qata, pp. 1746–1751, 2014.

    Google Scholar 

  37. [37]

    O. Abdel-Hamid, A. R. Mohamed, H. Jiang, L. Deng, G. Penn, D. Yu. Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 10, pp. 1533–1545, 2014. DOI: https://doi.org/10.1109/TASLP.2014.2339736.

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jiacun Wang.

Additional information

Recommended by Associate Editor Xian-Dong Ma

Bin Hu received the B. Sc. degree in mechanical engineering from Xi’an Jiaotong University, China in 2000. He received the two M. Sc. degrees in software engineering from Xi’an Jiaotong University, China in 2005 and Monmouth University, USA in 2018, respectively. From 2006 to 2016, he was an assistant professor at Xi’an University of Posts and Telecommunications, China. Currently, he is an adjunct professor in Department of Computer Science at New Jersey City University, USA. He published about 10 research papers in journals and conferences.

His research interests include software engineering, robotics, and wireless networking.

Jiacun Wang received the Ph. D. degree in computer engineering from Nanjing University of Science and Technology (NUST), China in 1991. He is currently a professor of software engineering at Monmouth University, USA. From 2001 to 2004, he was a member of scientific staff with Nortel Networks in Richardson, USA. Prior to joining Nortel, he was a research associate of the School of Computer Science, Florida International University (FIU) at Miami, USA. Prior to joining FIU, he was an associate professor at NUST, China. He authored Timed Petri Nets: Theory and Application (Kluwer, 1998), Real-time Embedded Systems (Wiley, 2018) and Formal Methods in Computer Science (CRC Press, 2019), edited Handbook of Finite Stat Based Models and Applications (CRC, 2012), and published about 90 research papers in journals and conferences. He was an Associate Editor of IEEE Transactions on Systems, Man and Cybernetics, Part C. He has served as general chair, program chair, and special sessions chair or program committee member for many international conferences. He is a senior member of IEEE.

His research interests include software engineering, discrete event systems, formal methods, wireless networking, and real-time distributed systems.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hu, B., Wang, J. Deep Learning Based Hand Gesture Recognition and UAV Flight Controls. Int. J. Autom. Comput. 17, 17–29 (2020). https://doi.org/10.1007/s11633-019-1194-7

Download citation

Keywords

  • Deep learning
  • neural networks
  • hand gesture recognition
  • Leap Motion Controllers
  • drones