Abstract
We describe the design and validation of a vision-based system that allows the dynamic identification of ramp signals performed by airport ground staff. This ramp signals’ recognizer increases the autonomy of unmanned vehicles and prevents errors caused by visual misinterpretations or lack of attention from the pilot of manned vehicles. This system is based on supervised machine learning techniques, developed with our own training dataset and two models. The first model is based on a pre-trained Convolutional Pose Machine followed by a classifier, for which we have evaluated two possibilities: A Random Forest and a Multi-Layer Perceptron based classifier. The second model is based on a single Convolutional Neural Network that classifies the gestures directly imported from real images. When experimentally tested, the first model proved to be more accurate and scalable than the second one. Its strength relies on a better capacity to extract information from the images and transform the domain of pixels into spatial vectors, which increases the robustness of the classification layer. The second model instead is more adequate for gestures’ identification in low visibility environments, such as during night operations, conditions in which the first model appeared to be more limited, segmenting the shape of the operator. Our results support the use of supervised learning and computer vision techniques for the correct identification and classification of ramp hand signals performed by airport marshallers.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Data Availability
A repository with the dataset used, the software, the figures and the demonstration videos, is publicly available at: https://github.com/astromaf/ramp_hand_signals_recognition
Change history
20 May 2023
Missing Open Access funding information has been added in the Funding Note.
References
ICAO, Annex 2 - Rules of the Air - Tenth Edition, no. November. (2005)
Tomaszewska, J., Zieja, M., Woch, M., Krzysiak, P.: Statistical analysis of ground-related incidents at airports. J. KONES 25(3), 467–472 (2018). https://doi.org/10.5604/01.3001.0012.4369
Dempsey, M. E., Rasmussen, S.: “Eyes of the army--US Army roadmap for unmanned aircraft systems, 2010--2035,” (2010)
Song, Y., Demirdjian, D., Davis, R.: “Tracking body and hands for gesture recognition: NATOPS aircraft handling signals database,” 2011 IEEE Int. Conf. Autom. Face Gesture Recognit. Work. FG 2011, pp. 500–506 (2011).https://doi.org/10.1109/FG.2011.5771448
Civil Aviation Authority (CAA), “Visual aids handbook,” Aids.10(6), 690–691, (1996). https://doi.org/10.1097/00002030-199606000-00024
Castillo, J.C., Alonso-Martín, F., Cáceres-Domíngue, D., Malfaz, M., Salichs M. Malfaz, A., Salichs, M.A.: “The Influence of Speed and Position in Dynamic Gesture Recognition for Human-Robot Interaction,” J. Sensors., (2019). https://doi.org/10.1155/2019/7060491
Shannon, C.E.: “The Mathematical Theory of Communication,” M.D. Comput., (1997). https://doi.org/10.2307/410457
Demarco, K.J., West, M.E., Howard, A.M.: “Underwater human-robot communication: A case study with human divers,” Conf. Proc. - IEEE Int. Conf. Syst. Man Cybern., vol. 2014-Janua, no. January, pp. 3738–3743, (2014). https://doi.org/10.1109/smc.2014.6974512
Baek, T., Lee, Y.G.: Traffic control hand signal recognition using convolution and recurrent neural networks. J. Comput. Des. Eng. 9(2), 296–309 (2022). https://doi.org/10.1093/jcde/qwab080
Molchanov, P., Yang, X., Gupta, S., Kim, K., Tyree, S., Kautz, J.: “Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Decem, 4207–4215 (2016). https://doi.org/10.1109/CVPR.2016.456
Kapuscinski, T., Oszust, Wysocki, M.,D. Warchol.: “Recognition of hand gestures observed by depth cameras,” Int. J. Adv. Robot. Syst., vol. 12, (2015). https://doi.org/10.5772/60091
Choi, C., Ahn, J.H., Byun, H.: “Visual recognition of aircraft marshalling signals using gesture phase analysis,” IEEE Intell. Veh. Symp. Proc., pp. 853–858 (2008). https://doi.org/10.1109/IVS.2008.4621186
Waldherr, S., Romero, R., Thrun, S.: Gesture based interface for human-robot interaction. Auton. Robots 9(2), 151–173 (2000). https://doi.org/10.1023/A:1008918401478
Ribó, A., Warchol, D., M. prz edu pl Oszust: An approach to gesture recognition with skeletal data using dynamic time warping and nearest neighbour classifier”. Int. J. Intell. Syst. Appl. 8(6), 1–8 (2016). https://doi.org/10.5815/ijisa.2016.06.01
Raheja, J.L., Minhas, M., Prashanth, D., Shah, T., Chaudhary, A.: Robust gesture recognition using Kinect: A comparison between DTW and HMM. Optik (Stuttg) (2015). https://doi.org/10.1016/j.ijleo.2015.02.043
Donahue, J., et al.: Long-Term Recurrent Convolutional Networks for Visual Recognition and Description. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 677–691 (2017). https://doi.org/10.1109/TPAMI.2016.2599174
Zhou, B., Andonian, A., Oliva, A., Torralba, A.: “Temporal Relational Reasoning in Videos,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), 11205 LNCS,831–846 (2018). https://doi.org/10.1007/978-3-030-01246-5_49
Hara, K., Kataoka, H., Satoh, Y.: “Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 6546–6555, (2018). https://doi.org/10.1109/CVPR.2018.00685
L. Abraham, A. Urru, N. Normani, M. P. Wilk, M. Walsh, and B. O’flynn, “Hand tracking and gesture recognition using lensless smart sensors,” Sensors (Switzerland), vol. 18, no. 9, (2018). https://doi.org/10.3390/s18092834
Viola, P., Jones, M.: “Rapid object detection using a boosted cascade of simple features,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (2001).https://doi.org/10.1109/cvpr.2001.990517
Dalal, N., Triggs, B.: “Histograms of oriented gradients for human detection,” in Proceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2005). https://doi.org/10.1109/CVPR.2005.177
Krizhevsky, A., Sutskever, I., Hinton, G.E.: “ImageNet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems (2012)
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: “Convolutional pose machines,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp. 4724–4732 (2016). https://doi.org/10.1109/CVPR.2016.511
He, J., Zhang, C., He, X., Dong, R.: Visual Recognition of traffic police gestures with convolutional pose machine and handcrafted features. Neurocomputing 390, 248–259 (2020). https://doi.org/10.1016/j.neucom.2019.07.103
Wang, S., et al.: Skeleton-based traffic command recognition at road intersections for intelligent vehicles. Neurocomputing 501, 123–134 (2022). https://doi.org/10.1016/j.neucom.2022.05.107
Schneider, P., Memmesheimer, R., Kramer, I., Paulus, D.: “Gesture Recognition in RGB Videos Using Human Body Keypoints and Dynamic Time Warping,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11531 LNAI, pp. 281–293, (2019). https://doi.org/10.1007/978-3-030-35699-6_22
Fang, H.S., Xie, S., Tai, Y.W., Lu, C.: RMPE: Regional Multi-person Pose Estimation. Proc. IEEE Conf. Comput. Vis. (2017). https://doi.org/10.1109/ICCV.2017.256
Cao, Z., Hidalgo Martinez, G., Simon, T., Wei, S.-E., Sheikh, Y.A.: “OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields,” IEEE Trans. Pattern Anal. Mach. Intell., (2019). https://doi.org/10.1109/tpami.2019.2929257.
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y : “Realtime multi-person 2D pose estimation using part affinity fields,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 1302–1310 (2017). https://doi.org/10.1109/CVPR.2017.143
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: “End-to-End Recovery of Human Shape and Pose,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2018). https://doi.org/10.1109/CVPR.2018.00744
Liu, J., Akhtar, N., Mian, A.: “Skepxels: Spatio-temporal Image Representation of Human Skeleton Joints for Action Recognition,” pp. 10–19, (2017), [Online]. Available: http://arxiv.org/abs/1711.05941
Lin, T.Y., et al : “Microsoft COCO: Common objects in context,” Lect. Notes Comput. Sci.(including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), 8693(5)740–755 (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Singh, M., Mandal, M., Basu, A.: “Visual gesture recognition for ground air traffic control using the radon transform,” 2005 IEEE/RSJ Int. Conf. Intell. Robot. Syst. IROS, pp. 2850–2855, (2005). https://doi.org/10.1109/IROS.2005.1545408
Blackett, C., Fernandes, A., Teigen, E., Thoresen, T.: Effects of Signal Latency on Human Performance in Teleoperations. Lect. Notes Networks Syst. 319(August), 386–393 (2022). https://doi.org/10.1007/978-3-030-85540-6_50
He, K., Zhang, X., Ren, S., Sun, J.: “Deep residual learning for image recognition,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp. 770–778, (2016). https://doi.org/10.1109/CVPR.2016.90
Breiman, L.: “Random forests,” Random For., pp. 1–122, (2001), doi: https://doi.org/10.1201/9780367816377-11
Funding
Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
Miguel Ángel de Frutos (MAdF) initiated the present project during his Master of Sciences at the Universidad Internacional de la Rioja (UNIR) and performed the subsequent data analysis and manuscript elaboration while pursuing his doctoral research at the Universidad Politécnica de Madrid (UPM). Fernando López Hernández (UCM) and J. Javier Rainer (UNIR) supervised the elaboration of the manuscript. All authors discussed and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics Approval
The authors declare that this work is original and does not include experiments with animals.
Consent to Participate
All individuals participating in the study provided an informed consent. The captured information has been nonetheless adequately anonymized.
Consent for Publication
The participants in the experiments provided informed consent for publication of the related images. Nevertheless, their faces or any other biometric data can be recognised in the relevant images.
Conflict of Interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Categories (6), (7).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
de Frutos Carro, M.Á., LópezHernández, F.C. & Granados, J.J.R. Real-Time Visual Recognition of Ramp Hand Signals for UAS Ground Operations. J Intell Robot Syst 107, 44 (2023). https://doi.org/10.1007/s10846-023-01832-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10846-023-01832-3