Abstract
Purpose of Review
We investigate the first use of deep networks for victim identification in Urban Search and Rescue (USAR). Moreover, we provide the first experimental comparison of single-stage and two-stage networks for body part detection, for cases of partial occlusions and varying illumination, on a RGB-D dataset obtained by a mobile robot navigating cluttered USAR-like environments.
Recent Findings
We considered the single-stage detectors Single Shot Multi-box Detector, You Only Look Once, and RetinaNet and the two-stage Feature Pyramid Network detector. Experimental results show that RetinaNet has the highest mean average precision (77.66%) and recall (86.98%) for detecting victims with body part occlusions in different lighting conditions.
Summary
End-to-end deep networks can be used for finding victims in USAR by autonomously extracting RGB-D image features from sensory data. We show that RetinaNet using RGB-D is robust to body part occlusions and low-lighting conditions and outperforms other detectors regardless of the image input type.
Similar content being viewed by others
References
Papers of particular interest, published recently, have been highlighted as: • Of importance
Louie W-YG, Nejat G. A victim identification methodology for rescue robots operating in cluttered USAR environments. Adv Robot. 2013;27:373–84. https://doi.org/10.1080/01691864.2013.763743.
Hui N, Li-gang C, Ya-zhou T, Yue W. Research on human body detection methods based on the head features on the disaster scenes. In: 2010 3rd International Symposium on Systems and Control in Aeronautics and Astronautics. China: Harbin; 2010. p. 380–5.
Nguyen DT, Li W, Ogunbona PO. Human detection from images and videos: a survey. Pattern Recogn. 2016;51:148–75. https://doi.org/10.1016/j.patcog.2015.08.027.
Kadkhodamohammadi A, Gangi A, Mathelin M de, Padoy N (2017) A multi-view RGB-D approach for human pose estimation in operating rooms. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). Santa Rosa, USA, pp 363–372.
Li H, Liu J, Zhang G, et al. Multi-glimpse LSTM with color-depth feature fusion for human detection. Beijing: IEEE International Conference on Image Processing; 2017.
Pishchulin L, Insafutdinov E, Tang S, et al (2016) DeepCut: joint subset partition and labeling for multi person pose estimation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Las Vegas, pp 4929–4937.
Insafutdinov E, Pishchulin L, Andres B, et al. DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Computer Vision – ECCV 2016. Cham, Amsterdam: Springer; 2016. p. 34–50.
Iqbal U, Gall J. Multi-person pose estimation with local joint-to-person associations. In: Hua G, Jégou H, editors. Computer Vision – ECCV 2016 Workshops. Cham: Springer International Publishing; 2016. p. 627–42.
Papandreou G, Zhu T, Kanazawa N, et al. Towards accurate multi-person pose estimation in the wild. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017. p. 3711–9.
Liu Y, Nejat G. Multirobot cooperative learning for semiautonomous control in urban search and rescue applications. J Field Robot. 2016;33:512–36. https://doi.org/10.1002/rob.21597.
Doroodgar B, Liu Y, Nejat G. A learning-based semi-autonomous controller for robotic exploration of unknown disaster scenes while searching for victims. IEEE Trans Cybern. 2014;44:2719–32. https://doi.org/10.1109/TCYB.2014.2314294.
Zhang K, Niroui F, Ficocelli M, Nejat G. Robot navigation of environments with unknown rough terrain using deep reinforcement learning. In: 2018 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR); 2018. p. 1–7.
Zhang Z, Nejat G, Guo H, Huang P. A novel 3D sensory system for robot-assisted mapping of cluttered urban search and rescue environments. Intell Serv Robot. 2011;4:119–34. https://doi.org/10.1007/s11370-010-0082-3.
Zhang Z, Nejat G. Intelligent sensing systems for rescue robots: landmark identification and three-dimensional mapping of unknown cluttered urban search and rescue environments. Adv Robot. 2009;23:1179–98. https://doi.org/10.1163/156855309X452511.
Zhang Z, Guo H, Nejat G, Huang P. Finding disaster victims: a sensory system for robot-assisted 3D mapping of urban search and rescue environments. In: Proceedings 2007 IEEE International Conference on Robotics and Automation; 2007. p. 3889–94.
Shamroukh R, Awad F. Detection of surviving humans in destructed environments using a simulated autonomous robot. In: 2009 6th International Symposium on Mechatronics and its Applications. Sharjah; 2009. p. 1–6.
Shu G, Dehghan A, Oreifej O, et al. Part-based multiple-person tracking with partial occlusion handling. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence; 2012. p. 1815–21.
Liu J, Zhang G, Liu Y, Tian L, Chen YQ. An ultra-fast human detection method for color-depth camera. J Vis Commun Image Represent. 2015;31:177–85. https://doi.org/10.1016/j.jvcir.2015.06.014.
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Conference on Neural Information Processing Systems.
Johnson S, Everingham M. Clustered pose and nonlinear appearance models for human pose estimation. In: Procedings of the British Machine Vision Conference 2010. Aberystwyth: British Machine Vision Association; 2010. p. 12.1–12.11.
Pishchulin L, Andriluka M, Schiele B (2014) Fine-grained activity recognition with holistic and pose based features. arXiv:14061881 [cs].
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016. p. 770–8.
Lin T-Y, Maire M, Belongie S, et al (2014) Microsoft COCO: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision – ECCV 2014. Springer International Publishing pp 740–755.
Wang X, Hu J, Jin Y, et al. Human pose estimation via deep part detection. In: Zhai G, Zhou J, Yang X, editors. Digital TV and Wireless Multimedia Communication. Singapore, Springer Singapore; 2018. p. 55–66.
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, et al. SSD: single shot MultiBox detector. In: Leibe B, Matas J, Sebe N, Welling M, editors. Computer Vision – ECCV 2016. Cham: Springer International Publishing; 2016. p. 21–37.
Panteleris P, Oikonomidis I, Argyros A (2017) Using a single RGB frame for real time 3D hand pose estimation in the wild. arXiv:171203866 [cs].
Deng J, Dong W, Socher R, et al. ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition; 2009. p. 248–55.
Güler RA, Neverova N, Kokkinos I. Densepose: dense human pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 7297–306.
Li X, Yang L, Song Q, Zhou F (2019) Detector-in-detector: multi-level analysis for human-parts. arXiv:190207017 [cs].
• Lin T-Y, Goyal P, Girshick RB, et al. Focal loss for dense object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV); 2017. p. 2999–3007. This paper introduces RetinaNet, a single stage object detector that uses focal loss to focus training on hard examples.
Liu L, Ouyang W, Wang X, et al (2018) Deep learning for generic object detection: a survey. arXiv:180902165 [cs].
• Lin T-Y, Dollar P, Girshick R, et al. Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE; 2017. p. 936–44. This paper introduces Feature Pyramid Networks, a two stage detector that uses lateral connections to build high-level semantic feature maps at all scales.
Girshick R, Donahue J, Darrell T, Malik J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society; 2014. p. 580–7.
Gkioxari G, Girshick R, Dollár P, He K (2017) Detecting and recognizing human-object interactions. arXiv:170407333 [cs].
Lan X, Zhu X, Gong S (2018) Person search by multi-scale matching. arXiv:180708582 [cs] 18.
Redmon J, Farhadi A (2016) YOLO9000: better, faster, stronger. arXiv:161208242 [cs].
• Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. CoRR abs/1804.02767: This paper introduces YOLOv3, an improvement of YOLOv2 using residual blocks and feature pyramids.
Kato S, Takeuchi E, Ishiguro Y, Ninomiya Y, Takeda K, Hamada T. An open approach to autonomous vehicles. IEEE Micro. 2015;35:60–8. https://doi.org/10.1109/MM.2015.133.
(2018) Open-Source To Self-Driving. Contribute to CPFL/Autoware development by creating an account on GitHub. Computing Platforms Federated Labratory.
Thakar V, Saini H, Ahmed W, et al (2018) Efficient single-shot Multibox detector for construction site monitoring. arXiv:180805730 [cs].
Simonyan K, Zisserman A (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 14091556.
Everingham M, Van Gool L, Williams CKI, et al. The Pascal visual object classes (VOC) challenge. Int J Comput Vis. 2010;88:303–38. https://doi.org/10.1007/s11263-009-0275-4.
COCO - Common Objects in Context. http://cocodataset.org/#detection-leaderboard. Accessed 9 Oct 2018.
Non-max Suppression - Object detection. In: Coursera. https://www.coursera.org/lecture/convolutional-neural-networks/non-max-suppression-dvrjH. Accessed 14 Nov 2018.
Funding
This research was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), the Canada Research Chairs (CRC) Program, and the NVIDIA GPU grant.
Author information
Authors and Affiliations
Contributions
The authors Angus Fung and Long Yu Wang contributed equally to this work.
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Human and Animal Rights and Informed Consent
This article does not contain any studies with human or animal subjects performed by any of the authors.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the Topical Collection on Defense, Military, and Surveillance Robotics
Rights and permissions
About this article
Cite this article
Fung, A., Wang, L.Y., Zhang, K. et al. Using Deep Learning to Find Victims in Unknown Cluttered Urban Search and Rescue Environments. Curr Robot Rep 1, 105–115 (2020). https://doi.org/10.1007/s43154-020-00011-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s43154-020-00011-8