Abstract
Deep learning has achieved impressive results in many machine learning tasks such as image recognition and computer vision. Its applicability to supervised problems is however constrained by the availability of high-quality training data consisting of large numbers of humans annotated examples (e.g. millions). To overcome this problem, recently, the AI world is increasingly exploiting artificially generated images or video sequences using realistic photo rendering engines such as those used in entertainment applications. In this way, large sets of training images can be easily created to train deep learning algorithms. In this paper, we generated photo-realistic synthetic image sets to train deep learning models to recognize the correct use of personal safety equipment (e.g., worker safety helmets, high visibility vests, ear protection devices) during at-risk work activities. Then, we performed the adaptation of the domain to real-world images using a very small set of real-world images. We demonstrated that training with the synthetic training set generated and the use of the domain adaptation phase is an effective solution for applications where no training set is available.
Similar content being viewed by others
References
Aubry M, Russell BC (2015) Understanding deep features with computer-generated imagery. In: Proceedings of the IEEE international conference on computer vision, pp 2875–2883
Bewley A, Rigley J, Liu Y, Hawke J, Shen R, Lam V-D, Kendall A (2018) Learning to drive from simulation without real world labels, arXiv:1812.03823
Bochinski E, Eiselein V, Sikora T (2016) Training a convolutional neural network for multi-class object detection using solely virtual world data. In: Advanced Video and Signal Based Surveillance (AVSS), 2016 13th, IEEE international conference on IEEE, pp 278–285
Chen C, Seff A, Kornhauser A, Xiao J (2015) Deepdriving: Learning affordance for direct perception in autonomous driving. In: Proceedings of the IEEE international conference on computer vision, pp 2722–2730
Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255
di Benedetto M, Meloni E, Amato G, Falchi F, Gennaro C (2019) Learning safety equipment detection using virtual worlds. In: International conference on Content-Based multimedia indexing (CBMI), Sep. 2019, pp 8–13
Everingham M, Eslami SMA, Van Gool L, Williams CKI, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136
Fabbri M, Lanzi F, Calderara S, Palazzi A, Vezzani R, Cucchiara R (2018) Learning to detect and track visible and occluded body joints in a virtual world. In: European Conference on Computer Vision (ECCV)
Filipowicz A, Liu J, Kornhauser A (2017) Learning to recognize distance to stop signs using the virtual world of grand theft auto 5, Tech. Rep.
Hong Z-W, Yu-Ming C, Su S-Y, Shann T-Y, Chang Y-H, Yang H-K, Ho BH-L, Tu C-C, Chang Y-C, Hsiao T-C et al (2018) Virtual-to-real: Learning to control in visual semantic segmentation, arXiv:1802.00285
Johnson-Roberson M, Barto C, Mehta R, Sridhar SN, Vasudevan R (2016) Driving in the matrix, Can virtual worlds replace human-generated annotations for real world tasks? arXiv:1610.01983
Kuznetsova A, Rom H, Alldrin N, Uijlings JRR, Krasin I, Pont-tuset J, Kamali S, Popov S, Malloci M, Duerig T, Ferrari V (2018) The open images dataset V4: unified image classification, object detection, and visual relationship detection at scale. arXiv:1811.00982
Lai K-T, Lin C-C, Kang C-Y, Liao M-E, Chen M-S (2018) Vivid: virtual environment for visual deep learning. In: Proceedings of the 26th ACM international conference on multimedia, ser. MM ’18. New York, NY, USA: ACM, pp 1356–1359
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision – ECCV 2014. Springer International Publishing, Cham, pp 740–755
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Long M, Cao Y, Wang J, Jordan MI (2015) Learning transferable features with deep adaptation networks. In: Proceedings of the 32nd international conference on international conference on machine learning - volume 37, ser, ICML’15. http://JMLR.org, pp 97–105
Luo W, Sun P, Zhong F, Liu W, Zhang T, Wang Y (2019) End-to-end active object tracking and its real-world deployment via reinforcement learning. IEEE Trans Pattern Anal Mach Intell
Marín J, Vázquez D, Gerónimo D, López AM (2010) Learning appearance in virtual scenarios for pedestrian detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2010, pp 137–144
Martinez M, Sitawarin C, Finch K, Meincke L, Yablonski A, Kornhauser A (2017) Beyond grand theft auto v for training, testing and enhancing deep learning in self driving cars, arXiv:1712.01397
Meloni E, Di Benedetto M, Amato G, Falchi F, Gennaro C (2019) Project Website. http://aimir.isti.cnr.it/vw-ppe
Qiu W, Yuille A (2016) Unrealcv: Connecting computer vision to unreal engine. In: European conference on computer vision. Springer, pp 909–916
RAGE Plugin Hook (2013) https://ragepluginhook.net
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: The IEEE conference on Computer Vision and Pattern Recognition (CVPR)
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:1804.02767
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems, vol 28. Curran Associates, Inc., pp 91–99
Richter SR, Vineet V, Roth S, Koltun V (2016) Playing for data: ground truth from computer games. In: European conference on computer vision. Springer, pp 102–118
Rockstar Games Inc. (2013) Grand Theft Auto - V https://www.rockstargames.com/V
Tremblay J, Prakash A, Acuna D, Brophy M, Jampani V, Anil C, To T, Cameracci E, Boochoon S, Birchfield S (2018) Training deep networks with synthetic data: bridging the reality gap by domain randomization. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 969–977
Vázquez D, Lopez AM, Ponsa D (2012) Unsupervised domain adaptation of virtual and real worlds for pedestrian detection. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp 3492–3495
Vázquez D, López AM, Marín J, Ponsa D, Gerónimo D (2014) Virtual and real world adaptation for pedestrian detection. IEEE Trans Pattern Anal Mach Intell 36(4):797–809
Acknowledgements
This work was partially supported by “Automatic Data and documents Analysis to enhance human-based processes” (ADA), funded by CUP CIPE D55F17000290009, and by the AI4EU project, funded by EC (H2020 - Contract n. 825619). We gratefully acknowledge the support of NVIDIA Corporation with the donation of a Jetson TX2 board used for this research.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Di Benedetto, M., Carrara, F., Meloni, E. et al. Learning accurate personal protective equipment detection from virtual worlds. Multimed Tools Appl 80, 23241–23253 (2021). https://doi.org/10.1007/s11042-020-09597-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09597-9