Face detection in the operating room: comparison of state-of-the-art methods and a self-supervised approach
- 12 Downloads
Face detection is a needed component for the automatic analysis and assistance of human activities during surgical procedures. Efficient face detection algorithms can indeed help to detect and identify the persons present in the room and also be used to automatically anonymize the data. However, current algorithms trained on natural images do not generalize well to the operating room (OR) images. In this work, we provide a comparison of state-of-the-art face detectors on OR data and also present an approach to train a face detector for the OR by exploiting non-annotated OR images.
We propose a comparison of six state-of-the-art face detectors on clinical data using multi-view OR faces, a dataset of OR images capturing real surgical activities. We then propose to use self-supervision, a domain adaptation method, for the task of face detection in the OR. The approach makes use of non-annotated images to fine-tune a state-of-the-art detector for the OR without using any human supervision.
The results show that the best model, namely the tiny face detector, yields an average precision of 0.556 at intersection over union of 0.5. Our self-supervised model using non-annotated clinical data outperforms this result by 9.2%.
We present the first comparison of state-of-the-art face detectors on OR images and show that results can be significantly improved by using self-supervision on non-annotated data.
KeywordsFace detection Semi-supervised learning MVOR-Faces dataset Visual domain adaptation Operating room
This work was supported by French state funds managed by the ANR within the Investissements d’Avenir program under references ANR-16-CE33-0009 (DeepSurg), ANR-11-LABX-0004 (Labex CAMI) and ANR-10-IDEX-0002-02 (IdEx Unistra). The authors would also like to thank the members of the Interventional Radiology Department at University Hospital of Strasbourg for their help in generating the dataset.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed consent was obtained from all patients for being included in the study.
- 1.Chen K, Gabriel P, Alasfour A, Gong C, Doyle WK, Devinsky O, Friedman D, Dugan P, Melloni L, Thesen T, Gonda D, Sattar S, Wang S, Gilja V (2018) Patient-specific pose estimation in clinical environments. IEEE J Transl Eng Health Med 6:1–11Google Scholar
- 2.Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: CVPR, pp I–IGoogle Scholar
- 3.Najibi M, Samangouei P, Chellappa R, Davis LS (2017) SSH: single stage headless face detector. In: ICCV, pp 4885–4894Google Scholar
- 4.Zhang S, Zhu X, Lei Z, Shi H, Wang X, Li SZ (2017) S\(^3\)FD: single shot scale-invariant face detector. In: International conference on computer vision (ICCV) at Venice, ItalyGoogle Scholar
- 5.Jiang H, Learned-Miller E (2017) Face detection with the faster R-CNN. In: 12th IEEE international conference on automatic face & gesture recognition (FG 2017)Google Scholar
- 6.Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp 91–99Google Scholar
- 7.Yang S, Luo P, Loy CC, Tang X (2016) Wider face: a face detection benchmark. In: CVPRGoogle Scholar
- 8.Cao Z, Simon T, Wei S-E, Sheikh Y (2017) Realtime multi-person 2D pose estimation using part affinity fields. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1302–1310Google Scholar
- 9.Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: ECCV, pp 34–50Google Scholar
- 10.Fang H-S, Xie S, Tai Y-W, Lu C (2017) RMPE: regional multi-person pose estimation. In: ICCVGoogle Scholar
- 11.Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: ECCVGoogle Scholar
- 12.Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 7103–7112Google Scholar
- 13.Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: ECCV, pp 740–755Google Scholar
- 14.Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR, June 2014Google Scholar
- 15.Twinanda AP, Shehata S, Mutter D, Marescaux J, de Mathelin M, Padoy N (2016) Multi-stream deep architecture for surgical phase recognition on multi-view RGBD videos. In: MICCAI workshop on modeling and monitoring of computer assisted interventions (M2CAI)Google Scholar
- 16.Maier-Hein L, Vedula SS, Speidel S, Navab N, Kikinis R, Park A, Eisenmann M, Feussner H, Forestier G, Giannarou S, Hashizume M, Katic D, Kenngott H, Kranzfelder M, Malpani A, März K, Neumuth T, Padoy N, Pugh C, Schoch N, Stoyanov D, Taylor R, Wagner M, Hager GD, Jannin P (2017) Surgical data science for next-generation interventions. Nat Biomed Eng 1(9):691CrossRefGoogle Scholar
- 19.Kadkhodamohammadi A, Gangi A, de Mathelin M, Padoy N (2017) A multi-view RGB-D approach for human pose estimation in operating rooms. In: WACV, pp 363–372Google Scholar
- 21.Nieto-Rodríguez A, Mucientes M, Brea VM (2015) System for medical mask detection in the operating room through facial attributes. In: Iberian conference on pattern recognition and image analysis. Springer, pp 138–145Google Scholar
- 22.Flouty E, Zisimopoulos O, Stoyanov D (2018) Faceoff: anonymizing videos in the operating rooms. In: OR 2.0 context-aware operating theaters, computer assisted robotic endoscopy, clinical image-based procedures, and skin image analysis. Springer, pp 30–38Google Scholar
- 24.Srivastav V, Issenhuth T, Kadkhodamohammadi A, de Mathelin M, Gangi A, Padoy N (2018) MVOR: a multi-view rgb-d operating room dataset for 2D and 3D human pose estimation. In: MICCAI-LABELS-2018Google Scholar
- 25.Laine S, Aila T (2016) Temporal ensembling for semi-supervised learning. In: ICLR. arXiv preprint arXiv:1610.02242
- 26.Radosavovic I, Dollár P, Girshick RB, Gkioxari G, He K (2018) Data distillation: towards omni-supervised learning. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 4119–4128Google Scholar
- 27.Hu P, Ramanan D (2017) Finding tiny faces. In: CVPRGoogle Scholar
- 28.Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. In: ECCV. Springer, pp 21–37Google Scholar