Face detection in the operating room: comparison of state-of-the-art methods and a self-supervised approach

  • Thibaut IssenhuthEmail author
  • Vinkle Srivastav
  • Afshin Gangi
  • Nicolas Padoy
Original Article



Face detection is a needed component for the automatic analysis and assistance of human activities during surgical procedures. Efficient face detection algorithms can indeed help to detect and identify the persons present in the room and also be used to automatically anonymize the data. However, current algorithms trained on natural images do not generalize well to the operating room (OR) images. In this work, we provide a comparison of state-of-the-art face detectors on OR data and also present an approach to train a face detector for the OR by exploiting non-annotated OR images.


We propose a comparison of six state-of-the-art face detectors on clinical data using multi-view OR faces, a dataset of OR images capturing real surgical activities. We then propose to use self-supervision, a domain adaptation method, for the task of face detection in the OR. The approach makes use of non-annotated images to fine-tune a state-of-the-art detector for the OR without using any human supervision.


The results show that the best model, namely the tiny face detector, yields an average precision of 0.556 at intersection over union of 0.5. Our self-supervised model using non-annotated clinical data outperforms this result by 9.2%.


We present the first comparison of state-of-the-art face detectors on OR images and show that results can be significantly improved by using self-supervision on non-annotated data.


Face detection Semi-supervised learning MVOR-Faces dataset Visual domain adaptation Operating room 



This work was supported by French state funds managed by the ANR within the Investissements d’Avenir program under references ANR-16-CE33-0009 (DeepSurg), ANR-11-LABX-0004 (Labex CAMI) and ANR-10-IDEX-0002-02 (IdEx Unistra). The authors would also like to thank the members of the Interventional Radiology Department at University Hospital of Strasbourg for their help in generating the dataset.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical standard

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed consent

Informed consent was obtained from all patients for being included in the study.


  1. 1.
    Chen K, Gabriel P, Alasfour A, Gong C, Doyle WK, Devinsky O, Friedman D, Dugan P, Melloni L, Thesen T, Gonda D, Sattar S, Wang S, Gilja V (2018) Patient-specific pose estimation in clinical environments. IEEE J Transl Eng Health Med 6:1–11Google Scholar
  2. 2.
    Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: CVPR, pp I–IGoogle Scholar
  3. 3.
    Najibi M, Samangouei P, Chellappa R, Davis LS (2017) SSH: single stage headless face detector. In: ICCV, pp 4885–4894Google Scholar
  4. 4.
    Zhang S, Zhu X, Lei Z, Shi H, Wang X, Li SZ (2017) S\(^3\)FD: single shot scale-invariant face detector. In: International conference on computer vision (ICCV) at Venice, ItalyGoogle Scholar
  5. 5.
    Jiang H, Learned-Miller E (2017) Face detection with the faster R-CNN. In: 12th IEEE international conference on automatic face & gesture recognition (FG 2017)Google Scholar
  6. 6.
    Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp 91–99Google Scholar
  7. 7.
    Yang S, Luo P, Loy CC, Tang X (2016) Wider face: a face detection benchmark. In: CVPRGoogle Scholar
  8. 8.
    Cao Z, Simon T, Wei S-E, Sheikh Y (2017) Realtime multi-person 2D pose estimation using part affinity fields. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1302–1310Google Scholar
  9. 9.
    Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: ECCV, pp 34–50Google Scholar
  10. 10.
    Fang H-S, Xie S, Tai Y-W, Lu C (2017) RMPE: regional multi-person pose estimation. In: ICCVGoogle Scholar
  11. 11.
    Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: ECCVGoogle Scholar
  12. 12.
    Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 7103–7112Google Scholar
  13. 13.
    Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: ECCV, pp 740–755Google Scholar
  14. 14.
    Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR, June 2014Google Scholar
  15. 15.
    Twinanda AP, Shehata S, Mutter D, Marescaux J, de Mathelin M, Padoy N (2016) Multi-stream deep architecture for surgical phase recognition on multi-view RGBD videos. In: MICCAI workshop on modeling and monitoring of computer assisted interventions (M2CAI)Google Scholar
  16. 16.
    Maier-Hein L, Vedula SS, Speidel S, Navab N, Kikinis R, Park A, Eisenmann M, Feussner H, Forestier G, Giannarou S, Hashizume M, Katic D, Kenngott H, Kranzfelder M, Malpani A, März K, Neumuth T, Padoy N, Pugh C, Schoch N, Stoyanov D, Taylor R, Wagner M, Hager GD, Jannin P (2017) Surgical data science for next-generation interventions. Nat Biomed Eng 1(9):691CrossRefGoogle Scholar
  17. 17.
    Yeung S, Downing NL, Fei-Fei L, Milstein A (2018) Bedside computer vision-moving artificial intelligence from driver assistance to patient safety. NEJM 378(14):1271CrossRefGoogle Scholar
  18. 18.
    Kadkhodamohammadi A, Gangi A, de Mathelin M, Padoy N (2017) Articulated clinician detection using 3D pictorial structures on RGB-D data. Med Image Anal 35:215–224CrossRefGoogle Scholar
  19. 19.
    Kadkhodamohammadi A, Gangi A, de Mathelin M, Padoy N (2017) A multi-view RGB-D approach for human pose estimation in operating rooms. In: WACV, pp 363–372Google Scholar
  20. 20.
    Belagiannis V, Wang X, Shitrit HB, Hashimoto K, Stauder R, Aoki Y, Kranzfelder M, Schneider A, Fua P, Ilic S, Feussner H, Navab N (2016) Parsing human skeletons in an operating room. Mach Vis Appl 27(7):1035–1046CrossRefGoogle Scholar
  21. 21.
    Nieto-Rodríguez A, Mucientes M, Brea VM (2015) System for medical mask detection in the operating room through facial attributes. In: Iberian conference on pattern recognition and image analysis. Springer, pp 138–145Google Scholar
  22. 22.
    Flouty E, Zisimopoulos O, Stoyanov D (2018) Faceoff: anonymizing videos in the operating rooms. In: OR 2.0 context-aware operating theaters, computer assisted robotic endoscopy, clinical image-based procedures, and skin image analysis. Springer, pp 30–38Google Scholar
  23. 23.
    Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat 28(2):337–407CrossRefGoogle Scholar
  24. 24.
    Srivastav V, Issenhuth T, Kadkhodamohammadi A, de Mathelin M, Gangi A, Padoy N (2018) MVOR: a multi-view rgb-d operating room dataset for 2D and 3D human pose estimation. In: MICCAI-LABELS-2018Google Scholar
  25. 25.
    Laine S, Aila T (2016) Temporal ensembling for semi-supervised learning. In: ICLR. arXiv preprint arXiv:1610.02242
  26. 26.
    Radosavovic I, Dollár P, Girshick RB, Gkioxari G, He K (2018) Data distillation: towards omni-supervised learning. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 4119–4128Google Scholar
  27. 27.
    Hu P, Ramanan D (2017) Finding tiny faces. In: CVPRGoogle Scholar
  28. 28.
    Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. In: ECCV. Springer, pp 21–37Google Scholar

Copyright information

© CARS 2019

Authors and Affiliations

  1. 1.ICube, University of Strasbourg, CNRS, IHU StrasbourgStrasbourgFrance
  2. 2.Radiology DepartmentUniversity Hospital of StrasbourgStrasbourgFrance

Personalised recommendations