Skip to main content
Log in

ETM-face: effective training sample selection and multi-scale feature learning for face detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In recent years, deep-learning-based face detectors have achieved promising results and been successfully used in a wide range of practical applications. However, extreme appearance variations are still the major obstacles for robust and accurate face detection in the wild. To address this issue, we propose an Improved Training Sample Selection (ITSS) strategy for mining effective positive and negative samples during network training. The proposed ITSS procedure collaborates with face sampling during data augmentation and selects suitable positive sample centres and IoU overlap for face detection. Moreover, we propose a Residual Feature Pyramid Fusion (RFPF) module that collects semantically robust features to improve the scale-invariance of deep features and better represent faces at different feature pyramid levels. The experimental results obtained on the FDDB and WiderFace datasets demonstrate the superiority of the proposed method over the state-of-the-art approaches. Specially, the proposed method achieves 96.9% and 96.2% in terms of AP on the easy and medium test sets of WiderFace.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availability

available

Code Availability

available

Notes

  1. This term corresponds to ‘centre-ness’ in [35].

References

  1. Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6154–6162

  2. Chen P-Y, Hsieh J-W, Wang C-Y, Liao H-YM, Gochoo M (2019) Residual bi-fusion feature pyramid network for accurate single-shot object detection arXiv:1911.12051

  3. Chen K, Li J, Lin W, See J, Wang J, Duan L, Chen Z, He C, Zou J (2019) Towards accurate one-stage object detection with ap-loss. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5119–5127

  4. Chi C, Zhang S, Xing J, Lei Z, Li SZ, Zou X (2019) Selective refinement network for high performance face detection. In: Proceedings of the AAAI conference on artificial intelligence, pp 8231–8238

  5. Dargan S, Kumar M, Ayyagari MR, Kumar G (2020) A survey of deep learning and its applications: a new paradigm to machine learning. Archives of Computational Methods in Engineering 27(4):1071–1092

    Article  MathSciNet  Google Scholar 

  6. Deng J, Guo J, Ververas E, Kotsia I, Zafeiriou S (2020) Retinaface: single-shot multi-level face localisation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5203–5212

  7. Feng Z-H, Kittler J, Awais M, Huber P, Wu X-J (2018) Wing loss for robust facial landmark localisation with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2235–2245

  8. Gao S-H, Cheng M-M, Zhao K, Zhang X-Y, Yang -H, Torr P. (2019) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell 43(2):652–662

    Article  Google Scholar 

  9. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1440–1448

  10. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 580–587

  11. Guo C, Fan B, Zhang Q, Xiang S, Pan C (2020) Augfpn: improving multi-scale feature learning for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 12595–12604

  12. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2961–2969

  13. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778

  14. Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus) arXiv:1606.08415

  15. Huang L, Yang Y, Deng Y, Yu Y (2015) Densebox: unifying landmark localization with end to end object detection arXiv:1509.04874

  16. Jain V, Learned-Miller E (2010) A benchmark for face detection in unconstrained settings. Technical Report, UMass Amherst technical report

  17. Ke W, Zhang T, Huang Z, Ye Q, Liu J, Huang D (2020) Multiple anchor learning for visual object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 10206–10215

  18. Kong T, Sun F, Liu H, Jiang Y, Li L, Shi J (2020) Foveabox: Beyound anchor-based object detection. IEEE Trans Image Process 29:7389–7398

    Article  MATH  Google Scholar 

  19. Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 845–853

  20. Li J, Wang Y, Wang C, Tai Y, Qian J, Yang J, Wang C, Li J, Huang F (2019) Dsfd: dual shot face detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5060–5069

  21. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2117–2125

  22. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2980–2988

  23. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Proceedings of the european conference on computer vision (ECCV), pp 740–755

  24. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: Proceedings of the european conference on computer vision (ECCV), pp 21–37

  25. Liu W, Liao S, Ren W, Hu W, Yu Y (2019) High-level semantic feature detection: A new perspective for pedestrian detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5187–5196

  26. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 8759–8768

  27. Liu Y, Tang X (2020) Bfbox: searching face-appropriate backbone and feature pyramid network for face detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 13568–13577

  28. Liu Y, Tang X, Han J, Liu J, Rui D, Wu X (2020) Hambox: delving into mining high-quality anchors on face detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 13043–13051

  29. Najibi M, Samangouei P, Chellappa R, Davis LS (2017) Ssh: single stage headless face detector. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 4875–4884

  30. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788

  31. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 7263–7271

  32. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767

  33. Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 10781–10790

  34. Tang X, Du DK, He Z, Liu J (2018) Pyramidbox: a context-assisted single shot face detector. In: Proceedings of the european conference on computer vision (ECCV), pp 797–813

  35. Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 9627–9636

  36. Wang C, Luo Z, Zhong Z, Li S (2021) Safd: single shot anchor free face detector. Multimed Tools Appl 80(9):13761–13785

    Article  Google Scholar 

  37. Xiao Y, Cao D, Gao L (2020) Face detection based on occlusion area detection and recovery. Multimed Tools Appl 79(23):16531–16546

    Article  Google Scholar 

  38. Xiao Y, Tian Z, Yu J, Zhang Y, Liu S, Du S, Lan X (2020) A review of object detection based on deep learning. Multimed Tools Appl 79(33):23729–23791

    Article  Google Scholar 

  39. Yang S, Luo P, Loy C-C, Tang X (2016) Wider face: a face detection benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5525–5533

  40. Yang W, Zhou L, Li T, Wang H (2019) A face detection method based on cascade convolutional neural network. Multimed Tools Appl 78 (17):24373–24390

    Article  Google Scholar 

  41. Yashunin D, Baydasov T, Vlasov R (2020) Maskface: multi-task face and landmark detector arXiv:2005.09412

  42. Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM international conference on multimedia, pp 516–520

  43. Zhang S, Chi C, Lei Z, Li SZ (2020) Refineface: refinement neural network for high performance face detection. IEEE Trans Pattern Anal Mach Intell 43(11):4008–4020

    Article  Google Scholar 

  44. Zhang S, Chi C, Yao Y, Lei Z, Li SZ (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 9759–9768

  45. Zhang F, Fan X, Ai G, Song J, Qin Y, Wu J (2019) Accurate face detection for high performance arXiv:1905.01585

  46. Zhang B, Li J, Wang Y, Tai Y, Wang C, Li J, Huang F, Xia Y, Pei W, Ji R (2020) Asfd: automatic and scalable face detector arXiv:2003.11228

  47. Zhang X, Wan F, Liu C, Ji R, Ye Q (2019) Freeanchor: learning to match anchors for visual object detection. In: Proceedings of the 33rd international conference on neural information processing systems, pp 147–155

  48. Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4203–4212

  49. Zhang S, Zhu X, Lei Z, Shi H, Wang X, Li SZ (2017) S3fd: single shot scale-invariant face detector. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 192–201

  50. Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-iou loss: faster and better learning for bounding box regression. In: Proceedings of the AAAI conference on artificial intelligence, pp 12993–13000

  51. Zheng Z, Wang P, Ren D, Liu W, Ye R, Hu Q, Zuo W (2020) Enhancing geometric factors in model learning and inference for object detection and instance segmentation arXiv:2005.03572

  52. Zhu Y, Cai H, Zhang S, Wang C, Xiong Y (2020) Tinaface: strong but simple baseline for face detection arXiv:2011.13183

  53. Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 840–849

  54. Zhu X, Hu H, Lin S, Dai J (2019) Deformable convnets v2: more deformable, better results. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 9308–9316

  55. Zhu J, Li D, Han T, Tian L, Shan Y (2020) Progressface: sscale-aware progressive learning for face detection. In: Proceedings of the european conference on computer vision (ECCV), pp 344–360

  56. Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: Proceedings of the European conference on computer vision (ECCV), pp 391–405

  57. Zoph B, Le QV (2016) Neural architecture search with reinforcement learning arXiv:1611.01578

Download references

Acknowledgements

This work was supported by the Major Project of National Social Science Foundation of China (No. 21&ZD166), the National Natural Science Foundation of China (61876072, 61902153) and the Natural Science Foundation of Jiangsu Province (No. BK20221535).

Funding

This work was supported by the Major Project of National Social Science Foundation of China (No. 21&ZD166), the National Natural Science Foundation of China (61876072, 61902153) and the Natural Science Foundation of Jiangsu Province (No. BK20221535).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by [Junyuan He]. The first draft of the manuscript was written by [Junyuan He] and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Xiaoning Song or Zhenhua Feng.

Ethics declarations

Ethics approval

informed consent

Consent to participate

consent

Consent for Publication

consent

Conflict of Interests

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, J., Song, X., Feng, Z. et al. ETM-face: effective training sample selection and multi-scale feature learning for face detection. Multimed Tools Appl 82, 26595–26611 (2023). https://doi.org/10.1007/s11042-023-14859-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14859-3

Keywords

Navigation