Abstract
In recent years, deep-learning-based face detectors have achieved promising results and been successfully used in a wide range of practical applications. However, extreme appearance variations are still the major obstacles for robust and accurate face detection in the wild. To address this issue, we propose an Improved Training Sample Selection (ITSS) strategy for mining effective positive and negative samples during network training. The proposed ITSS procedure collaborates with face sampling during data augmentation and selects suitable positive sample centres and IoU overlap for face detection. Moreover, we propose a Residual Feature Pyramid Fusion (RFPF) module that collects semantically robust features to improve the scale-invariance of deep features and better represent faces at different feature pyramid levels. The experimental results obtained on the FDDB and WiderFace datasets demonstrate the superiority of the proposed method over the state-of-the-art approaches. Specially, the proposed method achieves 96.9% and 96.2% in terms of AP on the easy and medium test sets of WiderFace.
Similar content being viewed by others
Data Availability
available
Code Availability
available
Notes
This term corresponds to ‘centre-ness’ in [35].
References
Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6154–6162
Chen P-Y, Hsieh J-W, Wang C-Y, Liao H-YM, Gochoo M (2019) Residual bi-fusion feature pyramid network for accurate single-shot object detection arXiv:1911.12051
Chen K, Li J, Lin W, See J, Wang J, Duan L, Chen Z, He C, Zou J (2019) Towards accurate one-stage object detection with ap-loss. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5119–5127
Chi C, Zhang S, Xing J, Lei Z, Li SZ, Zou X (2019) Selective refinement network for high performance face detection. In: Proceedings of the AAAI conference on artificial intelligence, pp 8231–8238
Dargan S, Kumar M, Ayyagari MR, Kumar G (2020) A survey of deep learning and its applications: a new paradigm to machine learning. Archives of Computational Methods in Engineering 27(4):1071–1092
Deng J, Guo J, Ververas E, Kotsia I, Zafeiriou S (2020) Retinaface: single-shot multi-level face localisation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5203–5212
Feng Z-H, Kittler J, Awais M, Huber P, Wu X-J (2018) Wing loss for robust facial landmark localisation with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2235–2245
Gao S-H, Cheng M-M, Zhao K, Zhang X-Y, Yang -H, Torr P. (2019) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell 43(2):652–662
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 580–587
Guo C, Fan B, Zhang Q, Xiang S, Pan C (2020) Augfpn: improving multi-scale feature learning for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 12595–12604
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2961–2969
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus) arXiv:1606.08415
Huang L, Yang Y, Deng Y, Yu Y (2015) Densebox: unifying landmark localization with end to end object detection arXiv:1509.04874
Jain V, Learned-Miller E (2010) A benchmark for face detection in unconstrained settings. Technical Report, UMass Amherst technical report
Ke W, Zhang T, Huang Z, Ye Q, Liu J, Huang D (2020) Multiple anchor learning for visual object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 10206–10215
Kong T, Sun F, Liu H, Jiang Y, Li L, Shi J (2020) Foveabox: Beyound anchor-based object detection. IEEE Trans Image Process 29:7389–7398
Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 845–853
Li J, Wang Y, Wang C, Tai Y, Qian J, Yang J, Wang C, Li J, Huang F (2019) Dsfd: dual shot face detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5060–5069
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2117–2125
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2980–2988
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Proceedings of the european conference on computer vision (ECCV), pp 740–755
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: Proceedings of the european conference on computer vision (ECCV), pp 21–37
Liu W, Liao S, Ren W, Hu W, Yu Y (2019) High-level semantic feature detection: A new perspective for pedestrian detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5187–5196
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 8759–8768
Liu Y, Tang X (2020) Bfbox: searching face-appropriate backbone and feature pyramid network for face detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 13568–13577
Liu Y, Tang X, Han J, Liu J, Rui D, Wu X (2020) Hambox: delving into mining high-quality anchors on face detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 13043–13051
Najibi M, Samangouei P, Chellappa R, Davis LS (2017) Ssh: single stage headless face detector. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 4875–4884
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 7263–7271
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767
Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 10781–10790
Tang X, Du DK, He Z, Liu J (2018) Pyramidbox: a context-assisted single shot face detector. In: Proceedings of the european conference on computer vision (ECCV), pp 797–813
Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 9627–9636
Wang C, Luo Z, Zhong Z, Li S (2021) Safd: single shot anchor free face detector. Multimed Tools Appl 80(9):13761–13785
Xiao Y, Cao D, Gao L (2020) Face detection based on occlusion area detection and recovery. Multimed Tools Appl 79(23):16531–16546
Xiao Y, Tian Z, Yu J, Zhang Y, Liu S, Du S, Lan X (2020) A review of object detection based on deep learning. Multimed Tools Appl 79(33):23729–23791
Yang S, Luo P, Loy C-C, Tang X (2016) Wider face: a face detection benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5525–5533
Yang W, Zhou L, Li T, Wang H (2019) A face detection method based on cascade convolutional neural network. Multimed Tools Appl 78 (17):24373–24390
Yashunin D, Baydasov T, Vlasov R (2020) Maskface: multi-task face and landmark detector arXiv:2005.09412
Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM international conference on multimedia, pp 516–520
Zhang S, Chi C, Lei Z, Li SZ (2020) Refineface: refinement neural network for high performance face detection. IEEE Trans Pattern Anal Mach Intell 43(11):4008–4020
Zhang S, Chi C, Yao Y, Lei Z, Li SZ (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 9759–9768
Zhang F, Fan X, Ai G, Song J, Qin Y, Wu J (2019) Accurate face detection for high performance arXiv:1905.01585
Zhang B, Li J, Wang Y, Tai Y, Wang C, Li J, Huang F, Xia Y, Pei W, Ji R (2020) Asfd: automatic and scalable face detector arXiv:2003.11228
Zhang X, Wan F, Liu C, Ji R, Ye Q (2019) Freeanchor: learning to match anchors for visual object detection. In: Proceedings of the 33rd international conference on neural information processing systems, pp 147–155
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4203–4212
Zhang S, Zhu X, Lei Z, Shi H, Wang X, Li SZ (2017) S3fd: single shot scale-invariant face detector. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 192–201
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-iou loss: faster and better learning for bounding box regression. In: Proceedings of the AAAI conference on artificial intelligence, pp 12993–13000
Zheng Z, Wang P, Ren D, Liu W, Ye R, Hu Q, Zuo W (2020) Enhancing geometric factors in model learning and inference for object detection and instance segmentation arXiv:2005.03572
Zhu Y, Cai H, Zhang S, Wang C, Xiong Y (2020) Tinaface: strong but simple baseline for face detection arXiv:2011.13183
Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 840–849
Zhu X, Hu H, Lin S, Dai J (2019) Deformable convnets v2: more deformable, better results. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 9308–9316
Zhu J, Li D, Han T, Tian L, Shan Y (2020) Progressface: sscale-aware progressive learning for face detection. In: Proceedings of the european conference on computer vision (ECCV), pp 344–360
Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: Proceedings of the European conference on computer vision (ECCV), pp 391–405
Zoph B, Le QV (2016) Neural architecture search with reinforcement learning arXiv:1611.01578
Acknowledgements
This work was supported by the Major Project of National Social Science Foundation of China (No. 21&ZD166), the National Natural Science Foundation of China (61876072, 61902153) and the Natural Science Foundation of Jiangsu Province (No. BK20221535).
Funding
This work was supported by the Major Project of National Social Science Foundation of China (No. 21&ZD166), the National Natural Science Foundation of China (61876072, 61902153) and the Natural Science Foundation of Jiangsu Province (No. BK20221535).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by [Junyuan He]. The first draft of the manuscript was written by [Junyuan He] and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval
informed consent
Consent to participate
consent
Consent for Publication
consent
Conflict of Interests
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
He, J., Song, X., Feng, Z. et al. ETM-face: effective training sample selection and multi-scale feature learning for face detection. Multimed Tools Appl 82, 26595–26611 (2023). https://doi.org/10.1007/s11042-023-14859-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14859-3