ETM-face: effective training sample selection and multi-scale feature learning for face detection

He, Junyuan; Song, Xiaoning; Feng, Zhenhua; Xu, Tianyang; Wu, Xiaojun; Kittler, Josef

doi:10.1007/s11042-023-14859-3

ETM-face: effective training sample selection and multi-scale feature learning for face detection

Published: 09 March 2023

Volume 82, pages 26595–26611, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Junyuan He¹,
Xiaoning Song ORCID: orcid.org/0000-0002-5741-9318¹,
Zhenhua Feng^2,3,
Tianyang Xu¹,
Xiaojun Wu¹ &
…
Josef Kittler³

232 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

In recent years, deep-learning-based face detectors have achieved promising results and been successfully used in a wide range of practical applications. However, extreme appearance variations are still the major obstacles for robust and accurate face detection in the wild. To address this issue, we propose an Improved Training Sample Selection (ITSS) strategy for mining effective positive and negative samples during network training. The proposed ITSS procedure collaborates with face sampling during data augmentation and selects suitable positive sample centres and IoU overlap for face detection. Moreover, we propose a Residual Feature Pyramid Fusion (RFPF) module that collects semantically robust features to improve the scale-invariance of deep features and better represent faces at different feature pyramid levels. The experimental results obtained on the FDDB and WiderFace datasets demonstrate the superiority of the proposed method over the state-of-the-art approaches. Specially, the proposed method achieves 96.9% and 96.2% in terms of AP on the easy and medium test sets of WiderFace.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning based single sample face recognition: a survey

Article 05 August 2022

Single sample face recognition using deep learning: a survey

Article 15 July 2023

EfficientFace: an efficient deep network with feature enhancement for accurate face detection

Article 14 July 2023

Data Availability

available

Code Availability

available

Notes

This term corresponds to ‘centre-ness’ in [35].

References

Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6154–6162
Chen P-Y, Hsieh J-W, Wang C-Y, Liao H-YM, Gochoo M (2019) Residual bi-fusion feature pyramid network for accurate single-shot object detection arXiv:1911.12051
Chen K, Li J, Lin W, See J, Wang J, Duan L, Chen Z, He C, Zou J (2019) Towards accurate one-stage object detection with ap-loss. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5119–5127
Chi C, Zhang S, Xing J, Lei Z, Li SZ, Zou X (2019) Selective refinement network for high performance face detection. In: Proceedings of the AAAI conference on artificial intelligence, pp 8231–8238
Dargan S, Kumar M, Ayyagari MR, Kumar G (2020) A survey of deep learning and its applications: a new paradigm to machine learning. Archives of Computational Methods in Engineering 27(4):1071–1092
Article MathSciNet Google Scholar
Deng J, Guo J, Ververas E, Kotsia I, Zafeiriou S (2020) Retinaface: single-shot multi-level face localisation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5203–5212
Feng Z-H, Kittler J, Awais M, Huber P, Wu X-J (2018) Wing loss for robust facial landmark localisation with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2235–2245
Gao S-H, Cheng M-M, Zhao K, Zhang X-Y, Yang -H, Torr P. (2019) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell 43(2):652–662
Article Google Scholar
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 580–587
Guo C, Fan B, Zhang Q, Xiang S, Pan C (2020) Augfpn: improving multi-scale feature learning for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 12595–12604
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2961–2969
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus) arXiv:1606.08415
Huang L, Yang Y, Deng Y, Yu Y (2015) Densebox: unifying landmark localization with end to end object detection arXiv:1509.04874
Jain V, Learned-Miller E (2010) A benchmark for face detection in unconstrained settings. Technical Report, UMass Amherst technical report
Ke W, Zhang T, Huang Z, Ye Q, Liu J, Huang D (2020) Multiple anchor learning for visual object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 10206–10215
Kong T, Sun F, Liu H, Jiang Y, Li L, Shi J (2020) Foveabox: Beyound anchor-based object detection. IEEE Trans Image Process 29:7389–7398
Article MATH Google Scholar
Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 845–853
Li J, Wang Y, Wang C, Tai Y, Qian J, Yang J, Wang C, Li J, Huang F (2019) Dsfd: dual shot face detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5060–5069
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2117–2125
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2980–2988
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Proceedings of the european conference on computer vision (ECCV), pp 740–755
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: Proceedings of the european conference on computer vision (ECCV), pp 21–37
Liu W, Liao S, Ren W, Hu W, Yu Y (2019) High-level semantic feature detection: A new perspective for pedestrian detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5187–5196
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 8759–8768
Liu Y, Tang X (2020) Bfbox: searching face-appropriate backbone and feature pyramid network for face detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 13568–13577
Liu Y, Tang X, Han J, Liu J, Rui D, Wu X (2020) Hambox: delving into mining high-quality anchors on face detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 13043–13051
Najibi M, Samangouei P, Chellappa R, Davis LS (2017) Ssh: single stage headless face detector. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 4875–4884
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 7263–7271
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767
Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 10781–10790
Tang X, Du DK, He Z, Liu J (2018) Pyramidbox: a context-assisted single shot face detector. In: Proceedings of the european conference on computer vision (ECCV), pp 797–813
Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 9627–9636
Wang C, Luo Z, Zhong Z, Li S (2021) Safd: single shot anchor free face detector. Multimed Tools Appl 80(9):13761–13785
Article Google Scholar
Xiao Y, Cao D, Gao L (2020) Face detection based on occlusion area detection and recovery. Multimed Tools Appl 79(23):16531–16546
Article Google Scholar
Xiao Y, Tian Z, Yu J, Zhang Y, Liu S, Du S, Lan X (2020) A review of object detection based on deep learning. Multimed Tools Appl 79(33):23729–23791
Article Google Scholar
Yang S, Luo P, Loy C-C, Tang X (2016) Wider face: a face detection benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5525–5533
Yang W, Zhou L, Li T, Wang H (2019) A face detection method based on cascade convolutional neural network. Multimed Tools Appl 78 (17):24373–24390
Article Google Scholar
Yashunin D, Baydasov T, Vlasov R (2020) Maskface: multi-task face and landmark detector arXiv:2005.09412
Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM international conference on multimedia, pp 516–520
Zhang S, Chi C, Lei Z, Li SZ (2020) Refineface: refinement neural network for high performance face detection. IEEE Trans Pattern Anal Mach Intell 43(11):4008–4020
Article Google Scholar
Zhang S, Chi C, Yao Y, Lei Z, Li SZ (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 9759–9768
Zhang F, Fan X, Ai G, Song J, Qin Y, Wu J (2019) Accurate face detection for high performance arXiv:1905.01585
Zhang B, Li J, Wang Y, Tai Y, Wang C, Li J, Huang F, Xia Y, Pei W, Ji R (2020) Asfd: automatic and scalable face detector arXiv:2003.11228
Zhang X, Wan F, Liu C, Ji R, Ye Q (2019) Freeanchor: learning to match anchors for visual object detection. In: Proceedings of the 33rd international conference on neural information processing systems, pp 147–155
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4203–4212
Zhang S, Zhu X, Lei Z, Shi H, Wang X, Li SZ (2017) S3fd: single shot scale-invariant face detector. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 192–201
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-iou loss: faster and better learning for bounding box regression. In: Proceedings of the AAAI conference on artificial intelligence, pp 12993–13000
Zheng Z, Wang P, Ren D, Liu W, Ye R, Hu Q, Zuo W (2020) Enhancing geometric factors in model learning and inference for object detection and instance segmentation arXiv:2005.03572
Zhu Y, Cai H, Zhang S, Wang C, Xiong Y (2020) Tinaface: strong but simple baseline for face detection arXiv:2011.13183
Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 840–849
Zhu X, Hu H, Lin S, Dai J (2019) Deformable convnets v2: more deformable, better results. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 9308–9316
Zhu J, Li D, Han T, Tian L, Shan Y (2020) Progressface: sscale-aware progressive learning for face detection. In: Proceedings of the european conference on computer vision (ECCV), pp 344–360
Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: Proceedings of the European conference on computer vision (ECCV), pp 391–405
Zoph B, Le QV (2016) Neural architecture search with reinforcement learning arXiv:1611.01578

Download references

Acknowledgements

This work was supported by the Major Project of National Social Science Foundation of China (No. 21&ZD166), the National Natural Science Foundation of China (61876072, 61902153) and the Natural Science Foundation of Jiangsu Province (No. BK20221535).

Funding

This work was supported by the Major Project of National Social Science Foundation of China (No. 21&ZD166), the National Natural Science Foundation of China (61876072, 61902153) and the Natural Science Foundation of Jiangsu Province (No. BK20221535).

Author information

Authors and Affiliations

School of Artificial Intelligence and Computer Science, Jiangnan University, Lihu Avenue, Wuxi, 214122, China
Junyuan He, Xiaoning Song, Tianyang Xu & Xiaojun Wu
Department of Computer Science, University of Surrey, Guildford, Surrey, GU2 7XH, UK
Zhenhua Feng
Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, Surrey, GU2 7XH, UK
Zhenhua Feng & Josef Kittler

Authors

Junyuan He
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoning Song
View author publications
You can also search for this author in PubMed Google Scholar
Zhenhua Feng
View author publications
You can also search for this author in PubMed Google Scholar
Tianyang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojun Wu
View author publications
You can also search for this author in PubMed Google Scholar
Josef Kittler
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by [Junyuan He]. The first draft of the manuscript was written by [Junyuan He] and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Xiaoning Song or Zhenhua Feng.

Ethics declarations

Ethics approval

informed consent

Consent to participate

consent

Consent for Publication

consent

Conflict of Interests

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

He, J., Song, X., Feng, Z. et al. ETM-face: effective training sample selection and multi-scale feature learning for face detection. Multimed Tools Appl 82, 26595–26611 (2023). https://doi.org/10.1007/s11042-023-14859-3

Download citation

Received: 15 October 2021
Revised: 02 May 2022
Accepted: 06 February 2023
Published: 09 March 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s11042-023-14859-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ETM-face: effective training sample selection and multi-scale feature learning for face detection

Abstract

Access this article

Similar content being viewed by others

Deep learning based single sample face recognition: a survey

Single sample face recognition using deep learning: a survey

EfficientFace: an efficient deep network with feature enhancement for accurate face detection

Data Availability

Code Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval

Consent to participate

Consent for Publication

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ETM-face: effective training sample selection and multi-scale feature learning for face detection

Abstract

Access this article

Similar content being viewed by others

Deep learning based single sample face recognition: a survey

Single sample face recognition using deep learning: a survey

EfficientFace: an efficient deep network with feature enhancement for accurate face detection

Data Availability

Code Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval

Consent to participate

Consent for Publication

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation