A Novel Face Detector Based on YOLOv3

Tuli, Sabrina Hoque; Mao, Anning; Liu, Wanquan

doi:10.1007/978-3-030-64984-5_5

Sabrina Hoque Tuli¹¹,
Anning Mao¹¹ &
Wanquan Liu¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12576))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

1346 Accesses
1 Citations

Abstract

Face detection has broad applications. Recently, there has been lots of advancement in face detection based on deep learning methods. However, small face detection in a real-world environment is still a challenging task due to its low resolution, variability in size, different poses and occlusions. YOLOv3 is one of the main approaches for object detection, which has achieved comparatively better performance for small target detection in real-time. However, it still struggles to detect a group of small size faces with inaccurate localization as well as an increasing number of false positives. In this paper, we propose an efficient multiscale deep learning network based on YOLOv3 to detect a group of small faces. First, we select the optimum number of anchors, and this will help us understand the small face targets better; secondly, we change the bounding box regression loss in the YOLOv3 to a new CIoU loss to improve the false positives; thirdly, we extend the detection scale from 3 to 4 in YOLOv3 especially for detecting small faces; fourthly, we simplify the four convolutional layers to two residual blocks from six convolutional layers in each detection scale to avoid the derivative vanishing. The proposed model can achieve the state-of-the-art performance on the WIDER FACE face detection benchmark, especially in the hard subset that has a high number of small faces with the variability of scale, poses and occlusions. Our model has achieved 86.5%AP in the WIDER FACE hard validation subset compared to 72.9%AP by the YOLOv3. The run-time is also satisfactory for real application for VGA resolution image with 64.3 FPS using the Nvidia Titan RTX.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016, pp. 779–788 (2016)
Google Scholar
Redmon, J., Farhadi, A.: YOLOv3: An Incremental Improvement. https://arxiv.org/abs/1804.02767. Accessed 8 Aug 2019
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Da, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems 29 (NIPS 2016) (2016)
Google Scholar
Zhang, S., Zhu, X., Lei, Z., Shi, H.: S³FD: single shot scale-invariant face detector. https://doi.org/10.1109/iccv.2017.30
Najibi, M., Samangouei, P., Chellappa, R., Davis, L.: SSH: single-stage headless face detector. In: ICCV, pp. 4885–4894 (2017)
Google Scholar
Wang, H., Li, Z., Ji, X., Wang, Y.: Face R-CNN. arXiv:1706.01061 (2017)
Zhu, C., Zheng, Y., Luu, K., Savvides, M.: CMS-RCNN: contextual multi-scale region-based CNN for unconstrained face detection. In: Bhanu, B., Kumar, A. (eds.) Deep Learning for Biometrics. ACVPR, pp. 57–79. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61657-5_3
Chapter Google Scholar
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Article Google Scholar
Yang, S., Luo, P., Loy, C.-C., Tang, X.: From facial parts responses to face detection: a deep learning approach. In: ICCV (2015)
Google Scholar
Yang, S., Luo, P., Loy, C.C., Tang, X.: Wider face: a face detection benchmark. In: CVPR, pp. 5525– 5533 (2016)
Google Scholar
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-IoU loss: faster and better learning for bounding box regression. In: AAAI Conference on Artificial Intelligence (AAAI) (2020)
Google Scholar
Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: CVPR, pp. 532–539 (2013)
Google Scholar
Zhu, X., Lei, Z., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: a 3D solution. In: CVPR, pp. 146–155 (2016)
Google Scholar
Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: CVPR, pp. 1891–1898 (2014)
Google Scholar
Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: Proceedings of the British Machine Vision Conference, vol. 1, p. 6 (2015)
Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: CVPR, pp. 815–823 (2015)
Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR, vol. 1, p. I–511. IEEE (2001)
Google Scholar
Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004). https://doi.org/10.1023/B:VISI.0000013087.49260.fb
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. TPAMI 37(9), 1904–1916 (2015)
Article Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 91–99 (2015)
Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR, pp. 7263–7271 (2017)
Google Scholar
Fu, C., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector. https://arxiv.org/abs/1701.06659. Accessed 8 Aug 2019
Navneet, D., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition (CVPR) (2005)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: CVPR (2015)
Google Scholar
Jiang, H., Learned-Miller, E.: Face detection with the faster R-CNN. arXiv:1606.03473 (2016)
Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_22
Chapter Google Scholar
Tang, X., Du, D.K., He, Z., Liu, J.: PyramidBox: a context-assisted single shot face detector. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 812–828. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_49
Chapter Google Scholar
Yang, S., Luo, P., Loy, C.C., Tang, X.: Faceness-Net: face detection through deep facial part responses. IEEE Trans. Pattern Anal. Mach. Intell. 40(8), 1845–1859 (2018)
Article Google Scholar
Ju, M., Luo, H., Wang, Z., Hui, B., Chang, Z.: The application of improved YOLO V3 in multiscale target detection. Appl. Sci. 9, 3775 (2019). https://doi.org/10.3390/app9183775
Article Google Scholar
Jocher, G.: Ultralytics LLC YOLOv3. https://github.com/ultralytics/yolov3

Download references

Author information

Authors and Affiliations

Department of Computing, Curtin University, Perth, WA, Australia
Sabrina Hoque Tuli, Anning Mao & Wanquan Liu

Authors

Sabrina Hoque Tuli
View author publications
You can also search for this author in PubMed Google Scholar
Anning Mao
View author publications
You can also search for this author in PubMed Google Scholar
Wanquan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sabrina Hoque Tuli .

Editor information

Editors and Affiliations

School of Information Technology and Electrical Engineering, University of Queensland, Brisbane, QLD, Australia
Marcus Gallagher
School of Engineering and Information Technology, University of New South Wales, Canberra, ACT, Australia
Nour Moustafa
School of Engineering and Information Technology, University of New South Wales, Canberra, ACT, Australia
Erandi Lakshika

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tuli, S.H., Mao, A., Liu, W. (2020). A Novel Face Detector Based on YOLOv3. In: Gallagher, M., Moustafa, N., Lakshika, E. (eds) AI 2020: Advances in Artificial Intelligence. AI 2020. Lecture Notes in Computer Science(), vol 12576. Springer, Cham. https://doi.org/10.1007/978-3-030-64984-5_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-64984-5_5
Published: 27 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64983-8
Online ISBN: 978-3-030-64984-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics