Abstract
Scale variation stands out as one of key challenges in face detection. Recent attempts have been made to cope with this issue by incorporating image/feature pyramids or adjusting anchor sampling/matching strategies. In this work, we propose a novel scale-aware progressive training mechanism to address large scale variations across faces. Inspired by curriculum learning, our method gradually learns large-to-small face instances. The preceding models learned with easier samples (i.e., large faces) can provide good initialization for succeeding learning with harder samples (i.e., small faces), ultimately deriving a better optimum of face detectors. Moreover, we propose an auxiliary anchor-free enhancement module to facilitate the learning of small faces by supplying positive anchors that may be not covered according to the criterion of IoU overlap. Such anchor-free module will be removed during inference and hence no extra computation cost is introduced. Extensive experimental results demonstrate the superiority of our method compared to the state-of-the-arts on the standard FDDB and WIDER FACE benchmarks. Especially, our ProgressFace-Light with MobileNet-0.25 backbone achieves 87.9% AP on the hard set of WIDER FACE, surpassing largely RetinaFace with the same backbone by 9.7%. Code and our trained face detection models are available at https://github.com/jiashu-zhu/ProgressFace.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Faces with area <128 accounts for \(\sim \)29% in WIDER FACE.
References
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: ICML (2009)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: CVPR (2018)
Chi, C., Zhang, S., Xing, J., Lei, Z., Li, S.Z., Zou, X.: Selective refinement network for high performance face detection. In: AAAI (2019)
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NeurIPS (2016)
Deng, J., Guo, J., Zhou, Y., Yu, J., Kotsia, I., Zafeiriou, S.: Retinaface: single-stage dense face localisation in the wild. arXiv preprint arXiv:1905.00641 (2019)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. TPAMI 32(9), 1627–1645 (2009)
Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentation-aware CNN model. In: ICCV (2015)
Girshick, R.: Fast R-CNN. In: ICCV (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. TPAMI 37(9), 1904–1916 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
He, Y., Zhu, C., Wang, J., Savvides, M., Zhang, X.: Bounding box regression with uncertainty for accurate object detection. In: CVPR (2019)
He, Y., Xu, D., Wu, L., Jian, M., Xiang, S., Pan, C.: LFFD: a light and fast face detector for edge devices. arXiv preprint arXiv:1904.10633 (2019)
Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Hu, P., Ramanan, D.: Finding tiny faces. In: CVPR (2017)
Huang, L., Yang, Y., Deng, Y., Yu, Y.: Densebox: unifying landmark localization with end to end object detection. arXiv preprint arXiv:1509.04874 (2015)
Jain, V., Learned-Miller, E.: FDDB: a benchmark for face detection in unconstrained settings. Technical report, UMass Amherst technical report (2010)
Jiang, L., Meng, D., Mitamura, T., Hauptmann, A.G.: Easy samples first: self-paced reranking for zero-example multimedia search. In: ACM MM (2014)
Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: NeurIPS (2017)
Kong, T., Sun, F., Liu, H., Jiang, Y., Shi, J.: Foveabox: beyond anchor-based object detector. arXiv preprint arXiv:1904.03797 (2019)
Kumar, M.P., Packer, B., Koller, D.: Self-paced learning for latent variable models. In: NeurIPS (2010)
Law, H., Deng, J.: CornerNet: detecting objects as paired keypoints. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 765–781. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_45
Lee, Y.J., Grauman, K.: Learning the easy things first: self-paced visual category discovery. In: CVPR (2011)
Li, D., Huang, J.B., Li, Y., Wang, S., Yang, M.H.: Weakly supervised object localization with progressive domain adaptation. In: CVPR (2016)
Li, J., et al.: DSFD: dual shot face detector. In: CVPR (2019)
Li, Z., Tang, X., Han, J., Liu, J., He, R.: Pyramidbox++: high performance detector for finding tiny face. arXiv preprint arXiv:1904.00386 (2019)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017)
Liu, C., et al.: Progressive neural architecture search. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 19–35. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_2
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: NeurIPS (2016)
Ming, X., Wei, F., Zhang, T., Chen, D., Wen, F.: Group sampling for scale invariant face detection. In: CVPR (2019)
Najibi, M., Samangouei, P., Chellappa, R., Davis, L.S.: SSH: single stage headless face detector. In: ICCV (2017)
Najibi, M., Singh, B., Davis, L.S.: FA-RPN: floating region proposals for face detection. In: CVPR (2019)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. TPAMI 39(6), 1137–1149 (2015)
Shi, Y., Jain, A.K.: Probabilistic face embeddings. In: ICCV (2019)
Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: CVPR (2016)
Singh, B., Davis, L.S.: An analysis of scale invariance in object detection snip. In: CVPR (2018)
Singh, B., Najibi, M., Davis, L.S.: Sniper: efficient multi-scale training. In: NeurIPS (2018)
Supancic, J.S., Ramanan, D.: Self-paced learning for long-term tracking. In: CVPR (2013)
Tang, X., Du, D.K., He, Z., Liu, J.: PyramidBox: a context-assisted single shot face detector. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 812–828. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_49
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: ICCV (2019)
Viola, P., Jones, M.J.: Robust real-time face detection. IJCV 57(2), 137–154 (2004). https://doi.org/10.1023/B:VISI.0000013087.49260.fb
Wang, J., Yuan, Y., Li, B., Yu, G., Jian, S.: SFace: an efficient network for face detection in large scale variations. arXiv preprint arXiv:1804.06559 (2018)
Yang, S., Luo, P., Loy, C.C., Tang, X.: Wider face: a face detection benchmark. In: CVPR (2016)
Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: Unitbox: an advanced object detection network. In: ACMMM (2016)
Zhang, F., Fan, X., Ai, G., Song, J., Qin, Y., Wu, J.: Accurate face detection for high performance. arXiv preprint arXiv:1905.01585 (2019)
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: CVPR (2018)
Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., Li, S.Z.: Faceboxes: a cpu real-time face detector with high accuracy. In: IJCB (2017)
Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., Li, S.Z.: S3FD: single shot scale-invariant face detector. In: ICCV (2017)
Zhang, Y., Xu, X., Liu, X.: Robust and high performance face detector. arXiv preprint arXiv:1901.02350 (2019)
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)
Zhou, X., Zhuo, J., Krahenbuhl, P.: Bottom-up object detection by grouping extreme and center points. In: CVPR (2019)
Zhu, C., Tao, R., Luu, K., Savvides, M.: Seeing small faces from robust anchor’s perspective. In: CVPR (2018)
Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: more deformable, better results. In: CVPR (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhu, J., Li, D., Han, T., Tian, L., Shan, Y. (2020). ProgressFace: Scale-Aware Progressive Learning for Face Detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12351. Springer, Cham. https://doi.org/10.1007/978-3-030-58539-6_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-58539-6_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58538-9
Online ISBN: 978-3-030-58539-6
eBook Packages: Computer ScienceComputer Science (R0)