ProgressFace: Scale-Aware Progressive Learning for Face Detection

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12351)


Scale variation stands out as one of key challenges in face detection. Recent attempts have been made to cope with this issue by incorporating image/feature pyramids or adjusting anchor sampling/matching strategies. In this work, we propose a novel scale-aware progressive training mechanism to address large scale variations across faces. Inspired by curriculum learning, our method gradually learns large-to-small face instances. The preceding models learned with easier samples (i.e., large faces) can provide good initialization for succeeding learning with harder samples (i.e., small faces), ultimately deriving a better optimum of face detectors. Moreover, we propose an auxiliary anchor-free enhancement module to facilitate the learning of small faces by supplying positive anchors that may be not covered according to the criterion of IoU overlap. Such anchor-free module will be removed during inference and hence no extra computation cost is introduced. Extensive experimental results demonstrate the superiority of our method compared to the state-of-the-arts on the standard FDDB and WIDER FACE benchmarks. Especially, our ProgressFace-Light with MobileNet-0.25 backbone achieves 87.9% AP on the hard set of WIDER FACE, surpassing largely RetinaFace with the same backbone by 9.7%. Code and our trained face detection models are available at


Face detection Progressive learning Anchor-free methods 

Supplementary material

504443_1_En_21_MOESM1_ESM.pdf (7.7 mb)
Supplementary material 1 (pdf 7922 KB)


  1. 1.
    Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: ICML (2009)Google Scholar
  2. 2.
    Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: CVPR (2018)Google Scholar
  3. 3.
    Chi, C., Zhang, S., Xing, J., Lei, Z., Li, S.Z., Zou, X.: Selective refinement network for high performance face detection. In: AAAI (2019)Google Scholar
  4. 4.
    Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NeurIPS (2016)Google Scholar
  5. 5.
    Deng, J., Guo, J., Zhou, Y., Yu, J., Kotsia, I., Zafeiriou, S.: Retinaface: single-stage dense face localisation in the wild. arXiv preprint arXiv:1905.00641 (2019)
  6. 6.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. TPAMI 32(9), 1627–1645 (2009)CrossRefGoogle Scholar
  7. 7.
    Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
  8. 8.
    Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentation-aware CNN model. In: ICCV (2015)Google Scholar
  9. 9.
    Girshick, R.: Fast R-CNN. In: ICCV (2015)Google Scholar
  10. 10.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)Google Scholar
  11. 11.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)Google Scholar
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. TPAMI 37(9), 1904–1916 (2015)CrossRefGoogle Scholar
  13. 13.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  14. 14.
    He, Y., Zhu, C., Wang, J., Savvides, M., Zhang, X.: Bounding box regression with uncertainty for accurate object detection. In: CVPR (2019)Google Scholar
  15. 15.
    He, Y., Xu, D., Wu, L., Jian, M., Xiang, S., Pan, C.: LFFD: a light and fast face detector for edge devices. arXiv preprint arXiv:1904.10633 (2019)
  16. 16.
    Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
  17. 17.
    Hu, P., Ramanan, D.: Finding tiny faces. In: CVPR (2017)Google Scholar
  18. 18.
    Huang, L., Yang, Y., Deng, Y., Yu, Y.: Densebox: unifying landmark localization with end to end object detection. arXiv preprint arXiv:1509.04874 (2015)
  19. 19.
    Jain, V., Learned-Miller, E.: FDDB: a benchmark for face detection in unconstrained settings. Technical report, UMass Amherst technical report (2010)Google Scholar
  20. 20.
    Jiang, L., Meng, D., Mitamura, T., Hauptmann, A.G.: Easy samples first: self-paced reranking for zero-example multimedia search. In: ACM MM (2014)Google Scholar
  21. 21.
    Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: NeurIPS (2017)Google Scholar
  22. 22.
    Kong, T., Sun, F., Liu, H., Jiang, Y., Shi, J.: Foveabox: beyond anchor-based object detector. arXiv preprint arXiv:1904.03797 (2019)
  23. 23.
    Kumar, M.P., Packer, B., Koller, D.: Self-paced learning for latent variable models. In: NeurIPS (2010)Google Scholar
  24. 24.
    Law, H., Deng, J.: CornerNet: detecting objects as paired keypoints. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 765–781. Springer, Cham (2018). Scholar
  25. 25.
    Lee, Y.J., Grauman, K.: Learning the easy things first: self-paced visual category discovery. In: CVPR (2011)Google Scholar
  26. 26.
    Li, D., Huang, J.B., Li, Y., Wang, S., Yang, M.H.: Weakly supervised object localization with progressive domain adaptation. In: CVPR (2016)Google Scholar
  27. 27.
    Li, J., et al.: DSFD: dual shot face detector. In: CVPR (2019)Google Scholar
  28. 28.
    Li, Z., Tang, X., Han, J., Liu, J., He, R.: Pyramidbox++: high performance detector for finding tiny face. arXiv preprint arXiv:1904.00386 (2019)
  29. 29.
    Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)Google Scholar
  30. 30.
    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017)Google Scholar
  31. 31.
    Liu, C., et al.: Progressive neural architecture search. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 19–35. Springer, Cham (2018). Scholar
  32. 32.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). Scholar
  33. 33.
    Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: NeurIPS (2016)Google Scholar
  34. 34.
    Ming, X., Wei, F., Zhang, T., Chen, D., Wen, F.: Group sampling for scale invariant face detection. In: CVPR (2019)Google Scholar
  35. 35.
    Najibi, M., Samangouei, P., Chellappa, R., Davis, L.S.: SSH: single stage headless face detector. In: ICCV (2017)Google Scholar
  36. 36.
    Najibi, M., Singh, B., Davis, L.S.: FA-RPN: floating region proposals for face detection. In: CVPR (2019)Google Scholar
  37. 37.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. TPAMI 39(6), 1137–1149 (2015)CrossRefGoogle Scholar
  38. 38.
    Shi, Y., Jain, A.K.: Probabilistic face embeddings. In: ICCV (2019)Google Scholar
  39. 39.
    Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: CVPR (2016)Google Scholar
  40. 40.
    Singh, B., Davis, L.S.: An analysis of scale invariance in object detection snip. In: CVPR (2018)Google Scholar
  41. 41.
    Singh, B., Najibi, M., Davis, L.S.: Sniper: efficient multi-scale training. In: NeurIPS (2018)Google Scholar
  42. 42.
    Supancic, J.S., Ramanan, D.: Self-paced learning for long-term tracking. In: CVPR (2013)Google Scholar
  43. 43.
    Tang, X., Du, D.K., He, Z., Liu, J.: PyramidBox: a context-assisted single shot face detector. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 812–828. Springer, Cham (2018). Scholar
  44. 44.
    Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: ICCV (2019)Google Scholar
  45. 45.
    Viola, P., Jones, M.J.: Robust real-time face detection. IJCV 57(2), 137–154 (2004). Scholar
  46. 46.
    Wang, J., Yuan, Y., Li, B., Yu, G., Jian, S.: SFace: an efficient network for face detection in large scale variations. arXiv preprint arXiv:1804.06559 (2018)
  47. 47.
    Yang, S., Luo, P., Loy, C.C., Tang, X.: Wider face: a face detection benchmark. In: CVPR (2016)Google Scholar
  48. 48.
    Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: Unitbox: an advanced object detection network. In: ACMMM (2016)Google Scholar
  49. 49.
    Zhang, F., Fan, X., Ai, G., Song, J., Qin, Y., Wu, J.: Accurate face detection for high performance. arXiv preprint arXiv:1905.01585 (2019)
  50. 50.
    Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)CrossRefGoogle Scholar
  51. 51.
    Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: CVPR (2018)Google Scholar
  52. 52.
    Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., Li, S.Z.: Faceboxes: a cpu real-time face detector with high accuracy. In: IJCB (2017)Google Scholar
  53. 53.
    Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., Li, S.Z.: S3FD: single shot scale-invariant face detector. In: ICCV (2017)Google Scholar
  54. 54.
    Zhang, Y., Xu, X., Liu, X.: Robust and high performance face detector. arXiv preprint arXiv:1901.02350 (2019)
  55. 55.
    Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)
  56. 56.
    Zhou, X., Zhuo, J., Krahenbuhl, P.: Bottom-up object detection by grouping extreme and center points. In: CVPR (2019)Google Scholar
  57. 57.
    Zhu, C., Tao, R., Luu, K., Savvides, M.: Seeing small faces from robust anchor’s perspective. In: CVPR (2018)Google Scholar
  58. 58.
    Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: more deformable, better results. In: CVPR (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Xilinx Inc.BeijingChina

Personalised recommendations