Skip to main content
Log in

Semantic convolutional features for face detection

  • Special Issue Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Convolutional neural networks have been extensively used as the key role to address many computer vision applications. Traditionally, learning convolutional features is performed in a hierarchical manner along the dimension of network depth to create multi-scale feature maps. As a result, strong semantic features are derived at the top-level layers only. This paper proposes a novel feature pyramid fashion to produce semantic features at all levels of the network for specially addressing the problem of face detection. Particularly, a Semantic Convolutional Box (SCBox) is presented by merging the features from different layers in a bottom-up fashion. The proposed lightweight detector is stacked of alternating SCBox and Inception residual modules to learn the visual features in both the dimensions of network depth and width. In addition, the newly introduced objective functions (e.g., focal and CIoU losses) are incorporated to effectively address the problem of unbalanced data, resulting in stable training. The proposed model has been validated on the standard benchmarks FDDB and WIDER FACES, in comparison with the state-of-the-art methods. Experiments showed promising results in terms of both processing time and detection accuracy. For instance, the proposed network achieves an average precision of \(96.8\%\) on FDDB, \(82.4\%\) on WIDER FACES, and gains an inference speed of 106 FPS on a moderate GPU configuration or 20 FPS on a CPU machine.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. http://vis-www.cs.umass.edu/fddb/results.html.

  2. http://shuoyang1213.me/WIDERFACE/.

  3. https://github.com/TropComplique/FaceBoxes-tensorflow.

References

  1. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

  2. Girshick, R.: Fast r-cnn. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)

  3. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(06), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  4. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91

  5. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd, : Single shot multibox detector. In: European Conference on Computer Vision (ECCV), pp. 21–37 (2016)

  6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  7. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv:1804.02767 [cs.CV] (2018)

  8. Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection. In: European Conference on Computer Vision (ECCV), pp. 404–419 (2018)

  9. Szegedy, C., Liu, Wei, Jia, Yangqing, Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015)

  10. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI’17, pp. 4278–4284. AAAI Press (2017)

  11. Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017). https://doi.org/10.1109/CVPR.2017.106

  12. Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018). https://doi.org/10.1109/CVPR.2018.00913

  13. Tan, M., Pang, R., Le, Q.V.: Efficientdet, : Scalable and efficient object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10, 778–10, 787 (2020). https://doi.org/10.1109/CVPR42600.2020.01079

  14. Lin, T., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007 (2017). https://doi.org/10.1109/ICCV.2017.324

  15. Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM International Conference on Multimedia, MM ’16, pp. 516–520. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2964284.2967274

  16. Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 658–666 (2019)

  17. Zheng, Z., Wang, P., Ren, D., Liu, W., Ye, R., Hu, Q., Zuo, W.: Enhancing geometric factors in model learning and inference for object detection and instance segmentation. arXiv:2005.03572 [cs.CV] (2020)

  18. Tang, X., Du, D.K., He, Z., Liu, J.: Pyramidbox: a context-assisted single shot face detector. In: European Conference on Computer Vision (ECCV), pp. 812–828 (2018)

  19. Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., Li, S.Z.: S\(^3\)fd: single shot scale-invariant face detector. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 192–201 (2017). https://doi.org/10.1109/ICCV.2017.30

  20. Zhang, J., Wu, X., Zhu, J., Hoi, S.C.H.: Feature agglomeration networks for single stage face detection. arXiv:1712.00721 [cs.CV] (2018)

  21. Li, J., Wang, Y., Wang, C., Tai, Y., Qian, J., Yang, J., Wang, C., Li, J., Huang, F.: Dsfd: Dual shot face detector. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5055–5064 (2019). https://doi.org/10.1109/CVPR.2019.00520

  22. Zhang, S., Chi, C., Lei, Z., Li, S.Z.: Refineface: refinement neural network for high performance face detection. IEEE Trans. Pattern Anal. Mach. Intell. (2020). https://doi.org/10.1109/TPAMI.2020.2997456

    Article  Google Scholar 

  23. Jain, V., Learned-Miller, E.: Fddb: A benchmark for face detection in unconstrained settings. Technical Report. UM-CS-2010-009, University of Massachusetts, Amherst (2010)

  24. Yang, S., Luo, P., Loy, C.C., Tang, : X.: Wider face: a face detection benchmark. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5525–5533 (2016)

  25. Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., Li, S.Z.: Faceboxes: a cpu real-time face detector with high accuracy. In: 2017 IEEE International Joint Conference on Biometrics (IJCB), pp. 1–9 (2017). https://doi.org/10.1109/BTAS.2017.8272675

  26. Zhang, S., Wang, X., Lei, Z., Li, S.Z.: Faceboxes: a cpu real-time and accurate unconstrained face detector. Neurocomputing 364, 297–309 (2019)

    Article  Google Scholar 

  27. Chen, W., Huang, H., Peng, S., Zhou, C., Zhang, C.: Yolo-face: a real-time face detector. Vis. Comput. (2020). https://doi.org/10.1007/s00371-020-01831-7

    Article  Google Scholar 

  28. Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition pp. 5325–5334 (2015). https://doi.org/10.1109/CVPR.2015.7299170

  29. Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016). https://doi.org/10.1109/LSP.2016.2603342

    Article  Google Scholar 

  30. Farfade, S.S., Saberian, M.J., Li, L.J.: Multi-view face detection using deep convolutional neural networks. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, pp. 643–650 (2015)

  31. Yang, S., Luo, P., Loy, C.C., Tang, X.: From facial parts responses to face detection: a deep learning approach. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), pp. 3676–3684 (2015)

  32. Tareeq, S., Parveen, R., Rozario, L., Bhuiyan, M.: Robust face detection using genetic algorithm. Inf. Technol. J. (2007). https://doi.org/10.3923/itj.2007.142.147

    Article  Google Scholar 

  33. Wiegand, S., Igel, C., Handmann, U.: Evolutionary optimization of neural networks for face detection. In: 12th European Symposium on Artificial Neural Networks (ESANN 2004), pp. 139–144 (2004)

  34. Besnassi, M., Neggaz, N., Benyettou, A.: Face detection based on evolutionary haar filter. Pattern Anal. Appl. 23(1), 309–330 (2020)

    Article  Google Scholar 

  35. Jammoussi, A.Y., Ghribi, S.F., Masmoudi, D.S.: Adaboost face detector based on joint integral histogram and genetic algorithms for feature extraction process. Springerplus 3, 1–9 (2014)

    Article  Google Scholar 

  36. Correia, J.A., Martins, T., Machado, P.: Evolutionary data augmentation in deep face detection. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO’19, pp. 163–164 (2019). https://doi.org/10.1145/3319619.3322053

  37. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 [cs.CV] (2014)

  38. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. arXiv:1703.06870 [cs.CV] (2018)

  39. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25. Curran Associates Inc, New York (2012)

    Google Scholar 

  40. Shang, W., Sohn, K., Almeida, D., Lee, H.: Understanding and improving convolutional neural networks via concatenated rectified linear units. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning-volume 48, ICML’16, pp. 2217–2225. JMLR.org (2016)

  41. Viola, P., Jones, M.: Robust real-time face detection. In: Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, vol. 2, pp. 747–747 (2001). https://doi.org/10.1109/ICCV.2001.937709

  42. Coello Coello, C.A., Christiansen, A.D.: An empirical study of evolutionary techniques for multiobjective optimization in engineering design. Ph.D. thesis, USA (1996)

  43. Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. arXiv:1612.08242 [cs.CV] (2016)

  44. Zhang, X., Chen, F., Yu, T., An, J., Huang, Z., Liu, J., Hu, W., Wang, L., Duan, H., Si, J.: Real-time gastric polyp detection using convolutional neural networks. PLoS ONE 14(3), 1–16 (2019). https://doi.org/10.1371/journal.pone.0214133

    Article  Google Scholar 

  45. Yoo, Y., Han, D., Yun, S.: Extd: extremely tiny face detector via iterative filter reuse. arXiv:1906.06579 [cs.CV] (2019)

  46. Zhang, B., Li, J., Wang, Y., Tai, Y., Wang, C., Li, J., Huang, F., Xia, Y., Pei, W., Ji, R.: Asfd: Automatic and scalable face detector. arXiv:2003.11228 [cs.CV] (2020)

  47. Li, Y., Sun, B., Wu, T., Wang, Y.: Face detection with end-to-end integration of a convnet and a 3d model. In: European Conference on Computer Vision (ECCV), pp. 420–436 (2016)

  48. Li, H., Lin, Z., Brandt, J., Shen, X., Hua, G.: Efficient boosted exemplar-based face detection. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1843–1850 (2014). https://doi.org/10.1109/CVPR.2014.238

  49. Liao, S., Jain, A.K., Li, S.Z.: A fast and accurate unconstrained face detector. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 211–223 (2016). https://doi.org/10.1109/TPAMI.2015.2448075

    Article  Google Scholar 

  50. Li, J., Zhang, Y.: Learning surf cascade for fast and accurate object detection. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3468–3475 (2013). https://doi.org/10.1109/CVPR.2013.445

  51. Ghiasi, G., Fowlkes, C.C.: Occlusion coherence: detecting and localizing occluded faces. arXiv:1506.08347 [cs.CV] (2016)

  52. Yang, B., Yan, J., Lei, Z., Li, S.Z.: Aggregate channel features for multi-view face detection. In: IEEE International Joint Conference on Biometrics, pp. 1–8 (2014). https://doi.org/10.1109/BTAS.2014.6996284

  53. Yang, B., Yan, J., Lei, Z., Li, S.Z.: Convolutional channel features. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 82–90 (2015). https://doi.org/10.1109/ICCV.2015.18

  54. Chen, D., Ren, S., Wei, Y., Cao, X., Sun, J.: Joint cascade face detection and alignment. In: European Conference on Computer Vision (ECCV), pp. 109–122 (2014)

  55. Mathias, M., Benenson, R., Pedersoli, M., Van Gool, L.: Face detection without bells and whistles. In: Computer Vision—ECCV 2014, pp. 720–735. Springer (2014)

  56. Triantafyllidou, D., Tefas, A.: A fast deep convolutional neural network for face detection in big visual data. In: Advances in Big Data, pp. 61–70 (2016)

  57. Ranjan, R., Patel, V.M., Chellappa, R.: Hyperface, : A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. arXiv:1603.01249 [cs.CV] (2017)

  58. Ranjan, R., Patel, V.M., Chellappa, R.: A deep pyramid deformable part model for face detection. arXiv:1508.04389 [cs.CV] (2015)

  59. Ohn-Bar, E., Trivedi, M.M.: To boost or not to boost? on the limits of boosted trees for object detection. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 3350–3355 (2016). https://doi.org/10.1109/ICPR.2016.7900151

  60. Yang, S., Xiong, Y., Loy, C.C., Tang, : X.: Face detection through scale-friendly deep convolutional networks. arXiv:1706.02863 (2017)

  61. Zhang, K., Zhang, Z., Wang, H., Li, Z., Qiao, Y., Liu, W.: Detecting faces using inside cascaded contextual cnn. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3190–3198 (2017). https://doi.org/10.1109/ICCV.2017.344

  62. Wang, Y., Ji, X., Zhou, Z., Wang, H., Li, Z.: Detecting faces using region-based fully convolutional networks. arXiv:1709.05256 [cs.CV] (2017)

  63. Hu, P., Ramanan, D.: Finding tiny faces. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1522–1530 (2017). https://doi.org/10.1109/CVPR.2017.166

  64. Zitnick, C.L., Dollár, P.: Edge boxes: Locating object proposals from edges. In: Computer Vision–ECCV 2014, pp. 391–405. Springer (2014)

  65. Zhu, C., Zheng, Y., Luu, K., Savvides, M.: CMS-RCNN: Contextual Multi-Scale Region-Based CNN for Unconstrained Face Detection, pp. 57–79 (2017)

  66. Wang, H., Li, Z., Ji, X., Wang, Y.: Face r-cnn. arXiv:1706.01061 [cs.CV] (2017)

  67. Yang, B., Yan, J., Lei, Z., Li, S.Z.: Aggregate channel features for multi-view face detection. arXiv:1407.4023 [cs.CV] (2014)

  68. Najibi, M., Samangouei, P., Chellappa, R., Davis, L.: SSH: Single stage headless face detector. In: The IEEE International Conference on Computer Vision (ICCV) (2017)

  69. Zhang, S., Wen, L., Shi, H., Lei, Z., Lyu, S., Li, S.Z.: Single-shot scale-aware network for real-time face detection. Int. J. Comput. Vis. (IJCV) 127(6–7), 537–559 (2019)

    Article  Google Scholar 

  70. Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Computer Vision-ECCV 2016, pp. 354–370 (2016)

  71. Zhu, C., Tao, R., Luu, K., Savvides, M.: Seeing small faces from robust anchor’s perspective. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5127–5136 (2018). https://doi.org/10.1109/CVPR.2018.00538

  72. DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552 [cs.CV] (2017)

  73. Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: Mixup: Beyond empirical risk minimization. arXiv:1710.09412 [cs.LG] (2018)

  74. Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: Cutmix: regularization strategy to train strong classifiers with localizable features. arXiv:1905.04899 [cs.CV] (2019)

  75. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934v1 [cs.CV] (2020)

Download references

Acknowledgements

This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant No. 102.05-2020.02.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to The-Anh Pham.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pham, TA. Semantic convolutional features for face detection. Machine Vision and Applications 33, 3 (2022). https://doi.org/10.1007/s00138-021-01245-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-021-01245-y

Keywords

Navigation