Abstract
Convolutional neural networks have been extensively used as the key role to address many computer vision applications. Traditionally, learning convolutional features is performed in a hierarchical manner along the dimension of network depth to create multi-scale feature maps. As a result, strong semantic features are derived at the top-level layers only. This paper proposes a novel feature pyramid fashion to produce semantic features at all levels of the network for specially addressing the problem of face detection. Particularly, a Semantic Convolutional Box (SCBox) is presented by merging the features from different layers in a bottom-up fashion. The proposed lightweight detector is stacked of alternating SCBox and Inception residual modules to learn the visual features in both the dimensions of network depth and width. In addition, the newly introduced objective functions (e.g., focal and CIoU losses) are incorporated to effectively address the problem of unbalanced data, resulting in stable training. The proposed model has been validated on the standard benchmarks FDDB and WIDER FACES, in comparison with the state-of-the-art methods. Experiments showed promising results in terms of both processing time and detection accuracy. For instance, the proposed network achieves an average precision of \(96.8\%\) on FDDB, \(82.4\%\) on WIDER FACES, and gains an inference speed of 106 FPS on a moderate GPU configuration or 20 FPS on a CPU machine.
Similar content being viewed by others
References
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Girshick, R.: Fast r-cnn. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(06), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd, : Single shot multibox detector. In: European Conference on Computer Vision (ECCV), pp. 21–37 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv:1804.02767 [cs.CV] (2018)
Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection. In: European Conference on Computer Vision (ECCV), pp. 404–419 (2018)
Szegedy, C., Liu, Wei, Jia, Yangqing, Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI’17, pp. 4278–4284. AAAI Press (2017)
Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017). https://doi.org/10.1109/CVPR.2017.106
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018). https://doi.org/10.1109/CVPR.2018.00913
Tan, M., Pang, R., Le, Q.V.: Efficientdet, : Scalable and efficient object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10, 778–10, 787 (2020). https://doi.org/10.1109/CVPR42600.2020.01079
Lin, T., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007 (2017). https://doi.org/10.1109/ICCV.2017.324
Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM International Conference on Multimedia, MM ’16, pp. 516–520. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2964284.2967274
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 658–666 (2019)
Zheng, Z., Wang, P., Ren, D., Liu, W., Ye, R., Hu, Q., Zuo, W.: Enhancing geometric factors in model learning and inference for object detection and instance segmentation. arXiv:2005.03572 [cs.CV] (2020)
Tang, X., Du, D.K., He, Z., Liu, J.: Pyramidbox: a context-assisted single shot face detector. In: European Conference on Computer Vision (ECCV), pp. 812–828 (2018)
Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., Li, S.Z.: S\(^3\)fd: single shot scale-invariant face detector. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 192–201 (2017). https://doi.org/10.1109/ICCV.2017.30
Zhang, J., Wu, X., Zhu, J., Hoi, S.C.H.: Feature agglomeration networks for single stage face detection. arXiv:1712.00721 [cs.CV] (2018)
Li, J., Wang, Y., Wang, C., Tai, Y., Qian, J., Yang, J., Wang, C., Li, J., Huang, F.: Dsfd: Dual shot face detector. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5055–5064 (2019). https://doi.org/10.1109/CVPR.2019.00520
Zhang, S., Chi, C., Lei, Z., Li, S.Z.: Refineface: refinement neural network for high performance face detection. IEEE Trans. Pattern Anal. Mach. Intell. (2020). https://doi.org/10.1109/TPAMI.2020.2997456
Jain, V., Learned-Miller, E.: Fddb: A benchmark for face detection in unconstrained settings. Technical Report. UM-CS-2010-009, University of Massachusetts, Amherst (2010)
Yang, S., Luo, P., Loy, C.C., Tang, : X.: Wider face: a face detection benchmark. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5525–5533 (2016)
Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., Li, S.Z.: Faceboxes: a cpu real-time face detector with high accuracy. In: 2017 IEEE International Joint Conference on Biometrics (IJCB), pp. 1–9 (2017). https://doi.org/10.1109/BTAS.2017.8272675
Zhang, S., Wang, X., Lei, Z., Li, S.Z.: Faceboxes: a cpu real-time and accurate unconstrained face detector. Neurocomputing 364, 297–309 (2019)
Chen, W., Huang, H., Peng, S., Zhou, C., Zhang, C.: Yolo-face: a real-time face detector. Vis. Comput. (2020). https://doi.org/10.1007/s00371-020-01831-7
Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition pp. 5325–5334 (2015). https://doi.org/10.1109/CVPR.2015.7299170
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016). https://doi.org/10.1109/LSP.2016.2603342
Farfade, S.S., Saberian, M.J., Li, L.J.: Multi-view face detection using deep convolutional neural networks. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, pp. 643–650 (2015)
Yang, S., Luo, P., Loy, C.C., Tang, X.: From facial parts responses to face detection: a deep learning approach. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), pp. 3676–3684 (2015)
Tareeq, S., Parveen, R., Rozario, L., Bhuiyan, M.: Robust face detection using genetic algorithm. Inf. Technol. J. (2007). https://doi.org/10.3923/itj.2007.142.147
Wiegand, S., Igel, C., Handmann, U.: Evolutionary optimization of neural networks for face detection. In: 12th European Symposium on Artificial Neural Networks (ESANN 2004), pp. 139–144 (2004)
Besnassi, M., Neggaz, N., Benyettou, A.: Face detection based on evolutionary haar filter. Pattern Anal. Appl. 23(1), 309–330 (2020)
Jammoussi, A.Y., Ghribi, S.F., Masmoudi, D.S.: Adaboost face detector based on joint integral histogram and genetic algorithms for feature extraction process. Springerplus 3, 1–9 (2014)
Correia, J.A., Martins, T., Machado, P.: Evolutionary data augmentation in deep face detection. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO’19, pp. 163–164 (2019). https://doi.org/10.1145/3319619.3322053
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 [cs.CV] (2014)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. arXiv:1703.06870 [cs.CV] (2018)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25. Curran Associates Inc, New York (2012)
Shang, W., Sohn, K., Almeida, D., Lee, H.: Understanding and improving convolutional neural networks via concatenated rectified linear units. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning-volume 48, ICML’16, pp. 2217–2225. JMLR.org (2016)
Viola, P., Jones, M.: Robust real-time face detection. In: Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, vol. 2, pp. 747–747 (2001). https://doi.org/10.1109/ICCV.2001.937709
Coello Coello, C.A., Christiansen, A.D.: An empirical study of evolutionary techniques for multiobjective optimization in engineering design. Ph.D. thesis, USA (1996)
Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. arXiv:1612.08242 [cs.CV] (2016)
Zhang, X., Chen, F., Yu, T., An, J., Huang, Z., Liu, J., Hu, W., Wang, L., Duan, H., Si, J.: Real-time gastric polyp detection using convolutional neural networks. PLoS ONE 14(3), 1–16 (2019). https://doi.org/10.1371/journal.pone.0214133
Yoo, Y., Han, D., Yun, S.: Extd: extremely tiny face detector via iterative filter reuse. arXiv:1906.06579 [cs.CV] (2019)
Zhang, B., Li, J., Wang, Y., Tai, Y., Wang, C., Li, J., Huang, F., Xia, Y., Pei, W., Ji, R.: Asfd: Automatic and scalable face detector. arXiv:2003.11228 [cs.CV] (2020)
Li, Y., Sun, B., Wu, T., Wang, Y.: Face detection with end-to-end integration of a convnet and a 3d model. In: European Conference on Computer Vision (ECCV), pp. 420–436 (2016)
Li, H., Lin, Z., Brandt, J., Shen, X., Hua, G.: Efficient boosted exemplar-based face detection. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1843–1850 (2014). https://doi.org/10.1109/CVPR.2014.238
Liao, S., Jain, A.K., Li, S.Z.: A fast and accurate unconstrained face detector. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 211–223 (2016). https://doi.org/10.1109/TPAMI.2015.2448075
Li, J., Zhang, Y.: Learning surf cascade for fast and accurate object detection. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3468–3475 (2013). https://doi.org/10.1109/CVPR.2013.445
Ghiasi, G., Fowlkes, C.C.: Occlusion coherence: detecting and localizing occluded faces. arXiv:1506.08347 [cs.CV] (2016)
Yang, B., Yan, J., Lei, Z., Li, S.Z.: Aggregate channel features for multi-view face detection. In: IEEE International Joint Conference on Biometrics, pp. 1–8 (2014). https://doi.org/10.1109/BTAS.2014.6996284
Yang, B., Yan, J., Lei, Z., Li, S.Z.: Convolutional channel features. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 82–90 (2015). https://doi.org/10.1109/ICCV.2015.18
Chen, D., Ren, S., Wei, Y., Cao, X., Sun, J.: Joint cascade face detection and alignment. In: European Conference on Computer Vision (ECCV), pp. 109–122 (2014)
Mathias, M., Benenson, R., Pedersoli, M., Van Gool, L.: Face detection without bells and whistles. In: Computer Vision—ECCV 2014, pp. 720–735. Springer (2014)
Triantafyllidou, D., Tefas, A.: A fast deep convolutional neural network for face detection in big visual data. In: Advances in Big Data, pp. 61–70 (2016)
Ranjan, R., Patel, V.M., Chellappa, R.: Hyperface, : A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. arXiv:1603.01249 [cs.CV] (2017)
Ranjan, R., Patel, V.M., Chellappa, R.: A deep pyramid deformable part model for face detection. arXiv:1508.04389 [cs.CV] (2015)
Ohn-Bar, E., Trivedi, M.M.: To boost or not to boost? on the limits of boosted trees for object detection. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 3350–3355 (2016). https://doi.org/10.1109/ICPR.2016.7900151
Yang, S., Xiong, Y., Loy, C.C., Tang, : X.: Face detection through scale-friendly deep convolutional networks. arXiv:1706.02863 (2017)
Zhang, K., Zhang, Z., Wang, H., Li, Z., Qiao, Y., Liu, W.: Detecting faces using inside cascaded contextual cnn. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3190–3198 (2017). https://doi.org/10.1109/ICCV.2017.344
Wang, Y., Ji, X., Zhou, Z., Wang, H., Li, Z.: Detecting faces using region-based fully convolutional networks. arXiv:1709.05256 [cs.CV] (2017)
Hu, P., Ramanan, D.: Finding tiny faces. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1522–1530 (2017). https://doi.org/10.1109/CVPR.2017.166
Zitnick, C.L., Dollár, P.: Edge boxes: Locating object proposals from edges. In: Computer Vision–ECCV 2014, pp. 391–405. Springer (2014)
Zhu, C., Zheng, Y., Luu, K., Savvides, M.: CMS-RCNN: Contextual Multi-Scale Region-Based CNN for Unconstrained Face Detection, pp. 57–79 (2017)
Wang, H., Li, Z., Ji, X., Wang, Y.: Face r-cnn. arXiv:1706.01061 [cs.CV] (2017)
Yang, B., Yan, J., Lei, Z., Li, S.Z.: Aggregate channel features for multi-view face detection. arXiv:1407.4023 [cs.CV] (2014)
Najibi, M., Samangouei, P., Chellappa, R., Davis, L.: SSH: Single stage headless face detector. In: The IEEE International Conference on Computer Vision (ICCV) (2017)
Zhang, S., Wen, L., Shi, H., Lei, Z., Lyu, S., Li, S.Z.: Single-shot scale-aware network for real-time face detection. Int. J. Comput. Vis. (IJCV) 127(6–7), 537–559 (2019)
Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Computer Vision-ECCV 2016, pp. 354–370 (2016)
Zhu, C., Tao, R., Luu, K., Savvides, M.: Seeing small faces from robust anchor’s perspective. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5127–5136 (2018). https://doi.org/10.1109/CVPR.2018.00538
DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552 [cs.CV] (2017)
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: Mixup: Beyond empirical risk minimization. arXiv:1710.09412 [cs.LG] (2018)
Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: Cutmix: regularization strategy to train strong classifiers with localizable features. arXiv:1905.04899 [cs.CV] (2019)
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934v1 [cs.CV] (2020)
Acknowledgements
This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant No. 102.05-2020.02.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pham, TA. Semantic convolutional features for face detection. Machine Vision and Applications 33, 3 (2022). https://doi.org/10.1007/s00138-021-01245-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-021-01245-y