Skip to main content

CMS-RCNN: Contextual Multi-Scale Region-Based CNN for Unconstrained Face Detection

  • Chapter
  • First Online:
Deep Learning for Biometrics

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

Abstract

Robust face detection in the wild is one of the ultimate components to support various facial related problems, i.e., unconstrained face recognition, facial periocular recognition, facial landmarking and pose estimation, facial expression recognition, 3D facial model construction, etc. Although the face detection problem has been intensely studied for decades with various commercial applications, it still meets problems in some real-world scenarios due to numerous challenges, e.g., heavy facial occlusions, extremely low resolutions, strong illumination, exceptional pose variations, image or video compression artifacts, etc. In this paper, we present a face detection approach named Contextual Multi-Scale Region-based Convolution Neural Network (CMS-RCNN) to robustly solve the problems mentioned above. Similar to the region-based CNNs, our proposed network consists of the region proposal component and the region-of-interest (RoI) detection component. However, far apart of that network, there are two main contributions in our proposed network that play a significant role to achieve the state-of-the-art performance in face detection. First, the multi-scale information is grouped both in region proposal and RoI detection to deal with tiny face regions. Second, our proposed network allows explicit body contextual reasoning in the network inspired from the intuition of human vision system. The proposed approach is benchmarked on two recent challenging face detection databases, i.e., the WIDER FACE Dataset which contains high degree of variability, as well as the Face Detection Dataset and Benchmark (FDDB). The experimental results show that our proposed approach trained on WIDER FACE Dataset outperforms strong baselines on WIDER FACE Dataset by a large margin, and consistently achieves competitive results on FDDB against the recent state-of-the-art face detection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. S. Yang, P. Luo, C.C. Loy, X. Tang, Wider face: a face detection benchmark, in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2016), pp. 5525–5533

    Google Scholar 

  2. P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple features, in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001. CVPR 2001, vol. 1 (IEEE, 2001), pp. I–511

    Google Scholar 

  3. C. Zhang, Z. Zhang, A survey of recent advances in face detection. Technical Report MSR-TR-2010-66 (2010), http://research.microsoft.com/apps/pubs/default.aspx?id=132077

  4. X. Zhu, D. Ramanan, Face detection, pose estimation, and landmark localization in the wild, in 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2012), pp. 2879–2886

    Google Scholar 

  5. J. Li, Y. Zhang, Learning surf cascade for fast and accurate object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013), pp. 3468–3475

    Google Scholar 

  6. H. Li, G. Hua, Z. Lin, J. Brandt, J. Yang, Probabilistic elastic part model for unsupervised face detector adaptation, in Proceedings of the IEEE International Conference on Computer Vision (2013), pp. 793–800

    Google Scholar 

  7. N. Markuš, M. Frljak, I.S. Pandžić, J. Ahlberg, R. Forchheimer, A method for object detection based on pixel intensity comparisons organized in decision trees (2013). arXiv:1305.4537

  8. H. Li, Z. Lin, J. Brandt, X. Shen, G. Hua, Efficient boosted exemplar-based face detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014), pp. 1843–1850

    Google Scholar 

  9. M. Mathias, R. Benenson, M. Pedersoli, L. Van Gool, Face detection without bells and whistles, in Computer Vision-ECCV 2014 (Springer, Berlin, 2014), pp. 720–735)

    Google Scholar 

  10. D. Chen, S. Ren, Y. Wei, X. Cao, J. Sun, Joint cascade face detection and alignment, in Computer Vision-ECCV 2014 (Springer, Berlin, 2014), pp. 109–122

    Google Scholar 

  11. B. Yang, J. Yan, Z. Lei, S.Z. Li, Aggregate channel features for multi-view face detection, in 2014 IEEE International Joint Conference on Biometrics (IJCB) (IEEE, 2014), pp. 1–8

    Google Scholar 

  12. G. Ghiasi, C.C. Fowlkes, Occlusion coherence: detecting and localizing occluded faces (2015). arXiv:1506.08347

  13. S. Liao, A. Jain, S. Li, A fast and accurate unconstrained face detector (2014)

    Google Scholar 

  14. H. Li, Z. Lin, X. Shen, J. Brandt, G. Hua, A convolutional neural network cascade for face detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 5325–5334

    Google Scholar 

  15. S.S. Farfade, M.J. Saberian, L.-J. Li, Multi-view face detection using deep convolutional neural networks, in Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ACM, 2015), pp. 643–650

    Google Scholar 

  16. S. Yang, P. Luo, C.-C. Loy, X. Tang, From facial parts responses to face detection: a deep learning approach, in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 3676–3684

    Google Scholar 

  17. R. Ranjan, V.M. Patel, R. Chellappa, A deep pyramid deformable part model for face detection, in 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS) (IEEE, 2015), pp. 1–8

    Google Scholar 

  18. B. Yang, J. Yan, Z. Lei, S.Z. Li, Convolutional channel features, in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 82–90

    Google Scholar 

  19. R. Ranjan, V.M. Patel, R. Chellappa, Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition (2016). arXiv:1603.01249

  20. V. Jain, E. Learned-Miller, FDDB: a benchmark for face detection in unconstrained settings. University of Massachusetts, Amherst, Technical Report UM-CS-2010-009 (2010)

    Google Scholar 

  21. K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556

  22. P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object detection with discriminatively trained part-based models. IEEE Trans. PAMI 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  23. X. Yu, J. Huang, S. Zhang, W. Yan, D. Metaxas, Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model, in Proceedings of the IEEE International Conference on Computer Vision (2013), pp. 1944–1951

    Google Scholar 

  24. S.K. Divvala, D. Hoiem, J.H. Hays, A.A. Efros, M. Hebert, An empirical study of context in object detection, in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009 (IEEE, 2009), pp. 1271–1278

    Google Scholar 

  25. S. Bell, C.L. Zitnick, K. Bala, R. Girshick, Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks (2015). arXiv:1512.04143

  26. S. Zagoruyko, A. Lerer, T.-Y. Lin, P.O. Pinheiro, S. Gross, S. Chintala, P. Dollár, A multipath network for object detection (2016). arXiv:1604.02135

  27. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105

    Google Scholar 

  28. R. Girshick, J. Donahue, T. Darrell, J. Malik, Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2016)

    Article  Google Scholar 

  29. R. Girshick, Fast R-CNN, in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 1440–1448

    Google Scholar 

  30. S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks, in Advances in Neural Information Processing Systems (2015), pp. 91–99

    Google Scholar 

  31. M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in Computer Vision-ECCV 2014 (Springer, Berlin, 2014), pp. 818–833

    Google Scholar 

  32. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C.L. Zitnick, Microsoft coco: Common objects in context, in ECCV (2014), pp. 740–755

    Google Scholar 

  33. B. Hariharan, P. Arbeláez, R. Girshick, J. Malik, Hypercolumns for object segmentation and fine-grained localization, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 447–456

    Google Scholar 

  34. W. Liu, A. Rabinovich, A.C. Berg, Parsenet: looking wider to see better (2015). arXiv:1506.04579

  35. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, T. Darrell, Caffe: Convolutional architecture for fast feature embedding, in Proceedings of the ACM International Conference on Multimedia (ACM, 2014), pp. 675–678

    Google Scholar 

  36. C.L. Zitnick, P. Dollár, Edge boxes: locating object proposals from edges, in ECCV (Springer, Berlin, 2014), pp. 391–405

    Google Scholar 

  37. M. Everingham, L. Van Gool, C.K. Williams, J. Winn, A. Zisserman, The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Chenchen Zhu or Yutong Zheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Zhu, C., Zheng, Y., Luu, K., Savvides, M. (2017). CMS-RCNN: Contextual Multi-Scale Region-Based CNN for Unconstrained Face Detection. In: Bhanu, B., Kumar, A. (eds) Deep Learning for Biometrics. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-61657-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-61657-5_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-61656-8

  • Online ISBN: 978-3-319-61657-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics