CMS-RCNN: Contextual Multi-Scale Region-Based CNN for Unconstrained Face Detection

Zhu, Chenchen; Zheng, Yutong; Luu, Khoa; Savvides, Marios

doi:10.1007/978-3-319-61657-5_3

Chenchen Zhu⁴,
Yutong Zheng⁴,
Khoa Luu⁴ &
…
Marios Savvides⁴

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

6180 Accesses
124 Citations

Abstract

Robust face detection in the wild is one of the ultimate components to support various facial related problems, i.e., unconstrained face recognition, facial periocular recognition, facial landmarking and pose estimation, facial expression recognition, 3D facial model construction, etc. Although the face detection problem has been intensely studied for decades with various commercial applications, it still meets problems in some real-world scenarios due to numerous challenges, e.g., heavy facial occlusions, extremely low resolutions, strong illumination, exceptional pose variations, image or video compression artifacts, etc. In this paper, we present a face detection approach named Contextual Multi-Scale Region-based Convolution Neural Network (CMS-RCNN) to robustly solve the problems mentioned above. Similar to the region-based CNNs, our proposed network consists of the region proposal component and the region-of-interest (RoI) detection component. However, far apart of that network, there are two main contributions in our proposed network that play a significant role to achieve the state-of-the-art performance in face detection. First, the multi-scale information is grouped both in region proposal and RoI detection to deal with tiny face regions. Second, our proposed network allows explicit body contextual reasoning in the network inspired from the intuition of human vision system. The proposed approach is benchmarked on two recent challenging face detection databases, i.e., the WIDER FACE Dataset which contains high degree of variability, as well as the Face Detection Dataset and Benchmark (FDDB). The experimental results show that our proposed approach trained on WIDER FACE Dataset outperforms strong baselines on WIDER FACE Dataset by a large margin, and consistently achieves competitive results on FDDB against the recent state-of-the-art face detection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

S. Yang, P. Luo, C.C. Loy, X. Tang, Wider face: a face detection benchmark, in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2016), pp. 5525–5533
Google Scholar
P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple features, in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001. CVPR 2001, vol. 1 (IEEE, 2001), pp. I–511
Google Scholar
C. Zhang, Z. Zhang, A survey of recent advances in face detection. Technical Report MSR-TR-2010-66 (2010), http://research.microsoft.com/apps/pubs/default.aspx?id=132077
X. Zhu, D. Ramanan, Face detection, pose estimation, and landmark localization in the wild, in 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2012), pp. 2879–2886
Google Scholar
J. Li, Y. Zhang, Learning surf cascade for fast and accurate object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013), pp. 3468–3475
Google Scholar
H. Li, G. Hua, Z. Lin, J. Brandt, J. Yang, Probabilistic elastic part model for unsupervised face detector adaptation, in Proceedings of the IEEE International Conference on Computer Vision (2013), pp. 793–800
Google Scholar
N. Markuš, M. Frljak, I.S. Pandžić, J. Ahlberg, R. Forchheimer, A method for object detection based on pixel intensity comparisons organized in decision trees (2013). arXiv:1305.4537
H. Li, Z. Lin, J. Brandt, X. Shen, G. Hua, Efficient boosted exemplar-based face detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014), pp. 1843–1850
Google Scholar
M. Mathias, R. Benenson, M. Pedersoli, L. Van Gool, Face detection without bells and whistles, in Computer Vision-ECCV 2014 (Springer, Berlin, 2014), pp. 720–735)
Google Scholar
D. Chen, S. Ren, Y. Wei, X. Cao, J. Sun, Joint cascade face detection and alignment, in Computer Vision-ECCV 2014 (Springer, Berlin, 2014), pp. 109–122
Google Scholar
B. Yang, J. Yan, Z. Lei, S.Z. Li, Aggregate channel features for multi-view face detection, in 2014 IEEE International Joint Conference on Biometrics (IJCB) (IEEE, 2014), pp. 1–8
Google Scholar
G. Ghiasi, C.C. Fowlkes, Occlusion coherence: detecting and localizing occluded faces (2015). arXiv:1506.08347
S. Liao, A. Jain, S. Li, A fast and accurate unconstrained face detector (2014)
Google Scholar
H. Li, Z. Lin, X. Shen, J. Brandt, G. Hua, A convolutional neural network cascade for face detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 5325–5334
Google Scholar
S.S. Farfade, M.J. Saberian, L.-J. Li, Multi-view face detection using deep convolutional neural networks, in Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ACM, 2015), pp. 643–650
Google Scholar
S. Yang, P. Luo, C.-C. Loy, X. Tang, From facial parts responses to face detection: a deep learning approach, in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 3676–3684
Google Scholar
R. Ranjan, V.M. Patel, R. Chellappa, A deep pyramid deformable part model for face detection, in 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS) (IEEE, 2015), pp. 1–8
Google Scholar
B. Yang, J. Yan, Z. Lei, S.Z. Li, Convolutional channel features, in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 82–90
Google Scholar
R. Ranjan, V.M. Patel, R. Chellappa, Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition (2016). arXiv:1603.01249
V. Jain, E. Learned-Miller, FDDB: a benchmark for face detection in unconstrained settings. University of Massachusetts, Amherst, Technical Report UM-CS-2010-009 (2010)
Google Scholar
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556
P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object detection with discriminatively trained part-based models. IEEE Trans. PAMI 32(9), 1627–1645 (2010)
Article Google Scholar
X. Yu, J. Huang, S. Zhang, W. Yan, D. Metaxas, Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model, in Proceedings of the IEEE International Conference on Computer Vision (2013), pp. 1944–1951
Google Scholar
S.K. Divvala, D. Hoiem, J.H. Hays, A.A. Efros, M. Hebert, An empirical study of context in object detection, in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009 (IEEE, 2009), pp. 1271–1278
Google Scholar
S. Bell, C.L. Zitnick, K. Bala, R. Girshick, Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks (2015). arXiv:1512.04143
S. Zagoruyko, A. Lerer, T.-Y. Lin, P.O. Pinheiro, S. Gross, S. Chintala, P. Dollár, A multipath network for object detection (2016). arXiv:1604.02135
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105
Google Scholar
R. Girshick, J. Donahue, T. Darrell, J. Malik, Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2016)
Article Google Scholar
R. Girshick, Fast R-CNN, in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 1440–1448
Google Scholar
S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks, in Advances in Neural Information Processing Systems (2015), pp. 91–99
Google Scholar
M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in Computer Vision-ECCV 2014 (Springer, Berlin, 2014), pp. 818–833
Google Scholar
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C.L. Zitnick, Microsoft coco: Common objects in context, in ECCV (2014), pp. 740–755
Google Scholar
B. Hariharan, P. Arbeláez, R. Girshick, J. Malik, Hypercolumns for object segmentation and fine-grained localization, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 447–456
Google Scholar
W. Liu, A. Rabinovich, A.C. Berg, Parsenet: looking wider to see better (2015). arXiv:1506.04579
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, T. Darrell, Caffe: Convolutional architecture for fast feature embedding, in Proceedings of the ACM International Conference on Multimedia (ACM, 2014), pp. 675–678
Google Scholar
C.L. Zitnick, P. Dollár, Edge boxes: locating object proposals from edges, in ECCV (Springer, Berlin, 2014), pp. 391–405
Google Scholar
M. Everingham, L. Van Gool, C.K. Williams, J. Winn, A. Zisserman, The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

CyLab Biometrics Center and the Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
Chenchen Zhu, Yutong Zheng, Khoa Luu & Marios Savvides

Authors

Chenchen Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yutong Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Khoa Luu
View author publications
You can also search for this author in PubMed Google Scholar
Marios Savvides
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Chenchen Zhu or Yutong Zheng .

Editor information

Editors and Affiliations

University of California, Riverside, California, USA
Bir Bhanu
Hong Kong Polytechnic University, Hong Kong, China
Ajay Kumar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhu, C., Zheng, Y., Luu, K., Savvides, M. (2017). CMS-RCNN: Contextual Multi-Scale Region-Based CNN for Unconstrained Face Detection. In: Bhanu, B., Kumar, A. (eds) Deep Learning for Biometrics. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-61657-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-61657-5_3
Published: 02 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61656-8
Online ISBN: 978-3-319-61657-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics