Hybrid Cascade Model for Face Detection in the Wild Based on Normalized Pixel Difference and a Deep Convolutional Neural Network

  • Darijan MarčetićEmail author
  • Martin Soldić
  • Slobodan Ribarić
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10425)


The main precondition for applications such as face recognition and face de-identification for privacy protection is efficient face detection in real scenes. In this paper, we propose a hybrid cascade model for face detection in the wild. The cascaded two-stage model is based on the fast normalized pixel difference (NPD) detector at the first stage, and a deep convolutional neural network (CNN) at the second stage. The outputs of the NPD detector are characterized by a very small number of false negative (FN) and a much higher number of false positive face (FP) detections. The FP detections are typically an order of magnitude higher than the FN ones. This very high number of FPs has a negative impact on recognition and/or de-identification processing time and on the naturalness of the de-identified images. To reduce the large number of FP face detections, a CNN is used at the second stage. The CNN is applied only on vague face region candidates obtained by the NPD detector that have an NPD score in the interval between two experimentally determined thresholds. The experimental results on the Annotated Faces in the Wild (AFW) test set and the Face Detection Dataset and Benchmark (FDDB) show that the hybrid cascade model significantly reduces the number of FP detections while the number of FN detections are only slightly increased.


Face detection in the wild Normalized pixel difference model Deep convolutional neural networks 



This work has been supported by the Croatian Science Foundation under project 6733 De-identification for Privacy Protection in Surveillance Systems (DePPSS).


  1. 1.
    Ribarić, S., Ariyaeeinia, A., Pavešić, N.: De-identification for privacy protection in multimedia content: A survey. Sig. Process. Image Commun. 47, 131–151 (2016)CrossRefGoogle Scholar
  2. 2.
    Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2879–2886 (2012)Google Scholar
  3. 3.
    Dollár, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features. In: Proceedings of British Machine Vision Conference, pp. 1–11 (2009)Google Scholar
  4. 4.
    Liao, S., Jain, A.K., Li, S.Z.: A fast and accurate unconstrained face detector. IEEE TPAMI 38(2), 211–223 (2016)CrossRefGoogle Scholar
  5. 5.
    Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1701–1708 (2014)Google Scholar
  6. 6.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations (2014).
  7. 7.
    Romdhani, S., Torr, P., Schölkopf, B., Blake, A.: Efficient face detection by a cascaded support–vector machine expansion. Proc. Roy. Soc. London A Math. Phys. Eng. Sci. 460(2051), 3283–3297 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Ranjan, R., Patel, V.M., Chellappa, R.: A deep pyramid deformable part model for face detection. In: IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS), pp. 1–8 (2015)Google Scholar
  9. 9.
    Chen, D., Ren, S., Wei, Y., Cao, X., Sun, J.: Joint cascade face detection and alignment. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 109–122. Springer, Cham (2014). doi: 10.1007/978-3-319-10599-4_8 Google Scholar
  10. 10.
    Dollár, P., Welinder P., Perona, P.: Cascaded pose regression. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1078–1085 (2010)Google Scholar
  11. 11.
    Ronghang, H., Ruiping, W., Shiguang, S., Xilin, C.: Robust head-shoulder detection using a two-stage cascade framework. In: 22nd International Conference on Pattern Recognition (ICPR), pp. 2796–2801 (2014)Google Scholar
  12. 12.
    Li, H., Lin, Z., Shen, X., Brandt J., Hua, G.: A convolutional neural network cascade for face detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015)Google Scholar
  13. 13.
    Marčetić, D., Hrkać, T., Ribarić, S.: Two-stage cascade model for unconstrained face detection. In: IEEE International Workshop on Sensing, Processing and Learning for Intelligent Machines (SPLINE), pp. 1–4 (2016)Google Scholar
  14. 14.
    The Annotated Faces in the Wild (AFW) testset. Accessed 21 Mar 2017
  15. 15.
    Jain, V., Learned-Miller, E.: FDDB: a benchmark for face detection in unconstrained settings. In: Technical report UM-CS-2010-009, Dept. of Computer Science, University of Massachusetts, Amherst (2010)Google Scholar
  16. 16.
    King, D.E.: Max-margin object detection. In: arXiv preprint arXiv:1502.00046 (2015)
  17. 17.
    Weber, E.H.: Tastsinn und Gemeingefühl. In: Wagner, R. (ed.) Hand-wörterbuch der Physiologie, vol. III, pp. 481–588. Vieweg, Braunschweig (1846)Google Scholar
  18. 18.
    Wu, Z., Huang, Y., Wang, L., Wang, X., Tan, T.: A comprehensive study on cross-view gait based human identification with deep cnns. IEEE TPAMI 39(2), 209–226 (2017)CrossRefGoogle Scholar
  19. 19.
    King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)Google Scholar
  20. 20.
  21. 21.
    Mathias, M., Benenson, R., Pedersoli, M., Gool, L.: Face detection without bells and whistles. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 720–735. Springer, Cham (2014). doi: 10.1007/978-3-319-10593-2_47 Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Faculty of Electrical Engineering and ComputingUniversity of ZagrebZagrebCroatia

Personalised recommendations