Skip to main content
Log in

Facial point localization via neural networks in a cascade regression framework

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript


Facial point detection gains an increasing importance in computer vision as it plays a vital role in several applications such as facial expression recognition and human behavior analysis. In this work, we propose an approach to locate 49 facial points via neural networks in a cascade regression fashion. The localization process starts by detecting the face, followed by a face cropping refinement task and lastly arriving at the facial point location through five cascades of regressors. In particular, we perform a guided initialization using holistic features extracted from the entire face patch. Then, the points location is refined in the next four cascades using local features extracted from patches enclosing the prior estimates of the points. The generalization capability was improved by performing feature selection at each cascade. By evaluating our approach on samples gathered from four challenging databases, we achieved a location average error for each point ranging between 0.72 % and 1.57 % of the face width. The proposed approach was further evaluated according to the 300-w challenge, where we achieved competitive results to those obtained by state-of-the-art approaches and commercial software packages. Moreover, our approach showed better generalization capability. Finally, we validated the proposed enhancements by studying the impact of several factors on the point localization accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others


  1. Almuallim H, Dietterich TG (1994) Learning boolean concepts in the presence of many irrelevant features. Artif Intell 69(1):279–305. doi:10.1016/0004-3702(94)90084-1.

  2. Baltrusaitis T, McDuff D, Banda N, Mahmoud M, El Kaliouby R, Robinson P, Picard R (2011) Real-time inference of mental states from facial expressions and upper body gestures. In: IEEE International Conference on Automatic Face Gesture Recognition and Workshops (FG 2011), pp 909– 914

  3. Baltrusaitis T, Robinson P, Morency L-P (2013) Constrained local neural fields for robust facial landmark detection in the wild. In: 2013 IEEE International Conference on Computer Vision Workshops (ICCVW), pp 354–361. doi:10.1109/ICCVW.2013.54

  4. Barbu A, She Y, Ding L, Gramajo G (2016) Feature selection with annealing for computer vision and big data learning. IEEE Trans Pattern Anal Mach Intell PP (99):1–1 . doi:10.1109/TPAMI.2016.2544315

  5. Belhumeur P, Jacobs D, Kriegman D, Kumar N (2011) Localizing parts of faces using a consensus of exemplars. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 545 –552. doi:10.1109/CVPR.2011.5995602

  6. Cristinacce D, Cootes TF (2006) Feature detection and tracking with constrained local models. In: Proceedings of the BMVC, pp 95.1–95.10. doi:10.5244/C.20.95

  7. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005., vol 1, San Diego, CA, USA, pp 886–893

  8. Ekman P, Friesen WV (10.1007/BF01115465) Measuring facial movement. J Nonverbal Behav 1(1):56–75

  9. Everingham M, Sivic J, Zisserman A (2009) Taking the bite out of automated naming of characters in tv video. Image Vision Comput 27(5):545–559. doi:10.1016/j.imavis.2008.04.018

  10. Ghimire D, Lee J, Li Z-N, Jeong S (2016) Recognition of facial expressions based on salient geometric features and support vector machines. Multimedia Tools and Applications:1–26

  11. Gourier N, Hall D, Crowley JL (2004) Estimating Face Orientation from Robust Detection of Salient Facial Features. In: Proceedings of pointing 2004, ICPR, International Workshop on Visual Observation of Deictic Gestures

  12. Gross R, Matthews I, Cohn J, Kanade T, Baker S (2010) Multi-pie. Image Vision Comput 28(5):807–813. doi:10.1016/j.imavis.2009.08.002

  13. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182.

  14. Hall MA (1999) Correlation-based feature selection for machine learning, Ph.D. thesis, Department of Computer Science. Waikato University, New Zealand

  15. i ⋅bug - resources. (Accessed: 04- Nov- 2015)

  16. Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression trees. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 1867–1874. doi:10.1109/CVPR.2014.241

  17. King DE (2009) Dlib-ml: a machine learning toolkit. J Mach Learn Res 10:1755–1758

    Google Scholar 

  18. Koller D, Sahami M (1995) Toward optimal feature selection. In: 13th International Conference on Machine Learning, pp 284–292

  19. L. Inc., luxand facesdk ver. 6.1, (Dec. 2015)

  20. Le V, Brandt J, Lin Z, Bourdev L, Huang TS (2012) Interactive facial feature localization. In: Proceedings of the 12th european conference on computer vision - Volume Part III, ECCV’12,. Springer-Verlag, Berlin, Heidelberg, pp 679–692

  21. Lee Y-H, Kim CG, Kim Y, Whangbo TK (2015) Facial landmarks detection using improved active shape model on android platform. Multimedia Tools and Applications 74(20):8821–8830

    Article  Google Scholar 

  22. Li H, Ding H, Huang D, Wang Y, Zhao X, Morvan J-M, Chen L (2015) An efficient multimodal 2d + 3d feature-based approach to automatic facial expression recognition. Comput Vis Image Underst 140(C):83–92. doi:10.1016/j.cviu.2015.07.005

  23. Littlewort G, Whitehill J, Wu T, Fasel I, Frank M, Movellan J, Bartlett M (2011) The computer expression recognition toolbox (cert). In: 2011 IEEE International Conference on Automatic Face Gesture Recognition and Workshops (FG 2011), pp 298–305. doi:10.1109/FG.2011.5771414

  24. Long N, Gianola D, Rosa G, Weigel K (2011) Dimension reduction and variable selection for genomic selection: application to predicting milk yield in holsteins. J Anim Breed Genet 128(4):247–257. doi:10.1111/j.1439-0388.2011.00917.x

    Article  Google Scholar 

  25. M. Inc., Face ++ matlab sdk demo, (Dec 2013)

  26. Martinez B, Valstar M, Binefa X, Pantic M (2013) Local evidence aggregation for regression-based facial point detection. IEEE Trans Pattern Anal Mach Intell 35(5):1149–1163. doi:10.1109/TPAMI.2012.205

    Article  Google Scholar 

  27. Milborrow S, Morkel J, Nicolls F The MUCT Landmarked Face Database, Pattern Recognition Association of South Africa

  28. Milborrow S, Nicolls F (2008) Locating facial features with an extended active shape model. In: Proceedings of the 10th European Conference on Computer Vision: Part IV, ECCV ’08. Springer-Verlag, Berlin, Heidelberg, pp 504–513

  29. Muni DP, Pal NR, Das J (2006) Genetic programming for simultaneous feature selection and classifier design. IEEE Trans Syst Man Cybern Part B Cybern 36 (1):106–117. doi:10.1109/TSMCB.2005.854499

  30. Saeed A, Al-Hamadi A, Ghoneim A (2015) Head pose estimation on top of haar-like face detection: A study using the kinect sensor. Sensors 15(9):20945–20966

    Article  Google Scholar 

  31. Saeed A, Al-Hamadi A, Niese R, Elzobi M (2014) Frame-based facial expression recognition using geometrical features. Advances in Human-Computer Interaction 2014 (1):1–13

    Article  Google Scholar 

  32. Sagonas C, Tzimiropoulos G, Zafeiriou S, Pantic M (2013) A semi-automatic methodology for facial landmark annotation. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 896–903. doi:10.1109/CVPRW.2013.132

  33. Sebe N, Lew MS, Sun Y, Cohen I, Gevers T, Huang TS (2007) Authentic facial expression analysis. Image Vision Comput 25(12):1856–1863. doi:10.1016/j.imavis.2005.12.021

  34. Smith B, Brandt J, Lin Z, Zhang L (2014) Nonparametric context modeling of local appearance for pose- and expression-robust facial landmark localization. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1741–1748. doi:10.1109/CVPR.2014.225

  35. Sun Y, Wang X, Tang X (2013) Deep convolutional network cascade for facial point detection. In: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’13. IEEE Computer Society, Washington, DC, USA, pp 3476–3483. doi:10.1109/CVPR.2013.446.

  36. Taner Eskil M, Benli KS (2014) Facial expression recognition based on anatomy. Comput Vis. Image Underst. 119:1–14

    Article  Google Scholar 

  37. Tzimiropoulos G, Pantic M (2014) Gauss-newton deformable part models for face alignment in-the-wild. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1851–1858. doi:10.1109/CVPR.2014.239

  38. Valstar M, Martinez B, Binefa X, Pantic M (2010) Facial point detection using boosted regression and graph models. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2729–2736. doi:10.1109/CVPR.2010.5539996

  39. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer vision and pattern recognition, 2001. CVPR 2001, vol 1, Kauai, Hawaii, USA, pp 511–518

  40. Werner P, Al-Hamadi A, Niese R, Walter S, Gruss S, Harald C (2013) Towards pain monitoring: Facial expression, head pose, a new database, an automatic system and remaining challenges. In: British Machine Vision Conference (BMVC), Bristol, UK

  41. Xiong X, De la Torre F (2013) Supervised descent method and its applications to face alignment. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 532– 539

  42. Yan J, Lei Z, Yi D, Li S (2013) Learn to combine multiple hypotheses for accurate face alignment. In: 2013 IEEE International Conference on Computer Vision Workshops (ICCVW), pp 392–396. doi:10.1109/ICCVW.2013.126

  43. Yu X, Huang J, Zhang S, Yan W, Metaxas DN (2013) Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model. In: 2013 IEEE International Conference on Computer Vision, pp 1944–1951. doi:10.1109/ICCV.2013.244

  44. Zafeiriou S, Zhang C, Zhang Z (2015) A survey on face detection in the wild: Past, present and future. Comput Vision Image Understanding 138:1–24

    Article  Google Scholar 

  45. Zhang L, Tjondronegoro D, Chandran V (2014) Representation of facial expression categories in continuous arousal-valence space: Feature and correlation. Image Vision Comput 32(12):1067–1079. doi:10.1016/j.imavis.2014.09.005

    Article  Google Scholar 

  46. Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2879– 2886

Download references


This work is part of the project done within the Transregional Collaborative Research Centre SFB/TRR 62 Companion-Technology for Cognitive Technical Systems funded by the German Research Foundation (DFG).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Anwar Saeed.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saeed, A., Al-Hamadi, A. & Neumann, H. Facial point localization via neural networks in a cascade regression framework. Multimed Tools Appl 77, 2261–2283 (2018).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: