Abstract
Facial landmark detection is a crucial pre-processing step for many applications including face tracking, face recognition and facial affect recognition. Hence, we first aim to investigate and experimentally compare the most successful open source facial feature point detection algorithms published in the last decade. We first present an overview of surveys on facial feature detection algorithms to provide insight into the challenges and innovations. We also propose a consensus-based selection and stacked regression based fusion of facial landmark methods to combine their results in order to achieve superior accuracy. Five open-source algorithms in the literature are objectively compared using the same test data and regression based models have been shown to be more successful. According to the extensive experimental results, the proposed consensus and stacking based fusion method gives the lowest facial landmark detection error as compared to the five most successful algorithms in the literature. Consensus and stacking based fusion of an ensemble of methods boosts the performance of facial landmark detection. The proposed fusion method can also be applied future methods as they emerge.
Similar content being viewed by others
Data Availability Statement
The datasets generated during and/or analysed during the current study are available in the IBUG repository, https://ibug.doc.ic.ac.uk/resources/300-W/.
References
Asthana A, Zafeiriou S, Cheng S, Pantic M (2013) Robust discriminative response map fitting with constrained local models. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3444–3451
Asthana A, Zafeiriou S, Cheng S, Pantic M (2014) Incremental face alignment in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1859–1866
Bailer C, Pagani A, Stricker D (2014) A superior tracking approach: building a strong tracker through fusion. In: European conference on computer vision (ECCV). Springer, pp 170–185
Baltrušaitis T, Robinson P, Morency LP (2014) Continuous conditional neural fields for structured regression. In: European conference on computer vision. Springer, pp 593–608
Baltrusaitis T, Zadeh A, Lim YC, Morency LP (2018) Openface 2.0: facial behavior analysis toolkit. In: 2018 13th IEEE international conference on automatic face and gesture recognition (FG 2018). IEEE, pp 59–66
Belhumeur PN, Jacobs DW, Kriegman DJ, Kumar N (2013) Localizing parts of faces using a consensus of exemplars. IEEE Trans Pattern Anal Mach Intell 35(12):2930–2940
Bianco S, Ciocca G, Schettini R (2017) Combination of video change detection algorithms by genetic programming. IEEE Trans Evol Comput 21(6):914–928
Bulat A, Tzimiropoulos G (2017) How far are we from solving the 2D and 3D face alignment problem? (And a dataset of 230,000 3D facial landmarks). In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1021–1030
Burgos-Artizzu XP, Perona P, Dollár P (2013) Robust face landmark estimation under occlusion. In: Proceedings of the IEEE international conference on computer vision, pp 1513–1520
Cao X, Wei Y, Wen F, Sun J (2014) Face alignment by explicit shape regression. Int J Comput Vis 107(2):177–190
Çeliktutan O, Ulukaya S, Sankur B (2013) A comparative study of face landmarking techniques. EURASIP J Image Video Process 1:13
Chen L, Su H, Ji Q (2019) Deep structured prediction for facial landmark detection. In: Advances in neural information processing systems, pp 2447–2457
Chen Y, Liu L, Phonevilay V, Gu K, Xia R, Xie J, Zhang Q, Yang K (2021a) Image super-resolution reconstruction based on feature map attention mechanism. Appl Intell 51:4367–4380. https://doi.org/10.1007/s10489-020-02116-1
Chen Y, Liu L, Tao J, Chen X, Xia R, Zhang Q, Xiong J, Yang K, Xie J (2021b) The image annotation algorithm using convolutional features from intermediate layer of deep learning. Multimed Tools Appl 80(3):4237–4261
Chen Y, Liu L, Tao J, Xia R, Zhang Q, Yang K, Xiong J, Chen X (2021c) The improved image inpainting algorithm via encoder and similarity constraint. Vis Comput 37(7):1691–1705
Chen Y, Zhang H, Liu L, Tao J, Zhang Q, Yang K, Xia R, Xie J (2021d) Research on image inpainting algorithm of improved total variation minimization method. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-020-02778-2
Chrysos GG, Antonakos E, Snape P, Asthana A, Zafeiriou S (2018) A comprehensive performance evaluation of deformable face tracking “in-the-wild’’. Int J Comput Vis 126(2–4):198–232
Everingham M, Sivic J, Zisserman A (2006) Hello! my name is... buffy”–automatic naming of characters in tv video. In: BMVC, 4, p 6
Feng ZH, Kittler J, Awais M, Huber P, Wu XJ (2018) Wing loss for robust facial landmark localisation with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2235–2245
Gogić I, Ahlberg J, Pandžić IS (2021) Regression-based methods for face alignment: a survey. Signal Process 178:107755
Hannane R, Elboushaki A, Afdel K (2020) A divide-and-conquer strategy for facial landmark detection using dual-task CNN architecture. Pattern Recognit 107:107504
Huang GB, Mattar M, Berg T, Learned-Miller E (2008) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. In: Workshop on faces in ‘real-life’ images: detection, alignment, and recognition
Huang Z, Zhou E, Cao Z (2015) Coarse-to-fine face alignment with multi-scale local patch regression. arXiv preprint arXiv:151104901
Jesorsky O, Kirchberg KJ, Frischholz RW (2001) Robust face detection using the hausdorff distance. In: International conference on audio-and video-based biometric person authentication. Springer, pp 90–95
Jin X, Tan X (2017) Face alignment in-the-wild: a survey. Comput Vis Image Underst 162:1–22
Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression trees. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR), pp 1867–1874. https://doi.org/10.1109/CVPR.2014.241
Kim HW, Kim HJ, Rho S, Hwang E (2020) Augmented emtcnn: a fast and accurate facial landmark detection network. Appl Sci 10(7):2253
Kuncheva LI (2002) Switching between selection and fusion in combining classifiers: an experiment. IEEE Trans Syst Man Cybern Part B (Cybern) 32(2):146–156
Lai H, Xiao S, Pan Y, Cui Z, Feng J, Xu C, Yin J, Yan S (2016) Deep recurrent regression for facial landmark detection. IEEE Trans Circuits Syst Video Technol 28(5):1144–1157
Le V, Brandt J, Lin Z, Bourdev L, Huang TS (2012) Interactive facial feature localization. In: European conference on computer vision. Springer, pp 679–692
Leang I, Herbin S, Girard B, Droulez J (2018) On-line fusion of trackers for single-object tracking. Pattern Recognit 74:459–473
Lee D, Park H, Yoo CD (2015) Face alignment using cascade gaussian process regression trees. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4204–4212
Liu Z, Zhu X, Hu G, Guo H, Tang M, Lei Z, Robertson NM, Wang J (2019) Semantic alignment: Finding semantically consistent ground-truth for facial landmark detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3467–3476
Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In: IEEE conference on computer vision and pattern recognition-workshops. IEEE, pp 94–101
Lv J, Shao X, Xing J, Cheng C, Zhou X (2017) A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3317–3326
Mendes-Moreira J, Soares C, Jorge AM, Sousa JFD (2012) Ensemble approaches for regression: a survey. ACM Comput Surv (CSUR) 45(1):1–40
Miao X, Zhen X, Liu X, Deng C, Athitsos V, Huang H (2018) Direct shape regression networks for end-to-end face alignment. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5040–5049
Milborrow S, Nicolls F (2008) Locating facial features with an extended active shape model. In: European conference on computer vision. Springer, pp 504–513
Perrone MP, Cooper LN (1993) When networks disagree: ensemble methods for hybrid neural networks. How We Learn; How We Remember: Toward an Understanding of Brain and Neural Systems. World Scientific Publishing Co Pte Ltd, September 1995, 342–358
Ren S, Cao X, Wei Y, Sun J (2014) Face alignment at 3000 fps via regressing local binary features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1685–1692
Sagonas C, Tzimiropoulos G, Zafeiriou S, Pantic M (2013) 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: Proceedings of the IEEE international conference on computer vision workshops, pp 397–403
Saragih JM, Lucey S, Cohn JF (2011) Deformable model fitting by regularized landmark mean-shift. Int J Comput Vis 91(2):200–215
Shen J, Zafeiriou S, Chrysos GG, Kossaifi J, Tzimiropoulos G, Pantic M (2015) The first facial landmark tracking in-the-wild challenge: benchmark and results. In: Proceedings of the IEEE international conference on computer vision workshops, pp 50–58
Tzimiropoulos G, Pantic M (2014) Gauss–Newton deformable part models for face alignment in-the-wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1851–1858
Uricár M, Franc V, Hlavác V (2012) Detector of facial landmarks learned by the structured output SVM. In: VISAPP, pp 547–556
Valle R, Buenaposada JM, Valdes A, Baumela L (2018) A deeply-initialized coarse-to-fine ensemble of regression trees for face alignment. In: Proceedings of the European conference on computer vision (ECCV), pp 585–601
Valle R, Buenaposada JM, Baumela L (2020) Cascade of encoder-decoder CNNs with learned coordinates regressor for robust facial landmarks detection. Pattern Recognit Lett 136:326–332
Valstar M, Martinez B, Binefa X, Pantic M (2010) Facial point detection using boosted regression and graph models. In: 2010 IEEE conference on computer vision and pattern recognition. IEEE, pp 2729–2736
Vukadinovic D, Pantic M (2005) Fully automatic facial feature point detection using Gabor feature based boosted classifiers. In: 2005 IEEE international conference on systems, man and cybernetics, vol 2. IEEE, pp 1692–1698
Wang N, Gao X, Tao D, Yang H, Li X (2018) Facial feature point detection: a comprehensive survey. Neurocomputing 275:50–65
Wu Y, Ji Q (2019) Facial landmark detection: a literature survey. Int J Comput Vis 127(2):115–142
Wu Y, Hassner T, Kim K, Medioni G, Natarajan P (2018) Facial landmark detection with tweaked convolutional neural networks. IEEE Trans Pattern Anal Mach Intell 40(12):3067–3074
Xiao S, Yan S, Kassim AA (2015) Facial landmark detection via progressive initialization. In: Proceedings of the IEEE international conference on computer vision workshops, pp 33–40
Xiong X, De la Torre F (2013) Supervised descent method and its applications to face alignment. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 532–539
Yan Y, Duffner S, Phutane P, Berthelier A, Naturel X, Blanc C, Garcia C, Chateau T (2020) Fine-grained facial landmark detection exploiting intermediate feature representations. Comput Vis Image Underst 200:103036
Yang H, Jia X, Loy CC, Robinson P (2015a) An empirical study of recent face alignment methods. arXiv preprint. arXiv:151105049
Yang J, Deng J, Zhang K, Liu Q (2015b) Facial shape tracking via spatio-temporal cascade shape regression. In: Proceedings of the IEEE international conference on computer vision workshops, pp 41–49
Yang J, Liu Q, Zhang K (2017) Stacked hourglass network for robust facial landmark localisation. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 79–87
Zadeh A, Chong Lim Y, Baltrusaitis T, Morency LP (2017) Convolutional experts constrained local model for 3D facial landmark detection. In: Proceedings of the IEEE international conference on computer vision workshops, pp 2519–2528
Zhang J, Shan S, Kan M, Chen X (2014a) Coarse-to-fine auto-encoder networks (cfan) for real-time face alignment. In: European conference on computer vision. Springer, pp 1–16
Zhang Z, Luo P, Loy CC, Tang X (2014b) Facial landmark detection by deep multi-task learning. In: European conference on computer vision (ECCV). Springer, pp 94–108
Zhu S, Li C, Loy CC, Tang X (2015) Face alignment by coarse-to-fine shape searching. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 4998–5006. 10.1109/CVPR.2015.7299134
Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 2879–2886
Zou X, Zhong S, Yan L, Zhao X, Zhou J, Wu Y (2019) Learning robust facial landmark detection via hierarchical structured ensemble. In: Proceedings of the IEEE international conference on computer vision, pp 141–150
Acknowledgements
This work has been supported by The Scientific and Technological Research Council of Turkey (TUBITAK) under the project number: 116E088.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations
Deceased: Esra Nur Sandıkçı.
Rights and permissions
About this article
Cite this article
Ulukaya, S., Sandıkçı, E.N. & Eroğlu Erdem, Ç. Consensus and stacking based fusion and survey of facial feature point detectors. J Ambient Intell Human Comput 14, 9947–9957 (2023). https://doi.org/10.1007/s12652-021-03662-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-021-03662-3