Abstract
The rapid increase in the availability of accurate 3D scanning devices has moved facial recognition and analysis into the 3D domain. 3D facial landmarks are often used as a simple measure of anatomy and it is crucial to have accurate algorithms for automatic landmark placement. The current state-of-the-art approaches have yet to gain from the dramatic increase in performance reported in human pose tracking and 2D facial landmark placement due to the use of deep convolutional neural networks (CNN). Development of deep learning approaches for 3D meshes has given rise to the new subfield called geometric deep learning, where one topic is the adaptation of meshes for the use of deep CNNs. In this work, we demonstrate how methods derived from geometric deep learning, namely multi-view CNNs, can be combined with recent advances in human pose tracking. The method finds 2D landmark estimates and propagates this information to 3D space, where a consensus method determines the accurate 3D face landmark position. We utilise the method on a standard 3D face dataset and show that it outperforms current methods by a large margin. Further, we demonstrate how models trained on 3D range scans can be used to accurately place anatomical landmarks in magnetic resonance images.
Keywords
- 3D facial landmarks
- Multi-view CNN
- Geometric deep learning
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Ben-Israel, A., Greville, T.N.: Generalized Inverses: Theory and Applications. Springer, Heidelberg (2003). https://doi.org/10.1007/b97366
Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: Proceedings of Computer Graphics and Interactive Techniques, pp. 187–194 (1999)
Boscaini, D., Masci, J., Rodolà, E., Bronstein, M.: Learning shape correspondence with anisotropic convolutional neural networks. In: Proceedings of NIPS, pp. 3189–3197 (2016)
Bowyer, K.W., Chang, K., Flynn, P.: A survey of approaches and challenges in 3D and multi-modal 3D+ 2D face recognition. Comput. Vis. Image Underst. 101(1), 1–15 (2006)
Bronstein, M.M., Bruna, J., LeCun, Y., Szlam, A., Vandergheynst, P.: Geometric deep learning: going beyond Euclidean data. IEEE Signal Process. Mag. 34(4), 18–42 (2017)
Bulat, A., Tzimiropoulos, G.: Convolutional aggregation of local evidence for large pose face alignment. In: Proceedings of BMVC (2016)
Bulat, A., Tzimiropoulos, G.: Two-Stage convolutional part heatmap regression for the 1st 3D Face Alignment in the Wild (3DFAW) challenge. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 616–624. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_43
Bulat, A., Tzimiropoulos, G.: How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3D facial landmarks). arXiv preprint arXiv:1703.07332 (2017)
Chang, J.B., Small, K.H., Choi, M., Karp, N.S.: Three-dimensional surface imaging in plastic surgery: foundation, practical applications, and beyond. Plast. Reconstr. Surg. 135(5), 1295–1304 (2015)
Creusot, C., Pears, N., Austin, J.: A machine-learning approach to keypoint detection and landmarking on 3D meshes. International journal of computer vision 102(1–3), 146–179 (2013)
Dale, A.M., Fischl, B., Sereno, M.I.: Cortical surface-based analysis: I. segmentation and surface reconstruction. Neuroimage 9(2), 179–194 (1999)
Delingette, H.: Modélisation, Déformation et Reconnaissance d’Objets Tridimensionnels á l’Aide de Maillages Simplexes. Ph.D. thesis, L’École Centrale de Paris (1994)
Fagertun, J., et al.: 3D facial landmarks: Inter-operator variability of manual annotation. BMC Med. Imaging 14(1), 35 (2014)
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. In: Readings in Computer Vision, pp. 726–740. Elsevier (1987)
Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3D hand pose estimation in single depth images: from single-view CNN to multi-view CNNs. In: Proceedings of CVPR, pp. 3593–3601 (2016)
Gilani, S.Z., Shafait, F., Mian, A.: Shape-based automatic detection of a large number of 3D facial landmarks. In: Proceedings of CVPR, pp. 4639–4648. IEEE (2015)
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning. MIT press, Cambridge (2016)
Gordon, G.G.: Face recognition based on depth and curvature features. In: Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 808–810. IEEE (1992)
Grewe, C.M., Zachow, S.: Fully automated and highly accurate dense correspondence for facial surfaces. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 552–568. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_38
Hammond, P., et al.: 3D analysis of facial morphology. Am. J. Med. Genet. Part A 126(4), 339–348 (2004)
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of CVPR, pp. 770–778 (2016)
Deng, J., Zhou, Y., Cheng, S., Zafeiriou, S.: Cascade multi-view hourglass model for robust 3D face alignment. In: FG (2018)
Jourabloo, A., Liu, X.: Pose-invariant 3D face alignment. In: Proceedings of ICCV, pp. 3694–3702 (2015)
Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3D surface construction algorithm. In: ACM Siggraph Computer Graphics, vol. 21, pp. 163–169. ACM (1987)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Odena, A., Dumoulin, V., Olah, C.: Deconvolution and Checkerboard Artifacts. Distill (2016) https://doi.org/10.23915/distill.00003
Paulsen, R.R.: Statistical shape analysis of the human ear canal with application to in-the-ear hearing aid design. Ph.D. thesis, Technical University of Denmark (2004)
Paulsen, R.R., Marstal, K.K., Laugesen, S., Harder, S.: Creating ultra dense point correspondence over the entire human head. In: Sharma, P., Bianchi, F.M. (eds.) SCIA 2017. LNCS, vol. 10270, pp. 438–447. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59129-2_37
Perakis, P., Passalis, G., Theoharis, T., Kakadiaris, I.A.: 3D facial landmark detection under large yaw and expression variations. IEEE Transact. Pattern Anal. Mach. Intell. 35(7), 1552–1564 (2013)
Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view cnns for object classification on 3D data. In: Proceedings of CVPR, pp. 5648–5656 (2016)
Salazar, A., Wuhrer, S., Shu, C., Prieto, F.: Fully automatic expression-invariant face correspondence. Mach. Vis. Appl. 25(4), 859–879 (2014)
Sedaghat, N., Zolfaghari, M., Amiri, E., Brox, T.: Orientation-boosted Voxel nets for 3D object recognition. In: British Machine Vision Conference (BMVC) (2017)
Sela, M., Richardson, E., Kimmel, R.: Unrestricted facial geometry reconstruction using image-to-image translation. arXiv (2017)
Shotton, J., et al.: Real-time human pose recognition in parts from single depth images. In: Proceedings of CVPR, pp. 1297–1304 (2011)
Sled, J.G., Zijdenbos, A.P., Evans, A.C.: A nonparametric method for automatic correction of intensity nonuniformity in MRI data. IEEE Transact. Med. Imaging 17(1), 87–97 (1998)
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 945–953 (2015)
Wiles, O., Zisserman, A.: SilNet: single-and multi-view reconstruction by learning from silhouettes. In: Proceedings of BMVC (2017)
Yang, J., Liu, Q., Zhang, K.: Stacked hourglass network for robust facial landmark localisation. In: Proceedings of CVPR, pp. 2025–2033. IEEE (2017)
Yin, L., Wei, X., Sun, Y., Wang, J., Rosato, M.J.: A 3D facial expression database for facial behavior research. In: Proceedings of FGR, pp. 211–216. IEEE (2006)
Zheng, Y., Liu, D., Georgescu, B., Nguyen, H., Comaniciu, D.: 3D deep learning for efficient and robust landmark detection in volumetric data. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 565–572. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24553-9_69
Zhu, X., Lei, Z., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: a 3D solution. In: Proceedings of CVPR, pp. 146–155 (2016)
Acknowledgements
We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Paulsen, R.R., Juhl, K.A., Haspang, T.M., Hansen, T., Ganz, M., Einarsson, G. (2019). Multi-view Consensus CNN for 3D Facial Landmark Placement. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11361. Springer, Cham. https://doi.org/10.1007/978-3-030-20887-5_44
Download citation
DOI: https://doi.org/10.1007/978-3-030-20887-5_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20886-8
Online ISBN: 978-3-030-20887-5
eBook Packages: Computer ScienceComputer Science (R0)