Multi-domain Multi-definition Landmark Localization for Small Datasets

Ferman, David; Bharaj, Gaurav

doi:10.1007/978-3-031-20077-9_38

David Ferman^12,13 &
Gaurav Bharaj¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13669))

Included in the following conference series:

European Conference on Computer Vision

2977 Accesses
1 Citations

Abstract

We present a novel method for multi image domain and multi-landmark definition learning for small dataset facial localization. Training a small dataset alongside a large(r) dataset helps with robust learning for the former, and provides a universal mechanism for facial landmark localization for new and/or smaller standard datasets. To this end, we propose a Vision Transformer encoder with a novel decoder with a definition agnostic shared landmark semantic group structured prior, that is learnt, as we train on more than one dataset concurrently. Due to our novel definition agnostic group prior the datasets may vary in landmark definitions and domains. During the decoder stage we use cross- and self-attention, whose output is later fed into domain/definition specific heads that minimize a Laplacian-log-likelihood loss. We achieve state-of-the-art performance on standard landmark localization datasets such as \(\texttt{COFW}\) and \(\texttt{WFLW}\), when trained with a bigger dataset. We also show state-of-the-art performance on several varied image domain small datasets for animals, caricatures, and facial portrait paintings. Further, we contribute a small dataset (150 images) of pareidolias to show efficacy of our method. Finally, we provide several analysis and ablation studies to justify our claims.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The Menpo Benchmark for Multi-pose 2D and 3D Facial Landmark Localisation and Tracking

Article Open access 29 November 2018

Collaborative Facial Landmark Localization for Transferring Annotations Across Datasets

Benchmarking Omni-Vision Representation Through the Lens of Visual Realms

Notes

1.
Labeling a landmark dataset for animal faces can take up to 6,833 h [22].
2.
We compare our results against previous work, with a caveat that our evaluation is on a subset of the dataset rather than the full dataset, and achieve SOTA performance for the \(\texttt{ArtFace}\) dataset, as shown in Table 4 (right).

References

Bao, H., Dong, L., Wei, F.: Beit: Bert pre-training of image transformers. ArXiv abs/2106.08254 (2021)
Google Scholar
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Vaughan, J.W.: A theory of learning from different domains. Machine Learning, pp. 151–175 (2009). https://doi.org/10.1007/s10994-009-5152-4
Bulat, A., Sanchez, E., Tzimiropoulos, G.: Subpixel heatmap regression for facial landmark localization. In: Proceedings of the British Machine Vision Conference (BMVC) (2021)
Google Scholar
Bulat, A., Tzimiropoulos, G.: How far are we from solving the 2d & 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks). In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1021–1030 (2017)
Google Scholar
Burgos-Artizzu, X.P., Perona, P., Dollár, P.: Robust face landmark estimation under occlusion. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1513–1520 (2013)
Google Scholar
Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. Int. J. Comput. Vis. 107, 177–190 (2012)
Article MathSciNet Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision. pp. 213–229. Springer (2020)
Google Scholar
Chandran, P., Bradley, D., Gross, M.H., Beeler, T.: Attention-driven cropping for very high resolution facial landmark detection. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5860–5869 (2020)
Google Scholar
Dapogny, A., Bailly, K., Cord, M.: Decafa: deep convolutional cascade for face alignment in the wild. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6892–6900 (2019)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.N.: Bert: Pre-training of deep bidirectional transformers for language understanding (2018). arxiv.org/abs/1810.04805
Dong, X., Yu, S.I., Weng, X., Wei, S.E., Yang, Y., Sheikh, Y.: Supervision-by-registration: an unsupervised approach to improve the precision of facial landmark detectors. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 360–368 (2018)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Dvornik, N., Schmid, C., Mairal, J.: Selecting relevant features from a multi-domain representation for few-shot classification. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 769–786. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_45
Chapter Google Scholar
Feng, Z.H., Kittler, J., Awais, M., Huber, P., Wu, X.J.: Wing loss for robust facial landmark localisation with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2235–2245 (2018)
Google Scholar
Hoffman, J., Tzeng, E., Darrell, T., Saenko, K.: Simultaneous deep transfer across domains and tasks. 2015 IEEE International Conference on Computer Vision (ICCV), pp. 4068–4076 (2015)
Google Scholar
Honari, S., Molchanov, P., Tyree, S., Vincent, P., Pal, C., Kautz, J.: Improving landmark localization with semi-supervised learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1546–1555 (2018)
Google Scholar
Huang, Y., Yang, H., Li, C., Kim, J., Wei, F.: Adnet: Leveraging error-bias towards normal direction in face alignment. arXiv preprint arXiv:2109.05721 (2021)
Jin, H., Liao, S., Shao, L.: Pixel-in-pixel net: towards efficient facial landmark detection in the wild. Int. J. Comput. Vision 129(12), 3174–3194 (2021)
Article Google Scholar
Jin, S., Feng, Z., Yang, W., Kittler, J.: Separable batch normalization for robust facial landmark localization with cross-protocol network training. arXiv preprint arXiv:2101.06663 (2021)
Jin, S., Feng, Z., Yang, W., Kittler, J.: Separable batch normalization for robust facial landmark localization with cross-protocol network training. ArXiv abs/2101.06663 (2021)
Google Scholar
Joshi, M., Dredze, M., Cohen, W.W., Rosé, C.P.: Multi-domain learning: When do domains matter? In: EMNLP (2012)
Google Scholar
Khan, M.H., et al.: Animalweb: a large-scale hierarchical dataset of annotated animal faces. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6937–6946 (2020)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kowalski, M., Naruniec, J., Trzciński, T.: Deep alignment network: a convolutional neural network for robust face alignment. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2034–2043 (2017)
Google Scholar
Kumar, A., et al.: Luvli face alignment: estimating landmarks’ location, uncertainty, and visibility likelihood. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8236–8246 (2020)
Google Scholar
Lan, X., Hu, Q., Cheng, J.: Hih: Towards more accurate face alignment via heatmap in heatmap. arXiv preprint arXiv:2104.03100 (2021)
Liu, R., Lehman, J., Molino, P., Such, F.P., Frank, E., Sergeev, A., Yosinski, J.: An intriguing failing of convolutional neural networks and the coordconv solution. In: NeurIPS (2018)
Google Scholar
Liu, Y., Shi, H., Si, Y., Shen, H., Wang, X., Mei, T.: A high-efficiency framework for constructing large-scale face parsing benchmark. arXiv preprint arXiv:1905.04830 (2019)
, Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision pp. 10012–10022 (2021)
Google Scholar
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4293–4302 (2016)
Google Scholar
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Chapter Google Scholar
Nibali, A., He, Z., Morgan, S., Prendergast, L.: Numerical coordinate regression with convolutional neural networks. ArXiv abs/1801.07372 (2018)
Google Scholar
Poggio, T., Torre, V., Koch, C.: Computational vision and regularization theory. Readings in Computer Vision, pp. 638–643 (1987)
Google Scholar
Qian, S., Sun, K., Wu, W., Qian, C., Jia, J.: Aggregation via separation: boosting facial landmark detector with semi-supervised style translation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10153–10163 (2019)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019)
Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: The first facial landmark localization challenge. In: Proceedings of the IEEE international conference on computer vision workshops. pp. 397–403 (2013)
Google Scholar
Saragih, J.M., Lucey, S., Cohn, J.F.: Face alignment through subspace constrained mean-shifts. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 1034–1041. IEEE (2009)
Google Scholar
Smith, B.M., Zhang, L.: Collaborative facial landmark localization for transferring annotations across datasets. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 78–93. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_6
Chapter Google Scholar
Song, L., Wu, W., Fu, C., Qian, C., Loy, C.C., He, R.: Everything’s talkin’: Pareidolia face reenactment. arXiv preprint arXiv:2104.03061 (2021)
Sun, K., et al.: High-resolution representations for labeling pixels and regions. arXiv preprint arXiv:1904.04514 (2019)
Tang, Z., Peng, X., Li, K., Metaxas, D.N.: Towards efficient u-nets: a coupled and quantized approach. IEEE Trans. Pattern Anal. Mach. Intell. 42, 2038–2050 (2020)
Article Google Scholar
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR (2021)
Google Scholar
Valle, R., Buenaposada, J.M., Valdés, A., Baumela, L.: A deeply-initialized coarse-to-fine ensemble of regression trees for face alignment. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 609–624. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_36
Chapter Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Wang, X., Bo, L., Fuxin, L.: Adaptive wing loss for robust face alignment via heatmap regression. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6971–6981 (2019)
Google Scholar
Wardle, S.G., Paranjape, S., Taubert, J., Baker, C.I.: Illusory faces are more likely to be perceived as male than female. Proceedings of the National Academy of Sciences 119(5) (2022)
Google Scholar
Watchareeruetai, U., et al.: Lotr: face landmark localization using localization transformer. arXiv preprint arXiv:2109.10057 (2021)
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 4724–4732 (2016)
Google Scholar
Wei, S.E., Saragih, J.M., Simon, T., Harley, A.W., Lombardi, S., Perdoch, M., Hypes, A., Wang, D., Badino, H., Sheikh, Y.: Vr facial animation via multiview image translation. ACM Trans. Graph. (TOG) 38, 1–16 (2019)
Article Google Scholar
White, T.: Shared visual abstractions. ArXiv abs/1912.04217 (2019)
Google Scholar
Williams, J.: Multi-domain learning and generalization in dialog state tracking. In: SIGDIAL Conference (2013)
Google Scholar
Wu, W., Qian, C., Yang, S., Wang, Q., Cai, Y., Zhou, Q.: Look at boundary: a boundary-aware face alignment algorithm. In: CVPR (2018)
Google Scholar
Wu, W., Yang, S.: Leveraging intra and inter-dataset variations for robust face alignment. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) pp. 2096–2105 (2017)
Google Scholar
Xiong, X., la Torre, F.D.: Supervised descent method and its applications to face alignment. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 532–539 (2013)
Google Scholar
Yang, J., Liu, Q., Zhang, K.: Stacked hourglass network for robust facial landmark localisation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 79–87 (2017)
Google Scholar
Yaniv, J., Newman, Y.: The face of art: landmark detection and geometric style in portraits (2019)
Google Scholar
Zhang, J., Kan, M., Shan, S., Chen, X.: Leveraging datasets with varying annotations for face alignment via deep regression network. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 3801–3809 (2015)
Google Scholar
Zhang, J., Cai, H., Guo, Y., Peng, Z.: Landmark detection and 3d face reconstruction for caricature using a nonlinear parametric model. Graph. Model. 115, 101103 (2021)
Article Google Scholar
Zheng, Y., et al.: General facial representation learning in a visual-linguistic manner. CoRR (2021)
Google Scholar
Zhu, S., Li, C., Loy, C.C., Tang, X.: Transferring landmark annotations for cross-dataset face alignment. ArXiv abs/1409.0602 (2014)
Google Scholar
Zhu, S., Li, C., Loy, C.C., Tang, X.: Face alignment by coarse-to-fine shape searching. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4998–5006 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

AI Foundation, Baltimore, USA
David Ferman & Gaurav Bharaj
UT Austin, Austin, USA
David Ferman

Authors

David Ferman
View author publications
You can also search for this author in PubMed Google Scholar
Gaurav Bharaj
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Ferman .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 513 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ferman, D., Bharaj, G. (2022). Multi-domain Multi-definition Landmark Localization for Small Datasets. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13669. Springer, Cham. https://doi.org/10.1007/978-3-031-20077-9_38

Download citation

DOI: https://doi.org/10.1007/978-3-031-20077-9_38
Published: 06 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20076-2
Online ISBN: 978-3-031-20077-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-domain Multi-definition Landmark Localization for Small Datasets

Abstract

Access this chapter

Similar content being viewed by others

The Menpo Benchmark for Multi-pose 2D and 3D Facial Landmark Localisation and Tracking

Collaborative Facial Landmark Localization for Transferring Annotations Across Datasets

Benchmarking Omni-Vision Representation Through the Lens of Visual Realms

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 513 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Multi-domain Multi-definition Landmark Localization for Small Datasets

Abstract

Access this chapter

Similar content being viewed by others

The Menpo Benchmark for Multi-pose 2D and 3D Facial Landmark Localisation and Tracking

Collaborative Facial Landmark Localization for Transferring Annotations Across Datasets

Benchmarking Omni-Vision Representation Through the Lens of Visual Realms

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 513 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation