Skip to main content

Multi-domain Multi-definition Landmark Localization for Small Datasets

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13669))

Included in the following conference series:

Abstract

We present a novel method for multi image domain and multi-landmark definition learning for small dataset facial localization. Training a small dataset alongside a large(r) dataset helps with robust learning for the former, and provides a universal mechanism for facial landmark localization for new and/or smaller standard datasets. To this end, we propose a Vision Transformer encoder with a novel decoder with a definition agnostic shared landmark semantic group structured prior, that is learnt, as we train on more than one dataset concurrently. Due to our novel definition agnostic group prior the datasets may vary in landmark definitions and domains. During the decoder stage we use cross- and self-attention, whose output is later fed into domain/definition specific heads that minimize a Laplacian-log-likelihood loss. We achieve state-of-the-art performance on standard landmark localization datasets such as \(\texttt{COFW}\) and \(\texttt{WFLW}\), when trained with a bigger dataset. We also show state-of-the-art performance on several varied image domain small datasets for animals, caricatures, and facial portrait paintings. Further, we contribute a small dataset (150 images) of pareidolias to show efficacy of our method. Finally, we provide several analysis and ablation studies to justify our claims.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Labeling a landmark dataset for animal faces can take up to 6,833 h [22].

  2. 2.

    We compare our results against previous work, with a caveat that our evaluation is on a subset of the dataset rather than the full dataset, and achieve SOTA performance for the \(\texttt{ArtFace}\) dataset, as shown in Table 4 (right).

References

  1. Bao, H., Dong, L., Wei, F.: Beit: Bert pre-training of image transformers. ArXiv abs/2106.08254 (2021)

    Google Scholar 

  2. Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Vaughan, J.W.: A theory of learning from different domains. Machine Learning, pp. 151–175 (2009). https://doi.org/10.1007/s10994-009-5152-4

  3. Bulat, A., Sanchez, E., Tzimiropoulos, G.: Subpixel heatmap regression for facial landmark localization. In: Proceedings of the British Machine Vision Conference (BMVC) (2021)

    Google Scholar 

  4. Bulat, A., Tzimiropoulos, G.: How far are we from solving the 2d & 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks). In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1021–1030 (2017)

    Google Scholar 

  5. Burgos-Artizzu, X.P., Perona, P., Dollár, P.: Robust face landmark estimation under occlusion. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1513–1520 (2013)

    Google Scholar 

  6. Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. Int. J. Comput. Vis. 107, 177–190 (2012)

    Article  MathSciNet  Google Scholar 

  7. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision. pp. 213–229. Springer (2020)

    Google Scholar 

  8. Chandran, P., Bradley, D., Gross, M.H., Beeler, T.: Attention-driven cropping for very high resolution facial landmark detection. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5860–5869 (2020)

    Google Scholar 

  9. Dapogny, A., Bailly, K., Cord, M.: Decafa: deep convolutional cascade for face alignment in the wild. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6892–6900 (2019)

    Google Scholar 

  10. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.N.: Bert: Pre-training of deep bidirectional transformers for language understanding (2018). arxiv.org/abs/1810.04805

  11. Dong, X., Yu, S.I., Weng, X., Wei, S.E., Yang, Y., Sheikh, Y.: Supervision-by-registration: an unsupervised approach to improve the precision of facial landmark detectors. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 360–368 (2018)

    Google Scholar 

  12. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  13. Dvornik, N., Schmid, C., Mairal, J.: Selecting relevant features from a multi-domain representation for few-shot classification. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 769–786. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_45

    Chapter  Google Scholar 

  14. Feng, Z.H., Kittler, J., Awais, M., Huber, P., Wu, X.J.: Wing loss for robust facial landmark localisation with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2235–2245 (2018)

    Google Scholar 

  15. Hoffman, J., Tzeng, E., Darrell, T., Saenko, K.: Simultaneous deep transfer across domains and tasks. 2015 IEEE International Conference on Computer Vision (ICCV), pp. 4068–4076 (2015)

    Google Scholar 

  16. Honari, S., Molchanov, P., Tyree, S., Vincent, P., Pal, C., Kautz, J.: Improving landmark localization with semi-supervised learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1546–1555 (2018)

    Google Scholar 

  17. Huang, Y., Yang, H., Li, C., Kim, J., Wei, F.: Adnet: Leveraging error-bias towards normal direction in face alignment. arXiv preprint arXiv:2109.05721 (2021)

  18. Jin, H., Liao, S., Shao, L.: Pixel-in-pixel net: towards efficient facial landmark detection in the wild. Int. J. Comput. Vision 129(12), 3174–3194 (2021)

    Article  Google Scholar 

  19. Jin, S., Feng, Z., Yang, W., Kittler, J.: Separable batch normalization for robust facial landmark localization with cross-protocol network training. arXiv preprint arXiv:2101.06663 (2021)

  20. Jin, S., Feng, Z., Yang, W., Kittler, J.: Separable batch normalization for robust facial landmark localization with cross-protocol network training. ArXiv abs/2101.06663 (2021)

    Google Scholar 

  21. Joshi, M., Dredze, M., Cohen, W.W., Rosé, C.P.: Multi-domain learning: When do domains matter? In: EMNLP (2012)

    Google Scholar 

  22. Khan, M.H., et al.: Animalweb: a large-scale hierarchical dataset of annotated animal faces. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6937–6946 (2020)

    Google Scholar 

  23. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  24. Kowalski, M., Naruniec, J., Trzciński, T.: Deep alignment network: a convolutional neural network for robust face alignment. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2034–2043 (2017)

    Google Scholar 

  25. Kumar, A., et al.: Luvli face alignment: estimating landmarks’ location, uncertainty, and visibility likelihood. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8236–8246 (2020)

    Google Scholar 

  26. Lan, X., Hu, Q., Cheng, J.: Hih: Towards more accurate face alignment via heatmap in heatmap. arXiv preprint arXiv:2104.03100 (2021)

  27. Liu, R., Lehman, J., Molino, P., Such, F.P., Frank, E., Sergeev, A., Yosinski, J.: An intriguing failing of convolutional neural networks and the coordconv solution. In: NeurIPS (2018)

    Google Scholar 

  28. Liu, Y., Shi, H., Si, Y., Shen, H., Wang, X., Mei, T.: A high-efficiency framework for constructing large-scale face parsing benchmark. arXiv preprint arXiv:1905.04830 (2019)

  29. , Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision pp. 10012–10022 (2021)

    Google Scholar 

  30. Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4293–4302 (2016)

    Google Scholar 

  31. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29

    Chapter  Google Scholar 

  32. Nibali, A., He, Z., Morgan, S., Prendergast, L.: Numerical coordinate regression with convolutional neural networks. ArXiv abs/1801.07372 (2018)

    Google Scholar 

  33. Poggio, T., Torre, V., Koch, C.: Computational vision and regularization theory. Readings in Computer Vision, pp. 638–643 (1987)

    Google Scholar 

  34. Qian, S., Sun, K., Wu, W., Qian, C., Jia, J.: Aggregation via separation: boosting facial landmark detector with semi-supervised style translation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10153–10163 (2019)

    Google Scholar 

  35. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019)

  36. Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: The first facial landmark localization challenge. In: Proceedings of the IEEE international conference on computer vision workshops. pp. 397–403 (2013)

    Google Scholar 

  37. Saragih, J.M., Lucey, S., Cohn, J.F.: Face alignment through subspace constrained mean-shifts. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 1034–1041. IEEE (2009)

    Google Scholar 

  38. Smith, B.M., Zhang, L.: Collaborative facial landmark localization for transferring annotations across datasets. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 78–93. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_6

    Chapter  Google Scholar 

  39. Song, L., Wu, W., Fu, C., Qian, C., Loy, C.C., He, R.: Everything’s talkin’: Pareidolia face reenactment. arXiv preprint arXiv:2104.03061 (2021)

  40. Sun, K., et al.: High-resolution representations for labeling pixels and regions. arXiv preprint arXiv:1904.04514 (2019)

  41. Tang, Z., Peng, X., Li, K., Metaxas, D.N.: Towards efficient u-nets: a coupled and quantized approach. IEEE Trans. Pattern Anal. Mach. Intell. 42, 2038–2050 (2020)

    Article  Google Scholar 

  42. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR (2021)

    Google Scholar 

  43. Valle, R., Buenaposada, J.M., Valdés, A., Baumela, L.: A deeply-initialized coarse-to-fine ensemble of regression trees for face alignment. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 609–624. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_36

    Chapter  Google Scholar 

  44. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

    Google Scholar 

  45. Wang, X., Bo, L., Fuxin, L.: Adaptive wing loss for robust face alignment via heatmap regression. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6971–6981 (2019)

    Google Scholar 

  46. Wardle, S.G., Paranjape, S., Taubert, J., Baker, C.I.: Illusory faces are more likely to be perceived as male than female. Proceedings of the National Academy of Sciences 119(5) (2022)

    Google Scholar 

  47. Watchareeruetai, U., et al.: Lotr: face landmark localization using localization transformer. arXiv preprint arXiv:2109.10057 (2021)

  48. Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 4724–4732 (2016)

    Google Scholar 

  49. Wei, S.E., Saragih, J.M., Simon, T., Harley, A.W., Lombardi, S., Perdoch, M., Hypes, A., Wang, D., Badino, H., Sheikh, Y.: Vr facial animation via multiview image translation. ACM Trans. Graph. (TOG) 38, 1–16 (2019)

    Article  Google Scholar 

  50. White, T.: Shared visual abstractions. ArXiv abs/1912.04217 (2019)

    Google Scholar 

  51. Williams, J.: Multi-domain learning and generalization in dialog state tracking. In: SIGDIAL Conference (2013)

    Google Scholar 

  52. Wu, W., Qian, C., Yang, S., Wang, Q., Cai, Y., Zhou, Q.: Look at boundary: a boundary-aware face alignment algorithm. In: CVPR (2018)

    Google Scholar 

  53. Wu, W., Yang, S.: Leveraging intra and inter-dataset variations for robust face alignment. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) pp. 2096–2105 (2017)

    Google Scholar 

  54. Xiong, X., la Torre, F.D.: Supervised descent method and its applications to face alignment. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 532–539 (2013)

    Google Scholar 

  55. Yang, J., Liu, Q., Zhang, K.: Stacked hourglass network for robust facial landmark localisation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 79–87 (2017)

    Google Scholar 

  56. Yaniv, J., Newman, Y.: The face of art: landmark detection and geometric style in portraits (2019)

    Google Scholar 

  57. Zhang, J., Kan, M., Shan, S., Chen, X.: Leveraging datasets with varying annotations for face alignment via deep regression network. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 3801–3809 (2015)

    Google Scholar 

  58. Zhang, J., Cai, H., Guo, Y., Peng, Z.: Landmark detection and 3d face reconstruction for caricature using a nonlinear parametric model. Graph. Model. 115, 101103 (2021)

    Article  Google Scholar 

  59. Zheng, Y., et al.: General facial representation learning in a visual-linguistic manner. CoRR (2021)

    Google Scholar 

  60. Zhu, S., Li, C., Loy, C.C., Tang, X.: Transferring landmark annotations for cross-dataset face alignment. ArXiv abs/1409.0602 (2014)

    Google Scholar 

  61. Zhu, S., Li, C., Loy, C.C., Tang, X.: Face alignment by coarse-to-fine shape searching. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4998–5006 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Ferman .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 513 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ferman, D., Bharaj, G. (2022). Multi-domain Multi-definition Landmark Localization for Small Datasets. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13669. Springer, Cham. https://doi.org/10.1007/978-3-031-20077-9_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20077-9_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20076-2

  • Online ISBN: 978-3-031-20077-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics