Lightweight facial landmark detection network based on improved MobileViT

Song, Limei; Hong, Chuanfei; Gao, Tian; Yu, Jiali

doi:10.1007/s11760-023-02975-4

Lightweight facial landmark detection network based on improved MobileViT

Original Paper
Published: 18 January 2024

Volume 18, pages 3123–3131, (2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Limei Song¹,
Chuanfei Hong¹,
Tian Gao¹ &
…
Jiali Yu¹

259 Accesses
Explore all metrics

Abstract

The long-distance link between facial landmarks cannot be modeled by the current CNN-based facial landmark detection networks, and these networks typically have many parameters that consume substantial computational resources. This paper proposes a multi-scale lightweight facial landmark detection network with CNN and Transformer multi-branch parallelism. Based on MobileViT, the network incorporates MobileOne Block and simplified Ghost BottleNeck lightweight network structure. Compared to MobileViT on the WFLW dataset, the number of network parameters is reduced by 49.18%, the failure rate is reduced by 3.20%, the detection speed is improved by 41.73%, the FLOPS is reduced by 64.83%, and the NME is improved by 0.45% and 1.31% on the test and pose subsets, respectively. The data proves that the global information extraction of facial landmarks is more accurate after adding the Transformer structure. This paper also compares with other networks, and the result shows that improved MobileViT achieves more accurate detection with fewer model parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards Hardware-Friendly and Robust Facial Landmark Detection Method

Modified Stacked Hourglass Networks for Facial Landmarks Detection

Facial Landmark Detection by Deep Multi-task Learning

Data availability

The datasets used in this paper are the public dataset, WFLW and 300W, available at https://wywu.github.io/projects/LAB/WFLW.html and https://ibug.doc.ic.ac.uk/resources/300-W/, respectively.

References

Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. IEEE Conf. Comput. Vis. Pattern Recognit. (2013). https://doi.org/10.1109/CVPR.2013.75
Article Google Scholar
Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. Int. J. Comput. Vis. 107, 177–190 (2014). https://doi.org/10.1007/s11263-013-0667-3
Article MathSciNet Google Scholar
Sun, Y., Xiaogang, W., Xiaoou, T.: Deep convolutional network cascade for facial point detection. IEEE Conf. Comput. Vis. Pattern Recognit. (2013). https://doi.org/10.1109/CVPR.2013.446
Article Google Scholar
Wu, W., Qian, C., Yang, S., Wang, Q., Cai, Y., Zhou, Q.: Look at boundary: a boundary-aware face alignment algorithm. IEEE Conf. Comput. Vis. Pattern Recognit. (2018). https://doi.org/10.1109/CVPR.2018.00227
Article Google Scholar
Kumar, A., Chellappa, R.: Disentangling 3D pose in a dendritic CNN for unconstrained 2D face alignment. IEEE Conf. Comput. Vis. Pattern Recognit. (2018). https://doi.org/10.1109/CVPR.2018.00052
Article Google Scholar
Zhenmin, Z., Peng, X., Fumin, Z.: Polarization-based method of highlight removal of high-reflectivity surfaces. Optik (2020). https://doi.org/10.1016/j.ijleo.2020.165345
Article Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16 x 16 words: transformers for image recognition at scale. (2020). arXiv preprint https://arxiv.org/abs/2010.11929
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. IEEE Int. Conf. Comput. Vis. (2021). https://doi.org/10.1109/ICCV48922.2021.00986
Article Google Scholar
Mehta, S., Rastegari, M.: MobileViT: light-weight, general-purpose, and mobile-friendly vision transformer. (2021). arXiv preprint https://arxiv.org/abs/2110.02178
Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: MobileNetV2: inverted residuals and linear bottlenecks. IEEE Conf. Comput. Vis. Pattern Recognit. (2018). https://doi.org/10.1109/CVPR.2018.00474
Article Google Scholar
Howard, A.G., Sandler, M., Chu, G., Chen, L., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., Le, Q.V., Adam, H.: Searching for MobileNetV3. IEEE Int. Conf. Comput. Vis. (2019). https://doi.org/10.1109/ICCV.2019.00140
Article Google Scholar
Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: GhostNet: more features from cheap operations. IEEE Conf. Comput. Vis. Pattern Recognit. (2019). https://doi.org/10.1109/cvpr42600.2020.00165
Article Google Scholar
Anasosalu Vasu, P., Gabriel, J., Zhu, J.J., Tuzel, O., Ranjan, A.: An Improved One Millisecond Mobile Backbone. (2022). arXiv preprint https://arxiv.org/abs/2206.04040
Wu, Z., Shen, C., Hengel, A.V.: Wider or deeper: revisiting the ResNet model for visual recognition. Pattern Recognit (2019). https://doi.org/10.1016/j.patcog.2019.01.006
Article Google Scholar
Feng, Z., Kittler, J., Awais, M., Huber, P., Wu, X.: Wing loss for robust facial landmark localisation with convolutional neural networks. IEEE Conf. Comput. Vis. Pattern Recognit. (2017). https://doi.org/10.1109/CVPR.2018.00238
Article Google Scholar
Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: the first facial landmark localization challenge. IEEE Int. Conf. Comput. Vis. (2013). https://doi.org/10.1109/ICCVW.2013.59
Article Google Scholar
Wu, W., Yang, S.: Leveraging Intra and inter-dataset variations for robust face alignment. IEEE Conf. Comput. Vis. Pattern Recognit. (2017). https://doi.org/10.1109/CVPRW.2017.261
Article Google Scholar
Wei, S., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. IEEE Conf. Comput. Vis. Pattern Recognit. (2016). https://doi.org/10.1109/CVPR.2016.511
Article Google Scholar
Dong, X., Yan, Y., Ouyang, W., Yang, Y.: Style Aggregated Network For Facial Landmark Detection. IEEE Conf. Comput. Vis. Pattern Recognit. (2018). https://doi.org/10.1109/CVPR.2018.00047
Article Google Scholar
Zhu, S., Li, C., Loy, C.C., Tang, X.: Face alignment by coarse-to-fine shape searching. IEEE Conf. Comput. Vis. Pattern Recognit. (2015). https://doi.org/10.1109/CVPR.2015.7299134
Article Google Scholar
Guo, X., Li, S., Zhang, J., Ma, J., Ma, L., Liu, W., Ling, H.: PFLD: a practical facial landmark detector. (2019). arXiv preprint https://arxiv.org/abs/1902.10859

Download references

Funding

This work was supported by the Program for Innovative Research Team in University of Tianjin (No. TD13-5036), and the Tianjin Science and Technology Popularization Project (No. 22KPXMRC00090).

Author information

Authors and Affiliations

Tianjin Key Laboratory of Intelligent Control of Electrical Equipment, Tiangong University, Tianjin, China
Limei Song, Chuanfei Hong, Tian Gao & Jiali Yu

Authors

Limei Song
View author publications
You can also search for this author in PubMed Google Scholar
Chuanfei Hong
View author publications
You can also search for this author in PubMed Google Scholar
Tian Gao
View author publications
You can also search for this author in PubMed Google Scholar
Jiali Yu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

LS and CH completed the main manuscript text and experiments, TG and JY prepared Table 3, All authors reviewed the manuscript.

Corresponding author

Correspondence to Limei Song.

Ethics declarations

Conflict of interest

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Song, L., Hong, C., Gao, T. et al. Lightweight facial landmark detection network based on improved MobileViT. SIViP 18, 3123–3131 (2024). https://doi.org/10.1007/s11760-023-02975-4

Download citation

Received: 08 March 2023
Revised: 14 November 2023
Accepted: 15 December 2023
Published: 18 January 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s11760-023-02975-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lightweight facial landmark detection network based on improved MobileViT

Abstract

Access this article

Similar content being viewed by others

Towards Hardware-Friendly and Robust Facial Landmark Detection Method

Modified Stacked Hourglass Networks for Facial Landmarks Detection

Facial Landmark Detection by Deep Multi-task Learning

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Lightweight facial landmark detection network based on improved MobileViT

Abstract

Access this article

Similar content being viewed by others

Towards Hardware-Friendly and Robust Facial Landmark Detection Method

Modified Stacked Hourglass Networks for Facial Landmarks Detection

Facial Landmark Detection by Deep Multi-task Learning

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation