Skip to main content
Log in

Lightweight facial landmark detection network based on improved MobileViT

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

The long-distance link between facial landmarks cannot be modeled by the current CNN-based facial landmark detection networks, and these networks typically have many parameters that consume substantial computational resources. This paper proposes a multi-scale lightweight facial landmark detection network with CNN and Transformer multi-branch parallelism. Based on MobileViT, the network incorporates MobileOne Block and simplified Ghost BottleNeck lightweight network structure. Compared to MobileViT on the WFLW dataset, the number of network parameters is reduced by 49.18%, the failure rate is reduced by 3.20%, the detection speed is improved by 41.73%, the FLOPS is reduced by 64.83%, and the NME is improved by 0.45% and 1.31% on the test and pose subsets, respectively. The data proves that the global information extraction of facial landmarks is more accurate after adding the Transformer structure. This paper also compares with other networks, and the result shows that improved MobileViT achieves more accurate detection with fewer model parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The datasets used in this paper are the public dataset, WFLW and 300W, available at https://wywu.github.io/projects/LAB/WFLW.html and https://ibug.doc.ic.ac.uk/resources/300-W/, respectively.

References

  1. Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. IEEE Conf. Comput. Vis. Pattern Recognit. (2013). https://doi.org/10.1109/CVPR.2013.75

    Article  Google Scholar 

  2. Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. Int. J. Comput. Vis. 107, 177–190 (2014). https://doi.org/10.1007/s11263-013-0667-3

    Article  MathSciNet  Google Scholar 

  3. Sun, Y., Xiaogang, W., Xiaoou, T.: Deep convolutional network cascade for facial point detection. IEEE Conf. Comput. Vis. Pattern Recognit. (2013). https://doi.org/10.1109/CVPR.2013.446

    Article  Google Scholar 

  4. Wu, W., Qian, C., Yang, S., Wang, Q., Cai, Y., Zhou, Q.: Look at boundary: a boundary-aware face alignment algorithm. IEEE Conf. Comput. Vis. Pattern Recognit. (2018). https://doi.org/10.1109/CVPR.2018.00227

    Article  Google Scholar 

  5. Kumar, A., Chellappa, R.: Disentangling 3D pose in a dendritic CNN for unconstrained 2D face alignment. IEEE Conf. Comput. Vis. Pattern Recognit. (2018). https://doi.org/10.1109/CVPR.2018.00052

    Article  Google Scholar 

  6. Zhenmin, Z., Peng, X., Fumin, Z.: Polarization-based method of highlight removal of high-reflectivity surfaces. Optik (2020). https://doi.org/10.1016/j.ijleo.2020.165345

    Article  Google Scholar 

  7. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16 x 16 words: transformers for image recognition at scale. (2020). arXiv preprint https://arxiv.org/abs/2010.11929

  8. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. IEEE Int. Conf. Comput. Vis. (2021). https://doi.org/10.1109/ICCV48922.2021.00986

    Article  Google Scholar 

  9. Mehta, S., Rastegari, M.: MobileViT: light-weight, general-purpose, and mobile-friendly vision transformer. (2021). arXiv preprint https://arxiv.org/abs/2110.02178

  10. Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: MobileNetV2: inverted residuals and linear bottlenecks. IEEE Conf. Comput. Vis. Pattern Recognit. (2018). https://doi.org/10.1109/CVPR.2018.00474

    Article  Google Scholar 

  11. Howard, A.G., Sandler, M., Chu, G., Chen, L., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., Le, Q.V., Adam, H.: Searching for MobileNetV3. IEEE Int. Conf. Comput. Vis. (2019). https://doi.org/10.1109/ICCV.2019.00140

    Article  Google Scholar 

  12. Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: GhostNet: more features from cheap operations. IEEE Conf. Comput. Vis. Pattern Recognit. (2019). https://doi.org/10.1109/cvpr42600.2020.00165

    Article  Google Scholar 

  13. Anasosalu Vasu, P., Gabriel, J., Zhu, J.J., Tuzel, O., Ranjan, A.: An Improved One Millisecond Mobile Backbone. (2022). arXiv preprint https://arxiv.org/abs/2206.04040

  14. Wu, Z., Shen, C., Hengel, A.V.: Wider or deeper: revisiting the ResNet model for visual recognition. Pattern Recognit (2019). https://doi.org/10.1016/j.patcog.2019.01.006

    Article  Google Scholar 

  15. Feng, Z., Kittler, J., Awais, M., Huber, P., Wu, X.: Wing loss for robust facial landmark localisation with convolutional neural networks. IEEE Conf. Comput. Vis. Pattern Recognit. (2017). https://doi.org/10.1109/CVPR.2018.00238

    Article  Google Scholar 

  16. Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: the first facial landmark localization challenge. IEEE Int. Conf. Comput. Vis. (2013). https://doi.org/10.1109/ICCVW.2013.59

    Article  Google Scholar 

  17. Wu, W., Yang, S.: Leveraging Intra and inter-dataset variations for robust face alignment. IEEE Conf. Comput. Vis. Pattern Recognit. (2017). https://doi.org/10.1109/CVPRW.2017.261

    Article  Google Scholar 

  18. Wei, S., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. IEEE Conf. Comput. Vis. Pattern Recognit. (2016). https://doi.org/10.1109/CVPR.2016.511

    Article  Google Scholar 

  19. Dong, X., Yan, Y., Ouyang, W., Yang, Y.: Style Aggregated Network For Facial Landmark Detection. IEEE Conf. Comput. Vis. Pattern Recognit. (2018). https://doi.org/10.1109/CVPR.2018.00047

    Article  Google Scholar 

  20. Zhu, S., Li, C., Loy, C.C., Tang, X.: Face alignment by coarse-to-fine shape searching. IEEE Conf. Comput. Vis. Pattern Recognit. (2015). https://doi.org/10.1109/CVPR.2015.7299134

    Article  Google Scholar 

  21. Guo, X., Li, S., Zhang, J., Ma, J., Ma, L., Liu, W., Ling, H.: PFLD: a practical facial landmark detector. (2019). arXiv preprint https://arxiv.org/abs/1902.10859

Download references

Funding

This work was supported by the Program for Innovative Research Team in University of Tianjin (No. TD13-5036), and the Tianjin Science and Technology Popularization Project (No. 22KPXMRC00090).

Author information

Authors and Affiliations

Authors

Contributions

LS and CH completed the main manuscript text and experiments, TG and JY prepared Table 3, All authors reviewed the manuscript.

Corresponding author

Correspondence to Limei Song.

Ethics declarations

Conflict of interest

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, L., Hong, C., Gao, T. et al. Lightweight facial landmark detection network based on improved MobileViT. SIViP 18, 3123–3131 (2024). https://doi.org/10.1007/s11760-023-02975-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-023-02975-4

Keywords

Navigation