Abstract
The long-distance link between facial landmarks cannot be modeled by the current CNN-based facial landmark detection networks, and these networks typically have many parameters that consume substantial computational resources. This paper proposes a multi-scale lightweight facial landmark detection network with CNN and Transformer multi-branch parallelism. Based on MobileViT, the network incorporates MobileOne Block and simplified Ghost BottleNeck lightweight network structure. Compared to MobileViT on the WFLW dataset, the number of network parameters is reduced by 49.18%, the failure rate is reduced by 3.20%, the detection speed is improved by 41.73%, the FLOPS is reduced by 64.83%, and the NME is improved by 0.45% and 1.31% on the test and pose subsets, respectively. The data proves that the global information extraction of facial landmarks is more accurate after adding the Transformer structure. This paper also compares with other networks, and the result shows that improved MobileViT achieves more accurate detection with fewer model parameters.
Similar content being viewed by others
Data availability
The datasets used in this paper are the public dataset, WFLW and 300W, available at https://wywu.github.io/projects/LAB/WFLW.html and https://ibug.doc.ic.ac.uk/resources/300-W/, respectively.
References
Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. IEEE Conf. Comput. Vis. Pattern Recognit. (2013). https://doi.org/10.1109/CVPR.2013.75
Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. Int. J. Comput. Vis. 107, 177–190 (2014). https://doi.org/10.1007/s11263-013-0667-3
Sun, Y., Xiaogang, W., Xiaoou, T.: Deep convolutional network cascade for facial point detection. IEEE Conf. Comput. Vis. Pattern Recognit. (2013). https://doi.org/10.1109/CVPR.2013.446
Wu, W., Qian, C., Yang, S., Wang, Q., Cai, Y., Zhou, Q.: Look at boundary: a boundary-aware face alignment algorithm. IEEE Conf. Comput. Vis. Pattern Recognit. (2018). https://doi.org/10.1109/CVPR.2018.00227
Kumar, A., Chellappa, R.: Disentangling 3D pose in a dendritic CNN for unconstrained 2D face alignment. IEEE Conf. Comput. Vis. Pattern Recognit. (2018). https://doi.org/10.1109/CVPR.2018.00052
Zhenmin, Z., Peng, X., Fumin, Z.: Polarization-based method of highlight removal of high-reflectivity surfaces. Optik (2020). https://doi.org/10.1016/j.ijleo.2020.165345
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16 x 16 words: transformers for image recognition at scale. (2020). arXiv preprint https://arxiv.org/abs/2010.11929
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. IEEE Int. Conf. Comput. Vis. (2021). https://doi.org/10.1109/ICCV48922.2021.00986
Mehta, S., Rastegari, M.: MobileViT: light-weight, general-purpose, and mobile-friendly vision transformer. (2021). arXiv preprint https://arxiv.org/abs/2110.02178
Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: MobileNetV2: inverted residuals and linear bottlenecks. IEEE Conf. Comput. Vis. Pattern Recognit. (2018). https://doi.org/10.1109/CVPR.2018.00474
Howard, A.G., Sandler, M., Chu, G., Chen, L., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., Le, Q.V., Adam, H.: Searching for MobileNetV3. IEEE Int. Conf. Comput. Vis. (2019). https://doi.org/10.1109/ICCV.2019.00140
Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: GhostNet: more features from cheap operations. IEEE Conf. Comput. Vis. Pattern Recognit. (2019). https://doi.org/10.1109/cvpr42600.2020.00165
Anasosalu Vasu, P., Gabriel, J., Zhu, J.J., Tuzel, O., Ranjan, A.: An Improved One Millisecond Mobile Backbone. (2022). arXiv preprint https://arxiv.org/abs/2206.04040
Wu, Z., Shen, C., Hengel, A.V.: Wider or deeper: revisiting the ResNet model for visual recognition. Pattern Recognit (2019). https://doi.org/10.1016/j.patcog.2019.01.006
Feng, Z., Kittler, J., Awais, M., Huber, P., Wu, X.: Wing loss for robust facial landmark localisation with convolutional neural networks. IEEE Conf. Comput. Vis. Pattern Recognit. (2017). https://doi.org/10.1109/CVPR.2018.00238
Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: the first facial landmark localization challenge. IEEE Int. Conf. Comput. Vis. (2013). https://doi.org/10.1109/ICCVW.2013.59
Wu, W., Yang, S.: Leveraging Intra and inter-dataset variations for robust face alignment. IEEE Conf. Comput. Vis. Pattern Recognit. (2017). https://doi.org/10.1109/CVPRW.2017.261
Wei, S., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. IEEE Conf. Comput. Vis. Pattern Recognit. (2016). https://doi.org/10.1109/CVPR.2016.511
Dong, X., Yan, Y., Ouyang, W., Yang, Y.: Style Aggregated Network For Facial Landmark Detection. IEEE Conf. Comput. Vis. Pattern Recognit. (2018). https://doi.org/10.1109/CVPR.2018.00047
Zhu, S., Li, C., Loy, C.C., Tang, X.: Face alignment by coarse-to-fine shape searching. IEEE Conf. Comput. Vis. Pattern Recognit. (2015). https://doi.org/10.1109/CVPR.2015.7299134
Guo, X., Li, S., Zhang, J., Ma, J., Ma, L., Liu, W., Ling, H.: PFLD: a practical facial landmark detector. (2019). arXiv preprint https://arxiv.org/abs/1902.10859
Funding
This work was supported by the Program for Innovative Research Team in University of Tianjin (No. TD13-5036), and the Tianjin Science and Technology Popularization Project (No. 22KPXMRC00090).
Author information
Authors and Affiliations
Contributions
LS and CH completed the main manuscript text and experiments, TG and JY prepared Table 3, All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Song, L., Hong, C., Gao, T. et al. Lightweight facial landmark detection network based on improved MobileViT. SIViP 18, 3123–3131 (2024). https://doi.org/10.1007/s11760-023-02975-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-023-02975-4