Skip to main content
Log in

MobileFaceFormer: a lightweight face recognition model against face variations

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In recent years, the study of lightweight models has been one of the most significant application streams of face recognition tasks due to practical life demands. However, typical lightweight face recognition models become less effective when dealing with large face feature variations (e.g. age variation, pose variation). In this paper, we present a lightweight face recognition model, namely MobileFaceFormer. It takes advantage of both convolutional neural networks’ (CNNs) effectiveness in capturing local features and visual transformers’ effectiveness in computing global dependencies for more enriched and abundant interpretations of facial features. To achieve this, both CNN branches and visual transformer branches are parallelized, and a bi-directional feature fusion bridge connecting dual branches is designed to concurrently retain local facial features and global facial interpretations. To enhance feature interpretations on dual branch, a convolutional token initialization method is proposed at transformer branch to perceive long-range facial information, also depthwise separable convolution and attention mechanisms are adopted at CNN branch to enhance local facial feature extraction. Further, an attentive global depthwise convolution (AGDC) is proposed to encourage the concentration of key facial information. Experiments across state-of-the-art FR datasets show MobileFaceFormer achieves higher recognition performance, e.g. MobileFaceFormer achieves 99.60% at LFW dataset, compared to 99.28 % of MobileFaceNets; Meanwhile, MobileFaceFormer shows more lightweight model complexity, e.g. in terms of computation cost, MobileFaceFormer has 65M Multiply-Accumulate Operations (MAdds) than 221M of MobileFaceNets under similar parameter sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The data are available from the corresponding author upon reasonable request.

References

  1. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European Conference on Computer Vision. Springer, pp 213–229

  2. Chen B, Li P, Li B, Li C, Bai L, Lin C, Sun M, Yan J, Ouyang W (2021) PSVIT: better vision transformer via token pooling and attention sharing. Preprint at http://arxiv.org/abs/2108.03428

  3. Chen S, Liu Y, Gao X, Han Z (2018) Mobilefacenets: efficient CNNs for accurate real-time face verification on mobile devices. In: Chinese Conference on Biometric Recognition. Springer, pp 428–438

  4. Chen Y, Dai X, Chen D, Liu M, Dong X, Yuan L, Liu Z (2022) Mobile-former: bridging mobilenet and transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 5270–5279

  5. d’Ascoli S, Touvron H, Leavitt ML, Morcos AS, Biroli G, Sagun L (2021) Convit: improving vision transformers with soft convolutional inductive biases. In: International Conference on Machine Learning, PMLR. pp. 2286–2296

  6. Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 4690–4699

  7. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. Preprint at http://arxiv.org/abs/2010.11929

  8. Duong CN, Quach KG, Jalata I, Le N, Luu K (2019) Mobiface: a lightweight deep learning face recognition on mobile devices. In: 2019 IEEE 10th international conference on biometrics theory, applications and systems (BTAS). IEEE, pp 1–6

  9. Graham B, El-Nouby A, Touvron H, Stock P, Joulin A, Jégou H, Douze M (2021) Levit: a vision transformer in convnet’s clothing for faster inference. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 12259–12269

  10. Guo J. Datasetzoo. https://github.com/deepinsight/insightface/wiki/Dataset-Zoo. Accessed 21 Nov 2021

  11. Guo Y, Zhang L, Hu Y, He X, Gao J (2016) Ms-celeb-1m: a dataset and benchmark for large-scale face recognition. In: European Conference on Computer Vision. Springer, pp 87–102

  12. Han K, Xiao A, Wu E, Guo J, Xu C, Wang Y (2021) Transformer in transformer. Adv Neural Inf Process Syst 34:15908–15919

    Google Scholar 

  13. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 770–778

  14. Hinton G, Vinyals O, Dean J et al (2015) Distilling the knowledge in a neural network. Preprint at http://arxiv.org/abs/1503.02531

  15. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 7132–7141

  16. Huang GB, Mattar M, Berg T, Learned-Miller E (2008) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. In: Workshop on Faces in’Real-Life’Images: Detection, Alignment, and Recognition

  17. Li X, Wang F, Hu Q, Leng C (2019) Airface: lightweight and efficient model for face recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. pp 0–0

  18. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 10012–10022

  19. Luo P, Zhu Z, Liu Z, Wang X, Tang X (2016) Face model compression by distilling knowledge from neurons. In: Thirtieth AAAI Conference on Artificial Intelligence

  20. Moschoglou S, Papaioannou A, Sagonas C, Deng J, Kotsia I, Zafeiriou S (2017) AGEDB: the first manually collected, in-the-wild age database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. pp 51–59

  21. Peng Z, Huang W, Gu S, Xie L, Wang Y, Jiao J, Ye Q (2021) Conformer: local features coupling global representations for visual recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 367–376

  22. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 4510–4520

  23. Sengupta S, Chen JC, Castillo C, Patel VM, Chellappa R, Jacobs DW (2016) Frontal to profile face verification in the wild. In: 2016 Winter Conference on Applications of Computer Vision (WACV). IEEE, pp 1–9

  24. Srinivas A, Lin TY, Parmar N, Shlens J, Abbeel P, Vaswani A (2021) Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 16519–16529

  25. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Proces Syst 30

  26. Wang W, Xie E, Li X, Fan DP, Song K, Liang D, Lu T, Luo P, Shao L (2021) Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 568–578

  27. Woo S, Park J, Lee JY, Kweon IS (2018) CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV). pp 3–19

  28. Wu H, Xiao B, Codella N, Liu M, Dai X, Yuan L, Zhang L (2021) CVT: introducing convolutions to vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 22–31

  29. Yi D, Lei Z, Liao S, Li SZ (2014) Learning face representation from scratch. Preprint at http://arxiv.org/abs/1411.7923

  30. Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang ZH, Tay FE, Feng J, Yan S (2021) Tokens-to-token VIT: training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 558–567

  31. Zhang J (2019) SeesawFaceNets: sparse and robust face verification model for mobile platform. Preprint at http://arxiv.org/abs/1908.09124

  32. Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23:1499–1503

    Article  Google Scholar 

  33. Zhang Q, Yang YB (2021) Rest: an efficient transformer for visual recognition. Adv Neural Inf Process Syst 34:15475–15485

    Google Scholar 

  34. Zheng T, Deng W (2018) Cross-pose LFW: a database for studying cross-pose face recognition in unconstrained environments. Beijing University of Posts and Telecommunications, Tech. Rep 5, 7

  35. Zheng T, Deng W, Hu J (2017) Cross-age LFW: a database for studying cross-age face recognition in unconstrained environments. Preprint at http://arxiv.org/abs/1708.08197

  36. Zhong Y, Deng W (2021) Face transformer for recognition. Preprint at http://arxiv.org/abs/2103.14803

  37. Zhou D, Kang B, Jin X, Yang L, Lian X, Jiang Z, Hou Q, Feng J (2021) Deepvit: towards deeper vision transformer. Preprint at http://arxiv.org/abs/2103.11886

Download references

Funding

This work was supported by the National Key Research and Development Program of China (2019YFB2204200) and National Natural Science Foundation of China(U1832217).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jiarui Li or Jie Chen.

Ethics declarations

Conflict of interest

Authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Zhou, L. & Chen, J. MobileFaceFormer: a lightweight face recognition model against face variations. Multimed Tools Appl 83, 12669–12685 (2024). https://doi.org/10.1007/s11042-023-15954-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15954-1

Keywords

Navigation