MobileFaceFormer: a lightweight face recognition model against face variations

Li, Jiarui; Zhou, Li; Chen, Jie

doi:10.1007/s11042-023-15954-1

MobileFaceFormer: a lightweight face recognition model against face variations

Published: 30 June 2023

Volume 83, pages 12669–12685, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

255 Accesses
1 Citation
Explore all metrics

Abstract

In recent years, the study of lightweight models has been one of the most significant application streams of face recognition tasks due to practical life demands. However, typical lightweight face recognition models become less effective when dealing with large face feature variations (e.g. age variation, pose variation). In this paper, we present a lightweight face recognition model, namely MobileFaceFormer. It takes advantage of both convolutional neural networks’ (CNNs) effectiveness in capturing local features and visual transformers’ effectiveness in computing global dependencies for more enriched and abundant interpretations of facial features. To achieve this, both CNN branches and visual transformer branches are parallelized, and a bi-directional feature fusion bridge connecting dual branches is designed to concurrently retain local facial features and global facial interpretations. To enhance feature interpretations on dual branch, a convolutional token initialization method is proposed at transformer branch to perceive long-range facial information, also depthwise separable convolution and attention mechanisms are adopted at CNN branch to enhance local facial feature extraction. Further, an attentive global depthwise convolution (AGDC) is proposed to encourage the concentration of key facial information. Experiments across state-of-the-art FR datasets show MobileFaceFormer achieves higher recognition performance, e.g. MobileFaceFormer achieves 99.60% at LFW dataset, compared to 99.28 % of MobileFaceNets; Meanwhile, MobileFaceFormer shows more lightweight model complexity, e.g. in terms of computation cost, MobileFaceFormer has 65M Multiply-Accumulate Operations (MAdds) than 221M of MobileFaceNets under similar parameter sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Lightweight Attention Model for Face Recognition

Deep Learning in Face Recognition Across Variations in Pose and Illumination

A lightweight CNN-based algorithm and implementation on embedded system for real-time face recognition

Article 10 August 2022

Data availability

The data are available from the corresponding author upon reasonable request.

References

Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European Conference on Computer Vision. Springer, pp 213–229
Chen B, Li P, Li B, Li C, Bai L, Lin C, Sun M, Yan J, Ouyang W (2021) PSVIT: better vision transformer via token pooling and attention sharing. Preprint at http://arxiv.org/abs/2108.03428
Chen S, Liu Y, Gao X, Han Z (2018) Mobilefacenets: efficient CNNs for accurate real-time face verification on mobile devices. In: Chinese Conference on Biometric Recognition. Springer, pp 428–438
Chen Y, Dai X, Chen D, Liu M, Dong X, Yuan L, Liu Z (2022) Mobile-former: bridging mobilenet and transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 5270–5279
d’Ascoli S, Touvron H, Leavitt ML, Morcos AS, Biroli G, Sagun L (2021) Convit: improving vision transformers with soft convolutional inductive biases. In: International Conference on Machine Learning, PMLR. pp. 2286–2296
Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 4690–4699
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. Preprint at http://arxiv.org/abs/2010.11929
Duong CN, Quach KG, Jalata I, Le N, Luu K (2019) Mobiface: a lightweight deep learning face recognition on mobile devices. In: 2019 IEEE 10th international conference on biometrics theory, applications and systems (BTAS). IEEE, pp 1–6
Graham B, El-Nouby A, Touvron H, Stock P, Joulin A, Jégou H, Douze M (2021) Levit: a vision transformer in convnet’s clothing for faster inference. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 12259–12269
Guo J. Datasetzoo. https://github.com/deepinsight/insightface/wiki/Dataset-Zoo. Accessed 21 Nov 2021
Guo Y, Zhang L, Hu Y, He X, Gao J (2016) Ms-celeb-1m: a dataset and benchmark for large-scale face recognition. In: European Conference on Computer Vision. Springer, pp 87–102
Han K, Xiao A, Wu E, Guo J, Xu C, Wang Y (2021) Transformer in transformer. Adv Neural Inf Process Syst 34:15908–15919
Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 770–778
Hinton G, Vinyals O, Dean J et al (2015) Distilling the knowledge in a neural network. Preprint at http://arxiv.org/abs/1503.02531
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 7132–7141
Huang GB, Mattar M, Berg T, Learned-Miller E (2008) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. In: Workshop on Faces in’Real-Life’Images: Detection, Alignment, and Recognition
Li X, Wang F, Hu Q, Leng C (2019) Airface: lightweight and efficient model for face recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. pp 0–0
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 10012–10022
Luo P, Zhu Z, Liu Z, Wang X, Tang X (2016) Face model compression by distilling knowledge from neurons. In: Thirtieth AAAI Conference on Artificial Intelligence
Moschoglou S, Papaioannou A, Sagonas C, Deng J, Kotsia I, Zafeiriou S (2017) AGEDB: the first manually collected, in-the-wild age database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. pp 51–59
Peng Z, Huang W, Gu S, Xie L, Wang Y, Jiao J, Ye Q (2021) Conformer: local features coupling global representations for visual recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 367–376
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 4510–4520
Sengupta S, Chen JC, Castillo C, Patel VM, Chellappa R, Jacobs DW (2016) Frontal to profile face verification in the wild. In: 2016 Winter Conference on Applications of Computer Vision (WACV). IEEE, pp 1–9
Srinivas A, Lin TY, Parmar N, Shlens J, Abbeel P, Vaswani A (2021) Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 16519–16529
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Proces Syst 30
Wang W, Xie E, Li X, Fan DP, Song K, Liang D, Lu T, Luo P, Shao L (2021) Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 568–578
Woo S, Park J, Lee JY, Kweon IS (2018) CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV). pp 3–19
Wu H, Xiao B, Codella N, Liu M, Dai X, Yuan L, Zhang L (2021) CVT: introducing convolutions to vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 22–31
Yi D, Lei Z, Liao S, Li SZ (2014) Learning face representation from scratch. Preprint at http://arxiv.org/abs/1411.7923
Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang ZH, Tay FE, Feng J, Yan S (2021) Tokens-to-token VIT: training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 558–567
Zhang J (2019) SeesawFaceNets: sparse and robust face verification model for mobile platform. Preprint at http://arxiv.org/abs/1908.09124
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23:1499–1503
Article Google Scholar
Zhang Q, Yang YB (2021) Rest: an efficient transformer for visual recognition. Adv Neural Inf Process Syst 34:15475–15485
Google Scholar
Zheng T, Deng W (2018) Cross-pose LFW: a database for studying cross-pose face recognition in unconstrained environments. Beijing University of Posts and Telecommunications, Tech. Rep 5, 7
Zheng T, Deng W, Hu J (2017) Cross-age LFW: a database for studying cross-age face recognition in unconstrained environments. Preprint at http://arxiv.org/abs/1708.08197
Zhong Y, Deng W (2021) Face transformer for recognition. Preprint at http://arxiv.org/abs/2103.14803
Zhou D, Kang B, Jin X, Yang L, Lian X, Jiang Z, Hou Q, Feng J (2021) Deepvit: towards deeper vision transformer. Preprint at http://arxiv.org/abs/2103.11886

Download references

Funding

This work was supported by the National Key Research and Development Program of China (2019YFB2204200) and National Natural Science Foundation of China(U1832217).

Author information

Authors and Affiliations

Institute of Microelectronics, Chinese Academy of Sciences, Beijing, 100029, China
Jiarui Li, Li Zhou & Jie Chen
University of Chinese Academy of Sciences, Beijing, 100049, China
Jiarui Li

Authors

Jiarui Li
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jie Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jiarui Li or Jie Chen.

Ethics declarations

Conflict of interest

Authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, J., Zhou, L. & Chen, J. MobileFaceFormer: a lightweight face recognition model against face variations. Multimed Tools Appl 83, 12669–12685 (2024). https://doi.org/10.1007/s11042-023-15954-1

Download citation

Received: 28 September 2022
Revised: 25 April 2023
Accepted: 29 May 2023
Published: 30 June 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11042-023-15954-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MobileFaceFormer: a lightweight face recognition model against face variations

Abstract

Access this article

Similar content being viewed by others

A Lightweight Attention Model for Face Recognition

Deep Learning in Face Recognition Across Variations in Pose and Illumination

A lightweight CNN-based algorithm and implementation on embedded system for real-time face recognition

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MobileFaceFormer: a lightweight face recognition model against face variations

Abstract

Access this article

Similar content being viewed by others

A Lightweight Attention Model for Face Recognition

Deep Learning in Face Recognition Across Variations in Pose and Illumination

A lightweight CNN-based algorithm and implementation on embedded system for real-time face recognition

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation