Distilling Face Recognition Models Trained Using Margin-Based Softmax Function

Svitov, D. V.; Alyamkin, S. A.

doi:10.1134/S00051179220100046

Distilling Face Recognition Models Trained Using Margin-Based Softmax Function

THEMATIC ISSUE
Published: 20 December 2022

Volume 83, pages 1517–1526, (2022)
Cite this article

Automation and Remote Control Aims and scope Submit manuscript

D. V. Svitov^1,2 &
S. A. Alyamkin¹

53 Accesses
1 Citation
Explore all metrics

Abstract

The use of convolutional neural networks trained with the margin-based softmax function allows achieving the highest accuracy in the face recognition problem. The development of embedded systems such as smart intercoms has increased interest in lightweight neural networks. Thus, lightweight neural network models, trained using the margin-based softmax function, were proposed for the face identification problem. In the present paper, we propose a distillation method that allows obtaining greater accuracy than other methods for the face recognition problem on LFW, AgeDB-30, and Megaface datasets. The main idea of our approach is to use the class centers of the teacher network to initialize the student network. Then the student network is trained to produce biometric vectors the angles from which to the class centers are equal to the angles in the teacher network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

REFERENCES

Chen, S., Liu, Y., Gao, X., and Han, Z., Mobilefacenets: Efficient CNNs for accurate real-time face verification on mobile devices, in Chin. Conf. Biometric Recognit., Cham: Springer, 2018, pp. 428–438.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.C., Mobilenetv2: Inverted residuals and linear bottlenecks, Proc. IEEE Conf. Comput. Vis. Pattern Recognit., (2018), pp. 4510–4520.
Deng, J., Guo, J., Xue, N., and Zafeiriou, S., Arcface: Additive angular margin loss for deep face recognition, Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (2019), pp. 4690–4699.
Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., and Song, L., Sphereface: Deep hypersphere embedding for face recognition, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2017), pp. 212–220.
Wang, H., Wang, Y., Zhou, Z., Ji, X., Gong, D., Zhou, J., Li, Z., and Liu, W., Cosface: Large margin cosine loss for deep face recognition, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2018), pp. 5265–5274.
Huang, G.B., Mattar, M., Berg, T., and Learned-Miller, E., Labeled faces in the wild: A database for studying face recognition in unconstrained environments, Workshop Faces in ‘Real-Life’ Images: Detection, Alignment, and Recognition (2008).
Moschoglou, S., Papaioannou, A., Sagonas, C., Deng, J., Kotsia, I., and Zafeiriou, S., Agedb: The first manually collected, in-the-wild age database, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops (2017), pp. 51–59.
Kemelmacher-Shlizerman, I., Seitz, S.M., Miller, D., and Brossard, E., The megaface benchmark: 1 million faces for recognition at scale, Proc. IEEE Conf. Comput. Vis. PatternRecognit. (2016), pp. 4873–4882.
Hinton, G., Vinyals, O., and Dean, J., Distilling the knowledge in a neural network, 2015. arXiv:1503.02531.
Fukuda, T., Suzuki, M., Kurata, G., Thomas, S., Cui, J., and Ramabhadran, B., Efficient knowledge distillation from an ensemble of teachers, Interspeech, 2017, pp. 3697–3701.
Sau, B.B. and Balasubramanian, V.N., Deep model compression: Distilling knowledge from noisy teachers, 2016. arXiv:1610.09650.
Furlanello, T., Lipton, Z., Tschannen, M., Itti, L., and Anandkumar, A., Born again neural networks, Int. Conf. Mach. Learn. PMLR (2018), pp. 1607–1616.
Huang, Z. and Wang, N., Like what you like: Knowledge distill via neuron selectivity transfer, 2017. arXiv:1707.01219.2017.
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., and Bengio, Y., Fitnets: Hints for thin deep nets, 2014. arXiv:1412.6550.
Chen, H., Wang, Y., Xu, C., Xu, C., and Tao, D., Learning student networks via feature embedding, IEEE Trans. Neural Networks Learn. Syst., 2020, vol. 32, no. 1, pp. 25–35.
Article Google Scholar
Park, W., Kim, D., Lu, Y., and Cho M., Relational knowledge distillation, Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (2019), pp. 3967–3976.
Feng, Y., Wang, H., Hu, H.R., Yu, L., Wang, W., and Wang S., Triplet distillation for deep face recognition, 2020 IEEE Int. Conf. Image Process. (ICIP) (2020), pp. 808–812.
Duong, C.N., Luu, K., Quach, K.G., and Le, N., Shrinkteanet: Million-scale lightweight face recognition via shrinking teacher–student networks, 2019. arXiv:1905.10620.
Nekhaev, D., Milyaev, S., and Laptev, I., Margin based knowledge distillation for mobile face recognition, in Twelfth Int. Conf. Mach. Vis. (ICMV 2019), Int. Soc. Opt. Photonics, 2020, vol. 11433, 114330O.
He, K., Zhang, X., Ren, S., and Sun, J., Deep residual learning for image recognition, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2016), pp. 770–778.
Zhang, K., Zhang, Z., Li, Z., and Qiao, Y., Joint face detection and alignment using multitask cascaded convolutional networks, IEEE Signal Process. Lett., 2016, vol. 23, no. 10, pp. 1499–1503.
Article Google Scholar
Guo, Y., Zhang, L., Hu, Y., He, X., and Gao, J., Ms-celeb-1m: A dataset and benchmark for large-scale face recognition, in Eur. Conf. Comput. Vis., Cham: Springer, 2016, pp. 87–102.
Ng, H.W. and Winkler, S., A data-driven approach to cleaning large face datasets, 2014 IEEE Int. Conf. Image Process. (ICIP) (2014), pp. 343–347.
Robbins, H. and Monro, S., A stochastic approximation method, Ann. Math. Stat., 1951, pp. 400–407.
Grabovoy, A.V. and Strijov, V.V., Bayesian distillation of deep learning models, Autom. Remote Control, 2021, vol. 82, no. 11, pp. 1846–1856.
Article MathSciNet MATH Google Scholar
Grabovoy, A.V. and Strijov, V.V., Probabilistic interpretation of the distillation problem, Autom. Remote Control, 2022, vol. 83, no. 1, pp. 123–137.
Article MathSciNet MATH Google Scholar
MarginDistillation: distillation for margin-based softmax. https://github.com/david-svitov/margindistillation. Accessed January 8, 2022.

Download references

Author information

Authors and Affiliations

Expasoft LLC, Novosibirsk, 630090, Russia
D. V. Svitov & S. A. Alyamkin
Institute of Automation and Electrometry of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, 630090, Russia
D. V. Svitov

Authors

D. V. Svitov
View author publications
You can also search for this author in PubMed Google Scholar
S. A. Alyamkin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to D. V. Svitov or S. A. Alyamkin.

Additional information

Translated by V. Potapchouck

Rights and permissions

Reprints and permissions

About this article

Cite this article

Svitov, D.V., Alyamkin, S.A. Distilling Face Recognition Models Trained Using Margin-Based Softmax Function. Autom Remote Control 83, 1517–1526 (2022). https://doi.org/10.1134/S00051179220100046

Download citation

Received: 10 January 2022
Revised: 08 May 2022
Accepted: 29 June 2022
Published: 20 December 2022
Issue Date: October 2022
DOI: https://doi.org/10.1134/S00051179220100046

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions