Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Learning an Evolutionary Embedding via Massive Knowledge Distillation

  • 80 Accesses


Knowledge distillation methods aim at transferring knowledge from a large powerful teacher network to a small compact student one. These methods often focus on close-set classification problems and matching features between teacher and student networks from a single sample. However, many real-world classification problems are open-set. This paper proposes an Evolutionary Embedding Learning (EEL) framework to learn a fast and accurate student network for open-set problems via massive knowledge distillation. First, we revisit the formulation of canonical knowledge distillation and make it suitable for the open-set problems with massive classes. Second, by introducing an angular constraint, a novel correlated embedding loss (CEL) is proposed to match embedding spaces between the teacher and student network from a global perspective. Lastly, we propose a simple yet effective paradigm towards a fast and accurate student network development for knowledge distillation. We show the possibility to implement an accelerated student network without sacrificing accuracy, compared with its teacher network. The experimental results are quite encouraging. EEL achieves better performance with other state-of-the-art methods for various large-scale open-set problems, including face recognition, vehicle re-identification and person re-identification.

This is a preview of subscription content, log in to check access.

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Fig. 1
Fig. 2
Fig. 3


  1. 1.


  1. Ba, J., & Caruana, R. (2014). Do deep nets really need to be deep? In NeurIPS.

  2. Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., & Shah, R. (1994). Signature verification using a“ siamese” time delay neural network. In NeurIPS.

  3. Chen, Y., Wang, N., & Zhang, Z. (2018). Darkrank: Accelerating deep metric learning via cross sample similarities transfer. In AAAI.

  4. Chen, J., Yi, D., Yang, J., Zhao, G., Li, S.Z., & Pietikainen, M. (2009) Learning mappings for face synthesis from near infrared to visual light images. In CVPR.

  5. Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., & Bengio, Y. (2016). Binarized neural networks: Training deep neural networks with weights and activations constrained to + 1 or - 1. In NeurIPS.

  6. Czarnecki, W. M., Osindero, S., Jaderberg, M., Swirszcz, G., & Pascanu, R. (2017). Sobolev training for neural networks. In NeurIPS.

  7. Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR.

  8. Ding, S., Lin, L., Wang, G., & Chao, H. Y. (2015). Deep feature learning with relative distance comparison for person re-identification. Pattern Recognition, 48, 2993.

  9. Guo, Y., Zhang, L., Hu, Y., He, X., & Gao, J. (2016). Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In ECCV.

  10. Guo, H., Zhao, C., Liu, Z., Wang, J., & Lu, H. (2018). Learning coarse-to-fine structured feature embedding for vehicle re-identification. In AAAI.

  11. Han, S., Mao, H., & Dally, W. J. (2016). Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In ICLR.

  12. Han, S., Pool, J., Tran, J., & Dally, W. J. (2015). Learning both weights and connections for efficient neural network. In NeurIPS.

  13. He, R., Wu, X., Sun, Z., & Tan, T. (2017). Learning invariant deep representation for nir-vis face recognition. In AAAI.

  14. He, R., Wu, X., Sun, Z., & Tan, T. (2018). Wasserstein CNN: Learning invariant features for NIR-VIS face recognition. In TPAMI.

  15. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.

  16. Hinton, G. E., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. In NeurIPS workshop.

  17. Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., et al. (2017). Mobilenets: Efficientconvolutional neural networks for mobile vision applications. CoRR arXiv:1704.04861.

  18. Huang, Z., & Wang, N. (2017). Like what you like: Knowledge distill via neuron selectivity transfer. CoRR arXiv:1707.01219.

  19. Huang, C., Loy, C. C., & Tang, X. (2016). Local similarity-aware deep feature embedding. In NeurIPS.

  20. Huang, G. B., Ramesh, M., Berg, T., & Learned-Miller, E. (2007). Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst.

  21. Huang, D., Sun, J., & Wang, Y. (2012). The BUAA-VisNir face database instructions. Technical Report IRIP-TR-12-FR-001, Beihang University, Beijing, China

  22. Iandola, F. N., Moskewicz, M. W., Ashraf, K., Han, S., Dally, W. J., & Keutzer, K. (2016). Squeezenet: Alexnet-level accuracy with 50x fewer parameters and \(<\)1 mb model size. CoRR arXiv:1602.07360.

  23. Kemelmacher-Shlizerman, I., Seitz, S. M., Miller, D., & Brossard, E. (2016). The megaface benchmark: 1 million faces for recognition at scale. In CVPR.

  24. Kim, J., Park, S., & Kwak, N. (2018). Paraphrasing complex network: Network compression via factor transfer. In NeurIPS.

  25. Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. In ICLR.

  26. LeCun, Y., Cortes, C., & Burges, C. (2010). Mnist handwritten digit database. AT&T Labs [Online]. Available .

  27. Li, H., Kadav, A., Durdanovic, I., Samet, H., & Graf, H. P. (2017). Pruning filters for efficient convnets. In: ICLR.

  28. Li, S. Z., Yi, D., Lei, Z., & Liao, S. (2013). The casia nir-vis 2.0 face database. In CVPR workshops.

  29. Liao, S., Lei, Z., Yi, D., Li, & S. Z. (2014). A benchmark study of large-scale unconstrained face recognition. In IJCB.

  30. Liu, Y., Cao, J., Li, B., Yuan, Y., Hu, W., Li, Y., & Duan, Y. (2019). Knowledge distillation via instance relationship graph. In CVPR.

  31. Liu, X., Liu, W., Ma, H., & Fu, H. (2016). Large-scale vehicle re-identification in urban surveillance videos. In ICME.

  32. Liu, X., Song, L., Wu, X., & Tan, T. (2016). Transferring deep representation for nir-vis heterogeneous face recognition. In ICB.

  33. Liu, H., Tian, Y., Wang, Y., Pang, L., & Huang, T. (2016). Deep relative distance learning: Tell the difference between similar vehicles. In CVPR.

  34. Liu, W., Wen, Y., Yu, Z., & Yang, M. (2016). Large-margin softmax loss for convolutional neural networks. In ICML.

  35. Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., & Song, L. (2017). Sphereface: Deep hypersphere embedding for face recognition. In CVPR.

  36. Liu, X., Liu, W., Mei, T., & Ma, H. (2018). Provid: Progressive and multimodal vehicle reidentification for large-scale urban surveillance. IEEE Transactions on Multimedia, 20, 645–658.

  37. Luo, H., Gu, Y., Liao, X., Lai, S., & Jiang, W. (2019). Bag of tricks and a strong baseline for deep person re-identification. In CVPR Workshops.

  38. Luo, J. H., Wu, J., & Lin, W. (2017). Thinet: A filter level pruning method for deep neural network compression. In ICCV.

  39. Luo, P., Zhu, Z., Liu, Z., Wang, X., & Tang, X. (2016). Face model compression by distilling knowledge from neurons. In AAAI.

  40. Molchanov, P., Tyree, S., Karras, T., Aila, T., & Kautz, J. (2017). Pruning convolutional neural networks for resource efficient inference. In ICLR.

  41. Ng, H., & Winkler, S. (2014). A data-driven approach to cleaning large face datasets. In ICIP.

  42. Park, W., Kim, D., Lu, Y., & Cho, M. (2019). Relational knowledge distillation. In CVPR.

  43. Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. In BMVC.

  44. Passalis, N., & Tefas, A. (2018). Learning deep representations with probabilistic knowledge transfer. In ECCV.

  45. Ranjan, R., Castillo, C. D., & Chellappa, R. (2017). L2-constrained softmax loss for discriminative face verification. CoRR arXiv:1703.09507.

  46. Rastegari, M., Ordonez, V., Redmon, J., & Farhadi, A. (2016). Xnor-net: Imagenet classification using binary convolutional neural networks. In ECCV.

  47. Ristani, E., Solera, F., Zou, R., Cucchiara, R., & Tomasi, C. (2016). Performance measures and a data set for multi-target, multi-camera tracking. In ECCV workshop.

  48. Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., & Bengio, Y. (2015). Fitnets: Hints for thin deep nets. In ICLR.

  49. Sandler, M., Howard, A. G., Zhu, M., Zhmoginov, A., & Chen, L. (2018). Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. CoRR arXiv:1801.04381.

  50. Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In CVPR.

  51. Sohn, K. (2016). Improved deep metric learning with multi-class n-pair loss objective. In NeurIPS.

  52. Song, C., Huang, Y., Ouyang, W., & Wang, L. (2018). Mask-guided contrastive attention model for person re-identification. In CVPR.

  53. Song, H. O., Jegelka, S., Rathod, V., & Murphy, K. (2017). Deep metric learning via facility location. In CVPR.

  54. Song, H.O., Xiang, Y., Jegelka, S., & Savarese, S. (2016). Deep metric learning via lifted structured feature embedding. In CVPR.

  55. Sun, Y., Chen, Y., Wang, X., & Tang, X. (2014). Deep learning face representation by joint identification-verification. In NeurIPS.

  56. Sun, Y., Wang, X., & Tang, X. (2015). Deeply learned face representations are sparse, selective, and robust. In CVPR.

  57. Sun, Y., Zheng, L., Deng, W., & Wang, S. (2017). Svdnet for pedestrian retrieval. In ICCV.

  58. Wang, F., Xiang, X., Cheng, J., & Yuille, A. L. (2017). Normface: \(\text{L}_{2}\) hypersphere embedding for face verification. In ACM MM.

  59. Wang, C., Zhang, Q., Huang, C., Liu, W., & Wang, X. (2018). Mancs: A multi-task attentional network with curriculum sampling for person re-identification. In: ECCV.

  60. Wang, J., Zhou, F., Wen, S., Liu, X., & Lin, Y. (2017). Deep metric learning with angular loss. In ICCV.

  61. Wen, Y., Zhang, K., Li, Z., & Qiao, Y. (2016). A discriminative feature learning approach for deep face recognition. In ECCV.

  62. Wu, X., Song, L., He, R., & Tan, T. (2018). Coupled deep learning for heterogeneous face recognition. In AAAI.

  63. Wu, X., He, R., Sun, Z., & Tan, T. (2018). A light CNN for deep face representation with noisy labels. IEEE Transactions on Information Forensics and Security, 13, 2884–2896.

  64. Yi, D., Lei, Z., Liao, S., & Li, S. Z. (2014). Learning face representation from scratch. CoRR arXiv:1411.7923.

  65. Yim, J., Joo, D., Bae, J., & Kim, J. (2017). A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In CVPR.

  66. Yuan, Y., Yang, K., & Zhang, C. (2017). Hard-aware deeply cascaded embedding. In ICCV.

  67. Zagoruyko, S., & Komodakis, N. (2017). Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In ICLR.

  68. Zhang, Z., Lan, C., Zeng, W., & Chen, Z. (2019). Densely semantically aligned person re-identification. In CVPR.

  69. Zhang, Y., Xiang, T., Hospedales, T. M., & Lu, H. (2018). Deep mutual learning. In: CVPR.

  70. Zhang, X., Zhou, X., Lin, M., & Sun, J. (2017). Shufflenet: An extremely efficient convolutional neural network for mobile devices. CoRR arXiv:1707.01083.

  71. Zhang, R., Lin, L., Zhang, R., Zuo, W., & Zhang, L. (2015). Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification. IEEE Transactions on Image Processing, 24, 4766–4779.

  72. Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A benchmark. In ICCV.

  73. Zheng, Z., Yang, X., Yu, Z., Zheng, L., Yang, Y., & Kautz, J. (2019). Joint discriminative and generative learning for person re-identification. In CVPR.

  74. Zheng, Z., Zheng, L., & Yang, Y. (2017). Pedestrian alignment network for large-scale person re-identification. CoRR arXiv:1707.00408.

  75. Zheng, Z., Zheng, L., & Yang, Y. (2017). Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In ICCV.

  76. Zhong, Z., Zheng, L., Zheng, Z., Li, S., & Yang, Y. (2018). Camera style adaptation for person re-identification. In CVPR.

  77. Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2017). Learning transferable architectures for scalable image recognition. CoRR arXiv:1707.07012.

Download references


This work was supported in part by the National Natural Science Foundation of China under Grant 61622310, Grant 61721004, and in part by the Beijing Natural Science Foundation Grant JQ18017.

Author information

Correspondence to Ran He.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by Li Liu, Matti Pietikäinen, Jie Qin, Jie Chen, Wanli Ouyang, Luc Van Gool.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wu, X., He, R., Hu, Y. et al. Learning an Evolutionary Embedding via Massive Knowledge Distillation. Int J Comput Vis (2020).

Download citation


  • Knowledge distillation
  • Open-set problem
  • Face recognition
  • Vehicle re-identification
  • Person re-identification