Self-supervised Knowledge Distillation Using Singular Value Decomposition

  • Seung Hyun Lee
  • Dae Ha Kim
  • Byung Cheol SongEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11210)


To solve deep neural network (DNN)’s huge training dataset and its high computation issue, so-called teacher-student (T-S) DNN which transfers the knowledge of T-DNN to S-DNN has been proposed. However, the existing T-S-DNN has limited range of use, and the knowledge of T-DNN is insufficiently transferred to S-DNN. To improve the quality of the transferred knowledge from T-DNN, we propose a new knowledge distillation using singular value decomposition (SVD). In addition, we define a knowledge transfer as a self-supervised task and suggest a way to continuously receive information from T-DNN. Simulation results show that a S-DNN with a computational cost of 1/5 of the T-DNN can be up to 1.1% better than the T-DNN in terms of classification accuracy. Also assuming the same computational cost, our S-DNN outperforms the S-DNN driven by the state-of-the-art distillation with a performance advantage of 1.79%. code is available on


Statistical methods and learning Optimization methods Recognition: detection Categorization Indexing Matching 



This research was supported by National Research Foundation of Korea Grant funded by the Korean Government (2016R1A2B4007353).


  1. 1.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  2. 2.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  3. 3.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  4. 4.
    Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, p. 3 (2017)Google Scholar
  5. 5.
    Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5987–5995. IEEE (2017)Google Scholar
  6. 6.
    Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. arXiv preprint arXiv:1707.01083 (2017)
  7. 7.
    Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
  8. 8.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
  9. 9.
    Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014)
  10. 10.
    Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  11. 11.
    Alter, O., Brown, P.O., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. 97(18), 10101–10106 (2000)CrossRefGoogle Scholar
  12. 12.
    Zhang, Z., Ely, G., Aeron, S., Hao, N., Kilmer, M.: Novel methods for multilinear data completion and de-noising based on tensor-SVD. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3842–3849 (2014)Google Scholar
  13. 13.
    Ionescu, C., Vantzos, O., Sminchisescu, C.: Matrix backpropagation for deep networks with structured layers. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2965–2973 (2015)Google Scholar
  14. 14.
    Kim, N., Byun, H.G., Kwon, K.H.: Learning behaviors of stochastic gradient radial basis function network algorithms for odor sensing systems. ETRI J. 28(1), 59–66 (2006)CrossRefGoogle Scholar
  15. 15.
    Wang, X.X., Chen, S., Harris, C.J.: Using the correlation criterion to position and shape RBF units for incremental modelling. Int. J. Autom. Comput. 3(4), 392–403 (2006)CrossRefGoogle Scholar
  16. 16.
    Larsson, G., Maire, M., Shakhnarovich, G.: Learning representations for automatic colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016 Part IV. LNCS, vol. 9908, pp. 577–593. Springer, Cham (2016). Scholar
  17. 17.
    Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016 Part VI. LNCS, vol. 9910, pp. 69–84. Springer, Cham (2016). Scholar
  18. 18.
    Doersch, C., Zisserman, A.: Multi-task self-supervised visual learning. In: The IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  19. 19.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  20. 20.
    Zhou, X., Belkin, M.: Semi-supervised learning. In: Academic Press Library in Signal Processing, vol. 1, pp. 1239–1269. Elsevier (2014)Google Scholar
  21. 21.
    Su, H., Zhu, J., Yin, Z., Dong, Y., Zhang, B.: Efficient and robust semi-supervised learning over a sparse-regularized graph. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016 Part VIII. LNCS, vol. 9912, pp. 583–598. Springer, Cham (2016). Scholar
  22. 22.
    Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, pp. 1310–1318 (2013)Google Scholar
  23. 23.
    Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015).
  24. 24.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)Google Scholar
  25. 25.
    Kiefer, J., Wolfowitz, J.: Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23, 462–466 (1952)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Nesterov, N.: A method for unconstrained convex minimization problem with the rate of convergence o (1/k\(^{\hat{\,}}\)2). Doklady AN USSR 269, 543–547 (1983)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Inha UniversityIncheonRepublic of Korea

Personalised recommendations