Skip to main content

Self-Referenced Deep Learning

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11362))

Abstract

Knowledge distillation is an effective approach to transferring knowledge from a teacher neural network to a student target network for satisfying the low-memory and fast running requirements in practice use. Whilst being able to create stronger target networks compared to the vanilla non-teacher based learning strategy, this scheme needs to train additionally a large teacher model with expensive computational cost. In this work, we present a Self-Referenced Deep Learning (SRDL) strategy. Unlike both vanilla optimisation and existing knowledge distillation, SRDL distils the knowledge discovered by the in-training target model back to itself to regularise the subsequent learning procedure therefore eliminating the need for training a large teacher model. SRDL improves the model generalisation performance compared to vanilla learning and conventional knowledge distillation approaches with negligible extra computational cost. Extensive evaluations show that a variety of deep networks benefit from SRDL resulting in enhanced deployment performance on both coarse-grained object categorisation tasks (CIFAR10, CIFAR100, Tiny ImageNet, and ImageNet) and fine-grained person instance identification tasks (Market-1501).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The computational cost of knowledge extraction required by both SRDL and Knowledge Distillation [15] is marginal (less than \(0.67\%\) model training cost) and hence omitted for analysis convenience.

References

  1. Ba, J., Caruana, R.: Do deep nets really need to be deep? In: NIPS (2014)

    Google Scholar 

  2. Bucilua, C., et al.: Model compression. In: SIGKDD. ACM (2006)

    Google Scholar 

  3. Bucilua, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: SIGKDD (2006)

    Google Scholar 

  4. Chang, X., Hospedales, T.M., Xiang, T.: Multi-level factorisation net for person re-identification. In: CVPR (2018)

    Google Scholar 

  5. Chen, D., Yuan, Z., Chen, B., Zheng, N.: Similarity learning with spatial constraints for person re-identification. In: CVPR (2016)

    Google Scholar 

  6. Chen, Y., Zhu, X., Gong, S., et al.: Person re-identification by deep learning multi-scale representations. In: ICCV Workshop (2017)

    Google Scholar 

  7. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. JMLR 12, 2121–2159 (2011)

    MathSciNet  MATH  Google Scholar 

  8. Erhan, D., Bengio, Y., Courville, A., Manzagol, P.A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? JMLR 11, 625–660 (2010)

    MathSciNet  MATH  Google Scholar 

  9. Furlanello, T., Lipton, Z.C., Tschannen, M., Itti, L., Anandkumar, A.: Born again neural networks (2018). arXiv e-prints

    Google Scholar 

  10. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)

    Google Scholar 

  11. Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: ICLR (2016)

    Google Scholar 

  12. He, K., et al.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: ICCV (2015)

    Google Scholar 

  14. Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification (2017). arXiv e-prints

    Google Scholar 

  15. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network (2015). arXiv e-prints

    Google Scholar 

  16. Huang, G., Li, Y., Pleiss, G., Liu, Z., Hopcroft, J.E., Weinberger, K.Q.: Snapshot ensembles: train 1, get M for free. In: ICLR (2017)

    Google Scholar 

  17. Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: CVPR (2017)

    Google Scholar 

  18. Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS (2013)

    Google Scholar 

  19. Keskar, N.S., et al.: On large-batch training for deep learning: generalization gap and sharp minima (2016). arXiv e-prints

    Google Scholar 

  20. Kingma, D., Ba, J.: Adam: a method for stochastic optimization (2014). arXiv e-prints

    Google Scholar 

  21. Krähenbühl, P., Doersch, C., Donahue, J., Darrell, T.: Data-dependent initializations of convolutional neural networks. In: ICLR (2016)

    Google Scholar 

  22. Krizhevsky, A., et al.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)

    Google Scholar 

  23. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

  24. Lan, X., Wang, H., Gong, S., Zhu, X.: Deep reinforcement learning attention selection for person re-identification (2017). arXiv e-prints

    Google Scholar 

  25. Lan, X., Zhu, X., Gong, S.: Knowledge distillation by on-the-fly native ensemble (2018). arXiv preprint: arXiv:1806.04606

  26. Lan, X., Zhu, X., Gong, S.: Person search by multi-scale matching. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part I. LNCS, vol. 11205, pp. 553–569. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_33

    Chapter  Google Scholar 

  27. Le, Y., Yang, X.: Tiny ImageNet visual recognition challenge. CS 231N (2015)

    Google Scholar 

  28. Li, D., Chen, X., Zhang, Z., Huang, K.: Learning deep context-aware features over body and latent parts for person re-identification. In: CVPR (2017)

    Google Scholar 

  29. Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient ConvNets. In: ICLR (2017)

    Google Scholar 

  30. Li, W., Zhu, X., Gong, S.: Person re-identification by deep joint learning of multi-loss classification. In: IJCAI (2017)

    Google Scholar 

  31. Li, W., Zhu, X., Gong, S.: Harmonious attention network for person re-identification. In: CVPR (2018)

    Google Scholar 

  32. Liu, X., et al.: HydraPlus-Net: attentive deep features for pedestrian analysis. In: ICCV (2017)

    Google Scholar 

  33. Lopez-Paz, D., Bottou, L., Schölkopf, B., Vapnik, V.: Unifying distillation and privileged information (2015). arXiv e-prints

    Google Scholar 

  34. Mishkin, D., Matas, J.: All you need is a good init. In: ICLR (2015)

    Google Scholar 

  35. Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part IV. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32

    Chapter  Google Scholar 

  36. Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets (2014). arXiv e-prints

    Google Scholar 

  37. Russakovsky, O., Deng, J., et al.: ImageNet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  38. Saxe, A.M., McClelland, J.L., Ganguli, S.: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks (2013). arXiv e-prints

    Google Scholar 

  39. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2015). arXiv e-prints

    Google Scholar 

  40. Su, C., Li, J., Zhang, S., Xing, J., Gao, W., Tian, Q.: Pose-driven deep convolutional model for person re-identification. In: ICCV (2017)

    Google Scholar 

  41. Sun, Y., Zheng, L., Deng, W., Wang, S.: SVDNet for pedestrian retrieval (2017). arXiv preprint

    Google Scholar 

  42. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., et al.: Going deeper with convolutions. In: CVPR (2015)

    Google Scholar 

  43. Vapnik, V., Izmailov, R.: Learning using privileged information: similarity control and knowledge transfer. JMLR 16(20232049), 55 (2015)

    MathSciNet  MATH  Google Scholar 

  44. Varior, R.R., Haloi, M., Wang, G.: Gated siamese convolutional neural network architecture for human re-identification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part VIII. LNCS, vol. 9912, pp. 791–808. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_48

    Chapter  Google Scholar 

  45. Wang, Y., et al.: Resource aware person re-identification across multiple resolutions. In: CVPR (2018)

    Google Scholar 

  46. Wang, Y., Chen, Z., Wu, F., Wang, G.: Person re-identification with cascaded pairwise convolutions (2018)

    Google Scholar 

  47. Welling, M., Teh, Y.W.: Bayesian learning via stochastic gradient Langevin dynamics. In: ICML (2011)

    Google Scholar 

  48. Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: CVPR (2017)

    Google Scholar 

  49. Zagoruyko, S., Komodakis, N.: Wide residual networks (2016). arXiv e-prints

    Google Scholar 

  50. Zeiler, M.D.: ADADELTA: an adaptive learning rate method (2012). arXiv e-prints

    Google Scholar 

  51. Zhang, B., Wang, L., Wang, Z., Qiao, Y., Wang, H.: Real-time action recognition with enhanced motion vector CNNs. In: CVPR (2016)

    Google Scholar 

  52. Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: ICCV (2015)

    Google Scholar 

Download references

Acknowledgements

This work was partly supported by the China Scholarship Council, Vision Semantics Limited, the Royal Society Newton Advanced Fellowship Programme (NA150459), and Innovate UK Industrial Challenge Project on Developing and Commercialising Intelligent Video Analytics Solutions for Public Safety (98111-571149).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xu Lan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lan, X., Zhu, X., Gong, S. (2019). Self-Referenced Deep Learning. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11362. Springer, Cham. https://doi.org/10.1007/978-3-030-20890-5_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20890-5_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20889-9

  • Online ISBN: 978-3-030-20890-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics