Skip to main content

Revisiting Distillation and Incremental Classifier Learning

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11366))

Abstract

One of the key differences between the learning mechanism of humans and Artificial Neural Networks (ANNs) is the ability of humans to learn one task at a time. ANNs, on the other hand, can only learn multiple tasks simultaneously. Any attempts at learning new tasks incrementally cause them to completely forget about previous tasks. This lack of ability to learn incrementally, called Catastrophic Forgetting, is considered a major hurdle in building a true AI system.

In this paper, our goal is to isolate the truly effective existing ideas for incremental learning from those that only work under certain conditions. To this end, we first thoroughly analyze the current state of the art (iCaRL) method for incremental learning and demonstrate that the good performance of the system is not because of the reasons presented in the existing literature. We conclude that the success of iCaRL is primarily due to knowledge distillation and recognize a key limitation of knowledge distillation, i.e., it often leads to bias in classifiers. Finally, we propose a dynamic threshold moving algorithm that is able to successfully remove this bias. We demonstrate the effectiveness of our algorithm on CIFAR100 and MNIST datasets showing near-optimal results. Our implementation is available at: https://github.com/Khurramjaved96/incremental-learning.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Wimber, M., Alink, A., Charest, I., Kriegeskorte, N., Anderson, M.C.: Retrieval induces adaptive forgetting of competing memories via cortical pattern suppression. Nat. Neurosci. 18(4), 582 (2015)

    Article  Google Scholar 

  2. Rebuffi, S.A., Kolesnikov, A., Sperl, G., Lampert, C.H.: iCaRL: incremental classifier and representation learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2001–2010 (2017)

    Google Scholar 

  3. Wu, Y., et al.: Incremental classifier learning with generative adversarial networks. CoRR abs/1802.00853 (2018)

    Google Scholar 

  4. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  5. McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. In: Psychology of Learning and Motivation, vol. 24, pp. 109–165. Elsevier (1989)

    Google Scholar 

  6. Ans, B., Rousset, S.: Avoiding catastrophic forgetting by coupling two reverberating neural networks. Comptes Rendus de l’Académie des Sciences-Series III-Sciences de la Vie 320(12), 989–997 (1997)

    Article  Google Scholar 

  7. French, R.M.: Catastrophic interference in connectionist networks: can it be predicted, can it be prevented? In: Advances in Neural Information Processing Systems, pp. 1176–1177 (1994)

    Google Scholar 

  8. French, R.M.: Catastrophic forgetting in connectionist networks. Trends Cogn. Sci. 3(4), 128–135 (1999)

    Article  Google Scholar 

  9. Goodfellow, I.J., Mirza, M., Xiao, D., Courville, A., Bengio, Y.: An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211 (2013)

  10. Mensink, T., Verbeek, J., Perronnin, F., Csurka, G.: Metric learning for large scale image classification: generalizing to new classes at near-zero cost. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, pp. 488–501. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33709-3_35

    Chapter  Google Scholar 

  11. Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 453–465 (2014)

    Article  Google Scholar 

  12. Li, Z., Hoiem, D.: Learning without Forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 2935–2947 (2018)

    Article  Google Scholar 

  13. Lopez-Paz, D., et al.: Gradient episodic memory for continual learning. In: Advances in Neural Information Processing Systems, pp. 6470–6479 (2017)

    Google Scholar 

  14. Venkatesan, R., Venkateswara, H., Panchanathan, S., Li, B.: A strategy for an uncompromising incremental learner. arXiv preprint arXiv:1705.00744 (2017)

  15. Rannen Triki, A., Aljundi, R., Blaschko, M.B., Tuytelaars, T.: Encoder based lifelong learning. arXiv preprint arXiv:1704.01920 (2017)

  16. Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1717–1724. IEEE (2014)

    Google Scholar 

  17. Zenke, F., Poole, B., Ganguli, S.: Improved multitask learning through synaptic intelligence. arXiv preprint arXiv:1703.04200 (2017)

  18. Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)

    Article  MathSciNet  Google Scholar 

  19. Buciluǎ, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 535–541. ACM (2006)

    Google Scholar 

  20. Kemker, R., Kanan, C.: FearNet: brain-inspired model for incremental learning. arXiv preprint arXiv:1711.10563 (2017)

  21. Rusu, A.A., et al.: Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016)

  22. Xiao, T., Zhang, J., Yang, K., Peng, Y., Zhang, Z.: Error-driven incremental learning in deep convolutional neural network for large-scale image classification. In: Proceedings of the 22nd ACM international conference on Multimedia, pp. 177–186. ACM (2014)

    Google Scholar 

  23. Paszke, A., et al.: Automatic differentiation in PyTorch (2017)

    Google Scholar 

  24. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)

    Google Scholar 

  25. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

  26. LeCun, Y.: The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/

  27. Buda, M., Maki, A., Mazurowski, M.A.: A systematic study of the class imbalance problem in convolutional neural networks. arXiv preprint arXiv:1710.05381 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khurram Javed .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Javed, K., Shafait, F. (2019). Revisiting Distillation and Incremental Classifier Learning. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11366. Springer, Cham. https://doi.org/10.1007/978-3-030-20876-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20876-9_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20875-2

  • Online ISBN: 978-3-030-20876-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics