Skip to main content

Lifelong Learning via Progressive Distillation and Retrospection

Part of the Lecture Notes in Computer Science book series (LNIP,volume 11207)


Lifelong learning aims at adapting a learned model to new tasks while retaining the knowledge gained earlier. A key challenge for lifelong learning is how to strike a balance between the preservation on old tasks and the adaptation to a new one within a given model. Approaches that combine both objectives in training have been explored in previous works. Yet the performance still suffers from considerable degradation in a long sequence of tasks. In this work, we propose a novel approach to lifelong learning, which tries to seek a better balance between preservation and adaptation via two techniques: Distillation and Retrospection. Specifically, the target model adapts to the new task by knowledge distillation from an intermediate expert, while the previous knowledge is more effectively preserved by caching a small subset of data for old tasks. The combination of Distillation and Retrospection leads to a more gentle learning curve for the target model, and extensive experiments demonstrate that our approach can bring consistent improvements on both old and new tasks (Project page:


  • Lifelong learning
  • Knowledge distillation
  • Retrospection

S. Hou, X. Pan—Indicates joint first authorship.

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-01219-9_27
  • Chapter length: 16 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   99.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-01219-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   129.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.


  1. 1.

    The regularization terms are omitted for simplicity.

  2. 2.

    The results with Encoder-based-LwF in Table 5 are from our re-implementation, which basically agree with those in [20]. The models in [20] are implemented with MatConvnet [25] and the data augmentation is adopted when recording the output of Original CNN. Besides the case of five-task scenario, we also take the experiments in the two-task scenario, which are provided in the supplementary material.



  2. Aljundi, R., Babiloni, F., Elhoseiny, M., Rohrbach, M., Tuytelaars, T.: Memory aware synapses: learning what (not) to forget. arXiv preprint arXiv:1711.09601 (2017)

  3. Aljundi, R., Chakravarty, P., Tuytelaars, T.: Expert gate: lifelong learning with a network of experts. In: CVPR (2017)

    Google Scholar 

  4. Caruana, R.: Multitask learning. In: Thrun, S., Pratt, L. (eds.) Learning to Learn, pp. 95–133. Springer, Heidelberg (1998).

    CrossRef  Google Scholar 

  5. Chen, T., Goodfellow, I., Shlens, J.: Net2Net: accelerating learning via knowledge transfer. In: ICLR (2016)

    Google Scholar 

  6. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)

    Google Scholar 

  7. Donahue, J., et al.: DeCAF: a deep convolutional activation feature for generic visual recognition. In: ICML (2014)

    Google Scholar 

  8. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)

    Google Scholar 

  9. Goodfellow, I.J., Mirza, M., Xiao, D., Courville, A., Bengio, Y.: An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211 (2013)

  10. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  11. Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)

  12. Jung, H., Ju, J., Jung, M., Kim, J.: Less-forgetful learning for domain expansion in deep neural networks. In: AAAI (2018)

    Google Scholar 

  13. Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)

    MathSciNet  CrossRef  Google Scholar 

  14. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)

    Google Scholar 

  15. Li, Z., Hoiem, D.: Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell., 1 (2018)

    Google Scholar 

  16. Maji, S., Kannala, J., Rahtu, E., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. Technical report (2013)

    Google Scholar 

  17. Mitchell, T.M.: The need for biases in learning generalizations. Department of Computer Science, Laboratory for Computer Science Research, Rutgers University, New Jersey (1980)

    Google Scholar 

  18. Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing (2008)

    Google Scholar 

  19. Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: CVPR (2009)

    Google Scholar 

  20. Rannen Ep Triki, A., Aljundi, R., Blaschko, M., Tuytelaars, T.: Encoder based lifelong learning. In: ICCV (2017)

    Google Scholar 

  21. Rebuffi, S.A., Kolesnikov, A., Lampert, C.H.: iCaRL: incremental classifier and representation learning. In: CVPR (2017)

    Google Scholar 

  22. Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: hints for thin deep nets. In: ICLR (2015)

    Google Scholar 

  23. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  24. Tzeng, E., Hoffman, J., Darrell, T., Saenko, K.: Simultaneous deep transfer across domains and tasks. In: ICCV (2015)

    Google Scholar 

  25. Vedaldi, A., Lenc, K.: MatConvNet - convolutional neural networks for MATLAB. In: ACM Multimedia (2015)

    Google Scholar 

  26. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD birds-200-2011 dataset. Technical report CNS-TR-2011-001, California Institute of Technology (2011)

    Google Scholar 

  27. Zenke, F., Poole, B., Ganguli, S.: Continual learning through synaptic intelligence. In: ICML (2017)

    Google Scholar 

Download references


This work is partially supported by the NSFC under Grant 61673362, Youth Innovation Promotion Association CAS, and the Fundamental Research Funds for the Central Universities. This work is also partially supported by the Big Data Collaboration Research grant from SenseTime Group (CUHK Agreement No. TS1610626), the General Research Fund (GRF) of Hong Kong (No. 14236516, 14241716, 14224316, 14209217).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Saihui Hou .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 272 KB)

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Hou, S., Pan, X., Loy, C.C., Wang, Z., Lin, D. (2018). Lifelong Learning via Progressive Distillation and Retrospection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science(), vol 11207. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01218-2

  • Online ISBN: 978-3-030-01219-9

  • eBook Packages: Computer ScienceComputer Science (R0)