Abstract
In recent years, continual learning that is more in line with real-world scenarios has received more attention. In order to solve the catastrophic forgetting problem in continual learning, researchers have put forward various solutions, which are simply summarized into three types: network structure-based methods, rehearsal-based methods and regularization-based methods. Inspired by pseudo-rehearsal and regularization methods, we propose a novel Continual Learning Based on Knowledge Distillation and Representation Learning (KRCL) model, which employs Beta-VAE as a representation learning module to extract a shared representation of learned tasks. In addition, Beta-VAE is also used as a generative model to generate pseudo samples of historical task, and KRCL trains the pseudo samples of the previous tasks together with the data of the current task, and then combines the knowledge distillation process to extract the dark knowledge from the old task model to alleviate the catastrophic forgetting. We compare KRCL with the Finetune, LWF, IRCL and KRCL_real baseline methods on four benchmark datasets. The result shows that the KRCL model achieves state-of-the-art performance in standard continual learning tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aljundi, R., Babiloni, F., Elhoseiny, M., Rohrbach, M., Tuytelaars, T.: Memory aware synapses: learning what (not) to forget. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 144–161. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_9
Delange, M., et al.: A continual learning survey: defying forgetting in classification tasks. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3366–3385 (2021)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Higgins, I., et al.: beta-VAE: learning basic visual concepts with a constrained variational framework (2016)
Hinton, G., Vinyals, O., Dean, J., et al.: Distilling the knowledge in a neural network, 2(7). arXiv preprint arXiv:1503.02531 (2015)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
LeCun, Y.: The MNIST database of handwritten digits (1998). http://yann.lecun.com/exdb/mnist/
Li, Z., Hoiem, D.: Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 2935–2947 (2017)
Lopez-Paz, D., Ranzato, M.: Gradient episodic memory for continual learning. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Mallya, A., Lazebnik, S.: PackNet: adding multiple tasks to a single network by iterative pruning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7765–7773 (2018)
Masana, M., Liu, X., Twardowski, B., Menta, M., Bagdanov, A.D., van de Weijer, J.: Class-incremental learning: survey and performance evaluation on image classification. arXiv preprint arXiv:2010.15277 (2020)
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011)
Rebuffi, S.A., Kolesnikov, A., Sperl, G., Lampert, C.H.: iCaRL: incremental classifier and representation learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2001–2010 (2017)
Rusu, A.A., et al.: Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016)
Sokar, G., Mocanu, D.C., Pechenizkiy, M.: Learning invariant representation for continual learning. arXiv preprint arXiv:2101.06162 (2021)
Thrun, S.: Lifelong learning algorithms. In: Thrun, S., Pratt, L. (eds.) Learning to Learn, pp. 181–209. Springer, Boston (1998). https://doi.org/10.1007/978-1-4615-5529-2_8
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
Yoon, J., Yang, E., Lee, J., Hwang, S.J.: Lifelong learning with dynamically expandable networks. arXiv preprint arXiv:1708.01547 (2017)
Acknowledgement
This work was supported by the Science Foundation of China University of Petroleum, Beijing (No. 2462020YXZZ023).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, Xy., Liu, Jw., Li, Wt. (2022). Continual Learning Based on Knowledge Distillation and Representation Learning. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13532. Springer, Cham. https://doi.org/10.1007/978-3-031-15937-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-15937-4_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15936-7
Online ISBN: 978-3-031-15937-4
eBook Packages: Computer ScienceComputer Science (R0)