Toward durable representations for continual learning

El Khatib, Alaa; Karray, Fakhri

doi:10.1007/s43674-021-00022-8

Toward durable representations for continual learning

Original Article
Published: 17 December 2021

Volume 2, article number 7, (2022)
Cite this article

Advances in Computational Intelligence Aims and scope Submit manuscript

1054 Accesses
Explore all metrics

Abstract

Continual learning models are known to suffer from catastrophic forgetting. Existing regularization methods to countering forgetting operate by penalizing large changes to learned parameters. A significant downside to these methods, however, is that, by effectively freezing model parameters, they gradually suspend the capacity of a model to learn new tasks. In this paper, we explore an alternative approach to the continual learning problem that aims to circumvent this downside. In particular, we ask the question: instead of forcing continual learning models to remember the past, can we modify the learning process from the start, such that the learned representations are less susceptible to forgetting? To this end, we explore multiple methods that could potentially encourage durable representations. We demonstrate empirically that the use of unsupervised auxiliary tasks achieves significant reduction in parameter re-optimization across tasks, and consequently reduces forgetting, without explicitly penalizing forgetting. Moreover, we propose a distance metric to track internal model dynamics across tasks, and use it to gain insight into the workings of our proposed approach, as well as other recently proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

Hierarchically structured task-agnostic continual learning

Article Open access 28 December 2022

On Robustness of Generative Representations Against Catastrophic Forgetting

Online Continual Learning on Sequences

Notes

In this work, we consider image classification tasks.
Note the change in notation here: we used \(\hat{\mathbf {y}}\) earlier to denote the probability distribution. In this case, \(\mathbf {y}\) denotes the class variable.
A trivial case of this is a single layer network whose parameter vector is perpendicular to all input samples \(\mathbf {x}\). One can scale the parameter vector without affecting \({\theta } ^\top \mathbf {x}\).
Note that this corresponds to what is known in the literature as the “multi-head” setting.
We conjecture that our use of the simplified KL divergence measure described in section 4.1 may be obscuring some of the details of the behavior of LwF. We intend to explore this issue further in future work, using a per-task KL divergence measure.
Allowing Aux-1 to train for more epochs per task reduces its intransigence values. However, for a fair comparison, and due to limited computational resources, we limit all experiments to 400 iterations per task.

References

Aljundi R, Belilovsky E, Tuytelaars T, Charlin L, Caccia M, Lin M, Page-Caccia L (2019) Online continual learning with maximal interfered retrieval. In: Wallach H, Larochelle H, Beygelzimer A, d’ Alché-Buc F, Fox E, Garnett R (eds) Advances in neural information processing systems, vol 32. Curran Associates, Inc., pp 11849–11860
Chaudhry A, Dokania PK, Ajanthan T, Torr PHS (2018) Riemannian walk for incremental learning: understanding forgetting and intransigence. In: Proceedings of European conference on computer vision (ECCV), pp 556–572
Clanuwat T, Bober-Irizar M, Kitamoto A, Lamb A, Yamamoto K, Ha D (2018) Deep learning for classical Japanese literature. arXiv:1812.01718
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
El Khatib A, Karray F (2019) Preempting catastrophic forgetting in continual learning models by anticipatory regularization. In: 2019 International joint conference on neural networks (IJCNN), pp 1–7
French RM, Chater N (2002) Using noise to compute error surfaces in connectionist networks: a novel means of reducing catastrophic forgetting. Neural Comput 14:1–15
Article Google Scholar
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems, vol 27. Curran Associates, Inc, pp 2672–2680
Jo J, Bengio Y (2017) Measuring the tendency of CNNS to learn surface statistical regularities. CoRR. arXiv:1711.11561
Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, Milan K, Quan J, Ramalho T, Grabska-Barwinska A, Hassabis D, Clopath C, Kumaran D, Hadsell R (2017) Overcoming catastrophic forgetting in neural networks. Proc Natl Acad Sci 114(13):3521–3526
Article MathSciNet Google Scholar
Krizhevsky A (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Li Z, Hoiem D (2018) Learning without forgetting. IEEE Trans Pattern Anal Mach Intell 40(12):2935–2947
Article Google Scholar
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learning. In: NIPS workshop on deep learning and unsupervised feature learning
Ratcliff R (1990) Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychol Rev 97(2):285–308
Article Google Scholar
Robins A (1993) Catastrophic forgetting in neural networks: the role of rehearsal mechanisms. In: Proceedings of the first New Zealand international two-stream conference on artificial neural networks and expert systems, pp 65–68. https://doi.org/10.1109/ANNES.1993.323080
Rusu AA, Rabinowitz NC, Desjardins G, Soyer H, Kirkpatrick J, Kavukcuoglu K, Pascanu R, Hadsell R (2016) Progressive neural networks. CoRR. arXiv:1606.04671
Shin H, Lee JK, Kim J, Kim J (2017) Continual learning with deep generative replay. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol 30. Curran Associates, Inc., pp 2990–2999
Terekhov AV, Montone G, O’Regan JK (2015) Knowledge transfer in deep block-modular neural networks. In: Wilson SP, Verschure PFMJ, Mura A, Prescott TJ (eds) Biomimetic and biohybrid systems. Springer International Publishing, pp 268–279
Vaswani A, Shazeer N, Parmar M, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Zenke F, Poole B, Ganguli S (2017) Continual learning through synaptic intelligence. arXiv:1703.04200

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada
Alaa El Khatib & Fakhri Karray

Authors

Alaa El Khatib
View author publications
You can also search for this author in PubMed Google Scholar
Fakhri Karray
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alaa El Khatib.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

This work was supported in part by Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean government [20ZS1100, Core Technology Research for Self-Improving Integrated Artificial Intelligence System].

Rights and permissions

Reprints and permissions

About this article

Cite this article

El Khatib, A., Karray, F. Toward durable representations for continual learning. Adv. in Comp. Int. 2, 7 (2022). https://doi.org/10.1007/s43674-021-00022-8

Download citation

Received: 24 April 2021
Revised: 21 September 2021
Accepted: 10 October 2021
Published: 17 December 2021
DOI: https://doi.org/10.1007/s43674-021-00022-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Toward durable representations for continual learning

Abstract

Access this article

Similar content being viewed by others

Hierarchically structured task-agnostic continual learning

On Robustness of Generative Representations Against Catastrophic Forgetting

Online Continual Learning on Sequences

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Toward durable representations for continual learning

Abstract

Access this article

Similar content being viewed by others

Hierarchically structured task-agnostic continual learning

On Robustness of Generative Representations Against Catastrophic Forgetting

Online Continual Learning on Sequences

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation