Continual prune-and-select: class-incremental learning with specialized subnetworks

Dekhovich, Aleksandr; Tax, David M.J.; Sluiter, Marcel H.F; Bessa, Miguel A.

doi:10.1007/s10489-022-04441-z

Continual prune-and-select: class-incremental learning with specialized subnetworks

Published: 13 January 2023

Volume 53, pages 17849–17864, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Aleksandr Dekhovich¹,
David M.J. Tax²,
Marcel H.F Sluiter¹ &
…
Miguel A. Bessa ORCID: orcid.org/0000-0002-6216-0355³

950 Accesses
2 Citations
7 Altmetric
Explore all metrics

Abstract

The human brain is capable of learning tasks sequentially mostly without forgetting. However, deep neural networks (DNNs) suffer from catastrophic forgetting when learning one task after another. We address this challenge considering a class-incremental learning scenario where the DNN sees test data without knowing the task from which this data originates. During training, Continual Prune-and-Select (CP&S) finds a subnetwork within the DNN that is responsible for solving a given task. Then, during inference, CP&S selects the correct subnetwork to make predictions for that task. A new task is learned by training available neuronal connections of the DNN (previously untrained) to create a new subnetwork by pruning, which can include previously trained connections belonging to other subnetwork(s) because it does not update shared connections. This enables to eliminate catastrophic forgetting by creating specialized regions in the DNN that do not conflict with each other while still allowing knowledge transfer across them. The CP&S strategy is implemented with different subnetwork selection strategies, revealing superior performance to state-of-the-art continual learning methods tested on various datasets (CIFAR-100, CUB-200-2011, ImageNet-100 and ImageNet-1000). In particular, CP&S is capable of sequentially learning 10 tasks from ImageNet-1000 keeping an accuracy around 94% with negligible forgetting, a first-of-its-kind result in class-incremental learning. To the best of the authors’ knowledge, this represents an improvement in accuracy above 10% when compared to the best alternative method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Revisiting Distillation and Incremental Classifier Learning

Avoiding Forgetting and Allowing Forward Transfer in Continual Learning via Sparse Networks

More Classifiers, Less Forgetting: A Generic Multi-classifier Paradigm for Incremental Learning

Data Availability

Not applicable

Code Availability

PyTorch [1] implementation of the code is available at: https://github.com/adekhovich/continual_prune_and_select

References

Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. In: Advances in neural information processing systems, vol 32
French RM (1999) Catastrophic forgetting in connectionist networks. Trends Cognit Sci 3 (4):128–135
Article Google Scholar
Goodfellow IJ, Mirza M, Xiao D, Courville A, Bengio Y (2014) An empirical investigation of catastrophic forgetting in gradient-based neural networks. In: 2nd International conference on learning representations, ICLR
Thrun S (1998) Lifelong learning algorithms. In: Learning to learn. Springer, Boston. MA, pp 181–209
Masana M, Liu X, Twardowski B, Menta M, Bagdanov AD, Van De Weijer J (2020) Class-incremental learning: survey and performance evaluation on image classification. IEEE Trans Pattern Anal Mach Intell
Rebuffi S-A, Kolesnikov A, Sperl G, Lampert CH (2017) icarl: incremental classifier and representation learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2001–2010
Shmelkov K, Schmid C, Alahari K (2017) Incremental learning of object detectors without catastrophic forgetting. In: Proceedings of the IEEE international conference on computer vision, pp 3400–3409
Zhang J, Zhang J, Ghosh S, Li D, Tasci S, Heck L, Zhang H, Kuo C-CJ (2020) Class-incremental learning via deep model consolidation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1131–1140
Michieli U, Zanuttigh P (2019) Incremental learning techniques for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision workshops
Yan S, Zhou J, Xie J, Zhang S, He X (2021) An em framework for online incremental learning of semantic segmentation. In: Proceedings of the 29th ACM international conference on multimedia, pp 3052–3060
Van De Ven GM, Siegelmann HT, Tolias AS (2020) Brain-inspired replay for continual learning with artificial neural networks. Nature Commun 11(1):1–14
Lerner Y, Honey CJ, Silbert LJ, Hasson U (2011) Topographic mapping of a hierarchy of temporal receptive windows using a narrated story. J Neurosci 31(8):2906–2915
Article Google Scholar
Zadbood A, Chen J, Leong YC, Norman KA, Hasson U (2017) How we transmit memories to other brains: constructing shared neural representations via communication. Cerebral cortex 27(10):4988–5000
Article Google Scholar
Huttenlocher PR (1990) Morphometric study of human cerebral cortex development. Neuropsychologia 28(6):517–527
Article Google Scholar
Lennie P (2003) The cost of cortical computation. Current Biol 13(6):493–497
Article Google Scholar
Attwell D, Laughlin SB (2001) An energy budget for signaling in the grey matter of the brain. J Cerebral Blood Flow Metabolism 21(10):1133–1145
Article Google Scholar
Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. Ieee, pp 248–255
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset. Tech Report CNS-TR-2011-001, California institute of technology
Delange M, Aljundi R, Masana M, Parisot S, Jia X, Leonardis A, Slabaugh G, Tuytelaars T (2021) A continual learning survey: defying forgetting in classification tasks. IEEE Trans Pattern Anal Mach Intell
Zenke F, Poole B, Ganguli S (2017) Continual learning through synaptic intelligence. In: International conference on machine learning. PMLR, pp 3987–3995
Li Z, Hoiem D (2017) Learning without forgetting. IEEE Trans Pattern Anal Mach Intell 40(12):2935–2947
Article Google Scholar
Dhar P, Singh RV, Peng K-C, Wu Z, Chellappa R (2019) Learning without memorizing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5138–5146
Liu X, Masana M, Herranz L, Van De Weijer J, Lopez AM, Bagdanov AD (2018) Rotate your networks: better weight consolidation and less catastrophic forgetting. In: 2018 24th International conference on pattern recognition (ICPR). IEEE, pp 2262–2268
Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, Milan K, Quan J, Ramalho T, Grabska-Barwinska A et al (2017) Overcoming catastrophic forgetting in neural networks. Proc National Acad Sci 114(13):3521–3526
Article MathSciNet MATH Google Scholar
Aljundi R, Babiloni F, Elhoseiny M, Rohrbach M, Tuytelaars T (2018) Memory aware synapses: learning what (not) to forget. In: Proceedings of the European conference on computer vision (ECCV), pp 139–154
Hou S, Pan X, Loy CC, Wang Z, Lin D (2019) Learning a unified classifier incrementally via rebalancing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 831–839
Belouadah E, Popescu A (2019) Il2m: class incremental learning with dual memory. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 583–592
Douillard A, Cord M, Ollion C, Robert T, Valle E (2020) Podnet: pooled outputs distillation for small-tasks incremental learning. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, 23–28 Aug 2020, proceedings, Part XX 16. Springer, pp 86–102
Yoon J, Yang E, Lee J, Hwang SJ (2018) Lifelong learning with dynamically expandable networks. In: 6th International conference on learning representations, ICLR
Wortsman M, Ramanujan V, Liu R, Kembhavi A, Rastegari M, Yosinski J, Farhadi A (2020) Supermasks in superposition. In: Advances in neural information processing systems
Sokar G, Mocanu DC, Pechenizkiy M (2021) Spacenet: make free space for continual learning. Neurocomputing 439:1–11
Article Google Scholar
Chaudhry A, Dokania PK, Ajanthan T, Torr PH (2018) Riemannian walk for incremental learning: understanding forgetting and intransigence. In: Proceedings of the European conference on computer vision (ECCV), pp 532–547
Shin H, Lee JK, Kim J, Kim J (2017) Continual learning with deep generative replay. In: Advances in neural information processing systems
Mensink T, Verbeek J, Perronnin F, Csurka G (2012) Metric learning for large scale image classification: generalizing to new classes at near-zero cost. In: European conference on computer vision. Springer, pp 488–501
Castro FM, Marín-Jiménez MJ, Guil N, Schmid C, Alahari K (2018) End-to-end incremental learning. In: Proceedings of the European conference on computer vision (ECCV), pp 233–248
Wu Y, Chen Y, Wang L, Ye Y, Liu Z, Guo Y, Fu Y (2019) Large scale incremental learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 374–382
Kang M, Park J, Han B (2022) Class-incremental learning by knowledge distillation with adaptive feature consolidation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16071–16080
Rajasegaran J, Hayat M, Khan S, Khan FS, Shao L (2019) Random path selection for incremental learning. In: Advances in neural information processing systems
Yan S, Xie J, He X (2021) Der: dynamically expandable representation for class incremental learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3014–3023
Wang FL, Zhou D-W, Ye H-J, Zhan D-C (2022) Foster: feature boosting and compression for class-incremental learning. In: European conference on computer vision
Rajasegaran J, Khan S, Hayat M, Khan FS, Shah M (2020) itaml: an incremental task-agnostic meta-learning approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13588–13597
Han S, Pool J, Tran J, Dally WJ (2015) Learning both weights and connections for efficient neural networks. In: Advances in neural information processing systems, pp 1135–1143
Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2017) Pruning filters for efficient convnets. In: 5th international conference on learning representations, ICLR
Frankle J, Carbin M (2019) The lottery ticket hypothesis: finding sparse, trainable neural networks. In: 7th International conference on learning representations, ICLR
Hu H, Peng R, Tai Y-W, Tang C-K (2016) Network trimming: a data-driven neuron pruning approach towards efficient deep architectures. arXiv:1607.03250
Huang Z, Wang N (2018) Data-driven sparse structure selection for deep neural networks. In: Proceedings of the European conference on computer vision (ECCV), pp 304–320
Luo J-H, Wu J, Lin W (2017) Thinet: a filter level pruning method for deep neural network compression. In: Proceedings of the IEEE international conference on computer vision, pp 5058–5066
Dekhovich A, Tax DM, Sluiter MH, Bessa MA (2021) Neural network relief: a pruning algorithm based on neural activity. arXiv:2109.10795
LeCun Y, Denker JS, Solla SA (1990) Optimal brain damage. In: Advances in neural information processing systems, pp 598–605
Hassibi B, Stork DG, Wolff GJ (1993) Optimal brain surgeon and general network pruning. In: IEEE international conference on neural networks. IEEE, pp 293–299
Hassibi B, Stork DG (1993) Second order derivatives for network pruning: optimal brain surgeon. In: Advances in neural information processing systems, pp 164–171
Lebedev V, Lempitsky V (2016) Fast convnets using group-wise brain damage. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2554–2564
Mallya A, Lazebnik S (2018) Packnet: adding multiple tasks to a single network by iterative pruning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7765–7773
Golkar S, Kagan M, Cho K (2019) Continual learning via neural pruning. In: NeurIPS workshop on real neurons & hidden units
Mallya A, Davis D, Lazebnik S (2018) Piggyback: adapting a single network to multiple tasks by learning to mask weights. In: Proceedings of the European conference on computer vision (ECCV), pp 67–82
Hung C-Y, Tu C-H, Wu C-E, Chen C-H, Chan Y-M, Chen C-S (2019) Compacting, picking and growing for unforgetting continual learning. In: Advances in neural information processing systems, vol 32
Rusu AA, Rabinowitz NC, Desjardins G, Soyer H, Kirkpatrick J, Kavukcuoglu K, Pascanu R, Hadsell R (2016) Progressive neural networks. arXiv:1606.04671
Kim ES, Kim JU, Lee S, Moon S-K, Ro YM (2020) Class incremental learning with task-selection. In: 2020 IEEE international conference on image processing (ICIP). IEEE, pp 1846–1850
Lopez-Paz D, Ranzato M (2017) Gradient episodic memory for continual learning. In: Advances in neural information processing systems, vol 30, pp 6467–6476
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: 3rd International conference on learning representations, ICLR
Liu L, Jiang H, He P, Chen W, Liu X, Gao J, Han J (2020) On the variance of the adaptive learning rate and beyond. In: 8th International conference on learning representations, ICLR

Download references

Acknowledgements

The authors would like to thank SURFsara for providing the access to Snellius HPC cluster. A preprint version of this work is published on arXiv under the CC BY license: Dekhovich, A., Tax, D. M., Sluiter, M. H., & Bessa, M. A. Continual Prune-and-Select: Class-incremental learning with specialized subnetworks. arXiv preprint arXiv:2208.04952 (2022).

Funding

Not applicable

Author information

Authors and Affiliations

Department of Materials Science and Engineering, Delft University of Technology, Mekelweg 2, Delft, 2628 CD, The Netherlands
Aleksandr Dekhovich & Marcel H.F Sluiter
Pattern Recognition and Bioinformatics Laboratory, Delft University of Technology, Van Mourik Broekmanweg 6, Delft, 2628 XE, The Netherlands
David M.J. Tax
School of Engineering, Brown University, 184 Hope St., Providence, RI 02912, USA
Miguel A. Bessa

Authors

Aleksandr Dekhovich
View author publications
You can also search for this author in PubMed Google Scholar
David M.J. Tax
View author publications
You can also search for this author in PubMed Google Scholar
Marcel H.F Sluiter
View author publications
You can also search for this author in PubMed Google Scholar
Miguel A. Bessa
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Not applicable

Corresponding author

Correspondence to Miguel A. Bessa.

Ethics declarations

Conflict of Interests

The authors have no conflicts of interest to declare.

Ethics approval

Not applicable

Consent to participate

Not applicable

Consent for Publication

Not applicable

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Additional information on CIFAR-100 experiments

Task-selection

We present CP&S results with different test batch sizes and task-selection strategies in Fig. 9.

Also, we provide an additional comparison between maxoutput and IS strategies in Figs. 10 and 11. In both cases, we observe the advantage of importance scores (IS) over maxoutput strategy in the case of imbalanced tasks.

Training hyperparameters

In Table 4, we show the hyperparameters that we used for experiments on CIFAR-100 in Section 4. For iTAML, all the parameters are taken from the original work and the results were reproduced using the official GitHub repository. Memory buffer contains 2000 training samples to mitigate forgetting. For CP&S, we used 3 pruning iterations, 1000 training samples per task to estimate importance scores in NNrelief and α_conv = 0.9. For retraining (after pruning sep), we use 40 epochs with Learning Rate (LR) 0.01 multiplied by 0.2 on epochs 15, 25 and 40.

Table 4 Hyperparameters for (ResNet-18)/3 training on CIFAR-100 (5/10/20 tasks)

Full size table

In Table 5, we present the training hyperparameters for experiments in Section 5. To reproduce the results, we use PODNet and AFC GitHub repositories using the hyperparameters from the original works. All the previous works use 2000 training samples in the fixed-size memory buffer to mitigate forgetting. For CP&S, we used 1 pruning iteration, 1000 training samples per task to estimate importance scores in NNrelief and α_conv = 0.9. For retraining (after the pruning step), we use 50 epochs with LR 0.001 multiplied by 0.1 on epochs 20 and 40.

Table 5 Hyperparameters for ResNet-32 training on CIFAR-100 (6 tasks)

Full size table

Appendix B: ImageNet-100/1000 results

For ImageNet-100/1000, we present exact numbers from which the plots are constructed for CP&S in Tables 6 and 7.

Table 6 ImageNet-100 results with different test batch sizes and task-IL scenario trained with SGD and Adam

Full size table

Table 7 ImageNet-1000 results with different test batch sizes and task-IL scenario trained with SGD

Full size table

Appendix C: CUB-200-2011 additional comparison

In this section, we provide an additional comparison for ResNet-18 on CUB-200-2011 dataset using 5 test images per batch to predict the task-ID in Fig. 12.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Dekhovich, A., Tax, D.M., Sluiter, M.H. et al. Continual prune-and-select: class-incremental learning with specialized subnetworks. Appl Intell 53, 17849–17864 (2023). https://doi.org/10.1007/s10489-022-04441-z

Download citation

Accepted: 28 December 2022
Published: 13 January 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s10489-022-04441-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Continual prune-and-select: class-incremental learning with specialized subnetworks

Abstract

Access this article

Similar content being viewed by others

Revisiting Distillation and Incremental Classifier Learning

Avoiding Forgetting and Allowing Forward Transfer in Continual Learning via Sparse Networks

More Classifiers, Less Forgetting: A Generic Multi-classifier Paradigm for Incremental Learning

Data Availability

Code Availability

References

Acknowledgements

Funding