Abstract
The human brain is capable of learning tasks sequentially mostly without forgetting. However, deep neural networks (DNNs) suffer from catastrophic forgetting when learning one task after another. We address this challenge considering a class-incremental learning scenario where the DNN sees test data without knowing the task from which this data originates. During training, Continual Prune-and-Select (CP&S) finds a subnetwork within the DNN that is responsible for solving a given task. Then, during inference, CP&S selects the correct subnetwork to make predictions for that task. A new task is learned by training available neuronal connections of the DNN (previously untrained) to create a new subnetwork by pruning, which can include previously trained connections belonging to other subnetwork(s) because it does not update shared connections. This enables to eliminate catastrophic forgetting by creating specialized regions in the DNN that do not conflict with each other while still allowing knowledge transfer across them. The CP&S strategy is implemented with different subnetwork selection strategies, revealing superior performance to state-of-the-art continual learning methods tested on various datasets (CIFAR-100, CUB-200-2011, ImageNet-100 and ImageNet-1000). In particular, CP&S is capable of sequentially learning 10 tasks from ImageNet-1000 keeping an accuracy around 94% with negligible forgetting, a first-of-its-kind result in class-incremental learning. To the best of the authors’ knowledge, this represents an improvement in accuracy above 10% when compared to the best alternative method.
Similar content being viewed by others
Data Availability
Not applicable
Code Availability
PyTorch [1] implementation of the code is available at: https://github.com/adekhovich/continual_prune_and_select
References
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. In: Advances in neural information processing systems, vol 32
French RM (1999) Catastrophic forgetting in connectionist networks. Trends Cognit Sci 3 (4):128–135
Goodfellow IJ, Mirza M, Xiao D, Courville A, Bengio Y (2014) An empirical investigation of catastrophic forgetting in gradient-based neural networks. In: 2nd International conference on learning representations, ICLR
Thrun S (1998) Lifelong learning algorithms. In: Learning to learn. Springer, Boston. MA, pp 181–209
Masana M, Liu X, Twardowski B, Menta M, Bagdanov AD, Van De Weijer J (2020) Class-incremental learning: survey and performance evaluation on image classification. IEEE Trans Pattern Anal Mach Intell
Rebuffi S-A, Kolesnikov A, Sperl G, Lampert CH (2017) icarl: incremental classifier and representation learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2001–2010
Shmelkov K, Schmid C, Alahari K (2017) Incremental learning of object detectors without catastrophic forgetting. In: Proceedings of the IEEE international conference on computer vision, pp 3400–3409
Zhang J, Zhang J, Ghosh S, Li D, Tasci S, Heck L, Zhang H, Kuo C-CJ (2020) Class-incremental learning via deep model consolidation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1131–1140
Michieli U, Zanuttigh P (2019) Incremental learning techniques for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision workshops
Yan S, Zhou J, Xie J, Zhang S, He X (2021) An em framework for online incremental learning of semantic segmentation. In: Proceedings of the 29th ACM international conference on multimedia, pp 3052–3060
Van De Ven GM, Siegelmann HT, Tolias AS (2020) Brain-inspired replay for continual learning with artificial neural networks. Nature Commun 11(1):1–14
Lerner Y, Honey CJ, Silbert LJ, Hasson U (2011) Topographic mapping of a hierarchy of temporal receptive windows using a narrated story. J Neurosci 31(8):2906–2915
Zadbood A, Chen J, Leong YC, Norman KA, Hasson U (2017) How we transmit memories to other brains: constructing shared neural representations via communication. Cerebral cortex 27(10):4988–5000
Huttenlocher PR (1990) Morphometric study of human cerebral cortex development. Neuropsychologia 28(6):517–527
Lennie P (2003) The cost of cortical computation. Current Biol 13(6):493–497
Attwell D, Laughlin SB (2001) An energy budget for signaling in the grey matter of the brain. J Cerebral Blood Flow Metabolism 21(10):1133–1145
Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. Ieee, pp 248–255
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset. Tech Report CNS-TR-2011-001, California institute of technology
Delange M, Aljundi R, Masana M, Parisot S, Jia X, Leonardis A, Slabaugh G, Tuytelaars T (2021) A continual learning survey: defying forgetting in classification tasks. IEEE Trans Pattern Anal Mach Intell
Zenke F, Poole B, Ganguli S (2017) Continual learning through synaptic intelligence. In: International conference on machine learning. PMLR, pp 3987–3995
Li Z, Hoiem D (2017) Learning without forgetting. IEEE Trans Pattern Anal Mach Intell 40(12):2935–2947
Dhar P, Singh RV, Peng K-C, Wu Z, Chellappa R (2019) Learning without memorizing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5138–5146
Liu X, Masana M, Herranz L, Van De Weijer J, Lopez AM, Bagdanov AD (2018) Rotate your networks: better weight consolidation and less catastrophic forgetting. In: 2018 24th International conference on pattern recognition (ICPR). IEEE, pp 2262–2268
Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, Milan K, Quan J, Ramalho T, Grabska-Barwinska A et al (2017) Overcoming catastrophic forgetting in neural networks. Proc National Acad Sci 114(13):3521–3526
Aljundi R, Babiloni F, Elhoseiny M, Rohrbach M, Tuytelaars T (2018) Memory aware synapses: learning what (not) to forget. In: Proceedings of the European conference on computer vision (ECCV), pp 139–154
Hou S, Pan X, Loy CC, Wang Z, Lin D (2019) Learning a unified classifier incrementally via rebalancing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 831–839
Belouadah E, Popescu A (2019) Il2m: class incremental learning with dual memory. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 583–592
Douillard A, Cord M, Ollion C, Robert T, Valle E (2020) Podnet: pooled outputs distillation for small-tasks incremental learning. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, 23–28 Aug 2020, proceedings, Part XX 16. Springer, pp 86–102
Yoon J, Yang E, Lee J, Hwang SJ (2018) Lifelong learning with dynamically expandable networks. In: 6th International conference on learning representations, ICLR
Wortsman M, Ramanujan V, Liu R, Kembhavi A, Rastegari M, Yosinski J, Farhadi A (2020) Supermasks in superposition. In: Advances in neural information processing systems
Sokar G, Mocanu DC, Pechenizkiy M (2021) Spacenet: make free space for continual learning. Neurocomputing 439:1–11
Chaudhry A, Dokania PK, Ajanthan T, Torr PH (2018) Riemannian walk for incremental learning: understanding forgetting and intransigence. In: Proceedings of the European conference on computer vision (ECCV), pp 532–547
Shin H, Lee JK, Kim J, Kim J (2017) Continual learning with deep generative replay. In: Advances in neural information processing systems
Mensink T, Verbeek J, Perronnin F, Csurka G (2012) Metric learning for large scale image classification: generalizing to new classes at near-zero cost. In: European conference on computer vision. Springer, pp 488–501
Castro FM, Marín-Jiménez MJ, Guil N, Schmid C, Alahari K (2018) End-to-end incremental learning. In: Proceedings of the European conference on computer vision (ECCV), pp 233–248
Wu Y, Chen Y, Wang L, Ye Y, Liu Z, Guo Y, Fu Y (2019) Large scale incremental learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 374–382
Kang M, Park J, Han B (2022) Class-incremental learning by knowledge distillation with adaptive feature consolidation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16071–16080
Rajasegaran J, Hayat M, Khan S, Khan FS, Shao L (2019) Random path selection for incremental learning. In: Advances in neural information processing systems
Yan S, Xie J, He X (2021) Der: dynamically expandable representation for class incremental learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3014–3023
Wang FL, Zhou D-W, Ye H-J, Zhan D-C (2022) Foster: feature boosting and compression for class-incremental learning. In: European conference on computer vision
Rajasegaran J, Khan S, Hayat M, Khan FS, Shah M (2020) itaml: an incremental task-agnostic meta-learning approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13588–13597
Han S, Pool J, Tran J, Dally WJ (2015) Learning both weights and connections for efficient neural networks. In: Advances in neural information processing systems, pp 1135–1143
Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2017) Pruning filters for efficient convnets. In: 5th international conference on learning representations, ICLR
Frankle J, Carbin M (2019) The lottery ticket hypothesis: finding sparse, trainable neural networks. In: 7th International conference on learning representations, ICLR
Hu H, Peng R, Tai Y-W, Tang C-K (2016) Network trimming: a data-driven neuron pruning approach towards efficient deep architectures. arXiv:1607.03250
Huang Z, Wang N (2018) Data-driven sparse structure selection for deep neural networks. In: Proceedings of the European conference on computer vision (ECCV), pp 304–320
Luo J-H, Wu J, Lin W (2017) Thinet: a filter level pruning method for deep neural network compression. In: Proceedings of the IEEE international conference on computer vision, pp 5058–5066
Dekhovich A, Tax DM, Sluiter MH, Bessa MA (2021) Neural network relief: a pruning algorithm based on neural activity. arXiv:2109.10795
LeCun Y, Denker JS, Solla SA (1990) Optimal brain damage. In: Advances in neural information processing systems, pp 598–605
Hassibi B, Stork DG, Wolff GJ (1993) Optimal brain surgeon and general network pruning. In: IEEE international conference on neural networks. IEEE, pp 293–299
Hassibi B, Stork DG (1993) Second order derivatives for network pruning: optimal brain surgeon. In: Advances in neural information processing systems, pp 164–171
Lebedev V, Lempitsky V (2016) Fast convnets using group-wise brain damage. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2554–2564
Mallya A, Lazebnik S (2018) Packnet: adding multiple tasks to a single network by iterative pruning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7765–7773
Golkar S, Kagan M, Cho K (2019) Continual learning via neural pruning. In: NeurIPS workshop on real neurons & hidden units
Mallya A, Davis D, Lazebnik S (2018) Piggyback: adapting a single network to multiple tasks by learning to mask weights. In: Proceedings of the European conference on computer vision (ECCV), pp 67–82
Hung C-Y, Tu C-H, Wu C-E, Chen C-H, Chan Y-M, Chen C-S (2019) Compacting, picking and growing for unforgetting continual learning. In: Advances in neural information processing systems, vol 32
Rusu AA, Rabinowitz NC, Desjardins G, Soyer H, Kirkpatrick J, Kavukcuoglu K, Pascanu R, Hadsell R (2016) Progressive neural networks. arXiv:1606.04671
Kim ES, Kim JU, Lee S, Moon S-K, Ro YM (2020) Class incremental learning with task-selection. In: 2020 IEEE international conference on image processing (ICIP). IEEE, pp 1846–1850
Lopez-Paz D, Ranzato M (2017) Gradient episodic memory for continual learning. In: Advances in neural information processing systems, vol 30, pp 6467–6476
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: 3rd International conference on learning representations, ICLR
Liu L, Jiang H, He P, Chen W, Liu X, Gao J, Han J (2020) On the variance of the adaptive learning rate and beyond. In: 8th International conference on learning representations, ICLR
Acknowledgements
The authors would like to thank SURFsara for providing the access to Snellius HPC cluster. A preprint version of this work is published on arXiv under the CC BY license: Dekhovich, A., Tax, D. M., Sluiter, M. H., & Bessa, M. A. Continual Prune-and-Select: Class-incremental learning with specialized subnetworks. arXiv preprint arXiv:2208.04952 (2022).
Funding
Not applicable
Author information
Authors and Affiliations
Contributions
Not applicable
Corresponding author
Ethics declarations
Conflict of Interests
The authors have no conflicts of interest to declare.
Ethics approval
Not applicable
Consent to participate
Not applicable
Consent for Publication
Not applicable
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Additional information on CIFAR-100 experiments
Task-selection
We present CP&S results with different test batch sizes and task-selection strategies in Fig. 9.
Also, we provide an additional comparison between maxoutput and IS strategies in Figs. 10 and 11. In both cases, we observe the advantage of importance scores (IS) over maxoutput strategy in the case of imbalanced tasks.
Training hyperparameters
In Table 4, we show the hyperparameters that we used for experiments on CIFAR-100 in Section 4. For iTAML, all the parameters are taken from the original work and the results were reproduced using the official GitHub repository. Memory buffer contains 2000 training samples to mitigate forgetting. For CP&S, we used 3 pruning iterations, 1000 training samples per task to estimate importance scores in NNrelief and αconv = 0.9. For retraining (after pruning sep), we use 40 epochs with Learning Rate (LR) 0.01 multiplied by 0.2 on epochs 15, 25 and 40.
In Table 5, we present the training hyperparameters for experiments in Section 5. To reproduce the results, we use PODNet and AFC GitHub repositories using the hyperparameters from the original works. All the previous works use 2000 training samples in the fixed-size memory buffer to mitigate forgetting. For CP&S, we used 1 pruning iteration, 1000 training samples per task to estimate importance scores in NNrelief and αconv = 0.9. For retraining (after the pruning step), we use 50 epochs with LR 0.001 multiplied by 0.1 on epochs 20 and 40.
Appendix B: ImageNet-100/1000 results
For ImageNet-100/1000, we present exact numbers from which the plots are constructed for CP&S in Tables 6 and 7.
Appendix C: CUB-200-2011 additional comparison
In this section, we provide an additional comparison for ResNet-18 on CUB-200-2011 dataset using 5 test images per batch to predict the task-ID in Fig. 12.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dekhovich, A., Tax, D.M., Sluiter, M.H. et al. Continual prune-and-select: class-incremental learning with specialized subnetworks. Appl Intell 53, 17849–17864 (2023). https://doi.org/10.1007/s10489-022-04441-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04441-z