SCMP-IL: an incremental learning method with super constraints on model parameters

Han, Jidong; Liu, Zhaoying; Li, Yujian; Zhang, Ting

doi:10.1007/s13042-022-01725-1

SCMP-IL: an incremental learning method with super constraints on model parameters

Original Article
Published: 27 November 2022

Volume 14, pages 1751–1767, (2023)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Jidong Han ORCID: orcid.org/0000-0002-4945-2150¹,
Zhaoying Liu¹,
Yujian Li^1,2 &
…
Ting Zhang¹

195 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Deep learning technology has played an important role in our life. Since deep learning technology relies on the neural network model, it is still plagued by the catastrophic forgetting problem, which refers to the neural network model will forget what it has learned after learning new knowledge. The neural network model learns knowledge through labeled samples, and its knowledge is stored in its parameters. Therefore, many methods try to solve this problem from the perspective of constraint parameters and stored samples. There are few ways to solve this problem from the perspective of constraining features output of neural network models. This paper proposes an incremental learning method with super constraints on model parameters. This method not only calculates the parameter similarity loss of the old and new models, but also calculates the layer output feature similarity loss of the old and new models, and finally suppresses the change of model parameters from two directions. In addition, we also propose a new strategy for selecting representative samples from dataset and tackling the imbalance between stored samples and new task samples. Finally, we utilize the neural kernel mapping support vector machine theory to increase the interpretability of the model. In order to better meet the actual situation, five sample sets with different categories and amounts were employed in experiments. Experiments show the effectiveness of our method. For example, after learning the last task, our method is at least 1.930% and 0.562% higher than other methods on the training set and test set, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

An Incremental Scheme with Weight Pruning to Train Deep Neural Network

RD-NMSVM: neural mapping support vector machine based on parameter regularization and knowledge distillation

Article 03 May 2022

The Neural Network for Online Learning Task Without Manual Feature Extraction

Notes

https://www.kaggle.com/.

References

Wang M, Deng W (2021) Deep face recognition: a survey. Neurocomputing 429:215–244. https://doi.org/10.1016/j.neucom.2020.10.081
Article Google Scholar
Wang Q, Wu T, Zheng H, Guo G (2020) Hierarchical pyramid diverse attention networks for face recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp 8323–8332
Qiao X, Peng C, Liu Z, Hu Y (2019) Word-character attention model for Chinese text classification. Int J Mach Learn Cybern 10:3521–3537. https://doi.org/10.1007/s13042-019-00942-5
Article Google Scholar
Hajiabadi H, Molla-Aliod D, Monsefi R, Yazdi HS (2020) Combination of loss functions for deep text classification. Int J Mach Learn Cybern 11:751–761. https://doi.org/10.1007/s13042-019-00982-x
Article Google Scholar
Ali MNY, Rahman ML, Chaki J et al (2021) Machine translation using deep learning for universal networking language based on their structure. Int J Mach Learn Cybern 12:2365–2376. https://doi.org/10.1007/s13042-021-01317-5
Article Google Scholar
Liu Y, Gu J, Goyal N et al (2020) Multilingual denoising pre-training for neural machine translation. Trans Assoc Comput Linguist 8:726–742. https://doi.org/10.1162/tacl_a_00343
Article Google Scholar
Le D-N, Parvathy VS, Gupta D et al (2021) IoT enabled depthwise separable convolution neural network with deep support vector machine for COVID-19 diagnosis and classification. Int J Mach Learn Cybern 12:3235–3248. https://doi.org/10.1007/s13042-020-01248-7
Article Google Scholar
Wu H, Luo J, Lu X, Zeng Y (2022) 3D transfer learning network for classification of Alzheimer’s disease with MRI. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-021-01501-7
Article Google Scholar
Zhai S, Ren C, Wang Z et al (2022) An effective deep network using target vector update modules for image restoration. Pattern Recognit 122:108333. https://doi.org/10.1016/j.patcog.2021.108333
Article Google Scholar
Zhang Y, Tian Y, Kong Y et al (2021) Residual dense network for image restoration. IEEE Trans Pattern Anal Mach Intell 43:2480–2495. https://doi.org/10.1109/TPAMI.2020.2968521
Article Google Scholar
Peng H, Li J, Song Y, Liu Y (2017) Incrementally learning the Hierarchical Softmax function for neural language models. Proc AAAI Conf Artif Intell. https://doi.org/10.1609/aaai.v31i1.10994
Article Google Scholar
Peng H, Li J, Yan H et al (2020) Dynamic network embedding via incremental skip-gram with negative sampling. Sci China Inf Sci 63:202103. https://doi.org/10.1007/s11432-018-9943-9
Article MathSciNet Google Scholar
Tian C, Fei L, Zheng W et al (2020) Deep learning on image denoising: an overview. Neural Netw 131:251–275. https://doi.org/10.1016/j.neunet.2020.07.025
Article MATH Google Scholar
Zhang K, Zuo W, Zhang L (2018) FFDNet: toward a fast and flexible solution for CNN-Based image denoising. IEEE Trans Image Process 27:4608–4622. https://doi.org/10.1109/TIP.2018.2839891
Article MathSciNet Google Scholar
Ye T, Zhang Z, Zhang X et al (2021) Fault detection of railway freight cars mechanical components based on multi-feature fusion convolutional neural network. Int J Mach Learn Cybern 12:1789–1801. https://doi.org/10.1007/s13042-021-01274-z
Article Google Scholar
Kong T, Sun F, Liu H et al (2020) Foveabox: beyound anchor-based object detection. IEEE Trans Image Process 29:7389–7398. https://doi.org/10.1109/TIP.2020.3002345
Article MATH Google Scholar
Zhang Y, Chi M (2020) Mask-R-FCN: a deep fusion network for semantic segmentation. IEEE Access 8:155753–155765. https://doi.org/10.1109/ACCESS.2020.3012701
Article Google Scholar
Liu X, Deng Z, Yang Y (2019) Recent progress in semantic image segmentation. Artif Intell Rev 52:1089–1106. https://doi.org/10.1007/s10462-018-9641-3
Article Google Scholar
Chen B, Zhao T, Liu J, Lin L (2021) Multipath feature recalibration DenseNet for image classification. Int J Mach Learn Cybern 12:651–660. https://doi.org/10.1007/s13042-020-01194-4
Article Google Scholar
Li S, Song W, Fang L et al (2019) Deep learning for hyperspectral image classification: an overview. IEEE Trans Geosci Remote Sens 57:6690–6709. https://doi.org/10.1109/TGRS.2019.2907932
Article Google Scholar
Gan W, Wang S, Lei X et al (2018) Online CNN-based multiple object tracking with enhanced model updates and identity association. Signal Process Image Commun 66:95–102. https://doi.org/10.1016/j.image.2018.05.008
Article Google Scholar
Aslan MF, Durdu A, Sabanci K, Mutluer MA (2020) CNN and HOG based comparison study for complete occlusion handling in human tracking. Meas J Int Meas Confed 158:107704. https://doi.org/10.1016/j.measurement.2020.107704
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems. pp 5998–6008
Dosovitskiy A, Beyer L, Kolesnikov A, et al (2020) An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Int Conf Learn Represent
Carion N, Massa F, Synnaeve G et al (2020) End-to-end object detection with transformers. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds) Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). Springer International Publishing, Cham, pp 213–229
Google Scholar
Touvron H, Cord M, Douze M et al (2021) Training data-efficient image transformers & distillation through attention. In: International conference on machine learning. PMLR, pp 10347–10357
Tolstikhin I, Houlsby N, Kolesnikov A, et al (2021) MLP-Mixer: An all-MLP Architecture for Vision. arXiv Prepr arXiv210501601
Li Y, Zhang T (2017) Deep neural mapping support vector machines. Neural Netw 93:185–194. https://doi.org/10.1016/j.neunet.2017.05.010
Article MATH Google Scholar
Kirkpatrick J, Pascanu R, Rabinowitz N et al (2017) Overcoming catastrophic forgetting in neural networks. Proc Natl Acad Sci 114:3521–3526. https://doi.org/10.1073/pnas.1611835114
Article MathSciNet MATH Google Scholar
Zenke F, Poole B, Ganguli S (2017) Continual learning through synaptic intelligence. In: Precup D, Teh YW (eds) Proceedings of the 34th International Conference on Machine Learning. PMLR, pp 3987–3995
Nguyen C V, Li Y, Bui TD, Turner RE (2018) Variational continual learning. In: 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings
Rebuffi SA, Kolesnikov A, Sperl G, Lampert CH (2017) iCaRL: Incremental classifier and representation learning. Proc - 30th IEEE Conf Comput Vis Pattern Recognition, CVPR 2017 2017-Janua:5533–5542. https://doi.org/10.1109/CVPR.2017.587
Belouadah E, Popescu A (2019) IL2M: Class incremental learning with dual memory. In: Proceedings of the IEEE International Conference on Computer Vision. pp 583–592
Shin H, Lee JK, Kim J, Kim J (2017) Continual learning with deep generative replay. In: Advances in Neural Information Processing Systems. pp 2991–3000
He C, Wang R, Shan S, Chen X (2019) Exemplar-supported generative reproduction for class incremental learning. In: British Machine Vision Conference 2018, BMVC 2018. p 98
Hayes TL, Kafle K, Shrestha R, et al (2020) Remind your neural network to prevent catastrophic forgetting. In: European Conference on Computer Vision. Springer, pp 466–483
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv Prepr arXiv150302531
Li X, Xiong H, Chen Z et al (2022) Knowledge distillation with attention for deep transfer learning of convolutional networks. ACM Trans Knowl Discov Data 16:1–20. https://doi.org/10.1145/3473912
Article Google Scholar
Yao Z, Wang Y, Long M, Wang J (2020) Unsupervised transfer learning for spatiotemporal predictive networks. In: 37th International Conference on Machine Learning, ICML 2020. PMLR, pp 10709–10719
Li Z, Hoiem D (2017) Learning without forgetting. IEEE Trans Pattern Anal Mach Intell 40:2935–2947
Article Google Scholar
Castro FM, Marín-Jiménez MJ, Guil N, et al (2018) End-to-end incremental learning. In: Proceedings of the European conference on computer vision (ECCV). pp 233–248
Cheraghian A, Rahman S, Fang P, et al (2021) Semantic-aware Knowledge Distillation for Few-Shot Class-Incremental Learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 2534–2543
Zhang J, Zhang J, Ghosh S, et al (2020) Class-incremental Learning via Deep Model Consolidation. In: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). pp 1120–1129
Wu Y, Chen Y, Wang L, et al (2019) Large scale incremental learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 374–382
Zhao B, Xiao X, Gan G, et al (2020) Maintaining discrimination and fairness in class incremental learning. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp 13205–13214
Belouadah E, Popescu A (2020) ScaIL: Classifier weights scaling for class incremental learning. In: Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020. pp 1255–1264
Zhu F, Zhang X-Y, Liu C-L (2021) Calibration for Non-Exemplar Based Class-Incremental Learning. In: 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1–6
Helber P, Bischke B, Dengel A, Borth D (2019) Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE J Sel Top Appl Earth Obs Remote Sens 12:2217–2226. https://doi.org/10.1109/JSTARS.2019.2918242
Article Google Scholar
You Y, Li J, Reddi S, et al (2019) Large batch optimization for deep learning: training BERT in 76 minutes. Int Conf Learn Represent

Download references

Acknowledgements

Our work is supported by the National Natural Science Foundation of China (61806013, 61876010, 61906005), General project of Science and Technology Plan of Beijing Municipal Education Commission (KM202110005028), Project of Interdisciplinary Research Institute of Beijing University of Technology (2021020101) and International Research Cooperation Seed Fund of Beijing University of Technology (2021A01).

Author information

Authors and Affiliations

Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
Jidong Han, Zhaoying Liu, Yujian Li & Ting Zhang
School of Artificial Intelligence, Guilin University of Electronic Technology, Guilin, 541004, China
Yujian Li

Authors

Jidong Han
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoying Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yujian Li
View author publications
You can also search for this author in PubMed Google Scholar
Ting Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jidong Han or Yujian Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

Table

Table 8 Without employing any strategy to prevent catastrophic forgetting, training set Top-1 accuracies of all tasks after adding new tasks

Full size table

8 and Table

Table 9 Without employing any strategy to prevent catastrophic forgetting, test set Top-1 accuracies of all tasks after adding new tasks

Full size table

9 show experimental results of training set and test set of incremental learning without any strategy to prevent catastrophic forgetting, respectively. The purpose of showing two tables is to illustrate the severity of catastrophic forgetting in the neural network model. Without employing any strategy to prevent catastrophic forgetting, the neural network model will completely forget the previous knowledge.

Appendix 2

Comparative experiments of main hyperparameters of Mixer Layer is shown.

Table

Table 10 Setting different numbers of Mixer Layer and then getting training set Top-1 average accuracies of all tasks (%)

Full size table

10 and Table

Table 11 Setting different numbers of Mixer Layer and then getting test set Top-1 average accuracies of all tasks (%)

Full size table

11 respectively show Top-1 average accuracies of the training set and test set of all tasks after stacking different Mixer Layers. It can be seen from Table 10 and Table 11 that setting the number of Mixer Layer overlays to 8 is the most appropriate.

Table

Table 12 Setting different numbers of Patch Size and then getting training set Top-1 average accuracies of all tasks (%)

Full size table

12 and Table

Table 13 Setting different numbers of Patch Size and then getting test set Top-1 average accuracies of all tasks (%)

Full size table

13 represents Top-1 average accuracies of the training set and test set of all tasks after using different patch sizes. From Table 12, it can be seen that setting the patch size to 16 is the most appropriate, and the performance is the best at this time. However, from Table 13, it seems most appropriate to set the patch size to 4. In the experiment, we found that with the decrease of patch size, the time cost and the GPU memory occupied will increase rapidly. In addition, in Table 13, the performance when patch size is set to 4 is only slightly higher than that when patch size is 16. So in the final experiment, we set the Patch Size to 16.

Appendix 3

In order to reduce the use of loop statements and improve the running efficiency of the program, the formula for calculating the Euclidean distance between sample set ${d}_{i}$ and its sample center ${center}_{i}$ in a single task is given here, and the derivation process is also given. Here, we take the image set as an example for calculation and derivation. This class consists of m images, and an image is represented by ${I}_{[c,w,h]}$, where c, w and h represent the color channel, width and height of the image, respectively. The length of the flattened ${I}_{[c,w,h]}$ is $c*w*h$. Let $\mathrm{n}=\mathrm{c}*\mathrm{w}*\mathrm{h}$, m flattened images can be represented by ${M}_{[m,n]}$. Let ${{\varvec{a}}}_{[1,n]}=({a}_{1},{a}_{2},\cdots ,{a}_{n})$ and $M_{[m,n]} = \left[ {\begin{array}{*{20}c} {b_{11} } & {b_{12} } & \cdots & {b_{1n} } \\ {b_{21} } & {b_{22} } & \cdots & {b_{2n} } \\ \cdots & \cdots & \ddots & \cdots \\ {b_{m1} } & {b_{m2} } & \cdots & {b_{mn} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {{\varvec{b}}_{1} } \\ {{\varvec{b}}_{2} } \\ \cdots \\ {{\varvec{b}}_{m} } \\ \end{array} } \right]$,

The Euclidean distance between each row of ${{\varvec{M}}}_{[m,n]}$ and vector ${{\varvec{a}}}_{[1,n]}$ can be expressed by the following formula:

$$\begin{gathered} R_{[m,1]} = sqrt\left( {\left[ {\begin{array}{*{20}c} {\left\| {{\varvec{b}}_{1} } \right\|^{2} + \left\| {{\varvec{a}}_{[1,n]} } \right\|^{2} - 2{\varvec{b}}_{1} \cdot {\varvec{a}}_{[1,n]}^{T} } \\ {\left\| {{\varvec{b}}_{2} } \right\|^{2} + \left\| {{\varvec{a}}_{[1.n]} } \right\|^{2} - 2{\varvec{b}}_{2} \cdot {\varvec{a}}_{[1,n]}^{T} } \\ \vdots \\ {\left\| {{\varvec{b}}_{m} } \right\|^{2} + \left\| {{\varvec{a}}_{[1,n]} } \right\|^{2} - 2{\varvec{b}}_{m} \cdot {\varvec{a}}_{[1,n]}^{T} } \\ \end{array} } \right]_{[m,1]} } \right) \hfill \\ \, \hfill \\ \, = sqrt\left( {\left[ {\begin{array}{*{20}c} {\left\| {{\varvec{b}}_{1} } \right\|^{2} } \\ {\left\| {{\varvec{b}}_{2} } \right\|^{2} } \\ \cdots \\ {\left\| {{\varvec{b}}_{m} } \right\|^{2} } \\ \end{array} } \right] + \left[ {\begin{array}{*{20}c} 1 \\ 1 \\ \cdots \\ 1 \\ \end{array} } \right]_{[m,1]} \left\| {{\varvec{a}}_{[1,n]} } \right\|^{2} - 2\left[ {\begin{array}{*{20}c} {{\varvec{b}}_{1} } \\ {{\varvec{b}}_{2} } \\ \cdots \\ {{\varvec{b}}_{m} } \\ \end{array} } \right]{\varvec{a}}_{[1,n]}^{T} } \right) \hfill \\ \, = sqrt\left( {sum(M_{m*n}^{2} , \, {\mathbf{axis = 1}}) + \left[ {\begin{array}{*{20}c} 1 \\ 1 \\ \cdots \\ 1 \\ \end{array} } \right]_{[m,1]} \left\| {{\varvec{a}}_{[1,n]} } \right\|^{2} - 2M_{[m,n]} {\varvec{a}}_{[1,n]}^{T} } \right) \hfill \\ \end{gathered}$$

(13)

where ${M}_{[m,n]}^{2}$ means to calculate the second power of its each element, and $\mathrm{sum}({M}_{[m,n]}^{2},\mathrm{ axis}=1)$ represents the sum of the elements of each row of ${M}_{[m,n]}^{2}$; sqrt (‘*’) indicates that the square root operation is performed on each element of ‘*’. The element ${{\varvec{R}}}_{i1}$ of ${{\varvec{R}}}_{[m,1]}$ is the Euclidean distance between the i-th row of ${{\varvec{M}}}_{[m,n]}$ and vector ${{\varvec{a}}}_{[1,n]}$, where 1 ≤ i ≤ m.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Han, J., Liu, Z., Li, Y. et al. SCMP-IL: an incremental learning method with super constraints on model parameters. Int. J. Mach. Learn. & Cyber. 14, 1751–1767 (2023). https://doi.org/10.1007/s13042-022-01725-1

Download citation

Received: 07 March 2022
Accepted: 09 November 2022
Published: 27 November 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s13042-022-01725-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SCMP-IL: an incremental learning method with super constraints on model parameters

Abstract

Access this article

Similar content being viewed by others

An Incremental Scheme with Weight Pruning to Train Deep Neural Network

RD-NMSVM: neural mapping support vector machine based on parameter regularization and knowledge distillation

The Neural Network for Online Learning Task Without Manual Feature Extraction

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Appendices

Appendix 1

Appendix 2

Appendix 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SCMP-IL: an incremental learning method with super constraints on model parameters

Abstract

Access this article

Similar content being viewed by others

An Incremental Scheme with Weight Pruning to Train Deep Neural Network

RD-NMSVM: neural mapping support vector machine based on parameter regularization and knowledge distillation

The Neural Network for Online Learning Task Without Manual Feature Extraction

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Appendices

Appendix 1

Appendix 2

Appendix 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation