Skip to main content

CoSCL: Cooperation of Small Continual Learners is Stronger Than a Big One

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13686))

Abstract

Continual learning requires incremental compatibility with a sequence of tasks. However, the design of model architecture remains an open question: In general, learning all tasks with a shared set of parameters suffers from severe interference between tasks; while learning each task with a dedicated parameter subspace is limited by scalability. In this work, we theoretically analyze the generalization errors for learning plasticity and memory stability in continual learning, which can be uniformly upper-bounded by (1) discrepancy between task distributions, (2) flatness of loss landscape and (3) cover of parameter space. Then, inspired by the robust biological learning system that processes sequential experiences with multiple parallel compartments, we propose Cooperation of Small Continual Learners (CoSCL) as a general strategy for continual learning. Specifically, we present an architecture with a fixed number of narrower sub-networks to learn all incremental tasks in parallel, which can naturally reduce the two errors through improving the three components of the upper bound. To strengthen this advantage, we encourage to cooperate these sub-networks by penalizing the difference of predictions made by their feature representations. With a fixed parameter budget, CoSCL can improve a variety of representative continual learning approaches by a large margin (e.g., up to 10.64% on CIFAR-100-SC, 9.33% on CIFAR-100-RS, 11.45% on CUB-200-2011 and 6.72% on Tiny-ImageNet) and achieve the new state-of-the-art performance. Our code is available at https://github.com/lywang3081/CoSCL.

L. Wang and X. Zhang—Contributed equally.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    In contrast to a single continual learning model with a wide network, we refer to such narrower sub-networks as “small” continual learners.

  2. 2.

    A concurrent work observed that the regular CNN architecture indeed achieves better continual learning performance than more advanced architectures such as ResNet and ViT with the same amount of parameters [27].

  3. 3.

    They both are performed against a similar AlexNet-based architecture.

  4. 4.

    Here we only use feature ensemble (FE) with ensemble cooperation loss (EC).

References

  1. Aljundi, R., Babiloni, F., Elhoseiny, M., Rohrbach, M., Tuytelaars, T.: Memory aware synapses: learning what (not) to forget. In: Proceedings of the European Conference on Computer Vision, pp. 139–154 (2018)

    Google Scholar 

  2. Aljundi, R., Chakravarty, P., Tuytelaars, T.: Expert gate: lifelong learning with a network of experts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3366–3375 (2017)

    Google Scholar 

  3. Aso, Y., et al.: The neuronal architecture of the mushroom body provides a logic for associative learning. Elife 3, e04577 (2014)

    Article  Google Scholar 

  4. Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Vaughan, J.W.: A theory of learning from different domains. Mach. Learn. 79(1), 151–175 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  5. Cha, J., et al.: Swad: domain generalization by seeking flat minima. arXiv preprint arXiv:2102.08604 (2021)

  6. Cha, S., Hsu, H., Hwang, T., Calmon, F., Moon, T.: CPR: classifier-projection regularization for continual learning. In: Proceedings of the International Conference on Learning Representations (2020)

    Google Scholar 

  7. Chaudhry, A., Dokania, P.K., Ajanthan, T., Torr, P.H.: Riemannian walk for incremental learning: Understanding forgetting and intransigence. In: Proceedings of the European Conference on Computer Vision, pp. 532–547 (2018)

    Google Scholar 

  8. Chen, X., He, K.: Exploring simple siamese representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15750–15758 (2021)

    Google Scholar 

  9. Cohn, R., Morantte, I., Ruta, V.: Coordinated and compartmentalized neuromodulation shapes sensory processing in drosophila. Cell 163(7), 1742–1755 (2015)

    Article  Google Scholar 

  10. Delange, M., et al.: A continual learning survey: defying forgetting in classification tasks. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3366–3385 (2021)

    Google Scholar 

  11. Deng, D., Chen, G., Hao, J., Wang, Q., Heng, P.A.: Flattening sharpness for dynamic gradient projection memory benefits continual learning. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 34 (2021)

    Google Scholar 

  12. Dinh, L., Pascanu, R., Bengio, S., Bengio, Y.: Sharp minima can generalize for deep nets. In: Proceedings of the International Conference on Machine Learning, pp. 1019–1028. PMLR (2017)

    Google Scholar 

  13. Doan, T., Mirzadeh, S.I., Pineau, J., Farajtabar, M.: Efficient continual learning ensembles in neural network subspaces. arXiv preprint arXiv:2202.09826 (2022)

  14. Fernando, C., et al.: Pathnet: evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734 (2017)

  15. Hu, D., et al.: How well self-supervised pre-training performs with streaming data? arXiv preprint arXiv:2104.12081 (2021)

  16. Hurtado, J., Raymond, A., Soto, A.: Optimizing reusable knowledge for continual learning via metalearning. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 34 (2021)

    Google Scholar 

  17. Jung, S., Ahn, H., Cha, S., Moon, T.: Continual learning with node-importance based adaptive group sparse regularization. arXiv e-prints pp. arXiv-2003 (2020)

    Google Scholar 

  18. Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  19. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Technical report, Citeseer (2009)

    Google Scholar 

  20. Liu, Yu., Parisot, S., Slabaugh, G., Jia, X., Leonardis, A., Tuytelaars, T.: More classifiers, less forgetting: a generic multi-classifier paradigm for incremental learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12371, pp. 699–716. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58574-7_42

    Chapter  Google Scholar 

  21. Long, M., Cao, Y., Wang, J., Jordan, M.: Learning transferable features with deep adaptation networks. In: Proceedings of the International Conference on Machine Learning, pp. 97–105. PMLR (2015)

    Google Scholar 

  22. Lopez-Paz, D., et al.: Gradient episodic memory for continual learning. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 6467–6476 (2017)

    Google Scholar 

  23. Madaan, D., Yoon, J., Li, Y., Liu, Y., Hwang, S.J.: Rethinking the representational continuity: Towards unsupervised continual learning. arXiv preprint arXiv:2110.06976 (2021)

  24. McAllester, D.A.: PAC-Bayesian model averaging. In: Proceedings of the Twelfth Annual Conference on Computational Learning Theory, pp. 164–170 (1999)

    Google Scholar 

  25. McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. In: Psychology of Learning and Motivation, vol. 24, pp. 109–165. Elsevier (1989)

    Google Scholar 

  26. Mirzadeh, S.I., Chaudhry, A., Hu, H., Pascanu, R., Gorur, D., Farajtabar, M.: Wide neural networks forget less catastrophically. arXiv preprint arXiv:2110.11526 (2021)

  27. Mirzadeh, S.I., et al.: Architecture matters in continual learning. arXiv preprint arXiv:2202.00275 (2022)

  28. Mirzadeh, S.I., Farajtabar, M., Gorur, D., Pascanu, R., Ghasemzadeh, H.: Linear mode connectivity in multitask and continual learning. arXiv preprint arXiv:2010.04495 (2020)

  29. Mirzadeh, S.I., Farajtabar, M., Pascanu, R., Ghasemzadeh, H.: Understanding the role of training regimes in continual learning. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 33, pp. 7308–7320 (2020)

    Google Scholar 

  30. Modi, M.N., Shuai, Y., Turner, G.C.: The drosophila mushroom body: from architecture to algorithm in a learning circuit. Annu. Rev. Neurosci. 43, 465–484 (2020)

    Article  Google Scholar 

  31. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011)

    Google Scholar 

  32. Qin, Q., Hu, W., Peng, H., Zhao, D., Liu, B.: BNS: building network structures dynamically for continual learning. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 34 (2021)

    Google Scholar 

  33. Ramesh, R., Chaudhari, P.: Model zoo: a growing brain that learns continually. In: NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and Applications (2021)

    Google Scholar 

  34. Rebuffi, S.A., Kolesnikov, A., Sperl, G., Lampert, C.H.: ICARL: incremental classifier and representation learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2001–2010 (2017)

    Google Scholar 

  35. Riemer, M., et al.: Learning to learn without forgetting by maximizing transfer and minimizing interference. arXiv preprint arXiv:1810.11910 (2018)

  36. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  37. Rusu, A.A., et al.: Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016)

  38. Schwarz, J., et al.: Progress & compress: a scalable framework for continual learning. In: Proceedings of the International Conference on Machine Learning, pp. 4528–4537. PMLR (2018)

    Google Scholar 

  39. Serra, J., Suris, D., Miron, M., Karatzoglou, A.: Overcoming catastrophic forgetting with hard attention to the task. In: Proceedings of the International Conference on Machine Learning, pp. 4548–4557. PMLR (2018)

    Google Scholar 

  40. Shi, G., Chen, J., Zhang, W., Zhan, L.M., Wu, X.M.: Overcoming catastrophic forgetting in incremental few-shot learning by finding flat minima. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 34 (2021)

    Google Scholar 

  41. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-UCSD birds-200-2011 dataset (2011)

    Google Scholar 

  42. Wang, L., Yang, K., Li, C., Hong, L., Li, Z., Zhu, J.: Ordisco: effective and efficient usage of incremental unlabeled data for semi-supervised continual learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5383–5392 (2021)

    Google Scholar 

  43. Wang, L., et al.: AFEC: active forgetting of negative transfer in continual learning. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 34 (2021)

    Google Scholar 

  44. Wang, L., et al.: Memory replay with data compression for continual learning. In: Proceedings of the International Conference on Learning Representations (2021)

    Google Scholar 

  45. Wen, Y., Tran, D., Ba, J.: Batchensemble: an alternative approach to efficient ensemble and lifelong learning. In: Proceedings of the International Conference on Learning Representations (2020)

    Google Scholar 

  46. Wortsman, M., Horton, M.C., Guestrin, C., Farhadi, A., Rastegari, M.: Learning neural network subspaces. In: Proceedings of the International Conference on Machine Learning, pp. 11217–11227. PMLR (2021)

    Google Scholar 

  47. Wortsman, M., et al.: Supermasks in superposition. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 33, pp. 15173–15184 (2020)

    Google Scholar 

  48. Yan, S., Xie, J., He, X.: DER: dynamically expandable representation for class incremental learning. arXiv preprint arXiv:2103.16788 (2021)

  49. Zbontar, J., Jing, L., Misra, I., LeCun, Y., Deny, S.: Barlow twins: self-supervised learning via redundancy reduction. In: Proceedings of the International Conference on Machine Learning, pp. 12310–12320. PMLR (2021)

    Google Scholar 

  50. Zenke, F., Poole, B., Ganguli, S.: Continual learning through synaptic intelligence. In: Proceedings of the International Conference on Machine Learning, pp. 3987–3995. PMLR (2017)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (2017YFA0700904, 2020AAA0106000, 2020AAA0104304, 2020AAA0106302, 2021YFB2701000), NSFC Projects (Nos. 62061136001, 62106123, 62076147, U19B2034, U1811461, U19A2081, 61972224), Beijing NSF Project (No. JQ19016), BNRist (BNR2022RC01006), Tsinghua-Peking Center for Life Sciences, Tsinghua Institute for Guo Qiang, Beijing Academy of Artificial Intelligence (BAAI), Tsinghua-OPPO Joint Research Center for Future Terminal Technology, the High Performance Computing Center, Tsinghua University, and China Postdoctoral Science Foundation (Nos. 2021T140377, 2021M701892).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jun Zhu or Yi Zhong .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2624 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, L., Zhang, X., Li, Q., Zhu, J., Zhong, Y. (2022). CoSCL: Cooperation of Small Continual Learners is Stronger Than a Big One. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13686. Springer, Cham. https://doi.org/10.1007/978-3-031-19809-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19809-0_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19808-3

  • Online ISBN: 978-3-031-19809-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics