Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence

  • Arslan Chaudhry
  • Puneet K. Dokania
  • Thalaiyasingam AjanthanEmail author
  • Philip H. S. Torr
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11215)


Incremental learning (il) has received a lot of attention recently, however, the literature lacks a precise problem definition, proper evaluation settings, and metrics tailored specifically for the il problem. One of the main objectives of this work is to fill these gaps so as to provide a common ground for better understanding of il. The main challenge for an il algorithm is to update the classifier whilst preserving existing knowledge. We observe that, in addition to forgetting, a known issue while preserving knowledge, il also suffers from a problem we call intransigence, its inability to update knowledge. We introduce two metrics to quantify forgetting and intransigence that allow us to understand, analyse, and gain better insights into the behaviour of il algorithms. Furthermore, we present RWalk, a generalization of ewc++ (our efficient version of ewc [6]) and Path Integral [25] with a theoretically grounded KL-divergence based perspective. We provide a thorough analysis of various il algorithms on MNIST and CIFAR-100 datasets. In these experiments, RWalk obtains superior results in terms of accuracy, and also provides a better trade-off for forgetting and intransigence.



This work was supported by The Rhodes Trust, EPSRC, ERC grant ERC-2012-AdG 321162-HELIOS, EPSRC grant Seebibyte EP/M013774/1 and EPSRC/MURI grant EP/N019474/1.

Supplementary material

474198_1_En_33_MOESM1_ESM.pdf (790 kb)
Supplementary material 1 (pdf 790 KB)


  1. 1.
    Amari, S.I.: Natural gradient works efficiently in learning. Neural Comput. 10, 251–276 (1998)CrossRefGoogle Scholar
  2. 2.
    Grosse, R., Martens, J.: A kronecker-factored approximate fisher matrix for convolution layers. In: ICML (2016)Google Scholar
  3. 3.
    Hecht-Nielsen, R., et al.: Theory of the backpropagation neural network. Neural Netw. 1(Supplement–1), 445–448 (1988)CrossRefGoogle Scholar
  4. 4.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: NIPS (2014)Google Scholar
  5. 5.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)Google Scholar
  6. 6.
    Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. In: Proceedings of the National Academy of Sciences of the United States of America (PNAS) (2016)Google Scholar
  7. 7.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009).
  8. 8.
    Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Le Roux, N., Pierre-Antoine, M., Bengio, Y.: Topmoumoute online natural gradient algorithm. In: NIPS (2007)Google Scholar
  10. 10.
    LeCun, Y.: The MNIST database of handwritten digits (1998).
  11. 11.
    Lee, J.M.: Riemannian Manifolds: An Introduction to Curvature, vol. 176. Springer, New York (2006). Scholar
  12. 12.
    Lee, S.W., Kim, J.H., Ha, J.W., Zhang, B.T.: Overcoming catastrophic forgetting by incremental moment matching. In: NIPS (2017)Google Scholar
  13. 13.
    Li, Z., Hoiem, D.: Learning without forgetting. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 614–629. Springer, Cham (2016). Scholar
  14. 14.
    Lopez-Paz, D., Ranzato, M.: Gradient episodic memory for continuum learning. In: NIPS (2017)Google Scholar
  15. 15.
    Martens, J., Grosse, R.: Optimizing neural networks with kronecker-factored approximate curvature. In: ICML (2015)Google Scholar
  16. 16.
    Nguyen, C.V., Li, Y., Bui, T.D., Turner, R.E.: Variational continual learning. In: ICLR (2018)Google Scholar
  17. 17.
    Pascanu, R., Bengio, Y.: Revisiting natural gradient for deep networks. In: ICLR (2014)Google Scholar
  18. 18.
    Rebuffi, S.A., Bilen, H., Vedaldi, A.: Learning multiple visual domains with residual adapters. In: NIPS (2017)Google Scholar
  19. 19.
    Rebuffi, S.V., Kolesnikov, A., Lampert, C.H.: iCaRL: incremental classifier and representation learning. In: CVPR (2017)Google Scholar
  20. 20.
    Rusu, A.A., et al.: Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016)
  21. 21.
    Schwarz, J., et al.: Progress & compress: a scalable framework for continual learning. In: ICML (2018)Google Scholar
  22. 22.
    Shin, H., Lee, J.K., Kim, J., Kim, J.: Continual learning with deep generative replay. In: NIPS (2017)Google Scholar
  23. 23.
    Terekhov, A.V., Montone, G., O’Regan, J.K.: Knowledge transfer in deep block-modular neural networks. In: Wilson, S.P., Verschure, P.F.M.J., Mura, A., Prescott, T.J. (eds.) LIVINGMACHINES 2015. LNCS (LNAI), vol. 9222, pp. 268–279. Springer, Cham (2015). Scholar
  24. 24.
    Yoon, J., Yang, E., Lee, J., Hwang, S.J.: Lifelong learning with dynamically expandable networks. In: ICLR (2018)Google Scholar
  25. 25.
    Zenke, F., Poole, B., Ganguli, S.: Continual learning through synaptic intelligence. In: ICML (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Arslan Chaudhry
    • 1
  • Puneet K. Dokania
    • 1
  • Thalaiyasingam Ajanthan
    • 1
    Email author
  • Philip H. S. Torr
    • 1
  1. 1.University of OxfordOxfordUK

Personalised recommendations