REMIND Your Neural Network to Prevent Catastrophic Forgetting

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12353)


People learn throughout life. However, incrementally updating conventional neural networks leads to catastrophic forgetting. A common remedy is replay, which is inspired by how the brain consolidates memory. Replay involves fine-tuning a network on a mixture of new and old instances. While there is neuroscientific evidence that the brain replays compressed memories, existing methods for convolutional networks replay raw images. Here, we propose REMIND, a brain-inspired approach that enables efficient replay with compressed representations. REMIND is trained in an online manner, meaning it learns one example at a time, which is closer to how humans learn. Under the same constraints, REMIND outperforms other methods for incremental class learning on the ImageNet ILSVRC-2012 dataset. We probe REMIND’s robustness to data ordering schemes known to induce catastrophic forgetting. We demonstrate REMIND’s generality by pioneering online learning for Visual Question Answering (VQA) (


Online learning Brain-inspired Deep learning 



This work was supported in part by the DARPA/MTO Lifelong Learning Machines program [W911NF-18-2-0263], AFOSR grant [FA9550-18-1-0121], NSF award #1909696, and a gift from Adobe Research. We thank NVIDIA for the GPU donation. The views and conclusions contained herein are those of the authors and should not be interpreted as representing the official policies or endorsements of any sponsor. We thank Michael Mozer, Ryne Roady, and Zhongchao Qian for feedback on early drafts of this paper.

Supplementary material

504445_1_En_28_MOESM1_ESM.pdf (2.3 mb)
Supplementary material 1 (pdf 2305 KB)


  1. 1.
    Abraham, W.C., Robins, A.: Memory retention-the synaptic stability versus plasticity dilemma. Trends Neurosci. 28, 73–78 (2005)Google Scholar
  2. 2.
    Acharya, M., Jariwala, K., Kanan, C.: VQD: visual query detection in natural scenes. In: NAACL (2019)Google Scholar
  3. 3.
    Aljundi, R., Babiloni, F., Elhoseiny, M., Rohrbach, M., Tuytelaars, T.: Memory aware synapses: learning what (not) to forget. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 144–161. Springer, Cham (2018). Scholar
  4. 4.
    Aljundi, R., et al.: Online continual learning with maximal interfered retrieval. In: NeurIPS, pp. 11849–11860 (2019)Google Scholar
  5. 5.
    Aljundi, R., Lin, M., Goujaud, B., Bengio, Y.: Gradient based sample selection for online continual learning. In: NeurIPS, pp. 11816–11825 (2019)Google Scholar
  6. 6.
    Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: CVPR (2018)Google Scholar
  7. 7.
    Andreas, J., Rohrbach, M., Darrell, T., Klein, D.: Neural module networks. In: CVPR, pp. 39–48 (2016)Google Scholar
  8. 8.
    Antol, S., et al.: VQA: visual question answering. In: ICCV (2015)Google Scholar
  9. 9.
    Barnes, D.C., Wilson, D.A.: Slow-wave sleep-imposed replay modulates both strength and precision of memory. J. Neurosci. 34(15), 5134–5142 (2014)Google Scholar
  10. 10.
    Ben-Younes, H., Cadene, R., Cord, M., Thome, N.: Mutan: multimodal tucker fusion for visual question answering. In: ICCV (2017)Google Scholar
  11. 11.
    Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: ICML, pp. 41–48 (2009)Google Scholar
  12. 12.
    Bernardi, R., et al.: Automatic description generation from images: a survey of models, datasets, and evaluation measures. J. Artif. Intell. Res. 55, 409–442 (2016)CrossRefGoogle Scholar
  13. 13.
    Castro, F.M., Marín-Jiménez, M.J., Guil, N., Schmid, C., Alahari, K.: End-to-End incremental learning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 241–257. Springer, Cham (2018). Scholar
  14. 14.
    Chaudhry, A., Dokania, P.K., Ajanthan, T., Torr, P.H.S.: Riemannian walk for incremental learning: understanding forgetting and intransigence. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 556–572. Springer, Cham (2018). Scholar
  15. 15.
    Chaudhry, A., Ranzato, M., Rohrbach, M., Elhoseiny, M.: Efficient lifelong learning with A-GEM. In: ICLR (2019)Google Scholar
  16. 16.
    Dhar, P., Singh, R.V., Peng, K.C., Wu, Z., Chellappa, R.: Learning without memorizing. In: CVPR, pp. 5138–5146 (2019)Google Scholar
  17. 17.
    Farquhar, S., Gal, Y.: Towards robust evaluations of continual learning. arXiv:1805.09733 (2018)
  18. 18.
    Fernando, C., et al.: Pathnet: evolution channels gradient descent in super neural networks. arXiv:1701.08734 (2017)
  19. 19.
    Fukui, A., Park, D.H., Yang, D., Rohrbach, A., Darrell, T., Rohrbach, M.: Multimodal compact bilinear pooling for visual question answering and visual grounding. In: EMNLP (2016)Google Scholar
  20. 20.
    Gama, J.: Knowledge Discovery from Data Streams. Chapman and Hall/CRC, Boca Raton (2010)zbMATHGoogle Scholar
  21. 21.
    Gama, J., Sebastião, R., Rodrigues, P.P.: On evaluating stream learning algorithms. Mach. Learn. 90(3), 317–346 (2013). Scholar
  22. 22.
    Hayes, T.L., Cahill, N.D., Kanan, C.: Memory efficient experience replay for streaming learning. In: ICRA (2019)Google Scholar
  23. 23.
    Hayes, T.L., Kanan, C.: Lifelong machine learning with deep streaming linear discriminant analysis. In: CVPRW (2020)Google Scholar
  24. 24.
    Hayes, T.L., Kemker, R., Cahill, N.D., Kanan, C.: New metrics and experimental paradigms for continual learning. In: CVPRW, pp. 2031–2034 (2018)Google Scholar
  25. 25.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  26. 26.
    Hou, S., Pan, X., Loy, C.C., Wang, Z., Lin, D.: Lifelong learning via progressive distillation and retrospection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 452–467. Springer, Cham (2018). Scholar
  27. 27.
    Hou, S., Pan, X., Wang, Z., Change Loy, C., Lin, D.: Learning a unified classifier incrementally via rebalancing. In: CVPR (2019)Google Scholar
  28. 28.
    Hudson, D.A., Manning, C.D.: Compositional attention networks for machine reasoning. In: ICLR (2018)Google Scholar
  29. 29.
    Insausti, R., et al.: The nonhuman primate hippocampus: neuroanatomy and patterns of cortical connectivity. In: Hannula, D.E., Duff, M.C. (eds.) The Hippocampus from Cells to Systems, pp. 3–36. Springer, Cham (2017). Scholar
  30. 30.
    Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. TPAMI 33(1), 117–128 (2010)Google Scholar
  31. 31.
    Ji, D., Wilson, M.A.: Coordinated memory replay in the visual cortex and hippocampus during sleep. Nat. Neurosci. 10(1), 100–107 (2007)Google Scholar
  32. 32.
    Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data (2019)Google Scholar
  33. 33.
    Johnson, J., Hariharan, B., van der Maaten, L., Fei-Fei, L., Zitnick, C.L., Girshick, R.: Clevr: a diagnostic dataset for compositional language and elementary visual reasoning. In: CVPR (2017)Google Scholar
  34. 34.
    Kafle, K., Kanan, C.: Answer-type prediction for visual question answering. In: CVPR, pp. 4976–4984 (2016)Google Scholar
  35. 35.
    Kafle, K., Kanan, C.: An analysis of visual question answering algorithms. In: ICCV, pp. 1983–1991 (2017)Google Scholar
  36. 36.
    Kafle, K., Kanan, C.: Visual question answering: datasets, algorithms, and future challenges. Comput. Vis. Image Underst. 163, 3–20 (2017)CrossRefGoogle Scholar
  37. 37.
    Kafle, K., Price, B., Cohen, S., Kanan, C.: DVQA: understanding data visualizations via question answering. In: CVPR, pp. 5648–5656 (2018)Google Scholar
  38. 38.
    Kafle, K., Shrestha, R., Cohen, S., Price, B., Kanan, C.: Answering questions about data visualizations using efficient bimodal fusion. In: WACV, pp. 1498–1507 (2020)Google Scholar
  39. 39.
    Kafle, K., Shrestha, R., Kanan, C.: Challenges and prospects in vision and language research. Front. Artif. Intell. 2, 28 (2019)CrossRefGoogle Scholar
  40. 40.
    Kahou, S.E., Michalski, V., Atkinson, A., Kádár, Á., Trischler, A., Bengio, Y.: Figureqa: An annotated figure dataset for visual reasoning. arXiv preprint arXiv:1710.07300 (2017)
  41. 41.
    Karlsson, M.P., Frank, L.M.: Awake replay of remote experiences in the hippocampus. Nat. Neurosci. 12(7), 913 (2009)Google Scholar
  42. 42.
    Kazemi, V., Elqursh, A.: Show, ask, attend, and answer: a strong baseline for visual question answering. arXiv:1704.03162 (2017)
  43. 43.
    Kazemzadeh, S., Ordonez, V., Matten, M., Berg, T.: Referitgame: Referring to objects in photographs of natural scenes. In: EMNLP, pp. 787–798 (2014)Google Scholar
  44. 44.
    Kemker, R., Kanan, C.: FearNet: brain-inspired model for incremental learning. In: ICLR (2018)Google Scholar
  45. 45.
    Kemker, R., McClure, M., Abitino, A., Hayes, T.L., Kanan, C.: Measuring catastrophic forgetting in neural networks. In: AAAI (2018)Google Scholar
  46. 46.
    Kim, J.H., Jun, J., Zhang, B.T.: Bilinear attention networks. In: NeurIPS, pp. 1564–1574 (2018)Google Scholar
  47. 47.
    Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. In: PNAS (2017)Google Scholar
  48. 48.
    Konkel, A., Warren, D.E., Duff, M.C., Tranel, D., Cohen, N.J.: Hippocampal amnesia impairs all manner of relational memory. Front. Hum. Neurosci. 2, 15 (2008)Google Scholar
  49. 49.
    Le, T., Stahl, F., Gaber, M.M., Gomes, J.B., Di Fatta, G.: On expressiveness and uncertainty awareness in rule-based classification for data streams. Neurocomputing 265, 127–141 (2017)Google Scholar
  50. 50.
    Lee, K., Lee, K., Shin, J., Lee, H.: Overcoming catastrophic forgetting with unlabeled data in the wild. In: ICCV, pp. 312–321 (2019)Google Scholar
  51. 51.
    Lewis, P.A., Durrant, S.J.: Overlapping memory replay during sleep builds cognitive schemata. Trends Cogn. Sci. 15(8), 343–351 (2011)Google Scholar
  52. 52.
    Lomonaco, V., Maltoni, D.: Core50: a new dataset and benchmark for continuous object recognition. In: CoRL, pp. 17–26 (2017)Google Scholar
  53. 53.
    Lopez-Paz, D., Ranzato, M.: Gradient episodic memory for continual learning. In: NeurIPS, pp. 6467–6476 (2017)Google Scholar
  54. 54.
    Malinowski, M., Fritz, M.: A multi-world approach to question answering about real-world scenes based on uncertain input. In: NeurIPS (2014)Google Scholar
  55. 55.
    Marois, V., Jayram, T., Albouy, V., Kornuta, T., Bouhadjar, Y., Ozcan, A.S.: On transfer learning using a mac model variant. arXiv:1811.06529 (2018)
  56. 56.
    McClelland, J.L., Goddard, N.H.: Considerations arising from a complementary learning systems perspective on hippocampus and neocortex. Hippocampus 6(6), 654–665 (1996)Google Scholar
  57. 57.
    McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. Psychol. Learn. Motiv. 24, 109–165 (1989)CrossRefGoogle Scholar
  58. 58.
    Nguyen, C.V., Li, Y., Bui, T.D., Turner, R.E.: Variational continual learning. In: ICLR (2018)Google Scholar
  59. 59.
    Ostapenko, O., Puscas, M., Klein, T., Jähnichen, P., Nabi, M.: Learning to remember: a synaptic plasticity driven framework for continual learning. In: CVPR (2019)Google Scholar
  60. 60.
    O’Neill, J., Pleydell-Bouverie, B., Dupret, D., Csicsvari, J.: Play it again: reactivation of waking experience and memory. Trends Neurosci. 33(5), 220–229 (2010)Google Scholar
  61. 61.
    Parisi, G.I., Kemker, R., Part, J.L., Kanan, C., Wermter, S.: Continual lifelong learning with neural networks: a review. Neural Netw. 113, 54–71 (2019)Google Scholar
  62. 62.
    Parisi, G.I., Tani, J., Weber, C., Wermter, S.: Lifelong learning of spatiotemporal representations with dual-memory recurrent self-organization. Front. Neurorobot. 12, 78 (2018)Google Scholar
  63. 63.
    Plummer, B.A., Wang, L., Cervantes, C.M., Caicedo, J.C., Hockenmaier, J., Lazebnik, S.: Flickr30k entities: collecting region-to-phrase correspondences for richer image-to-sentence models. In: ICCV, pp. 2641–2649 (2015)Google Scholar
  64. 64.
    Rebuffi, S.A., Kolesnikov, A., Sperl, G., Lampert, C.H.: icarl: incremental classifier and representation learning. In: CVPR (2017)Google Scholar
  65. 65.
    Riemer, M., et al.: Learning to learn without forgetting by maximizing transfer and minimizing interference. In: ICLR (2019)Google Scholar
  66. 66.
    Ritter, H., Botev, A., Barber, D.: Online structured Laplace approximations for overcoming catastrophic forgetting. In: NeurIPS, pp. 3738–3748 (2018)Google Scholar
  67. 67.
    Rohrbach, A., Rohrbach, M., Hu, R., Darrell, T., Schiele, B.: Grounding of textual phrases in images by reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 817–834. Springer, Cham (2016). Scholar
  68. 68.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015). Scholar
  69. 69.
    Rusu, A.A., et al.: Progressive neural networks. arXiv:1606.04671 (2016)
  70. 70.
    Serra, J., Suris, D., Miron, M., Karatzoglou, A.: Overcoming catastrophic forgetting with hard attention to the task. In: ICML, pp. 4555–4564 (2018)Google Scholar
  71. 71.
    Shrestha, R., Kafle, K., Kanan, C.: Answer them all! toward universal visual question answering models. In: CVPR (2019)Google Scholar
  72. 72.
    Stickgold, R., Hobson, J.A., Fosse, R., Fosse, M.: Sleep, learning, and dreams: off-line memory reprocessing. Science 294(5544), 1052–1057 (2001)Google Scholar
  73. 73.
    Subramanian, S., Trischler, A., Bengio, Y., Pal, C.J.: Learning general purpose distributed sentence representations via large scale multi-task learning. In: ICLR (2018)Google Scholar
  74. 74.
    Takahashi, S.: Episodic-like memory trace in awake replay of hippocampal place cell activity sequences. Elife 4, e08105 (2015)Google Scholar
  75. 75.
    Teyler, T.J., Rudy, J.W.: The hippocampal indexing theory and episodic memory: updating the index. Hippocampus 17(12), 1158–1169 (2007)Google Scholar
  76. 76.
    Verma, V., et al.: Manifold mixup: better representations by interpolating hidden states. In: ICML (2019)Google Scholar
  77. 77.
    Wu, Y., et al.: Large scale incremental learning. In: CVPR, pp. 374–382 (2019)Google Scholar
  78. 78.
    Yang, Z., He, X., Gao, J., Deng, L., Smola, A.J.: Stacked attention networks for image question answering. In: CVPR (2016)Google Scholar
  79. 79.
    Yoon, J., Yang, E., Lee, J., Hwang, S.J.: Lifelong learning with dynamically expandable networks. In: ICLR (2018)Google Scholar
  80. 80.
    Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: NeurIPS, pp. 3320–3328 (2014)Google Scholar
  81. 81.
    Zenke, F., Poole, B., Ganguli, S.: Continual learning through synaptic intelligence. In: ICML, pp. 3987–3995 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Rochester Institute of TechnologyRochesterUSA
  2. 2.Adobe ResearchSan JoseUSA
  3. 3.PaigeNew YorkUSA
  4. 4.Cornell TechNew YorkUSA

Personalised recommendations