Advertisement

PODNet: Pooled Outputs Distillation for Small-Tasks Incremental Learning

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12365)

Abstract

Lifelong learning has attracted much attention, but existing works still struggle to fight catastrophic forgetting and accumulate knowledge over long stretches of incremental learning. In this work, we propose PODNet, a model inspired by representation learning. By carefully balancing the compromise between remembering the old classes and learning new ones, PODNet fights catastrophic forgetting, even over very long runs of small incremental tasks – a setting so far unexplored by current works. PODNet innovates on existing art with an efficient spatial-based distillation-loss applied throughout the model and a representation comprising multiple proxy vectors for each class. We validate those innovations thoroughly, comparing PODNet with three state-of-the-art models on three datasets: CIFAR100, ImageNet100, and ImageNet1000. Our results showcase a significant advantage of PODNet over existing art, with accuracy gains of 12.10, 6.51, and 2.85 percentage points, respectively.

Keywords

Incremental-learning Representation-learning pooling 

Notes

Acknowledgement

E. Valle is funded by FAPESP grant 2019/05018-1 and CNPq grants 424958/2016-3 and 311905/2017-0. This work was performed using HPC resources from GENCI–IDRIS (Grant 2019-AD011011588). We also wish to thanks Estelle Thou for the helpful discussion.

Supplementary material

504476_1_En_6_MOESM1_ESM.pdf (218 kb)
Supplementary material 1 (pdf 217 KB)

References

  1. 1.
    Aljundi, R., Babiloni, F., Elhoseiny, M., Rohrbach, M., Tuytelaars, T.: Memory aware synapses: learning what (not) to forget. In: Proceedings of the IEEE European Conference on Computer Vision (ECCV) (2018)Google Scholar
  2. 2.
    Allen, K., Shelhamer, E., Shin, H., Tenenbaum, J.: Infinite mixture prototypes for few-shot learning. In: International Conference on Machine Learning (ICML) (2019)Google Scholar
  3. 3.
    Castro, F.M., Marín-Jiménez, M.J., Guil, N., Schmid, C., Alahari, K.: End-to-end incremental learning. In: Proceedings of the IEEE European Conference on Computer Vision (ECCV) (2018)Google Scholar
  4. 4.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)Google Scholar
  5. 5.
    Dhar, P., Singh, R.V., Peng, K.C., Wu, Z., Chellappa, R.: Learning without memorizing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  6. 6.
    Fernando, C., et al.: PathNet: evolution channels gradient descent in super neural networks. arXiv preprint library (2017)Google Scholar
  7. 7.
    French, R.: Catastrophic forgetting in connectionist networks. Trends Cogn. Sci. 3(4), 128–135 (1999)CrossRefGoogle Scholar
  8. 8.
    Gidaris, S., Komodakis, N.: Dynamic few-shot visual learning without forgetting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  9. 9.
    Goldberger, J., Hinton, G.E., Roweis, S.T., Salakhutdinov, R.R.: Neighbourhood components analysis. In: Advances in Neural Information Processing Systems (NeurIPS) (2005)Google Scholar
  10. 10.
    Golkar, S., Kagan, M., Cho, K.: Continual learning via neural pruning. In: Advances in Neural Information Processing Systems (NeurIPS), Neuro AI Workshop (2019)Google Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  12. 12.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: Advances in Neural Information Processing Systems (NeurIPS), Deep Learning and Representation Learning Workshop (2015)Google Scholar
  13. 13.
    Hou, S., Pan, X., Change Loy, C., Wang, Z., Lin, D.: Learning a unified classifier incrementally via rebalancing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  14. 14.
    Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Proceedings of the IEEE European Conference on Computer Vision (ECCV) (2016)Google Scholar
  15. 15.
    Kemker, R., Kanan, C.: FearNet: Brain-inspired model for incremental learning. In: Proceedings of the International Conference on Learning Representations (ICLR) (2018)Google Scholar
  16. 16.
    Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. In: Proceedings of the National Academy of Sciences (2017)Google Scholar
  17. 17.
    Komodakis, N., Zagoruyko, S.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: Proceedings of the International Conference on Learning Representations (ICLR) (2017)Google Scholar
  18. 18.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report (2009)Google Scholar
  19. 19.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. Object Categorization: Computer and Human Vision Perspectives. Cambridge University Press (2006)Google Scholar
  20. 20.
    Li, X., Zhou, Y., Wu, T., Socher, R., Xiong, C.: Learn to grow: a continual structure learning framework for overcoming catastrophic forgetting (2019)Google Scholar
  21. 21.
    Li, Z., Hoiem, D.: Learning without forgetting. In: Proceedings of the IEEE European Conference on Computer Vision (ECCV) (2016)Google Scholar
  22. 22.
    Lomonaco, V., Maltoni, D.: CORe50: a new dataset and benchmark for continuous object recognition. In: Annual Conference on Robot Learning (2017)Google Scholar
  23. 23.
    Lopez-Paz, D., Ranzato, M.: Gradient episodic memory for continual learning. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems (NeurIPS) (2017)Google Scholar
  24. 24.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (1999)Google Scholar
  25. 25.
    Luo, C., Zhan, J., Xue, X., Wang, L., Ren, R., Yang, Q.: Cosine normalization: Using cosine similarity instead of dot product in neural networks. In: International Conference on Artificial Neural Networks (2018)Google Scholar
  26. 26.
    Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No fuss distance metric learning using proxies. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  27. 27.
    Paszke, A., et al.: Automatic differentiation in PyTorch. In: Advances in Neural Information Processing Systems (NeurIPS), Autodiff Workshop (2017)Google Scholar
  28. 28.
    Qi, H., Brown, M., Lowe, D.G.: Low-shot learning with imprinted weights. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  29. 29.
    Qian, Q., Shang, L., Sun, B., Hu, J., Li, H., Jin, R.: SoftTriple loss: deep metric learning without triplet sampling. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2019)Google Scholar
  30. 30.
    Rebuffi, S.A., Kolesnikov, A., Sperl, G., Lampert, C.H.: iCaRL: incremental classifier and representation learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  31. 31.
    Robins, A.: Catastrophic forgetting, rehearsal and pseudorehearsal. Connection Sci. 7, 123–146 (1995)CrossRefGoogle Scholar
  32. 32.
    Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  33. 33.
    Shin, H., Lee, J.K., Kim, J., Kim, J.: Continual learning with deep generative replay. In: Advances in Neural Information Processing Systems (NeurIPS) (2017)Google Scholar
  34. 34.
    Thrun, S.: Lifelong learning algorithms. In: Thrun, S., Pratt, L. (eds.) Learning to Learn, pp. 181–209. Springer, Boston, MA (1998).  https://doi.org/10.1007/978-1-4615-5529-2_8CrossRefzbMATHGoogle Scholar
  35. 35.
    Wu, Y., et al.: Large scale incremental learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  36. 36.
    Yoon, J., Yang, E., Lee, J., Hwang, S.J.: Lifelong learning with dynamically expandable networks. In: Proceedings of the International Conference on Learning Representations (ICLR) (2018)Google Scholar
  37. 37.
    Zhou, P., Mai, L., Zhang, J., Xu, N., Wu, Z., Davis, L.S.: M2KD: multi-model and multi-level knowledge distillation for incremental learning. arXiv preprint library (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.HeuritechParisFrance
  2. 2.Sorbonne UniversityParisFrance
  3. 3.Valeo.aiParisFrance
  4. 4.University of CampinasCampinasBrazil

Personalised recommendations