Advertisement

Imbalanced Continual Learning with Partitioning Reservoir Sampling

Conference paper
  • 528 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12358)

Abstract

Continual learning from a sequential stream of data is a crucial challenge for machine learning research. Most studies have been conducted on this topic under the single-label classification setting along with an assumption of balanced label distribution. This work expands this research horizon towards multi-label classification. In doing so, we identify unanticipated adversity innately existent in many multi-label datasets, the long-tailed distribution. We jointly address the two independently solved problems, Catastropic Forgetting and the long-tailed label distribution by first empirically showing a new challenge of destructive forgetting of the minority concepts on the tail. Then, we curate two benchmark datasets, COCOseq and NUS-WIDEseq, that allow the study of both intra- and inter-task imbalances. Lastly, we propose a new sampling strategy for replay-based approach named Partitioning Reservoir Sampling (PRS), which allows the model to maintain a balanced knowledge of both head and tail classes. We publicly release the dataset and the code in our project page.

Keywords

Imbalanced learning Continual learning Multi-label classification Long-tailed distribution Online learning 

Notes

Acknowledgements

We express our gratitude for the helpful comments on the manuscript by Soochan Lee, Junsoo Ha and Hyunwoo Kim. This work was supported by Samsung Advanced Institute of Technology, Institute of Information & communications Technology Planning & Evaluation (IITP) grant (No.2019-0-01082, SW StarLab) and the international cooperation program by the NRF of Korea (NRF-2018K2A9A2A11080927).

References

  1. 1.
    Aljundi, R.: Continual Learning in Neural Networks. Ph.D. thesis, Department of Electrical Engineering, KU Leuven (2019)Google Scholar
  2. 2.
    Aljundi, R., Marcus, R., Tuytelaars, T.: Selfless sequential learning. arXiv preprint arXiv:1806.05421 (2019)
  3. 3.
    Aljundi, R., Lin, M., Goujaud, B., Bengio, Y.: Gradient based sample selection for online continual learning. In: Advances in Neural Information Processing Systems, pp. 11816–11825 (2019)Google Scholar
  4. 4.
    Bankier, M.: Power allocations: determining sample sizes for subnational areas. Am. Stat. 42(3), 174–177 (1988)Google Scholar
  5. 5.
    Batista, G., Prati, R., Monard, M.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor 6, 20–29 (2004)CrossRefGoogle Scholar
  6. 6.
    Brahma, P.P., Othon, A.: Subset replay based continual learning for scalable improvement of autonomous systems. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1179–11798. IEEE (2018)Google Scholar
  7. 7.
    Buda, M., Maki, A., Mazuorowski, M.: A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 106, 249–259 (2018)CrossRefGoogle Scholar
  8. 8.
    Carroll, J.: Allocation of a sample between states. Australian Bureau of Census and Statistics (1970)Google Scholar
  9. 9.
    Caruaca, R.: Multitask learning. Mach. Learn. 28, 41–75 (1997)CrossRefGoogle Scholar
  10. 10.
    Chaudhry, A., Dokania, P.K., Ajanthan, T., Torr, P.H.: Riemannian walk for incremental learning: Understanding forgetting and intransigence. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 532–547 (2018)Google Scholar
  11. 11.
    Chaudhry, A., Ranzato, M., Rohrbach, M., Elhoseiny, M.: Efficient lifelong learning with a-gem. arXiv preprint arXiv:1812.00420 (2019)
  12. 12.
    Chaudhry, A., et al.: On tiny episodic memories in continual learning. arXiv preprint arXiv:1902.10486v4 (2019)
  13. 13.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRefGoogle Scholar
  14. 14.
    Chen, C., Liaw, A., Breiman, L., et al.: Using random forest to learn imbalanced data. Univ. Calif. Berkeley 110(1–12), 24 (2004)Google Scholar
  15. 15.
    Chua, T.S., Tang, J., Hong, R., Li, H., luo, Z., Zheng, Y.: Nus-wide: a real-world web image database from national university of singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, pp. 1–9 (2009)Google Scholar
  16. 16.
    Cui, Y., Jia, M., Lin, T., Song, Y., Belongie, S.: Class-balanced loss based on effective number of samples. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9268–9277 (2019)Google Scholar
  17. 17.
    d’Autume, C., Ruder, S., Kong, L., Yogatama, D.: Episodic memory in lifelong language learning. In: Advances in Neural Information Processing Systems, pp. 13143–13152 (2019)Google Scholar
  18. 18.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)Google Scholar
  19. 19.
    Dong, Q., Gong, S., Zhu, X.: Imbalanced deep learning by minority class incremental rectification. TPAMI 41, 1367–1381 (2019)CrossRefGoogle Scholar
  20. 20.
    Douzas, G., Bacao, F.: Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Syst. Appl. 91, 464–471 (2018)CrossRefGoogle Scholar
  21. 21.
    Farquhar, S., Gal, Y.: Towards robust evaluations of continual learning. arXiv preprint arXiv:1805.09733 (2019)
  22. 22.
    Fellegi, I.P.: Should the Census Counts Be Adjusted for Allocation Purposes?-Equity Considerations. In: Current Topics in Survey Sampling, pp. 47–76 (1981)Google Scholar
  23. 23.
    French, R.: Catastrophic forgetting in connectionist networks. Trends Cogn. Sci. 3(4), 128–135 (1999)CrossRefGoogle Scholar
  24. 24.
    Ge, W., Yang, S., Yu, Y.: Multi-evidence filtering and fusion for multi-label classification, object detection and semantic segmentation based on weakly supervised learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1277–1286 (2018)Google Scholar
  25. 25.
    Guillaumin, M., Mensink, T., Verbeek, J., Schmid, C.: Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 309–316. IEEE (2009)Google Scholar
  26. 26.
    Guo, H., Zheng, K., Fan, X., Yu, H., Wang, S.: Visual attention consistency under image transforms for multi-label image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 729–739 (2019)Google Scholar
  27. 27.
    Han, X., Yu, P., Liu, Z., Sun, M., Li, P.: Hierarchical relation extraction with coarse-to-fine grained attention. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2236–2245 (2018)Google Scholar
  28. 28.
    Hayes, T.L., Cahill, N.D., Kanan, C.: Memory efficient experience replay for streaming learning. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 9769–9776. IEEE (2019)Google Scholar
  29. 29.
    He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 9, 1263–1284 (2008)Google Scholar
  30. 30.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778 (2016)Google Scholar
  31. 31.
    Huang, C., Li, Y., Change Loy, C., Tang, X.: Learning deep representation for imbalanced classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5375–5384 (2016)Google Scholar
  32. 32.
    Isele, D., Cosgun, A.: Selective experience replay for lifelong learning. arXiv preprint arXiv:1802.10269 (2018)
  33. 33.
    Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6, 429–449 (2002)CrossRefGoogle Scholar
  34. 34.
    Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2015)
  35. 35.
    Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. In: Proceedings of the National Academy of Sciences (2017)Google Scholar
  36. 36.
    Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Progress Artif. Intell. 5, 221–232 (2016)CrossRefGoogle Scholar
  37. 37.
    Lee, S., Ha, J., Zhang, D., Kim, G.: A neural dirichlet process mixture model for task-free continual learning. arXiv preprint arXiv:2001.00689 (2020)
  38. 38.
    Lesort, T., Gepperth, A., Stoian, A., Filliat, D.: Marginal replay vs conditional replay for continual learning. In: Tetko, I.V., Kůrková, V., Karpov, P., Theis, F. (eds.) ICANN 2019. LNCS, vol. 11728, pp. 466–480. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-30484-3_38CrossRefGoogle Scholar
  39. 39.
    Li, Y., Zhao, L., Church, K., Elhoseiny, M.: Compositional continual language learning. In: International Conference on Learning Representations (2020)Google Scholar
  40. 40.
    Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  41. 41.
    Liu, Y., Cong, Y., Sun, G.: L3doc: lifelong 3D object classification. arXiv preprint arXiv:1912.06135 (2019)
  42. 42.
    Liu, Y., Sheng, L., Shao, J., Yan, J., Xiang, S., Pan, C.: Multi-label image classification via knowledge distillation from weakly-supervised detection. In: Proceedings of the 26th ACM international conference on Multimedia, pp. 700–708 (2018)Google Scholar
  43. 43.
    Liu, Z., Miao, Z., Zhan, X., Wang, J., Gong, B., Yu, S.X.: Large-scale long-tailed recognition in an open world. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2537–2546 (2019)Google Scholar
  44. 44.
    Lopez-Paz, D., Ranzato, M.: Gradient episodic memory for continual learning. In: Advances in Neural Information Processing Systems, pp. 6467–6476 (2017)Google Scholar
  45. 45.
    van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2008)zbMATHGoogle Scholar
  46. 46.
    Maciejewski, T., Stefanowski, J.: Local neighbourhood extension of smote for mining imbalanced data. In: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 104–111. IEEE (2011)Google Scholar
  47. 47.
    Makadia, A., Pavlovic, V., Kumar, S.: A new baseline for image annotation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5304, pp. 316–329. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-88690-7_24CrossRefGoogle Scholar
  48. 48.
    Maltoni, D., Lomonaco, V.: Continuous learning in single-incremental-task scenarios. Elsevier Neural Netw. J. 116, 56–73 (2019)CrossRefGoogle Scholar
  49. 49.
    McCloskey, M., Cohen, N.J.: Catastrophic interference in conncectionist networks. Psychol. Learn. Motiv. 24, 109–265 (1989)CrossRefGoogle Scholar
  50. 50.
    Newman, M.: Power laws, pareto distributions and zipf’s law. Contemp. Phys. 46, 323–351 (2005)CrossRefGoogle Scholar
  51. 51.
    Nguyen, G., Jun, T.J., Tran, T., Kim, D.: Contcap: a comprehensive framework for continual image captioning. arXiv preprint arXiv:1909.08745 (2019)
  52. 52.
    Ouyang, W., Wang, X., Zhang, C., Yang, X.: Factors in finetuning deep model for object detection with long-tail distribution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 864–873 (2016)Google Scholar
  53. 53.
    Parisi, G.I., Tani, J., Weber, C., Wermter, S.: Lifelong learning of human actions with deep neural network self-organization. Neural Netw. 96, 137–149 (2017)CrossRefGoogle Scholar
  54. 54.
    Parisi, G.I., Tani, J., Weber, C., Wermter, S.: Lifelong learning of spatiotemporal representations with dual-memory recurrent self-organization. Front. Neurorobotics 12, 78 (2018)CrossRefGoogle Scholar
  55. 55.
    Ratcliff, R.: Conncectionist models of recognition memory: constraints imposed by learning and forgetting functions. Pscyhol. Rev. 97(2), 285–308 (1990)CrossRefGoogle Scholar
  56. 56.
    Reed, W.J.: The pareto, zipf and other power laws. Econ. Lett. 74, 15–19 (2001)CrossRefGoogle Scholar
  57. 57.
    Riemer, M., et al.: Learning to learn without forgetting by maximizing transfer and minimizing interference. arXiv preprint arXiv:1810.11910 (2019)
  58. 58.
    Rolnick, D., Ahuja, A., Schwarz, J., Lillicrap, T.P., Wayne, G.: Experience replay for continual learning. In: Advances in Neural Information Processing Systems, pp. 350–360 (2019)Google Scholar
  59. 59.
    Rusu, A.A., et al.: Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016)
  60. 60.
    Shin, H., Lee, J.K., Kim, J., Kim, J.: Continual learning with deep generative replay. In: Advances in Neural Information Processing Systems, pp. 2990–2999 (2017)Google Scholar
  61. 61.
    Shmelkov, K., Schmid, C., Alahari, K.: Incremental learning of object detectors without catastrophic forgetting. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3400–3409 (2017)Google Scholar
  62. 62.
    Van Horn, G., Perona, P.: The devil is in the tails: fine-grained classification in the wild. arXiv preprint arXiv:1709.01450 (2017)
  63. 63.
    van de Ven, G.M., Andreas, S.T.: Three scenarios for continual learning. In: NeurIPS Continual Learning workshop (2019)Google Scholar
  64. 64.
    Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. (TOMS) 11(1), 37–57 (1985)MathSciNetCrossRefGoogle Scholar
  65. 65.
    Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., Xu, W.: Cnn-rnn: a unified framework for multi-label image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2285–2294 (2016)Google Scholar
  66. 66.
    Wang, Y., Ramana, D., Hebert, M.: Learning to model the tail. In: Advances in Neural Information Processing Systems, pp. 7029–7039 (2017)Google Scholar
  67. 67.
    Wang, Z., Chen, T., Li, G., Xu, R., Lin, L.: Multi-label image recognition by recurrently discovering attentional regions. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 464–472 (2017)Google Scholar
  68. 68.
    Wei, Z.M.C.X.S., Wang, P., Guo, Y.: Multi-label image recognition with graph convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5177–5186 (2019)Google Scholar
  69. 69.
    Yin, X., Yu, X., Sohn, K., Liu, X., Chandraker, M.: Feature transfer learning for deep face recognition with under-represented data. arXiv preprint arXiv:1803.09014 (2019)
  70. 70.
    Yoon, J., Yang, E., Lee, J., Hwang, S.J.: Lifelong learning with dynamically expandable networks. arXiv preprint arXiv:1708.01547 (2018)
  71. 71.
    Zenke, F., Poole, B., Ganguli, S.: Continual learning through syanptic intelligence. Proc. Mach. Learn. Res. 70, 3987 (2017)Google Scholar
  72. 72.
    Zhang, N., et al.: Long-tail relation extraction via knowledge graph embeddings and graph convolution networks. arXiv preprint arXiv:1903.01306 (2019)
  73. 73.
    Zhu, F., Li, H., Ouyang, W., Yu, N., Wang, X.: Learning spatial regularization with imagelevel supervisions for multi-label image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5513–5522 (2017)Google Scholar
  74. 74.
    Zhu, X., Anguelov, D., Ramanan, D.: Capturing long-tail distributions of object subcategories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 915–922 (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Neural Processing Research Center, Seoul National UniversitySeoulKorea

Personalised recommendations