Advertisement

Meta-learning with Network Pruning

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12364)

Abstract

Meta-learning is a powerful paradigm for few-shot learning. Although with remarkable success witnessed in many applications, the existing optimization based meta-learning models with over-parameterized neural networks have been evidenced to ovetfit on training tasks. To remedy this deficiency, we propose a network pruning based meta-learning approach for overfitting reduction via explicitly controlling the capacity of network. A uniform concentration analysis reveals the benefit of network capacity constraint for reducing generalization gap of the proposed meta-learner. We have implemented our approach on top of Reptile assembled with two network pruning routines: Dense-Sparse-Dense (DSD) and Iterative Hard Thresholding (IHT). Extensive experimental results on benchmark datasets with different over-parameterized deep networks demonstrate that our method not only effectively alleviates meta-overfitting but also in many cases improves the overall generalization performance when applied to few-shot classification tasks.

Keywords

Meta-learning Few-shot learning Network pruning Sparsity Generalization analysis 

Notes

Acknowledgements

Xiao-Tong Yuan is supported in part by National Major Project of China for New Generation of AI under Grant No.2018AAA0100400 and in part by Natural Science Foundation of China (NSFC) under Grant No.61876090 and No.61936005. Qingshan Liu is supported by NSFC under Grant No.61532009 and No.61825601.

Supplementary material

References

  1. 1.
    Abramovich, F., Grinshtein, V.: High-dimensional classification by sparse logistic regression. IEEE Trans. Inf. Theory 65(5), 3068–3079 (2019)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Arora, S., Ge, R., Neyshabur, B., Zhang, Y.: Stronger generalization bounds for deep nets via a compression approach. In: International Conference on Machine Learning, pp. 254–263 (2018)Google Scholar
  3. 3.
    Bengio, Y., Bengio, S., Cloutier, J.: Learning a synaptic learning rule. In: IJCNN (1990)Google Scholar
  4. 4.
    Denevi, G., Ciliberto, C., Grazzi, R., Pontil, M.: Learning-to-learn stochastic gradient descent with biased regularization. In: International Conference on Machine Learning, pp. 1566–1575 (2019)Google Scholar
  5. 5.
    Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1126–1135. JMLR. org (2017)Google Scholar
  6. 6.
    Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable neural networks. In: International Conference on Learning Representations (2019)Google Scholar
  7. 7.
    Han, S., et al.: Dsd: Dense-sparse-dense training for deep neural networks. In: International Conference on Learning Representations (2016)Google Scholar
  8. 8.
    Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems, pp. 1135–1143 (2015)Google Scholar
  9. 9.
    Hardt, M., Recht, B., Singer, Y.: Train faster, generalize better: stability of stochastic gradient descent. In: International Conference on Machine Learning, pp. 1225–1234 (2016)Google Scholar
  10. 10.
    Hassibi, B., Stork, D.G., Wolff, G.J.: Optimal brain surgeon and general network pruning. In: IEEE International Conference on Neural Networks, pp. 293–299. IEEE (1993)Google Scholar
  11. 11.
    Jin, X., Yuan, X., Feng, J., Yan, S.: Training skinny deep neural networks with iterative hard thresholding methods. arXiv preprint arXiv:1607.05423 (2016)
  12. 12.
    Khodak, M., Balcan, M.F., Talwalkar, A.: Provable guarantees for gradient-based meta-learning. In: Advances in Neural Information Processing Systems (2019)Google Scholar
  13. 13.
    Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. In: ICML deep learning workshop, vol. 2 (2015)Google Scholar
  14. 14.
    Lake, B., Salakhutdinov, R., Gross, J., Tenenbaum, J.: One shot learning of simple visual concepts. In: Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 33 (2011)Google Scholar
  15. 15.
    LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Advances in Neural Information Processing Systems, pp. 598–605 (1990)Google Scholar
  16. 16.
    Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10657–10665 (2019)Google Scholar
  17. 17.
    Li, Z., Zhou, F., Chen, F., Li, H.: Meta-SGD: learning to learn quickly for few-shot learning. In: Advances in Neural Information Processing Systems (2017)Google Scholar
  18. 18.
    Louizos, C., Welling, M., Kingma, D.P.: Learning sparse neural networks through \( l\_0 \) regularization. arXiv preprint arXiv:1712.01312 (2017)
  19. 19.
    Maurer, A., Pontil, M.: Structured sparsity and generalization. J. Mach. Learn. Res. 13(Mar), 671–690 (2012)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Mishra, N., Rohaninejad, M., Chen, X., Abbeel, P.: A simple neural attentive meta-learner. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=B1DmUzWAW
  21. 21.
    Nichol, A., Achiam, J., Schulman, J.: On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999 (2018)
  22. 22.
    Pillutla, V.K., Roulet, V., Kakade, S.M., Harchaoui, Z.: A smoother way to train structured prediction models. In: Advances in Neural Information Processing Systems, pp. 4766–4778 (2018)Google Scholar
  23. 23.
    Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: International Conference on Learning Representations (2016)Google Scholar
  24. 24.
    Ren, M., et al.: Meta-learning for semi-supervised few-shot classification. arXiv preprint arXiv:1803.00676 (2018)
  25. 25.
    Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T.: Meta-learning with memory-augmented neural networks. In: International Conference on Machine Learning, pp. 1842–1850 (2016)Google Scholar
  26. 26.
    Schmidhuber, J.: Evolutionary principles in self-referential learning. On learning how to learn: The meta-meta-... hook.) Diploma thesis, Institut f. Informatik, Technical University, Munich 1, 2 (1987)Google Scholar
  27. 27.
    Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: Advances in Neural Information Processing Systems, pp. 4077–4087 (2017)Google Scholar
  28. 28.
    Srinivas, S., Babu, R.V.: Data-free parameter pruning for deep neural networks. arXiv preprint arXiv:1507.06149 (2015)
  29. 29.
    Sung, F., Zhang, L., Xiang, T., Hospedales, T., Yang, Y.: Learning to learn: meta-critic networks for sample efficient learning. arXiv preprint arXiv:1706.09529 (2017)
  30. 30.
    Thrun, S., Pratt, L.: Learning to Learn. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-1-4615-5529-2CrossRefzbMATHGoogle Scholar
  31. 31.
    Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. In: Advances in Neural Information Processing Systems, pp. 3630–3638 (2016)Google Scholar
  32. 32.
    Weston, J., Chopra, S., Bordes, A.: Memory networks. arXiv preprint arXiv:1410.3916 (2014)
  33. 33.
    Yoon, J., Kim, T., Dia, O., Kim, S., Bengio, Y., Ahn, S.: Bayesian model-agnostic meta-learning. In: Advances in Neural Information Processing Systems, pp. 7332–7342 (2018)Google Scholar
  34. 34.
    Yuan, X.T., Li, P., Zhang, T.: Gradient hard thresholding pursuit. J. Mach. Learn. Res. 18, 1–43 (2018)MathSciNetzbMATHGoogle Scholar
  35. 35.
    Zhou, P., Yuan, X., Xu, H., Yan, S., Feng, J.: Efficient meta learning via minibatch proximal update. In: Advances in Neural Information Processing Systems, pp. 1532–1542 (2019)Google Scholar
  36. 36.
    Zintgraf, L., Shiarli, K., Kurin, V., Hofmann, K., Whiteson, S.: Fast context adaptation via meta-learning. In: International Conference on Machine Learning, pp. 7693–7702 (2019)Google Scholar
  37. 37.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  38. 38.
    Rigollet, P.: 18. s997: High dimensional statistics. Lecture Notes. MIT Open-CourseWare, Cambridge (2015)Google Scholar
  39. 39.
    Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.B-DAT LabNanjing University of Information Science and TechnologyNanjingChina
  2. 2.JD Finance America CorporationMountain ViewUSA

Personalised recommendations