Advertisement

Controlling Model Complexity in Probabilistic Model-Based Dynamic Optimization of Neural Network Structures

  • Shota SaitoEmail author
  • Shinichi Shirakawa
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11728)

Abstract

A method of simultaneously optimizing both the structure of neural networks and the connection weights in a single training loop can reduce the enormous computational cost of neural architecture search. We focus on the probabilistic model-based dynamic neural network structure optimization that considers the probability distribution of structure parameters and simultaneously optimizes both the distribution parameters and connection weights based on gradient methods. Since the existing algorithm searches for the structures that only minimize the training loss, this method might find overly complicated structures. In this paper, we propose the introduction of a penalty term to control the model complexity of obtained structures. We formulate a penalty term using the number of weights or units and derive its analytical natural gradient. The proposed method minimizes the objective function injected the penalty term based on the stochastic gradient descent. We apply the proposed method in the unit selection of a fully-connected neural network and the connection selection of a convolutional neural network. The experimental results show that the proposed method can control model complexity while maintaining performance.

Keywords

Neural networks Structure optimization Stochastic natural gradient Model complexity Stochastic relaxation 

Notes

Acknowledgment

This work is partially supported by the SECOM Science and Technology Foundation.

References

  1. 1.
    Akimoto, Y., Shirakawa, S., Yoshinari, N., Uchida, K., Saito, S., Nishida, K.: Adaptive stochastic natural gradient method for one-shot neural architecture search. In: International Conference on Machine Learning (ICML), pp. 171–180 (2019)Google Scholar
  2. 2.
    Amari, S.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998).  https://doi.org/10.1162/089976698300017746CrossRefGoogle Scholar
  3. 3.
    Dong, J., Cheng, A., Juan, D., Wei, W., Sun, M.: PPP-Net: platform-aware progressive search for pareto-optimal neural architectures. In: International Conference on Learning Representations (ICLR) Workshop (2018)Google Scholar
  4. 4.
    Elsken, T., Metzen, J.H., Hutter, F.: Efficient multi-objective neural architecture search via Lamarckian evolution. In: International Conference on Learning Representations (ICLR) (2019)Google Scholar
  5. 5.
    Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural networks. In: Neural Information Processing Systems (NIPS) (2015)Google Scholar
  6. 6.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034 (2015).  https://doi.org/10.1109/ICCV.2015.123
  7. 7.
    Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269 (2017).  https://doi.org/10.1109/CVPR.2017.243
  8. 8.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML), pp. 448–456 (2015)Google Scholar
  9. 9.
    Kandasamy, K., Neiswanger, W., Schneider, J., Poczos, B., Xing, E.: Neural architecture search with Bayesian optimisation and optimal transport. In: Neural Information Processing Systems (NIPS) (2018)Google Scholar
  10. 10.
    Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: International Conference on Learning Representations (ICLR) (2019)Google Scholar
  11. 11.
    Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: IEEE International Conference on Computer Vision (ICCV), pp. 2736–2744 (2017)Google Scholar
  12. 12.
    Okuta, R., Unno, Y., Nishino, D., Hido, S., Loomis, C.: CuPy: a NumPy-compatible library for NVIDIA GPU calculations. In: Workshop on Machine Learning Systems (LearningSys) in the 31st Annual Conference on Neural Information Processing Systems (NIPS) (2017)Google Scholar
  13. 13.
    Ollivier, Y., Arnold, L., Auger, A., Hansen, N.: Information-geometric optimization algorithms: a unifying picture via invariance principles. J. Mach. Learn. Res. 18(18), 1–65 (2017)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Pham, H., Guan, M., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameters sharing. In: International Conference on Machine Learning (ICML), pp. 4095–4104 (2018)Google Scholar
  15. 15.
    Real, E., et al.: Large-scale evolution of image classifiers. In: International Conference on Machine Learning (ICML), pp. 2902–2911 (2017)Google Scholar
  16. 16.
    Saito, S., Shirakawa, S., Akimoto, Y.: Embedded feature selection using probabilistic model-based optimization. In: Genetic and Evolutionary Computation Conference (GECCO) Companion, pp. 1922–1925 (2018).  https://doi.org/10.1145/3205651.3208227
  17. 17.
    Shirakawa, S., Iwata, Y., Akimoto, Y.: Dynamic optimization of neural network structures using probabilistic modeling. In: The 32nd AAAI Conference on Artificial Intelligence (AAAI-18), pp. 4074–4082 (2018)Google Scholar
  18. 18.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  19. 19.
    Suganuma, M., Shirakawa, S., Nagao, T.: A genetic programming approach to designing convolutional neural network architectures. In: Genetic and Evolutionary Computation Conference (GECCO), pp. 497–504 (2017).  https://doi.org/10.1145/3071178.3071229
  20. 20.
    Tan, M., et al.: MnasNet: platform-aware neural architecture search for mobile. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019Google Scholar
  21. 21.
    Tokui, S., Oono, K., Hido, S., Clayton, J.: Chainer: a next-generation open source framework for deep learning. In: Workshop on Machine Learning Systems (LearningSys) in Neural Information Processing Systems (NIPS), pp. 1–6 (2015)Google Scholar
  22. 22.
    Zagoruyko, S., Komodakis, N.: Wide residual networks. In: British Machine Vision Conference (BMVC), pp. 87.1-87.12 (2016).  https://doi.org/10.5244/C.30.87
  23. 23.
    Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: International Conference on Learning Representations (ICLR) (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Yokohama National UniversityYokohamaJapan
  2. 2.SkillUp AI Co., Ltd.TokyoJapan

Personalised recommendations