DENSER: deep evolutionary network structured representation
Abstract
Deep evolutionary network structured representation (DENSER) is a novel evolutionary approach for the automatic generation of deep neural networks (DNNs) which combines the principles of genetic algorithms (GAs) with those of dynamic structured grammatical evolution (DSGE). The GA-level encodes the macro structure of evolution, i.e., the layers, learning, and/or data augmentation methods (among others); the DSGE-level specifies the parameters of each GA evolutionary unit and the valid range of the parameters. The use of a grammar makes DENSER a general purpose framework for generating DNNs: one just needs to adapt the grammar to be able to deal with different network and layer types, problems, or even to change the range of the parameters. DENSER is tested on the automatic generation of convolutional neural networks (CNNs) for the CIFAR-10 dataset, with the best performing networks reaching accuracies of up to 95.22%. Furthermore, we take the fittest networks evolved on the CIFAR-10, and apply them to classify MNIST, Fashion-MNIST, SVHN, Rectangles, and CIFAR-100. The results show that the DNNs discovered by DENSER during evolution generalise, are robust, and scale. The most impressive result is the 78.75% classification accuracy on the CIFAR-100 dataset, which, to the best of our knowledge, sets a new state-of-the-art on methods that seek to automatically design CNNs.
Keywords
Automated machine learning NeuroEvolution Deep neural networks Convolutional neural networks Dynamic structured grammatical evolutionNotes
Acknowledgements
This work is partially funded by: Fundação para a Ciência e Tecnologia (FCT), Portugal, under the Grant SFRH/BD/114865/2016. We would also like to thank NVIDIA for providing us Titan X GPUs.
References
- 1.M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, Tensorflow: a system for large-scale machine learning. OSDI 16, 265–283 (2016)Google Scholar
- 2.F. Ahmadizar, K. Soltanian, F. AkhlaghianTab, I. Tsoulos, Artificial neural network development by means of a novel combination of grammatical evolution and genetic algorithm. Eng. Appl. Artif. Intell. 39, 1–13 (2015)CrossRefGoogle Scholar
- 3.F. Assunção, N. Lourenço, P. Machado, B. Ribeiro, Evolving the topology of large scale deep neural networks, in European Conference on Genetic Programming. Springer, pp. 19–34 (2018)Google Scholar
- 4.F. Assunção, N. Lourenço, P. Machado, B. Ribeiro, Towards the evolution of multi-layered neural networks: A dynamic structured grammatical evolution approach, in Proceedings of the Genetic and Evolutionary Computation Conference, GECCO’17. ACM, New York, NY, USA, pp. 393–400 (2017). https://doi.org/10.1145/3071178.3071286
- 5.T. Bäck, H.P. Schwefel, An overview of evolutionary algorithms for parameter optimization. Evol. Comput. 1(1), 1–23 (1993)CrossRefGoogle Scholar
- 6.B. Baker, O. Gupta, N. Naik, R. Raskar, Designing neural network architectures using reinforcement learning. arXiv preprint arXiv:1611.02167 (2016)
- 7.A. Baldominos, Y. Saez, P. Isasi, Evolutionary design of convolutional neural networks for human activity recognition in sensor-rich environments. Sensors (14248220) 18(4) (2018)CrossRefGoogle Scholar
- 8.J. Bergstra, Y. Bengio, Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)MathSciNetzbMATHGoogle Scholar
- 9.J.S. Bergstra, R. Bardenet, Y. Bengio, B. Kégl, Algorithms for hyper-parameter optimization, in Advances in Neural Information Processing Systems, pp. 2546–2554 (2011)Google Scholar
- 10.C.M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics) (Springer, Berlin, Heidelberg, 2006)zbMATHGoogle Scholar
- 11.F. Chollet, et al.: Keras. https://keras.io (2015)
- 12.Z. Chunhong, J. Licheng, Automatic parameters selection for SVM based on GA, in Intelligent Control and Automation, 2004. WCICA 2004. Fifth World Congress on, vol. 2. IEEE, pp. 1869–1872 (2004)Google Scholar
- 13.K.B. Duan, S.S. Keerthi, Which is the best multiclass SVM method? An empirical study, in International Workshop on Multiple Classifier Systems. Springer, pp. 278–285 (2005)Google Scholar
- 14.C. Fernando, D. Banarse, C. Blundell, Y. Zwols, D. Ha, A.A. Rusu, A. Pritzel, D. Wierstra, Pathnet: Evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734 (2017)
- 15.M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, F. Hutter, Efficient and robust automated machine learning, in Advances in Neural Information Processing Systems, pp. 2962–2970 (2015)Google Scholar
- 16.D. Floreano, P. Dürr, C. Mattiussi, Neuroevolution: from architectures to learning. Evol. Intell. 1(1), 47–62 (2008)CrossRefGoogle Scholar
- 17.F. Gomez, J. Schmidhuber, R. Miikkulainen, Accelerated neural evolution through cooperatively coevolved synapses. J. Mach. Learn. Res. 9(May), 937–965 (2008)MathSciNetzbMATHGoogle Scholar
- 18.I. Goodfellow, Y. Bengio, A. Courville, Y. Bengio, Deep Learning, vol. 1 (MIT Press, Cambridge, 2016)zbMATHGoogle Scholar
- 19.I.J. Goodfellow, Y. Bulatov, J. Ibarz, S. Arnoud, V. Shet, Multi-digit number recognition from street view imagery using deep convolutional neural networks. arXiv preprint arXiv:1312.6082 (2013)
- 20.B. Graham, Fractional max-pooling. arXiv preprint arXiv:1412.6071 (2014)
- 21.I. Guyon, K. Bennett, G. Cawley, H.J. Escalante, S. Escalera, T.K. Ho, N. Macia, B. Ray, M. Saeed, A. Statnikov, et al.: Design of the 2015 ChaLearn AutoML challenge, in Neural Networks (IJCNN), 2015 International Joint Conference on. IEEE, pp. 1–8 (2015)Google Scholar
- 22.I. Guyon, I. Chaabane, H.J. Escalante, S. Escalera, D. Jajetic, J.R. Lloyd, N. Macià, B. Ray, L. Romaszko, M. Sebag, et al.: A brief review of the ChaLearn AutoML challenge: any-time any-dataset learning without human intervention, in Workshop on Automatic Machine Learning, pp. 21–30 (2016)Google Scholar
- 23.N. Hansen, A. Ostermeier, Completely derandomized self-adaptation in evolution strategies. Evol. Comput. 9(2), 159–195 (2001)CrossRefGoogle Scholar
- 24.S.A. Harp, T. Samad, A. Guha, Designing application-specific neural networks using the genetic algorithm, in Advances in Neural Information Processing Systems, pp. 447–454 (1990)Google Scholar
- 25.K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
- 26.Á.B. Jiménez, J.L. Lázaro, J.R. Dorronsoro, Finding optimal model parameters by deterministic and annealed focused grid search. Neurocomputing 72(13–15), 2824–2832 (2009)CrossRefGoogle Scholar
- 27.J.Y. Jung, J.A. Reggia, Evolutionary design of neural network architectures using a descriptive encoding language. IEEE Trans. Evol. Comput. 10(6), 676–688 (2006)CrossRefGoogle Scholar
- 28.J.D. Kelly Jr., L. Davis, A hybrid genetic algorithm for classification, in Proceedings of the 12th International Joint Conference on Artificial Intelligence, Sydney, Australia, ed. by J. Mylopoulos, R. Reiter (Morgan Kaufmann, San Francisco, 1991), pp. 645–650. http://ijcai.org/Proceedings/91-2/Papers/006.pdfGoogle Scholar
- 29.D. Khritonenko, V. Stanovov, E. Semenkin, Applying an instance selection method to an evolutionary neural classifier design, in IOP Conference Series: Materials Science and Engineering, vol. 173. IOP Publishing, p. 012007 (2017)Google Scholar
- 30.H. Kitano, Designing neural networks using genetic algorithms with graph generation system. Complex Syst. 4(4), 461–476 (1990)zbMATHGoogle Scholar
- 31.B. Komer, J. Bergstra, C. Eliasmith, Hyperopt-sklearn: automatic hyperparameter configuration for scikit-learn, in ICML Workshop on AutoML (2014)Google Scholar
- 32.A. Krizhevsky, G. Hinton, Learning multiple layers of features from tiny images (2009)Google Scholar
- 33.Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
- 34.F.H.F. Leung, H.K. Lam, S.H. Ling, P.K.S. Tam, Tuning of the structure and parameters of a neural network using an improved genetic algorithm. IEEE Trans. Neural Netw. 14(1), 79–88 (2003)CrossRefGoogle Scholar
- 35.I. Loshchilov, F. Hutter, CMA-ES for hyperparameter optimization of deep neural networks. arXiv preprint arXiv:1604.07269 (2016)
- 36.N. Lourenço, F. Assunção, F.B. Pereira, E. Costa, P. Machado, Structured grammatical evolution: a dynamic approach, in Handbook of Grammatical Evolution, ed. by C. Ryan, M. O’Neill, J. Collins (Springer, Berlin, 2018). https://doi.org/10.1007/978-3-319-78717-6 CrossRefGoogle Scholar
- 37.N. Lourenço, F.B. Pereira, E. Costa, Unveiling the properties of structured grammatical evolution. Genet. Program. Evol. Mach. 17(3), 251–289 (2016)CrossRefGoogle Scholar
- 38.R. Miikkulainen, J. Liang, E. Meyerson, A. Rawal, D. Fink, O. Francon, B. Raju, A. Navruzyan, N. Duffy, B. Hodjat, Evolving deep neural networks. arXiv preprint arXiv:1703.00548 (2017)
- 39.J.F. Miller, Cartesian genetic programming, in Cartesian Genetic Programming. Springer, pp. 17–34 (2011)Google Scholar
- 40.J. Močkus, On bayesian methods for seeking the extremum, in Optimization Techniques IFIP Technical Conference. Springer, pp. 400–404 (1975)Google Scholar
- 41.D.E. Moriarty, R. Miikkulainen, Forming neural networks through efficient and adaptive coevolution. Evol. Comput. 5(4), 373–399 (1997)CrossRefGoogle Scholar
- 42.G. Morse, K.O. Stanley, Simple evolutionary optimization can rival stochastic gradient descent in neural networks, in Proceedings of the 2016 on Genetic and Evolutionary Computation Conference. ACM, pp. 477–484 (2016)Google Scholar
- 43.Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, A.Y. Ng, Reading digits in natural images with unsupervised feature learning, in NIPS Workshop on Deep Learning and Unsupervised Feature Learning, vol. 2011, p. 5 (2011)Google Scholar
- 44.M. O’Neil, C. Ryan, Grammatical evolution, in Grammatical Evolution. Springer, pp. 33–47 (2003)Google Scholar
- 45.F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
- 46.A. Radi, R. Poli, Discovering efficient learning rules for feedforward neural networks using genetic programming, in Recent Advances in Intelligent Paradigms and Applications. Springer, pp. 133–159 (2003)Google Scholar
- 47.E. Real, S. Moore, A. Selle, S. Saxena, Y.L. Suematsu, Q. Le, A. Kurakin, Large-scale evolution of image classifiers. arXiv preprint arXiv:1703.01041 (2017)
- 48.M. Rocha, P. Cortez, J. Neves, Evolution of neural networks for classification and regression. Neurocomputing 70(16), 2809–2816 (2007)CrossRefGoogle Scholar
- 49.B. Schuller, S. Reiter, G. Rigoll, Evolutionary feature generation in speech emotion recognition, in 2006 IEEE International Conference on Multimedia and Expo. IEEE, pp. 5–8 (2006)Google Scholar
- 50.P. Sermanet, S. Chintala, Y. LeCun, Convolutional neural networks applied to house numbers digit classification, in 2012 21st International Conference on Pattern Recognition (ICPR). IEEE, pp. 3288–3291 (2012)Google Scholar
- 51.P.Y. Simard, D. Steinkraus, J.C. Platt et al., Best practices for convolutional neural networks applied to visual document analysis. ICDAR 3, 958–962 (2003)Google Scholar
- 52.K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
- 53.J. Snoek, H. Larochelle, R.P. Adams, Practical Bayesian optimization of machine learning algorithms, in Advances in Neural Information Processing Systems, pp. 2951–2959 (2012)Google Scholar
- 54.J. Snoek, O. Rippel, K. Swersky, R. Kiros, N. Satish, N. Sundaram, M. Patwary, M. Prabhat, R. Adams, Scalable Bayesian optimization using deep neural networks, in International Conference on Machine Learning, pp. 2171–2180 (2015)Google Scholar
- 55.K. Soltanian, F.A. Tab, F.A. Zar, I. Tsoulos, Artificial neural networks generation using grammatical evolution, in 2013 21st Iranian Conference on Electrical Engineering (ICEE). IEEE, pp. 1–5 (2013)Google Scholar
- 56.K.O. Stanley, D.B. D’Ambrosio, J. Gauci, A hypercube-based encoding for evolving large-scale neural networks. Artif. Life 15(2), 185–212 (2009)CrossRefGoogle Scholar
- 57.K.O. Stanley, R. Miikkulainen, Evolving neural networks through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002)CrossRefGoogle Scholar
- 58.V. Stanovov, E. Semenkin, O. Semenkina, Instance selection approach for self-configuring hybrid fuzzy evolutionary algorithm for imbalanced datasets, in International Conference in Swarm Intelligence. Springer, pp. 451–459 (2015)Google Scholar
- 59.M. Suganuma, S. Shirakawa, T. Nagao, A genetic programming approach to designing convolutional neural network architectures, in Proceedings of the Genetic and Evolutionary Computation Conference, GECCO’17. ACM, New York, NY, USA, pp. 497–504 (2017). https://doi.org/10.1145/3071178.3071229
- 60.C. Thornton, F. Hutter, H.H. Hoos, K. Leyton-Brown, Auto-weka: Combined selection and hyperparameter optimization of classification algorithms, in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp. 847–855 (2013)Google Scholar
- 61.A.J. Turner, J.F. Miller, Cartesian genetic programming encoded artificial neural networks: a comparison using three benchmarks, in Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation. ACM, pp. 1005–1012 (2013)Google Scholar
- 62.P. Verbancsics, J. Harguess, Image classification using generative neuro evolution for deep learning in 2015 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, pp. 488–493 (2015)Google Scholar
- 63.D. Whitley, T. Starkweather, C. Bogart, Genetic algorithms and neural networks: optimizing connections and connectivity. Parallel Comput. 14(3), 347–361 (1990)CrossRefGoogle Scholar
- 64.I.H. Witten, E. Frank, M.A. Hall, C.J. Pal, Data Mining: Practical Machine Learning Tools And Techniques (Morgan Kaufmann, San Francisco, 2016)Google Scholar
- 65.H. Xiao, K. Rasul, R. Vollgraf, Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms (2017)Google Scholar
- 66.B. Xue, M. Zhang, W.N. Browne, X. Yao, A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016)CrossRefGoogle Scholar
- 67.X. Yao, Evolving artificial neural networks. Proc. IEEE 87(9), 1423–1447 (1999)CrossRefGoogle Scholar
- 68.J. Yu, B. Bhanu, Evolutionary feature synthesis for facial expression recognition. Pattern Recognit. Lett. 27(11), 1289–1298 (2006)CrossRefGoogle Scholar