A Framework for Selecting Deep Learning Hyper-parameters

  • Jim O’ Donoghue
  • Mark Roantree
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9147)


Recent research has found that deep learning architectures show significant improvements over traditional shallow algorithms when mining high dimensional datasets. When the choice of algorithm employed, hyper-parameter setting, number of hidden layers and nodes within a layer are combined, the identification of an optimal configuration can be a lengthy process. Our work provides a framework for building deep learning architectures via a stepwise approach, together with an evaluation methodology to quickly identify poorly performing architectural configurations. Using a dataset with high dimensionality, we illustrate how different architectures perform and how one algorithm configuration can provide input for fine-tuning more complex models.


Hide Layer Hide Node Random Search Feature Representation Stochastic Gradient Descent 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Arauzo-Azofra, A., Aznarte, J.L., Bentez, J.M.: Empirical study of feature selection methods based on individual feature evaluation for classification problems. Expert Syst. Appl. 38(7), 8170–8177 (2011)CrossRefGoogle Scholar
  2. 2.
    Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Ian Goodfellow, J., Bergeron, A., Bouchard, N., Bengio, Y.: Theano: new features and speed improvements. In: Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop (2012)Google Scholar
  3. 3.
    Bellazzi, R., Zupan, B.: Predictive data mining in clinical medicine: current issues and guidelines. Int. J. Med. Inform. 77(2), 81–97 (2008)CrossRefGoogle Scholar
  4. 4.
    Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)zbMATHMathSciNetCrossRefGoogle Scholar
  5. 5.
    Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)CrossRefGoogle Scholar
  6. 6.
    Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)zbMATHMathSciNetGoogle Scholar
  7. 7.
    Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010. Oral PresentationGoogle Scholar
  8. 8.
    Camous, F., McCann, D., Roantree, M.: Capturing personal health data from wearable sensors. In: International Symposium on Applications and the Internet, SAINT 2008, pp. 153–156. IEEE (2008)Google Scholar
  9. 9.
    Deckers, K., Boxtel, M.P.J., Schiepers, O.J.G., Vugt, M., Sánchez, J.L.M., Anstey, K.J., Brayne, C., Dartigues, J.-F., Engedal, K., Kivipelto, M., et al.: Target risk factors for dementia prevention: a systematic review and delphi consensus study on the evidence from observational studies. Int. J.Geriatr. Psychiatry 30(3), 234–246 (2014)CrossRefGoogle Scholar
  10. 10.
    Donnelly, N., Irving, K., Roantree, M.: Cooperation across multiple healthcare clinics on the cloud. In: Magoutis, K., Pietzuch, P. (eds.) DAIS 2014. LNCS, vol. 8460, pp. 82–88. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  11. 11.
    Fakhraei, S., Soltanian-Zadeh, H., Fotouhi, F., Elisevich, K.: Confidence in medical decision making: application in temporal lobe epilepsy data mining. In: Proceedings of the 2011 Workshop on Data Mining for Medicine and Healthcare, pp. 60–63. ACM (2011)Google Scholar
  12. 12.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)Google Scholar
  13. 13.
    Hinton, G.: A practical guide to training restricted boltzmann machines. Momentum 9(1), 926 (2010)Google Scholar
  14. 14.
    Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)zbMATHMathSciNetCrossRefGoogle Scholar
  15. 15.
    Humphrey, E.J., Bello, J.P., LeCun, Y.: Feature learning and deep architectures: new directions for music informatics. J. Intell. Inf. Syst. 41(3), 461–481 (2013)CrossRefGoogle Scholar
  16. 16.
    van Boxtel, M.P.J., Ponds, R.H.W.M., Jolles, J., Houx, P.J.: The Maastricht Aging Study: Determinants of Cognitive Aging. Neuropsych Publishers, Maastricht (1995)Google Scholar
  17. 17.
    Larochelle, H., Erhan, D., Courville, A., Bergstra, J., Bengio, Y.: An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 473–480. ACM, New York, NY, USA (2007)Google Scholar
  18. 18.
    Liang, Z., Zhang, G., Huang, J.X., Hu, Q.V.: Deep learning for healthcare decision making with EMRs. In: 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 556–559. IEEE (2014)Google Scholar
  19. 19.
    Roantree, M., O’Donoghue, J., O’Kelly, N., Pierce, M., Irving, K., Van Boxtel, M., Köhler, S.: Mapping longitudinal studies to risk factors in an ontology for dementia. Health Inf. J., pp. 1–13 (2015)Google Scholar
  20. 20.
    Roantree, M., Shi, J., Cappellari, P., O’Connor, M.F., Whelan, M., Moyna, N.: Data transformation and query management in personal health sensor networks. J. Netw. Comput. Appl. 35(4), 1191–1202 (2012). Intelligent Algorithms for Data-Centric Sensor NetworksCrossRefGoogle Scholar
  21. 21.
    Salakhutdinov, R., Hinton, G.E.: Deep boltzmann machines. In: International Conference on Artificial Intelligence and Statistics, pp. 448–455 (2009)Google Scholar
  22. 22.
    van Boxtel, M.P., Buntinx, F., Houx, P.J., Metsemakers, J.F., Knottnerus, A., Jolles, J.: The relation between morbidity and cognitive performance in a normal aging population. J. Gerontol. Ser. A Biol. Sci. Med. Sci. 53(2), 147–154 (1998)CrossRefGoogle Scholar
  23. 23.
    Wan, L., Zeiler, M., Zhang, S., Cun, Y.L., Fergus, R.: Regularization of neural networks using dropconnect. In: Proceedings of the 30th International Conference on Machine Learning, ICML-2013, pp. 1058–1066 (2013)Google Scholar
  24. 24.
    Jimeno Yepes, A., MacKinlay, A., Bedo, J., Garnavi, R., Chen, Q.: Deep belief networks and biomedical text categorisation. In: Australasian Language Technology Association Workshop, p. 123 (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Insight Centre for Data Analytics, School of ComputingDCUDublin 9Ireland

Personalised recommendations