Advertisement

Introduction to Neural Networks

  • Grégoire MontavonEmail author
Chapter
  • 1.2k Downloads
Part of the Lecture Notes in Physics book series (LNP, volume 968)

Abstract

Machine learning has become an essential tool for extracting regularities in the data and for making inferences. Neural networks, in particular, provide the scalability and flexibility that is needed to convert complex datasets into structured and well-generalizing models. Pretrained models have strongly facilitated the application of neural networks to images and text data. Application to other types of data, e.g., in physics, remains more challenging and often requires ad-hoc approaches. In this chapter, we give an introduction to neural networks with a focus on the latter applications. We present practical steps that ease training of neural networks, and then review simple approaches to introduce prior knowledge into the model. The discussion is supported by theoretical arguments as well as examples showing how well-performing neural networks can be implemented easily in modern neural network frameworks.

Notes

Acknowledgements

This work was supported by the German Ministry for Education and Research as Berlin Center for Machine Learning (01IS18037I). The author is grateful to Klaus-Robert Müller for the valuable feedback.

References

  1. 1.
    C.M. Bishop, Neural Networks for Pattern Recognition (Oxford University Press, New York, 1995)zbMATHGoogle Scholar
  2. 2.
    G. Montavon, G.B. Orr, K. Müller (eds.), in Neural Networks: Tricks of the Trade, 2nd edn. Lecture Notes in Computer Science, vol. 7700 (Springer, Berlin, 2012)Google Scholar
  3. 3.
    J. Schmidhuber, Neural Netw. 61, 85 (2015)CrossRefGoogle Scholar
  4. 4.
    Y. LeCun, Y. Bengio, G. Hinton, Nature 521(7553), 436 (2015)ADSCrossRefGoogle Scholar
  5. 5.
    G. Cybenko, Math. Control Signals Syst. 2(4), 303 (1989)CrossRefGoogle Scholar
  6. 6.
    Z. Lu, H. Pu, F. Wang, Z. Hu, L. Wang, in Advances in Neural Information Processing Systems, vol. 30 (2017), pp. 6231–6239Google Scholar
  7. 7.
    K. Fukushima, Biol. Cybern. 36, 193 (1980)CrossRefGoogle Scholar
  8. 8.
    G. Montavon, M.L. Braun, K. Müller, J. Mach. Learn. Res. 12, 2563 (2011)MathSciNetGoogle Scholar
  9. 9.
    C. Cortes, V. Vapnik, Mach. Learn. 20(3), 273 (1995)Google Scholar
  10. 10.
    K. Müller, S. Mika, G. Rätsch, K. Tsuda, B. Schölkopf, IEEE Trans. Neural Netw. 12(2), 181 (2001)CrossRefGoogle Scholar
  11. 11.
    B. Schölkopf, A. J. Smola, in Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Adaptive Computation and Machine Learning Series (MIT Press, Cambridge, MA, 2002)Google Scholar
  12. 12.
    A. Krizhevsky, I. Sutskever, G. E. Hinton, in Neural Information Processing Systems (2012), pp. 1106–1114Google Scholar
  13. 13.
    K. Simonyan, A. Zisserman, in Third International Conference on Learning Representations (2015)Google Scholar
  14. 14.
    M. Oquab, L. Bottou, I. Laptev, J. Sivic, in IEEE Conference on Computer Vision and Pattern Recognition (2014), pp. 1717–1724Google Scholar
  15. 15.
    R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P.P. Kuksa, J. Mach. Learn. Res. 12, 2493 (2011)Google Scholar
  16. 16.
    Y. Kim, in Proceedings of the Conference on Empirical Methods in Natural Language Processing (2014), pp. 1746–1751Google Scholar
  17. 17.
    P. Baldi, P. Sadowski, D. Whiteson, Nat. Commun. 5, 4308 (2014)ADSCrossRefGoogle Scholar
  18. 18.
    K. T. Schütt, F. Arbabzadah, S. Chmiela, K.R. Müller, A. Tkatchenko, Nat. Commun. 8, 13890 (2017)ADSCrossRefGoogle Scholar
  19. 19.
    A. Mardt, L. Pasquali, H. Wu, F. Noé, Nat. Commun. 9(5) (2018)Google Scholar
  20. 20.
    L. Holmström, P. Koistinen, IEEE Trans. Neural Netw. 3(1), 24 (1992)CrossRefGoogle Scholar
  21. 21.
    S. Hobday, R. Smith, J. Belbruno, Model. Simul. Mater. Sci. Eng. 7(3), 397 (1999)ADSCrossRefGoogle Scholar
  22. 22.
    J. Behler, M. Parrinello, Phys. Rev. Lett. 98(14), 146401 (2007)ADSCrossRefGoogle Scholar
  23. 23.
    K. Yao, J.E. Herr, D.W. Toth, R. Mckintyre, J. Parkhill, Chem. Sci. 9(8), 2261 (2018)CrossRefGoogle Scholar
  24. 24.
    B. Nebgen, N. Lubbers, J.S. Smith, A.E. Sifain, A. Lokhov, O. Isayev, A.E. Roitberg, K. Barros, S. Tretiak, J. Chem. Theory Comput. 14(9), 4687 (2018)CrossRefGoogle Scholar
  25. 25.
    D.E. Rumelhart, G.E. Hinton, R.J. Williams, Nature 323(6088), 533 (1986)ADSCrossRefGoogle Scholar
  26. 26.
    P.J. Werbos, in System Modeling and Optimization (Springer, Berlin, 1982), pp. 762–770Google Scholar
  27. 27.
    Y. LeCun, L. Bottou, G.B. Orr, K. Müller, in Neural Networks: Tricks of the Trade, 2nd edn. Lecture Notes in Computer Science, vol. 7700 (Springer, Berlin, 2012), pp. 9–48Google Scholar
  28. 28.
    J. Lafond, N. Vasilache, L. Bottou (2017). CoRR abs/1705.09319Google Scholar
  29. 29.
    A. Botev, H. Ritter, D. Barber, in Proceedings of the 34th International Conference on Machine Learning (2017), pp. 557–565Google Scholar
  30. 30.
    Y. Jeon, C. Choi, in International Joint Conference Neural Network (1999), pp. 1685–1690Google Scholar
  31. 31.
    G. Montavon, M. Rupp, V. Gobre, A. Vazquez-Mayagoitia, K. Hansen, A. Tkatchenko, K.-R. Müller, O. A. von Lilienfeld, New J. Phys. 15(9), 095003 (2013)ADSCrossRefGoogle Scholar
  32. 32.
    X. Glorot, A. Bordes, Y. Bengio, in International Conference on Artificial Intelligence and Statistics (2011), pp. 315–323Google Scholar
  33. 33.
    M.D. Zeiler, M. Ranzato, R. Monga, M.Z. Mao, K. Yang, Q.V. Le, P. Nguyen, A.W. Senior, V. Vanhoucke, J. Dean, G.E. Hinton, in IEEE International Conference on Acoustics, Speech and Signal Processing (2013), pp. 3517–3521Google Scholar
  34. 34.
    K. He, X. Zhang, S. Ren, J. Sun, in IEEE International Conference on Computer Vision (2015), pp. 1026–1034Google Scholar
  35. 35.
    D.P. Kingma, J. Ba, in Third International Conference on Learning Representations (2015)Google Scholar
  36. 36.
    L. Bottou, in Proceedings of Neuro-Nîmes, vol. 91 (EC2, Nimes, 1991)Google Scholar
  37. 37.
    L. Bottou, in Neural Networks: Tricks of the Trade, 2nd edn. Lecture Notes in Computer Science, vol. 7700 (Springer, Berlin, 2012), pp. 421–436Google Scholar
  38. 38.
    V.N. Vapnik, The Nature of Statistical Learning Theory, 2nd edn. Statistics for Engineering and Information Science (Springer, Berlin, 2000)Google Scholar
  39. 39.
    A. Krogh, J.A. Hertz, in Advances in Neural Information Processing Systems, vol. 4 (1991), pp. 950–957Google Scholar
  40. 40.
    R. Reed, IEEE Trans. Neural Netw. 4(5), 740 (1993)CrossRefGoogle Scholar
  41. 41.
    L. Breiman, Mach. Lear. 24(2), 123 (1996)Google Scholar
  42. 42.
    M. Rupp, A. Tkatchenko, K.-R. Müller, O.A. von Lilienfeld, Phys. Rev. Lett. 108, 058301 (2012)ADSCrossRefGoogle Scholar
  43. 43.
    K. Hansen, F. Biegler, R. Ramakrishnan, W. Pronobis, O.A. von Lilienfeld, K.-R. Müller, A. Tkatchenko, J. Phys. Chem. Lett. 6(12), 2326 (2015)CrossRefGoogle Scholar
  44. 44.
    F.A. Faber, L. Hutchison, B. Huang, J. Gilmer, S.S. Schoenholz, G.E. Dahl, O. Vinyals, S. Kearnes, P.F. Riley, O.A. von Lilienfeld, J. Chem. Theory Comput. 13(11), 5255 (2017)CrossRefGoogle Scholar
  45. 45.
    S. Chmiela, A. Tkatchenko, H.E. Sauceda, I. Poltavsky, K.T. Schütt, K.-R. Müller, Sci. Adv. 3(5), e1603015 (2017)ADSCrossRefGoogle Scholar
  46. 46.
    S. Chmiela, H.E. Sauceda, K.-R. Müller, A. Tkatchenko, Nat. Commun. 9, 3887 (2018)ADSCrossRefGoogle Scholar
  47. 47.
    I. Guyon, A. Elisseeff, in Feature Extraction—Foundations and Applications. Studies in Fuzziness and Soft Computing, vol. 207 (Springer, Berlin, 2006), pp. 1–25Google Scholar
  48. 48.
    P.Y. Simard, Y. LeCun, J.S. Denker, B. Victorri, in Neural Networks: Tricks of the Trade, 2nd edn. Lecture Notes in Computer Science, vol. 7700 (Springer, Berlin, 2012), pp. 235–269Google Scholar
  49. 49.
    Y. LeCun, P. Haffner, L. Bottou, Y. Bengio, in Shape, Contour and Grouping in Computer Vision (Springer, Berlin, 1999), pp. 319–345Google Scholar
  50. 50.
    J. Gilmer, S.S. Schoenholz, P.F. Riley, O. Vinyals, G.E. Dahl, in Proceedings of the 34th International Conference on Machine Learning (2017), pp. 1263–1272Google Scholar
  51. 51.
    K.T. Schütt, H.E. Sauceda, P.-J. Kindermans, A. Tkatchenko, K.-R. Müller, J. Chem. Phys. 148(24), 241722 (2018)ADSCrossRefGoogle Scholar
  52. 52.
    K. Hansen, G. Montavon, F. Biegler, S. Fazli, M. Rupp, M. Scheffler, O. A. von Lilienfeld, A. Tkatchenko, K.-R. Müller, J. Chem. Theory Comput. 9(8), 3404 (2013)CrossRefGoogle Scholar
  53. 53.
    J. Bergstra, Y. Bengio, J. Mach. Learn. Res. 13, 281 (2012)MathSciNetGoogle Scholar
  54. 54.
    J. Bergstra, R. Bardenet, Y. Bengio, B. Kégl, in Advances in Neural Information Processing Systems, vol. 24 (2011), pp. 2546–2554Google Scholar
  55. 55.
    Z.C. Lipton, ACM Queue 16(3), 30 (2018)Google Scholar
  56. 56.
    W. Samek, G. Montavon, A. Vedaldi, L.K. Hansen, K.-R. Müller (eds.), Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Lecture Notes in Computer Science, vol. 11700 (Springer, Berlin, 2019)Google Scholar
  57. 57.
    D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe, K. Hansen, K. Müller, J. Mach. Learn. Res. 11, 1803 (2010)MathSciNetGoogle Scholar
  58. 58.
    S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, W. Samek, PLoS One 10(7), e0130140 (2015)CrossRefGoogle Scholar
  59. 59.
    R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, N. Elhadad, in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2015), pp. 1721–1730Google Scholar
  60. 60.
    B. Zhou, A. Khosla, À. Lapedriza, A. Oliva, A. Torralba, in IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 2921–2929Google Scholar
  61. 61.
    K. Yao, J.E. Herr, S.N. Brown, J. Parkhill, J. Phys. Chem. Lett. 8(12), 2689 (2017)CrossRefGoogle Scholar
  62. 62.
    K.T. Schütt, M. Gastegger, A. Tkatchenko, K.-R. Müller, in Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Lecture Notes in Computer Science, vol. 11700 (Springer, Berlin, 2019)Google Scholar
  63. 63.
    M.T. Ribeiro, S. Singh, C. Guestrin, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016), pp. 1135–1144Google Scholar
  64. 64.
    R.C. Fong, A. Vedaldi, In IEEE International Conference on Computer Vision (2017), pp. 3449–3457Google Scholar
  65. 65.
    M. Sundararajan, A. Taly, Q. Yan, in Proceedings of the 34th International Conference on Machine Learning (2017), pp. 3319–3328Google Scholar
  66. 66.
    G. Montavon, A. Binder, S. Lapuschkin, W. Samek, K.-R. Müller, in Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Lecture Notes in Computer Science, vol. 11700 (Springer, Berlin, 2019)Google Scholar
  67. 67.
    L. Arras, J. Arjona-Medina, M. Widrich, G. Montavon, M. Gillhofer, K.-R. Müller, S. Hochreiter, W. Samek, in Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Lecture Notes in Computer Science, vol. 11700 (Springer, Berlin, 2019)Google Scholar
  68. 68.
    S. Lapuschkin, S. Wäldchen, A. Binder, G. Montavon, W. Samek, K.-R. Müller, Nat. Commun. 10, 1096 (2019)ADSCrossRefGoogle Scholar

Copyright information

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Electrical Engineering and Computer ScienceTechnische Universität BerlinBerlinGermany

Personalised recommendations