Skip to main content

Basics of Neural Networks

  • Chapter
  • First Online:
Deep Learning and Physics

Part of the book series: Mathematical Physics Studies ((MPST))

Abstract

In this chapter, we derive neural networks from the viewpoint of physical models. A neural network is a nonlinear function that maps an input to an output, and giving the network is equivalent to giving a function called an error function in the case of supervised learning. By considering the output as dynamical degrees of freedom and the input as an external field, various neural networks and their deepened versions are born from simple Hamiltonians. Training (learning) is a procedure for reducing the value of the error function, and we will learn the specific method of backpropagation using the bra-ket notation popular in quantum mechanics. And we will look at how the “universal approximation theorem” works, which is why neural networks can express connections between various types of data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For conventional explanations, we recommend Ref. [20]. Reading them together with this book may complement understanding.

  2. 2.

    If the readers are new to analytical mechanics, there is no problem in translating mechanical degrees of freedom into coordinates (or momentum) and Hamiltonians into energy.

  3. 3.

    In the Boltzmann distribution, the factor of H∕(k B T), which is the Hamiltonian divided by the temperature, is in the exponent. We redefine J to Jk B T to absorb the temperature part, so that the temperature does not appear in the exponent.

  4. 4.

    The reason that the sigmoid function is similar to the Fermi distribution function can be understood from the fact that d takes on a binary value in the current system. d = 0 corresponds to the state where the fermion site is not occupied (vacancy state), and d = 1 corresponds to the state where fermion excitation exists.

  5. 5.

    By the way, \(-\sum P(\mathbf {x}, d) \log P(d|\mathbf {x}) \) which is equivalent to the “J-independent part” in (3.13) is called conditional entropy. Due to the positive definite nature of the relative entropy, the first term of (3.13) should be greater than this conditional entropy. If (3.14), which approximates the first term, is smaller than this bound, it is clearly a sign of over-training. Since it is difficult to actually evaluate that term, we ignore the term in the following.

  6. 6.

    It is essentially the same as (3.16). To make them exactly the same, set d = (d,  1 − d) in (3.16).

  7. 7.

    In this way, increasing the number of degrees of freedom while sharing the parameters corresponds to the handling of the ensemble of h bit and improves the accuracy in a statistical sense. This is thought to lead to the improvement of performance [30]. In our case, J is shifted by 0.5, which simplifies 〈h〉 as shown in the main text, thus the computational cost is lower than that of the ordinary ensemble.

  8. 8.

    Here, every |h l〉 is expanded by the basis |m〉, meaning that the dimensions of all vectors are the same. However, if the range in the sum symbol of (3.59) is not abbreviated, this notation can be applied in different dimensions.

  9. 9.

    Visit his website for an intuitive understanding of the proof.

  10. 10.

    The situation is the same as the physics calculations at zero temperature at which the Fermi distribution function becomes a step function.

  11. 11.

    The part corresponds to the dense nature in Cybenko’s proof.

References

  1. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)

    Google Scholar 

  2. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)

    Google Scholar 

  3. Teh, Y.W., Hinton, G.E.: Rate-coded restricted boltzmann machines for face recognition. In: Advances in Neural Information Processing Systems, pp. 908–914 (2001)

    Google Scholar 

  4. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. Technical report, California Univ San Diego La Jolla Inst for Cognitive Science (1985)

    Google Scholar 

  5. Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989)

    Article  MathSciNet  Google Scholar 

  6. Nielsen, M.: A visual proof that neural nets can compute any function. http://neuralnetworksanddeeplearning.com/chap4.html.

  7. Lee, H., Ge, R., Ma, T., Risteski, A., Arora, S.: On the ability of neural nets to express distributions. arXiv preprint arXiv:1702.07028 (2017)

    Google Scholar 

  8. Sonoda, S., Murata, N.: Neural network with unbounded activation functions is universal approximator. Appl. Comput. Harmon Anal. 43(2), 233–268 (2017)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Appendices

Column: Statistical Mechanics and Quantum Mechanics

Canonical Distribution in Statistical Mechanics

The statistical mechanics used in this book is called canonical distribution for a given temperature. The physical situation is as follows. Suppose we have some physical degrees of freedom, such as a spin or a particle position. Denote it as d. Since this is a physical degree of freedom, it should have energy which depends on its value. Let it be H(d). When this system is immersed in an environment with a temperature of T (called a heat-bath), the probability of achieving d with the energy H(d) is known to be

$$\displaystyle \begin{aligned} P(d) = \frac{e^{- \frac{H(d)}{k_B T}}}{Z} \, . \end{aligned} $$
(3.92)

Here, k B is an important constant called the Boltzmann constant , and Z is called the partition function , defined as

$$\displaystyle \begin{aligned} Z = \sum_{d} e^{- \frac{H(d)}{k_B T}} \, . \end{aligned} $$
(3.93)

In the main text, we put k B T = 1.

Simple example: law of equipartition of energy

As an example, consider d as the position x and momentum p of a particle in a box of length L, with the standard kinetic energy function as its energy. The expectation value of energy is

$$\displaystyle \begin{aligned} \langle E \rangle = \int_0^L dx \int_{-\infty}^{+\infty} dp \ \frac{p^2}{2m} \frac{e^{ - \frac{p^2}{2 m k_B T } } }{Z} \, . \end{aligned} $$
(3.94)

Here

$$\displaystyle \begin{aligned} Z = \int_0^L dx \int_{-\infty}^{+\infty} dp \ e^{ - \frac{p^2}{2 m k T } } = L (2 \pi m k_B T)^{1/2} \, . \end{aligned} $$
(3.95)

The calculation of 〈E〉 looks a bit difficult, but using \( \beta = \frac {1}{k_B T} \) we find

$$\displaystyle \begin{aligned} &Z = L \Big( \frac{2\pi m}{\beta} \Big)^{1/2}, \end{aligned} $$
(3.96)
$$\displaystyle \begin{aligned} &\langle E \rangle = - \frac{\partial}{\partial \beta} \log Z = \frac{1}{2} \frac{1}{\beta} = \frac{1}{2} k_BT \, . \end{aligned} $$
(3.97)

In other words, when the system temperature T is high (i.e. when the system is hot), the expectation value of energy is high, and when the temperature is low, the expectation value of energy is low. This is consistent with our intuition. In addition, if we consider the case of three spatial dimensions, we can obtain the famous formula \( \frac {3}{2}k_BT \) as the expectation value of energy.

Bracket Notation in Quantum Mechanics

In the derivation of the backpropagation method, bracket notation has been introduced as a simple method. This is nothing more than just writing a vector as a ket ,

$$\displaystyle \begin{aligned} \mathbf{v} = | v \rangle \, . \end{aligned} $$
(3.98)

There are some notational conveniences. For example, the inner bracket is written conventionally as

$$\displaystyle \begin{aligned} \mathbf{w} \cdot \mathbf{v} = \langle w | v \rangle, \end{aligned} $$
(3.99)

which means that the inner product is regarded as the “product of the matrices,”

$$\displaystyle \begin{aligned} \mathbf{w} \cdot \mathbf{v} = \begin{pmatrix} w_1, & w_2, &\cdots \end{pmatrix} \begin{pmatrix} v_1 \\ v_2 \\ \vdots \end{pmatrix} \, . \end{aligned} $$
(3.100)

Also in the main text we have had

$$\displaystyle \begin{aligned} |v \rangle \langle w | \end{aligned} $$
(3.101)

and also this can be regarded as a “matrix,” made by a matrix product,

$$\displaystyle \begin{aligned} |v \rangle \langle w | &= \begin{pmatrix} v_1 \\ v_2 \\ \vdots \end{pmatrix} \begin{pmatrix} w_1, & w_2, &\cdots \end{pmatrix} = \begin{pmatrix} v_1 w_1& v_1 w_2&\cdots \\ v_2 w_1& v_2 w_2&\cdots \\ \dots \end{pmatrix} \, . \end{aligned} $$
(3.102)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Tanaka, A., Tomiya, A., Hashimoto, K. (2021). Basics of Neural Networks. In: Deep Learning and Physics. Mathematical Physics Studies. Springer, Singapore. https://doi.org/10.1007/978-981-33-6108-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-33-6108-9_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-33-6107-2

  • Online ISBN: 978-981-33-6108-9

  • eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)

Publish with us

Policies and ethics