Artificial Neural Network in Cosmic Landscape

In this paper we propose that artificial neural network, the basis of machine learning, is useful to generate the inflationary landscape from a cosmological point of view. Traditional numerical simulations of a global cosmic landscape typically need an exponential complexity when the number of fields is large. However, a basic application of artificial neural network could solve the problem based on the universal approximation theorem of the multilayer perceptron. A toy model in inflation with multiple light fields is investigated numerically as an example of such an application.


Introduction
Inflation is one of the most promising cosmological paradigms to describe the dynamics of spacetime and quantum fields in the very early universe [1][2][3], although many details of inflationary dynamics are still mysterious and are not confirmed by theories and observations [4]. For instance, we are still curious that, if inflation is governed by scalar fields, what is the corresponding inflationary potential? Indicated by string theory 1 , there should be a complicated vacuum structure and a tower of light scalar fields is predicted and might drive the dynamics of cosmic inflation. Theoretically, we might treat the inflationary potential V (ϕ j ) = V (ϕ 1 , ϕ 2 , ϕ 3 , · · · , ϕ n ) (1.1) to be highly random [13][14][15][16][17][18] on the field space.
A directly way of studying physical properties of inflation governed by such a random potential is computer simulation (although some specific problems could also be answered with mathematics, for instance, studying statistics using Kac-Rice formula, see [19,20]). In order to realize this idea in a real computer and get some predictions, we typically need to use Monte-Carlo method to generate a random function up to some measures, and try to solve the dynamical equation of fields in inflation. The methods to investigate such a problem in the existing research can be classified in two types: global and local methods (see also in [21]). The advantages and disadvantages are given as following.
The local type of methods is trying to generate the random function V (ϕ j ) around some specific inflationary trajectories ϕ j (t), namely, solving the differential equation at the same time when generating random functions (where their work typically established on random matrix theory or statistical physics, see [22] for early contribution on using random matrix theory in cosmic landscape, and also [23,24] for locally constructing inflationary trajectories).
In each time step, one can generate a random potential in a small neighbourhood for the corresponding point in the field space, and then use that potential to evolve the time towards the next step by treating the differential equations as difference equations, and then generate a new, consistent random potential at the second step and so on. Redo this algorithm for several times, one can obtain an inflationary trajectory. This type of methods is often very fast to generate a high-dimensional landscape with polynomial computational complexity, and capture most of physics related to inflationary trajectories. However, lacking of a global structure of the landscape may lead to an inconsistency for self-crossing trajectories without special treatment [25], which might make it harder to study some specific phenomena in inflation, such as periodic, self-crossing trajectories and bifurcation.
The global type is trying to directly generate the whole random function V in a compact or non-compact region of field space. The related technologies include Fourier transformation [26,27] and interpolation [28]. This method has no inconsistency between different trajectories because the landscape is global, and it is safe to study special phenomena which need the consistency in the space of trajectories, like self-crossing. However, it is hard to discuss a high-dimensional landscape (which might be more realistic according to string theory vacua) because the computational complexity is exponential if we don't use further assumptions [21].
In this paper we argue that, artificial neural network, the basic construction of machine learning in the computer science [29][30][31][32], is naturally suitable for globally generating a landscape. Based on the celebrated universal approximation theorem [33,34], artificial neural network is a universal function simulator. Thus introducing the randomness in the parameters of artificial neural network will generate all possible random functions in the corresponding range. For a given construction of artificial neural network, the computational complexity is polynomial with field degrees of freedom, up to some controlled parameters labeling the randomness (or capacity) of the output. Then we will do some numerical investigations on the landscape generation, and try to evolve the inflationary trajectories based on the artificialneural-network-generated landscape in an inflationary model. This paper will be organized as following. In Section 2, after a discussion on computational complexity of traditional algorithms, we will discuss artificial neural network and its application on the landscape. In Section 3, we will present some numerical tests of landscape constructed by artificial neural network, and possible applications for evolving inflationary trajectories. In Section 4, we will arrive at a conclusion and some outlooks on the future directions.

Random function and artificial neural network
In order to globally construct a random function, the most direct method might be interpolation. The algorithm can be simply described as following (for simplicity we assume the dimension n = 1 This method can be easily generalized to high-dimensional case, where the interpolation and sub-intervals should be high-dimensional. For instance, we can interpolate a smooth curve as a two-dimensional random function where two-dimensional domain is split to many small boxes. Moreover, we can interpolation on one fixed surface with some random bumps. In the following example, we choose the quadratic potential surface on the domain [0, and after random perturbations, we have the interpolation where ∆ x ϕ = ϕmax N is the distance for each adjacent pair of two-dimensional intervals, Rand ij are random numbers in [0, 1] chosen for each (i, j) and belong to one fixed distribution, and A is the amplitude of randomness. This function is made to be asymmetric in i and j (corresponds to x and y), because the mean part of this random function is driven by the one-dimensional variable x (like eq.2.1), while some small two-dimensional random perturbations are around such a function. (In cosmology, one can imagine that it is a two-dimensional inflationary model with one inflaton and another light field which will not drive inflation. The interaction between two fields is due to a random landscape). Figures with different randomness are shown as Figure 1.
However, this method is practically useless for high-dimensional case. Just see the number of random variables related on the dimension n. If we choose N intervals on each direction, we will obtain N n high-dimensional intervals. As a result, the number of random variables are N n . And we must do high-dimensional interpolation on these N n random numbers. So the complexity is exponentially blowing up with the dimension n. Potential surface with null, small and large randomness. We choose ∆ x ϕ = 0.05, m = 10 −6 , ϕ max = 30, and A = 0, 0.1, 1 respectively for the left, middle and right figure. The distribution is uniform for random numbers in the interpolation function, and the interpolation is for C 1 functions.
The same situation happens on other global methods. Taking classical Fourier series for example, the one-dimensional Fourier expansion is So we have N Fourier coefficients (after truncation from infinite sum for a practical computation in the computer). The two dimensional Fourier expansion is So we have N 2 Fourier coefficients after truncation. Generally, the n dimensional Fourier expansion is So we will have N n Fourier coefficients. When we want to use classical Fourier series to simulate the high-dimensional random function, we must set all the coefficients to be random numbers and the computational complexity will be O(e n ). In the inflationary landscape, the number of light fields could be larger than O(10)-O(100). Thus, we need a new way to construct high-dimensional random functions.
Recently there is a rising interest in physics to apply computer science technologies into physical problems (for instance, see [35][36][37][38]). As one of the most promising area in computer science, machine learning (or artificial intelligence) [29][30][31] is widely used to understand difference areas of physics, including condensed matter phases [39], high energy experiment [40], energy landscape [41][42][43], particle phenomenology [44], tensor networks [45] and cosmic non-Gaussianities [46]. In recent research string theorists also find that machine learning algorithm is efficient to study manifold data in the string landscape [47][48][49][50], which may give us the motivation to think about the learning algorithm landscape from cosmological point of view.
Artificial neural network is a fundamental construction in modern machine learning. Inspired by the research of brain and neural biology, artificial neural network is proposed as a mathematical model to simulate the activity of calculation and identification of human brains, as highly non-linear, complicated biological computers. Mathematically speaking, artificial neural network is given by some specific linear combinations of activation function. A mathematical proof is given to claim that all possible continuous functions on a compact space could be simulated by artificial neural networks [33,34]. By learning algorithms, one can tune the input parameters in the network to realize some computational objects efficiently, like pattern recognition.
The unit in artificial neural network is called neuron. A neuron is mathematically defined as a function mapping a vector to a scalar. It is combined by the following objects, • Synapse. Synapses are typically connected by some weights. To be concrete, an input signal for the synapse and connected to a neuron, which is called x j , is multiplied by a corresponding weight w j .
• Adder. Adder is the addition operation of all input signals.
• Activation function. An activation function ψ is used to model the output of the neuron from external influence. Typical types of activation functions include sigmoid function or Heaviside function, which are widely used in machine learning. The function should often be assumed as nonconstant, bounded, and monotonically-increasing continuous, as the requirement of the universal approximation theorem. However, some periodic functions, like sine function, are also used to be the approximator. (For instance, see [51].) • Bias. The bias b is defined as a constant shift for input signals.
To be specific, for an n-dimensional signal vector {x j } n j=1 , the neural output y is defined as A simple example of artificial neural network is the multilayer perceptron. In order to realize some computational objects, we need to use a well-defined, organized, layered neural network of multiple neurons, which should have at least two layers: the output and input layers of neurons. For some complicated constructions, their might be also some layers in the middle, which are called hidden layers. In the simplest construction, signals are moving in only one direction, namely the connections between units will not form cycles. This is called feedforward type, or multilayer feed-forward architecture. If some of the activation functions are non-linear, it is called the multilayer perceptron.
More hidden layers will significantly increase the quality of statistics as the feedback of stimulation for an artificial neural network. In a real network which is designed for complicated computational tasks, the number of outputs, inputs and hidden layers is large. In our practical motivation, we are only interested in the random function, thus we will consider a multilayer perceptron with one output, and we will only consider one hidden layer for simplicity. Namely, if we have a multilayer perceptron neural network with one hidden layer that consists of h hidden units, and the network has n inputs, then we can formalise the output of the network as a function of the inputs by where a ij is the weight of the synapse that goes form the input x i to the jth hidden neuron, b j is the bias of the jth hidden neuron, ψ is the activation function, and w i is the weights of the synapse that goes to the output neuron to form the jth neuron.
For this configuration, we have the simplest universal approximation theorem. The universal approximation theorem shows that the standard multilayer feed-forward networks with arbitrary activation functions, are universal approximators for all possible function in the corresponding variable dimension and the compact domain. This property is not because of the form of activation functions, but the structure of multilayer perceptron itself [33,34]. The theorem claims that, Let ψ be the activation function. Let X n ⊂ R n and X n be compact. Then for ∀k = 1, 2, . . . , ∞, This theorem could be regarded as the theoretical foundation of artificial neural networks, and could be easily generated to more complicated cases. According to this theorem, properly choices of parameters will lead to an efficient approximation to all possible high-dimensional functions. Thus, including the randomness in the parameters will give a nice and efficient construction of random functions.
About specific choice of activation functions, the original theorem needs the activation function to have some specific properties, to be concrete, bounded and monotonically-increasing.
However, some other types of functions could also be used in the practical construction of neural networks. In the following application, we will try sigmoid function and sine function ψ(x) = sin(x) (2.10) A related idea also appear in the deep learning and statistical physics, which is searching for patterns in some statistical models energy landscape (for instance, see [41][42][43]). There problem is related to the solution of a minimal (local or global) in a high dimensional surface (where is called landscape), or specifically finding a minimal energy in some models from statistical physics, and the high dimension comes from the large N setup in the corresponding model. The energy landscape, especially in higher dimensions, are very hard to explore globally. However, the task could be related to machine learning algorithms and might be solved efficiently. These algorithms might benefit the future research of string landscape for solving some statistical patterns.

Random function generation
From the universal approximation theorem mentioned in the last subsection, one con approximate any arbitrary continuous or kth order continuous function by choosing different parameters h, a ij , b j , w j . As a result, if we choose these parameters randomly, we can construct a smooth random function living in some functional distribution. This is the basic idea of our random generation method.
Some comments are given as following.
• The number of neurons in the hidden layer h, should be fixed in a practical generation. In fact, h, and also the number of hidden layers for more complicated networks, can be roughly regarded as the capability of random statistics it could generate. Larger h could always in principle reproduce the result for lower h. h could be regarded as an adjustable variable for users according to the computational ability of their system.
• The range of parameters a ij , b j , w j should be bounded, and the output signals will live in a given range. For instance, one can assume for uniform distributions.
• The generation is fast. We can do a simple analysis on the time scale it will take. If we assume that when we compute ϕ(x) once for some x, we need the time t 1 ; when we compute the addition or multiplication once, we need the time t 2 ; when we generate a random number, we need t 3 , then for a total construction process of one random function, we need time for one hidden layer. For finite number of hidden layers, one sample random function could be generated in a polynomial time.
In Figure 2 we show slices of a 100-dimensional random function generated by multilayer perceptron with one hidden layer.

Properties
In the previous sections, we argue that artificial neural network provides a new effective approach to simulate a landscape. Thus, it is valuable to obtain basic properties of random functions generated by neural networks. Some analytic and numerical results are summarized in the following discussions.

Volume and symmetry in the field space
Because of finiteness of digital numbers in the computer, it is typically computational expensive to explore the field range in full R n , or an extremely large compact subspace of it (and this is also not allowed by the universal approximation theorem). Considering practical applications, in the context of inflationary cosmology, some trans-Planckian arguments [52] or the Lyth bound [53], may constrain the range of primordial fields and their perturbations. Thus, it should be crucial to constrain the valid volume where random functions could appear and fluctuate in the field configuration space.
In the traditional method, one can try to cut off the volume by brute force. For instance, in the low dimensional interpolation approach, one can choose the range of field space for the corresponding random sampling. In the Fourier method, one can also directly choose the field range. The range of field that has been cut off is typically the range where the random functions could appear. For Taylor expansion, there are some clear asymptotic dependence at infinity of the field space, thus it should be necessary to cut out the boundary asymptotic range to remain enough randomness.
In the context of artificial neural networks, the volume in the field space strongly depends on the activation function we choose. A simple observation is, for the sine activation function, the field space is always full, because it is periodic, fluctuating from −1 to 1. The norm of fluctuation is set by the range of random number w j only (namely, O(1) × w), in the simplest one layer of hidden neurons. However, the sigmoid function has some different properties. It will quickly remain constant if the input is away from the origin. As a rough estimation, go to the negative direction from the origin, the amplitude decays to 1/e when x = − log(−1 + e) = −0.54 = O(1). Thus, the function is activated around a bounded regime by an O(1) number. As an estimation, consider the neural network construction, the activated regime of one layer of hidden neurons is For sufficiently large number of hidden neurons.
Sometimes people may consider field space with some symmetric structure, for example, rotational invariance (for cosmological applications, sometimes we consider scalar fields and their perturbations may be separated as inflaton and isocurvatons with radial and angular directions in the two dimensional or higher dimensional field space [54]). In this case, the neural network construction should be slightly modified. A possible approach would be modify the matrix a ij in the network eq.2.7 to be rotational invariant, namely, sampling with a proper measure which is invariant under the rotation group SO(n). Namely for measure µ(a ) on a h × n matrix, where a = (a ij ) T in eq.2.7, we have The SO(n) invariant measure construction on a h × n matrix is a well-defined problem in harmonic analysis and measure theory. Here we only show the solution of the h = n case (it is definitely possible for a practical simulation, where h and n are both large numbers), where in this case we can directly use an uniform measure on SO(n) [55]. The only measure which is left and right invariant, and should also be normalized, is the celebrated Haar measure [56], which is widely used in applied mathematics and theoretical physics, and more proper for our current motivation. The Haar measure construction for SO(n) is defined as follows. For an n dimensional unit sphere S n , the hypersurface coordinate, denoted by Σ n (θ 1 , θ 2 . . . θ n ), is defined as where θ i are angular coordinates on S n , compactified by Based on the construction of Σ n , we define the following matrix The group manifold n(n − 1)/2 different coordinates, where we denote by (φ i,j ) 1≤i≤j≤n−1 and it is compactified by The Haar parametrization of the matrix from SO(n) is defined by recursion. Start from a 2 (φ 11 ) = cos φ 11 sin φ 11 − sin φ 11 cos φ 11 (3.10) for a 2 ∈ SO(2), we define a n ∈ SO(n) by recursive matrix product a n (φ i,j ) 1≤i≤j≤n−1 = M n (φ 1,j ) 1≤j≤n−1 a ext n−1 (φ i,j ) 1≤i≤j≤n−2 (3.11) where the operation ext means that mapping vectors in R n−1 to a hyperplane subspace in R n . Sampling from the uniform distribution of angles, we obtain the Haar random matrix samples. Obviously the computational complexity for this generation algorithm is polynomial in n. Using this method, we can obtain a ij from Haar measure, multiplying with an overall constant α to control the random range, we obtain an rotational symmetry SO(n) in the field space by construction.
Some other methods may also achieve this goal, for instance, decomposing the field contents to radial and angular pieces. And those discussions could also be extended to some other symmetry groups. We leave the detailed discussions to future research.

Taylor statistics
The number of degree of freedom increases polynomially in the neural networks, while exponentially in the Taylor or Fourier expansion. Thus, the random Taylor or Fourier series coefficients might be correlated from a neural network random function, especially when the number of hidden neurons is small enough. This fact will not effect the universal approximation statement, because it is natural to imagine that one coefficient set from corresponding complete basis could be non-trivially transforms to another coefficient set with a different complete basis, while the transformed coefficients have some induced correlations. However, testing this property from a neural network random function is also helpful especially when we are dealing with some specific real problems where independence of some given coefficients could play a crucial rule, and then we have to modify the structure and the activation of the neural network, or change the distribution of random elements in a network construction.
This problem is hard to answer universally from analytic and numerical point of view. From analytic side, it is hard to control the network especially for an arbitrary activation function setup, while from numerically point of view, it is significantly hard to test the network with higher dimensions through Taylor or Fourier decomposition (because of exponential growth 2 ).
However, in this paper we will try to deal with this problem in the simplest setup, namely, considering the single layer network with O(100)-O(1000) hidden layers for a two dimensional random function, and calculating the Taylor series. We consider 1000 random realizations to get some statistical predictions. Given a random function, if we expand through Taylor series, an important feature is that we often obtain a non-perturbative series, namely, the coefficients from higher orders are crucially larger than the lower cases. This is because the function is highly random such that the local Taylor expansion is only valid in a small regime, which is divergent outside this regime. The non-perturbative coefficients might be highly random and strongly fluctuating over a very large regime, while the lower order coefficients are relatively stable.
Firstly, it should be valuable to check the random distribution of Taylor coefficients. We consider some lowest expansions of Taylor series as examples.
The following Figure 3 shows the probability distribution of some Taylor coefficients as a sample. One can see that they are both roughly symmetrically distributed around zero, while the sine activation has a larger random range, means that sine activation is more random in Figure 3. The probability distribution of the Taylor series coefficients a 10 and a 11 for neural network in two dimension. We consider the sigmoid activation (upper) and the sine activation (lower) respectively. The data is constructed via x 0 = y 0 = 1, α = 100, β = 100, w = 1/h and h = 100, and we realize the network for 1000 times.
this range 3 .
Secondly, in order to do some simple tests to probe some hidden correlations between different Taylor coefficients, a nice type of statistical quantities is covariance (3.14) and the correlation for random variable X and Y with number of samples N . Those quantities are standard statistical justification of the correlation (or more precisely, linear correlation) of some random variables. When the absolute value of η is near 1, then variables are strongly correlated. When the absolute value of η is approaching zero, then variables have weak or no (linear) correlation.
In the following Figure 4, we give the plots of the absolute values of the correlations between different Taylor coefficients. Based on this analysis, we could conclude the following results, • We do find that several Taylor coefficients are strongly correlated (|η| ≥ 0.5). However, there are also several Taylor coefficients that are weakly correlated (|η| < 0.1). The categories of coefficients justifying correlation, are activation function dependent. Namely, it is not simply due to the structure of neural networks.
• One can turn up and down several parameters of the network. However, most parameters justifying random ranges in the network are effectively rescaling the input and output variables, except the number of hidden neurons, h, parametrizing the complexity of the network. Thus, in the plots we test the dependence between the correlation and the variable h. However, the dependence is not clear at least in our testing range. Maybe it is because the h in our setup is too small. At larger scale of h, the correlation might change dramatically. However, by our computational resource it is hard to make a concrete statement.
• There are some other issues about this numerical tests. For instance, the correlation might also depend on the location when we do Taylor expansion, the shape and structure of neural networks, and the input measure of random variables. And one can also seek for higher order correlations instead of linear. We leave those issues to future research.

Cosmological application
As a simple application, we will test a concrete inflation model. Considering multiple fields living in a high dimensional landscape, the Friedman equation is written as where i = 1, 2 . . . , n for n fields. In inflation the Hubble parameter would be approximated as in the slow roll limit. This equation the classical dynamics of the background fields. To include quantum fluctuation, Starobinsky method [57] is typically used for numerical simulations of an inflationary trajectory. Integrating out the sub-horizon modes, quantum fluctuations could be effectively serving as a source term in the equation of motion, namelÿ Figure 4. The absolute value of (linear) correlations between Taylor series coefficients. We consider the sigmoid activation (upper) and the sine activation (lower) respectively. The data is constructed via x 0 = y 0 = 1, α = 100, β = 100, w = 1/h and h changes from 100 to 1000, and we realize the network for 1000 times.
where the η i term is the random source to denote the quantum corrections and follows independent Gaussian distribution. We construct the normalization condition to constrain the η i term, such that during a Hubble time and averaged on a Hubble volume, the quantum fluctuation on each field direction is normalized as H/(2π). In numerical calculation, the discrete time interval ∆t = t k − t k−1 has to be used. While replacing differential equations into difference equations, the Dirac delta function shall be replaced by So the difference strategy is where η i (k) ∼ N (0, 1/ √ ∆t), and Gaussian distributions are independent for different i.
In our simple model, we will assume that inflation is mainly only by one field ϕ 1 , and the potential looks like V = V 0 (ϕ 1 ) + V rand (ϕ 1 , ϕ 2 , . . . , ϕ n ) (3.22) where V 0 would be chosen to be quadratic (large field inflation) The random part V rand , is generated by neural networks with some random amplitudes A.
As a simple example, we choose the parameters as n = 20 (for 20-dimensional field space), m = 10 −6 (inflaton mass normalized by Planck mass), ϕ max = 30 (the initial value and bounded value of fields), ∆t = 10 5 (discrete time scale). The initial value of fields are: ϕ 1 = ϕ max , ϕ i =1 = ϕ max /2. During the evolution of the trajectory, if field one field drops out of the range [0, ϕ max ], we think it as the end of inflation and breaks the recursion. The evolution of trajectories for dimension 20. In this case we choose the sigmoid activation function and we set α = β = 50, ω = 1, A = 10 −14 and h = 100. And also we set V rand = A h (ϕ 1 − ϕ max /2, (ϕ i =1 − ϕ max /2) × 10). We show the trajectories ϕ 1 (k) (left) as a function of time step k for inflaton, and ϕ another (k) (left) for another light field. Figure 6. The evolution of trajectories for dimension 20. In this case we choose the sine activation function and we set α = 6000, β = 100, ω = 1, A = 10 −15 and h = 100. And also we set V rand = A h (ϕ 1 , ϕ i =1 ). We show the trajectories ϕ 1 (k) (left) as a function of time step k for inflaton, and ϕ another (k) (left) for another light field.

Conclusion and discussion
In this paper we show that artificial neural network is useful to construct a cosmic landscape for computer simulation. After a simple discussion on the computational complexity of random functions, we introduce artificial neural network, the foundation of machine learning and make the training parameter to be random in order to construct random functions. We show that neural network construction of a random landscape is fast up to the construction of hidden layers and number of hidden neurons. Finally, we present an example to use such a random landscape to simulate the dynamics of inflationary trajectories, where the dimension of field space is hard to achieve for ordinary methods.
We are trying to show the possible computational power for neural network technologies in the application of high-dimensional cosmological landscape. Some further researches could be done in the future. Firstly, in this paper we choose a multilayer perceptron with one hidden-layer for simplicity, but in the future, to increase the complexity of random potential one should consider neural networks with more complicated structures, like more hidden layers and recurrent neural networks. Furthermore, more cosmological models and their properties could be addressed and simulated with neural networks when considering landscape potentials. We leave these interesting possibilities to future works.