Stochastic analysis of heterogeneous porous material with modified neural architecture search (NAS) based physics-informed neural networks using transfer learning

In this work, a modified neural architecture search method (NAS) based physics-informed deep learning model is presented for stochastic analysis in heterogeneous porous material. Monte Carlo method based on a randomized spectral representation is first employed to construct a stochastic model for simulation of flow through porous media. To solve the governing equations for stochastic groundwater flow problem, we build a modified NAS model based on physics-informed neural networks (PINNs) with transfer learning in this paper that will be able to fit different partial differential equations (PDEs) with less calculation. The performance estimation strategies adopted is constructed from an error estimation model using the method of manufactured solutions. A sensitivity analysis is performed to obtain the prior knowledge of the PINNs model and narrow down the range of parameters for search space and use hyper-parameter optimization algorithms to further determine the values of the parameters. Further the NAS based PINNs model also saves the weights and biases of the most favorable architectures, then used in the fine-tuning process. It is found that the log-conductivity field using Gaussian correlation function will perform much better than exponential correlation case, which is more fitted to the PINNs model and the modified neural architecture search based PINNs model shows a great potential in approximating solutions to PDEs. Moreover, a three dimensional stochastic flow model is built to provide a benchmark to the simulation of groundwater flow in highly heterogeneous aquifers. The NAS model based deep collocation method is verified to be effective and accurate through numerical examples in different dimensions using different manufactured solutions.


Introduction
In recent years, groundwater pollution has become one of the most important environmental problems worldwide. In order to protect groundwater quality, it is necessary to study groundwater flow and solute transport. With this needs of society and the further development of science and technology, it is quite important to investigate the groundwater flow model, which includes the flow through the aquifer comprising saturated sediment and rock with rather heterogeneous hydrologic properties. In this study, the stochastic analysis in heterogeneous porous material is investigated with deep learning method, which in hope to give more insight into the mechanism of groundwater flow model.
The hydraulic conductivity describes the physical properties with which measures the ability of the medium to transmit fluid through pore spaces, and depends on the structure of the medium (particle size, arrangement, void filling, etc.) and the physical properties of the water (viscosity of the liquid). It is common to use random fields with a given statistical structure to describe the porous media because of its intrinsic complexity [1]. Freeze [2] has proved, that the hydraulic conductivity field can be well characterized by random log-normal distribution. This approach is often used for the flow analysis in saturated zone, e.g., in [3]. Both Gaussian [4] and exponential [5] correlations are always chosen for the log-normal probability distribution. Different correlation coefficients can lead to different accuracy of the model. Therefore, the two correlation coefficients are compared in this paper.
Different approaches have been used to construct functional equations for permeability as a function of pore structure parameters [6][7][8]. Analytical spectral representation methods were first used by Bakr [9] to solve the stochastic flow and solute transport equations perturbed with a random hydraulic conductivity field. They pointed out, that if a random field is homogeneous and has zero mean, then it will always possible to represent the process by a kind of Fourier (or Fourier-Stieltjes) decomposition into essentially uncorrelated random components, and these random components in turn will give the spectral density function, which is the distribution of variance over different wave numbers k. This theory is widely used to construct hydraulic conductivity fields, and a number of construction methods have been derived, such as turning bands method [10], HYDRO GEN method [11] and Kraichnan algorithm [12]. Ababou and his coworker [13] used turning bands method to verify the feasibility of solving this problem using numerical methods, and also indicated the scale range of relevant parameters. Wörmann and Kronnnäs [13] tested a gradually increase of the heterogeneity of the flow resistance and compared the numerically simulated residence time PDF with the observed based on HYDRO GEN method. Unlike the other two methods, Kraichnan proposed an approximation algorithm for direct control of random field accuracy, by increasing the modulus, and of its variability, through the variance of the log-hydraulic conductivity random field. Inspired by these results, Kolyukhin and Sabelfeld [1] constructed a randomized spectral model (RSM) for simulation of a steady flow in porous media in 3D case under small fluctuation assumptions. This approach is used in this paper to generate hydraulic conductivity fields.
With the development of information technology, machine learning has become the hottest topic nowadays. As a new branch of machine learning, deep learning was first brought up in the realm of artificial intelligence in 2006, which uses deep neural networks to learn features of data with high-level of abstractions [14]. The deep neural networks adopt artificial neural network architectures with various hidden layers, which exponentially reduce the computational cost and amount of training data in some applications [15]. The major two desirable traits of deep learning lie in the nonlinear processing in multiple hidden layers in supervised or unsupervised learning [16]. Several types of deep neural networks such as convolutional neural networks (CNN) and recurrent/recursive neural networks (RNN) [17] have been created, which further boost the application of deep learning in image processing [18], object detection [19], speech recognition [20] and many other domains including genomics [21] and even finance [22]. In some cases, certain physical laws of the model to be predicted is a priori, and these laws can be used to formulate learning algorithms as well as set up neural network configurations. This leads to a physics-informed neural network (PINN), which is widely used in practical research, for example, earthquake velocity model prediction [23].
Besides the application of deep learning in regression and classification, some researchers employed deep learning for the solution of PDEs, some researchers employed deep learning for the solution of PDEs. Mills et al. deployed a deep conventional neural network to solve Schrödinger equation, which directly learned the mapping between potential and energy [24]. E et al. applied deep learning-based numerical methods for high-dimensional parabolic PDEs and back-forward stochastic differential equations, which was proven to be efficient and accurate for 100-dimensional nonlinear PDEs [25,26]. Also, E and Yu proposed a Deep Ritz method for solving variational problems arising from partial differential equations [27]. Raissi et al. applied the probabilistic machine learning in solving linear and nonlinear differential equations using Gaussian Processes and later introduced a data-driven Numerical Gaussian Processes to solve time-dependent and nonlinear PDEs, which circumvented the need for spatial discretization [28,29]. Later, they [30,31] introduced a physical informed neural networks for supervised learning of nonlinear partial differential equations from Burger's equations to Navier-Stokes equations. Two distinct models were tailored for spatio-temporal datasets: continuous time and discrete time models. They also applied a deep neural networks in solving coupled forward-backward stochastic differential equations and their corresponding high-dimensional PDEs [35]. Beck et al. [32,33] studied the deep learning in solving stochastic differential equations and Kolmogorov equations. Anitescu et al. [34], Guo et al. [35] applied deep neural networks based collocation method for finding the solutions for second order and fourth order boundary value problems. Rabczuk et al. [36,37] proposed an energy approach to the solution of partial differential equations in computational mechanics via machine learning. Goswami [38] applied a transfer learning enhanced physics informed neural network (PINN) algorithm for solving brittle fracture problems based on phase-field modeling.
However, in a typical machine learning application, the practitioner must apply appropriate data preprocessing, feature engineering, feature extraction and feature selection methods to make the dataset suitable for machine learning. Following these pre-processing steps, practitioners must perform algorithm selection and hyper-parameter optimization to maximize their final machine learning model's predictive performance. The PINNs model is no exception though the randomly distributed collocation points are generated for further calculation without the need for data-preprocessing. Most of the time wasted in PINNs based model are exerted upon the tuning of neural architecture configurations, which strongly influences the accuracy and stability of the numerical performance. Since many of these steps are typically beyond the capabilities of non-experts, automated machine learning (AutoML) has become the trend as an AI-based solutions to meet the growing challenges in machine learning applications, which seeks to automate the many time-consuming and laborious steps of the machine learning pipeline to make it easier to apply machine learning to real-world problems.
The oldest AutoML library is AutoWEKA [39], first released in 2013, which automatically selects models and hyper-parameters. Other notable AutoML libraries include auto-sklearn [40], H2O AutoML [41], and TPOT [42]. Neural architecture search (NAS) [43] is a technique for automatic design of neural networks, which allows algorithms to automatically design high performance network structures based on sample sets. Neural architecture search (NAS) aims to find a configuration comparable to human experts on certain tasks and even discover certain network structures that have not been proposed by humans before, which can effectively reduce the use and implementation cost of neural networks. In [44] the authors add a controller in the efficient NAS, which can learn to discover neural network architectures by searching for an optimal subgraph within a large computational graph. They used parameters sharing between the subgraphs to make the computing process faster. The controller decides which parameter matrices are used, by choosing the previous indices. Therefore, in ENAS, all recurrent cells in a search space share the same set of parameters. Liu [45] used a sequence model-based optimization (SMBO) strategy to learn a surrogate model with which to guide the search of the structure space while searching for neural network structures.
To build up an Neural architecture search (NAS) model, it is necessary to conduct dimensionality reduction and identification of valid parameter bounds to reduce the calculation involved in auto-tuning. A global sensitivity analysis can be used to identify valid regions in the search space and subsequently decrease its dimensionality [46], which can serve as a starting point for an efficient calibration process.
The remainder of this paper is organized as follows. In Section 2, we describe the physical model of the groundwater flow problem, the randomized spectral method to generate hydraulic conductivity fields, and also construct the manufactured solutions to verify the accuracy of our model, which is helpful in building up the performance estimation strategy. In Section 3, we introduce the neural architecture search model, including its structure and how each part works specifically. And we presented an efficient sensitivity analysis method and compared several hyper-parameter optimizers in order to find an accurate and efficient search method. In Section 4, we introduce the finite difference method and the finite element method as a comparison to verify the feasibility of our method through numerical experiments. At last, some conclusions of the present study are drawn in Section 5. The specific generation process of random fields and the expressions for manufactured solutions are presented in Appendix A and Appendix B.
2. Generate stochastic flow fields in a heterogeneous porous medium with randomized spectral model

Darcy equation for groundwater flow problem
Consider the continuity equation for steady-state, aquifer flow in a porous media governed by the Darcy law: where q is the Darcy velocity, K, is the hydraulic conductivity, h, the hydraulic head h = H + δh,with the mean H and the perturbation δh.
To describe the variation of hydraulic conductivity as a function of the position vector x, it is convenient to introduce the variable where Y (x) is the hydraulic log-conductivity with the mean Y and perturbation Y (x): where E[Y (x)] = 0, and Y (x) is taken to be a three-dimensional statistically homogeneous random field characterized by its correlation function, where r is the separation vector. According to the conservation equation ∇ · q = 0, the Equation (1) can be rewritten in the following form: with N the dimension and subject to the Neumann and Dirichlet boundary conditions as follows: The groundwater flow problem can be boiled down to find a solution h such that Equations 5 and 8 holds.
E is an operator that maps elements of vector space H to vector space V and represents the investigated physical model: To be more specific, to solving Equation (5) , the Dirichlet boundary and Neumann boundary conditions can be assumed as follows: where J is the mean slope of the hydraulic head in x direction, a dimensionless constant [9].
As suggested by Ababou [13], the scale of fluctuation should be much smaller than the scale of domain to get statistically meaningful results, considering the simplicity of the calculations, we set the L x , L y , L z ten times larger than λ. They also suggested a reasonable resolution requirement could be we set the number of discretization points in each direction 100. As Y is homogeneous and isotropic, we consider two correlation functions: the exponential correlation function [47], and the Gaussian correlation function [48], where λ is the log conductivity correlation length scale.

Generate the hydraulic conductivity fields
In the application of spectral methods, the Wiener-Khinchin theorem plays a very important role. It states the power spectral density of a smooth stochastic process is the Fourier transform of his autocorrelation function. Applying it to the heterogeneous groundwater flow problem, the Gaussian random field with given spectral density S(k) is just the Fourier transform of the correlation function (in Equation (4)): with S(k) spectral function of the random field Y (x). According to the Fourier transform table, Substituting Equation (10), (11), (14) and (15) into Equation (13) respectively, the spectral function under the exponential and the Gaussian correlation coefficient can be derived: Kramer [49] has given an expression for a Gaussian homogenous random field in the general case: here W (dk) is a complex-valued white noise random measure, with W (b) = W (−b) and W (b) = 0. According to Euler's formula, Equation (18) can be rewritten as: It is obvious from Equation (16) and Equation (17) that the two spectral functions are even. So, with W 1 (dk) and W 2 (dk) being two independent real valued Wiener processes. Choose the probability density function(pdf) of random variables k, p(k) = 2S(k)/σ 2 , substitute it into Equation (20), which variance can be calculated as: The following approximations can be obtained using the Monte-Carlo integration, From this we can derive the final approximate form, where the ξ i are again mutually independent Gaussian random variables. For the random variable k i , we can get its probability density distribution function to be p(k) and calculate its cumulative distribution function(cdf) according to F (k) = k −∞ p(x)dx, and as long as there is an uniformly distributed other random variable θ, the inverse function k = F −1 (θ) can be obtained, and k must obey the p(k) distribution. The choices of k and the proof process are detailed in Appendix A.
Using these expressions for k, we can write MATLAB programs to generate the random fields we need. Shown in Figures 1 and 2, the schematic of the random field space is generated with fixed k to 15, σ 2 to 0.1, and N to 1000,

Defining the experimental model
After determining the expression for the groundwater flow problem, we need to set the geometric and physical parameters involved in simulation. For the hydraulic conductivity, the number of modes N and the variance σ 2 got a big impact. With the Lipschitz continuity in mind, experiments have found that an excessively large N leads to, when the exponential correlation coefficient is used, the realization of the K field apparently approaches a non-differentiable function [50]. So in this paper, we set the N values to 500, 1000 and 2000, to verify the fit of the model to k for different modes. σ 2 determines the heterogeneity of the hydraulic conductivity, with a larger σ 2 indicating larger heterogeneity. In reality geological formations, σ 2 has a wide range of variation. As summarized in Sudicky' study [51], in low heterogeneous Canadian Forces Base Borden aquifers is σ 2 = 0.29, for Cape Cod is 0.14, but in highly heterogeneous Columbus aquifers, σ 2 = 4.5. First-order analysis [9] has been provided a solid basis for predictions. Numerical simulations [11] indicate that the first-order results are robust and applicable for σ 2 close to and even above 1. With this approximation, we can get the e Y = K exp(−σ 2 /2) for one and two dimensional cases [52], and e Y = K exp(−σ 2 /6) for three dimensional cases [53]. So in this paper we set the value of σ 2 to 0.1, 1 and 3, covering the three cases from small to moderate and large. The mean hydraulic conductivity is fixed to K = 15m/day, a value representative for gravel or coarse sand aquifers [54]. And we set all the correlation lengths in one and two dimensional cases equal 1m, in three dimensional cases, we set them different, λ 1 = 0.5m, λ 2 = 0.2m and λ 3 = 0.1m. Based on the above settings, we have finalized our test domain:

Manufactured solutions
To verify the accuracy of our model and build up an error estimation model, we use the method of manufactured Solution (MMS), which provides a general procedure for generating analytical solutions [55]. Malaya et al. [56] discussed the method manufactured solutions in constructing an error estimator for solution verification where one simulates the phenomenon of interest and has no priori knowledge of the solution. MMS, instead of relying upon the availability of an exact solution to the governing equations, specifies a manufactured solution. This artificial solution is then substituted into the partial differential equations. Naturally, there will be a residual term since the chosen function is unlikely to be an exact solution to the original partial differential equations. This residual can then be added to the code as a source term; subsequently, the MMS test uses the code to solve the modified equations and checks that the chosen function is recovered.
With MMS, the original problem is to find the solution Equation (5) is thus changed to the following form, For operator E(ĥ), we now get a source term f . By adding the source term to the original governing equation E, a slightly modified governing equation will be obtained: which is solved by the manufactured solutionĥ. The Neumann and Dirichlet boundary conditions are thus modified as follows: We adopt the form of the manufactured solution mentioned in Tremblay's study [55], where a i are arbitrary non-zero real numbers. When the manufactured solutions (28) are applied on the lift side of the Equation (5), we will get a source term f , To verify the adaptability of our model to different solutions, we also used another form of manufactured solution [56],ĥ the parameter values are the same as Equation (28). We can get the source term as follows, This leads to the change of the boundary conditions from Equation (8) to These source terms can be used as a physical law to describe the system, and also as a basis for evaluating neural networks. The specific form of the construct solutions and source term f used in this paper are given in Appendix B. The convolutional NAS approach has three main dimensions [57], the first one is a collection of candidate neural network structures called the search space, a narrowing of which can be achieved by combining it with the conditions of a known practical model. Second one is the search strategy, which represents in detail how NN works in the search space. And the last, the performance of the neural network is measured by certain metrics such as precision and speed, which is called performance evaluation. Inspired by Park [58], we construct the system configuration of the NAS fitted to the PINNs model in Figure 3. It consists of sensitivity analyses (SA), search methods, physics-informed neural networks (NN) generator, which eventually output the optimum neural architecture configuration and the corresponding weights and biases. A transfer learning model is eventually built up based on the weights and biases and the selected neural network configurations.  • Search Space. The search space defines the architecture that in principle can be represented. Combined with a priori knowledge of the typical properties of architectures well suited to the task, this can reduce the size of the search space and simplify the search. For the model in this study, the priori knowledge of search space is gained from the global sensitive analysis. Figure 4(b) shows a common global search space with a chain structure in NAS work.
The chain-structured neural network architecture can be written as a sequence of n layers, where i'th layer L i receives input from layer i − 1 and its output is used as input for layer i + 1: where are operations.
• Search Method. The search method is an initial filtering step to help us narrow down the search space. In this paper, hyperparameter optimizers will be used to accomplish this goal. It should be noted here the choice of search space largely determines the difficulty of the optimization problem, which may result in the optimization problem remaining (i) noncontinuous and (ii) relatively high-dimensional. Thus, the prior knowledge of the model features needs to be incorporated into consideration.
• Search Strategy. The search strategy details how to explore the search space. It's important to note that, on the one hand, it is desirable to quickly find architectures that perform well, but on the other hand, it should avoid converging too early to areas of suboptimal architecture.
• Performance Estimation Strategy. Performance estimation is the process of estimating this performance: the simplest option is standard training and validation of the data for the architecture. Based on discussion in Section 2.4, we can define the relative error for performance estimation strategy:

Modified NAS
To integrated with the physics-informed machine learning model, some changes have been drawn. For the modified model shown in Figure 3, the NAS is divided into four main phases. Namely, a sensitivity analysis to dive into prior knowledge behind the physics-informed machine learning model, which eventually helped to construct the search space in hope to be less dependent on human experts. The second phase is the search strategies, there are a wide choice of optimization methods. In this paper, we have tested several commonly used optimization strategies, including randomization search method, Bayesian optimization method, Hyperband optimization method, and Jaya optimization method. The third phase is the neural network generators, including the generation of physics-informed deep neural networks tailored for a mechanical model based on the information from optimization. The final phase are the training and validation models, with the input neural architectures, it will output the estimation strategies. A suitable estimation is recommended in Equation 34.

Neural networks generator
How to approximating a function, regarding solving a partial differential equation, has long been a problem in mathematics. Mathematicians have developed many tools to approximate functions, such as interpolation theory, frameworks, spectral methods, finite elements, etc. From the perspective of approximation theory, neural Networks can be viewed as a nonlinear smooth function approximator. Using the neural network, we can obtain an output value that reflects the quality, validity, etc. of the input data, adjust the configuration of the neural network based on this result, recalculate the result, and repeat these steps until the target is reached. Physics-informed neural networks, on the other hand, add physical conservation law and prior physical knowledge to the existing neural network, which require substantially less training data and can result in simpler neural network structures, while achieving high accuracy [59]. The diagram of its structure is as follows:

Physics-informed neural network
Physics-informed networks generator includes mainly neural networks interpreter, which represents the configuration of a NN, and physical information checker. For the neural networks interpreter, it comprises of a deep neural networks with multiple layers: input layer, one or more hidden layers and output layer. Each layer consists of one or more nodes called neurons, shown in the Figure 5 by small coloured circles, which is the basic unit of computation. For an interconnected structure, every two neurons in neighbouring layers have a connection, which is represented by a connection weight, see Figure 5. Mathematically, the output of a node is computed by: with z i the input, w i weight, b i bias and σ i activation function. With those concepts, we can draw a definition here: where d the dimension of the inputs, n the number of field variables, θ consisting of hyperparameters such as weights and biases and • denotes the element-wise composition.
The universal approximation theorem [60,61] reveals that this continuous bounded function F with nonlinear activation σ can be adopted to capture the smoothness, nonlinear property of the system. Accordingly, a theorem follows as [62]:

Deep collocation method
Collocation method is a widely used method seeking numerical solutions for ordinary, partial differential and integral equations [63]. It is a a popular method for trajectory optimization in control theory. A set of randomly distributed points (also known as collocation points) is often deployed to represent a desired trajectory that minimizes the loss function while satisfying a set of constraints. The collocation method tends to be relatively insensitive to instabilities (such as blowing/vanishing gradients with neural networks) and is a viable way to train the deep neural networks [64].
The modified Darcy equation (25) can be boiled down to the solution of a second order differential equations with boundary constraints. Hence we first discretize the physical domain with collocation points denoted by x Ω = (x 1 , ..., x NΩ ) T . Another set of collocation points are employed to discretize the boundary conditions denoted by x Γ (x 1 , ..., x NΓ ) T . Then the hydraulic headĥ is approximated with the aforementioned deep feedforward neural networkĥ h (x; θ). A loss function can thus be constructed to find the approximate solutionĥ h (x; ; θ) by minimizing of governing equation with boundary conditions. The mean squared error loss form is taken here.
Substitutingĥ h (x Ω ; θ) into governing equation, we obtain which results in a physical informed deep neural network E (x Ω ; θ). The boundary conditions illustrated in Section 2 can also be expressed by the neural network approximationĥ h (x Γ ; θ) as: Note the induced physical informed neural network E (x; θ), q (x; θ) share the same parameters aŝ h h (x; θ). Considering the generated collocation points in domain and on boundaries, they can all be learned by minimizing the mean square error loss function: with is then a solution to hydraulic head. Here, the defined loss function measures how well the approximation satisfies the physics law (governing equation), boundaries conditions. Our goal is to find the a set of parameters θ that the approximated potentialĥ h (x; θ) minimizes the loss L. If L is a very small value, the approximationĥ h (x; θ) is very closely satisfying governing equations and boundary conditions, namelŷ The solution of groundwater flow problems by deep collocation method can be reduced to an optimization problem. To train the deep feedforward neural network, the gradient-based optimization algorithms such as Adam is employed. The idea is to take a descent step at collocation point x i with Adam-based learning rates α i , And then the process in Equation (43) is repeated until a convergence criterion is satisfied. The combined Adam-L-BFGS-B minimization algorithm is used to train the physics-informed neural networks in that adding physics constraints makes it more difficult to train. This strategy consists of training the network first using the Adam algorithm, and after a defined number of iterations, performing the L-BFGS-B optimization of the loss with a small limit of executions.
Further, the approximation ability of neural networks for the potential problems needs to be proved. The approximation power of neural networks for a quasilinear parabolic PDEs has been proved by Sirignano et al. [65]. For groundwater flow problems, whose governing equation is an elliptic partial differential equation, the proof can be boiled down to: The groundwater flow problems has a unique solution, s.t.ĥ ∈ C 2 (Ω) with its derivatives uniformly bounded. Also, the heterogeneous hydraulic conductivity function K(x) is assumed to be C 1,1 (C 1 with Lipschitz continuous derivative). The smoothness of the K field is essentially determined by the correlation of the random field Y . According to [66], the smoothness conditions are fulfilled if the correlation of Y has a Gaussian shape and is infinitely differentiable. For the source term, the smoothness of the source term is determined by the constructed manufactured solutionĥ M M S , in Equations 28 and 30, which is obvious continuous and infinitely differentiable f ∈ C ∞ (Ω).

Theorem 2.
With assumption that Ω is compact and considering measures 1 , 2 , and 3 whose supports are constrained in Ω, Γ D , and Γ N . Also, the governing Equation (25) subject to 27 is assumed to have a unique classical solution and conductivity function K(x) is assumed to be C 1,1 (C 1 with Lipschitz continuous derivative). Then, ∀ ε > 0, ∃ λ > 0, which may dependent on sup Ω ĥ ii and sup Ω ĥ i , s.t. ∃ĥ h ∈ F n , that satisfies L(θ) ≤ λε Proof. For governing Equation (25) subject to 27, according to Theorem Recalling that the Loss is constructed in the form shown in Equation (40), for M SE G , applying triangle inequality, and obtains: Also, considering the C 1,1 conductivity function (45), it can be obtained that: On boundaries Γ D and Γ N , Therefore, using Equations (47) and (48), as n → ∞ With the hold of Theorem 2 and conditions that Ω is a bounded open subset of R, ∀n ∈ N + ,ĥ h ∈ F n ∈ L 2 (Ω), it can be concluded from Sirignano et al. [65] that: Theorem 3. ∀ p < 2,ĥ h ∈ F n converges toĥ strongly in L p (Ω) as n → ∞ withĥ being the unique solution to the potential problems.
In summary, for feedforward neural networks F n ∈ L p space (p < 2), the approximated solutionĥ h ∈ F n will converge to the solution to this PDE. This will justify the application of physics-informed and datadriven deep learning method in solving groundwater flow problems.

Sensitivity analyses(SA)
Sensitivity analysis is also a very useful tool in the model parameter calibration process, which aims to determine which aspects of the model is easiest to introduce uncertainty into the system description. Sensitivity analysis can be used to determine the influence of each parameter of the model on the output, and those that are most important to consider in the model calibration process. Parameters that have a large impact on the output can be disregarded if they have little or no effect on the model results. This will significantly reduce the workload of model calibration [67][68][69].
At this stage, global sensitivity analysis of parameters in hydrological models has gradually become a research hotspot. In this work, the parameter sensitivity analysis experiment contributes to the whole NAS model by offering prior knowledge of the DCM model, which helps to reduce dimensions of the search space and further improve the computational efficiency for optimization method.
Global sensitivity analysis methods can be subdivided into qualitative, such as Morris method [70], Fourier amplitude sensitivity test (FAST) [71], and quantitative analysis methods, such as sobol method [72], extend FAST [73]. Scholars have conducted numerous experiments to compare the advantages and disadvantages between different methods [74][75][76]. The results shows, that Sobol' method can provide quantitative results for SA, but it requires a large number of runs to obtain stable results. eFAST is more efficient and stable than Sobol' method, and can therefore be seen as a good alternative to the Sobol method. The method of Morris is able to correctly screen the most and least sensitive parameters for a highly parameterized model with 300 times fewer model evaluations than the Sobol' method. Therefore, we will take the same approach as Crosetto [77] did, i.e., first test all the hyper-parameters using the Morris method, remove the two most and least influential parameters, then filter them again, but with the eFAST method. In this way, we can get the highest accuracy in a relatively small amount of time.

Morris method
Morris method [70] is proposed by Morris in 1991. It works as follows: select a variable X from the model parameters, and also specify an objective function y(x) = f (X 1 , X 2 , ...X n ), change these variables X i by specific ranges, calculate the error e i to discern the extent to which parameter changes affect the output function. Taking n elements as input parameters to the model, the sensitivity of n parameters can be obtained. It's very effective in calculations.
where f (x) represents the prior point in the trajectory. Using the single trajectories shown in Equation (50), the basic effect of each parameter can be calculated with only p + 1 model evaluations. After sampling the trajectories, the resulting set of basic effects are then averaged to obtain total-order sensitivity of the i-th parameter µ * i , and its variance can be calculated by, Both the mean µ * and the variance σ 2 are important to the sensitivity, so we consider σ 2 + µ * 2 as the criteria for judging.

eFast method
The eFAST method [78] obtains the spectrum of the Fourier series by Fourier transformations, through which the spectrum curve is obtained by each parameter and the parameter's Variance of model results due to interactions. According to a suitable search function, the model y(x) = f (X 1 , X 2 , ...X n ) can be transformed by the Fourier transform into y = f (s) with, The spectral curve of the Fourier progression is defined asΛ j = A 2 j + B 2 j ,the variance of the model results due to the uncertainty in the parameter X i is with ω 1 , parametric frequency, Λ, Spectrum of Fourier transforms, Z 0 , Non-zero integers.
The total variance can be obtained by cumulatively summing the spectra at all frequencies, The sensitivity of the parameters to the output is, When finding the total sensitivity of X i , the frequency of X i is set to ω i , while a different frequency ω is set for all other parameters. By calculating the frequency ω i and its higher resonance pω i spectra, the output variance D −i due to the influence of all parameters except X i and their interrelationships can be obtained.Thus,

SALib
SALib [79] is a sensitivity analysis library in Python, which provides several sensitivity analysis methods, developed by Jon Herman and his coworkers, such as Sobol, Morris, and FAST. One can easily use it to perform sensitivity analysis on a model that has built already. And it comes also with a sensitivity result visualization feature that helps us to present our results in a good way.

Search methods for NNs
After the sensitivity analysis screening, hyperparameter optimization is tapped to find the detailed most suitable parameters for the physics-informed neural architectures. Hyper-parameter optimization is a combinatorial optimization problem and cannot be simply optimized by the gradient descent methods which may incorporate general parameters. The time cost in evaluating a set of hyperparameter configurations is usually very high, especially when the model is complex with many parameters. So finding an appropriate algorithm is crucial. There are a lot of optimization methods available these days. In this application, the classical randomization search method and Bayesian optimization method and some recent proposed optimization method, Hyperband algorithm and Jaya algorithm, with the latter a branch of heuristic learning method.

Randomization search method(RSM)
Randomization search method is one of the most basic algorithms, simply draw the hyper-parameters from the search space in random combinations, to find a configuration that performs best. The algorithm itself is very simple and does not make use of correlations between different combinations of hyperparameters. If there is a narrow cost-minimum region close to the boundary, then any solutions close to it may be excluded because those solutions' costs are all very high, so we have almost no probability of getting such a global minimum pathway. This is a common flaw of randomization algorithms and there is no good way to solve it.

Bayesian optimization
Bayesian optimization is an adaptive hyper-parametric search method that predicts the next combination that is likely to bring the most benefit based on the currently tested hyper-parametric combinations [80].
Assuming that the function f (x) of hyperparameter optimization obeys the Gaussian process, then p f (x) | x is a normal distribution. The Bayesian optimization process is modeled as a Gaussian process based on the results of existing N group experiments, H = {x n , y n } N n=1 , and calculate the posterior distribution After obtaining the posterior distribution of the objective function, an acquisition function a(x, H) is defined to trade off in sampling where the model predicts a high objective and sampling at locations where the prediction uncertainty is high. The goal is left to maximize the acquisition function to determine the next sampling point. Assuming y * = min y n , 1 ≤ n ≤ N is the optimal value in the currently existing sample, the desired improvement function is, The algorithm works as follows:

Hyperband algorithm
Hyperband [81] is an algorithm derived from the SuccessiveHalving algorithm [82]. The core idea of this algorithm is that, assume that there are n sets of hyperparameter combinations, then uniformly allocate budgets to these n sets of hyperparameters and perform a validation evaluation, eliminate half of the poorly performing hyperparameter sets based on the validation results, and then iterate the above process until an optimal hyperparameter combination is found. The disadvantage of this algorithm is that it is not possible to control the ratio between the budget and the hyperparameters, which leads to the fact that when either side is too large, the predictions will not be very good. Hyberband, on the other hand, offers a way of weighing the two. The specific algorithm is shown in Table 2:  end; 10 end; 11 output: configuration λ with lowest validation loss seen so far;

Jaya algorithm
The Jaya algorithm is easy to implement and does not require adjustment of any algorithm-related parameters. During each iteration, the solution is randomly updated with the following equation: r(i, j, 1) and r(i, j, 1) are a random number taken from the range [0, 1], to make sure good diversity of algorithms. The main goal of the Jaya algorithm is to improve the adaptation of each candidate solution in the population. Thus, the Jaya algorithm attempts to move the objective function value of each solution towards the best solution by updating the values of the variables. Once the values of the variables are updated, the updated solution is compared to the corresponding old solution, and the next generation considers only the good solution. In the Jaya algorithm, each generation of solutions is close to the best solution, while the candidate solutions move away from the worst solution. Thus, good concentration and diversity are achieved in the search process.
With the Jaya algorithm, the objective function value gradually approaches the optimal solution by updating the value of the variable. In the process, the fitness of each candidate solution in the population is improved. The process of the Jaya algorithm is presented in following flowchart [83]. The performance of Jaya algorithm is reflected by the minimum function estimation.

Transfer learning (TL)
It is introduced in the previous section that, for the physics-informed neural networks, since physical constraints are added to the loss function, the model becomes difficult to train. The combined optimizer is adopted for the model training. To improve the computational efficiency and inherit the learnt knowledge from the trained model, transfer learning algorithm is added to the entire model. Transfer learning is a research method in the field of machine learning. It focuses on storing knowledge gained while solving one problem and applying it to a different but related problem. The basic architecture of Transfer learning method of this Model is shown in Figure 7. It is mainly composed of a Pre-train model and several Fine-tune models. For Pre-train model, during the neural architecture procedure, the optimum neural architecture configuration is obtained through a hyperparameter optimization algorithm and meanwhile saving the corresponding weights and biases. Then the weights and biases are transferred for fine-tuning model. It has been proven in the numerical example section that this inheritance method can greatly shorten the time of program calculation and improve learning efficiency. With transfer learning, for different statistical parameters involved in the random log-hydraulic conductivity field, there is no need to train the whole model from the scratch and the solution to the modified Darcy equation can be easily yielded in Equation (25)

Numerical examples
In this section, numerical examples in different dimensions and boundary conditions are studied and compared. Firstly, the choice of exponential and Gaussian correlation functions is discussed, and after experimentally arriving at the optimal choice, conduct our numerical experiments with the heterogeneous hydraulic conductivity field constructed. Next, we filter the algorithm-specific parameters by means of sensitivity analysis and select the parameters that have the greatest impact on the model as our search space. Then four different hyperparameter optimization algorithms are compared in both accuracy and efficiency in hope to find a trade-off search method for the NAS model. The relative error in Equation (34) between the predicted results and the manufactured solution are obtained to built the search strategy for the NAS model. The results of the above selection from the proposed NAS based model are then substituted into the already built PINNs, and we can start solving the groundwater flow problem. For comparison, we use a finite difference method to fit the partial differential equations under the same conditions as the PINNs model. The simulations are done on a 64-bit Windows 10 server with Intel(R) Core(TM) i7-7700HQ CPU, 8GB memory. The accuracy of the numerical results by using the relative error of the hydraulic head. The relative error is defined as: Here, · refers to the l 2 − norm.

Comparison of Gaussian and exponential correlations
Before we proceed to the next step of our analysis, we first compare the two correlation coefficients, Gaussian and exponential. Those two are the most widely used correlations for random field generation, it is extremely significant to figure out the most suitable one for neural network approximators. We calculated the results obtained with these two correlation coefficients for the one-dimensional (1D), two-dimensional (2D), and three-dimensional (3D) stochastic groundwater flow cases, respectively, with the same choice of parameters. The number of hidden layers and the neurons per layer are uniformly set to 6 and 16. Meanwhile, the model with transfer learning (TL) and without transfer learning are compared accordingly. The results are shown below:

One dimensional groundwater flow with both correlations
The non-homogeneous 1D flow problem for Darcy equation can be reduced to solve Equation (25) subjected to Equation (B.2). The hydraulic conductivity K is constructed from Equation (24) by Radom spectral method as Equation (B.3). The source term f adopting manufactured solution Equation (B.1) can thus be obtained as Equation (B.4). The detailed derivation can be retrieved from Appendix B.
The relative errors δĥ of the predicted hydraulic head results for exponential and Gaussian correlation of the ln(K) field are shown in Tables 3 and 4. For the presented benchmark, it is obvious the Gaussian correlation is much more accurate for all N and σ 2 . With transfer learning model, the accuracy even improves. The predicted hydraulic head and velocity and the manufactured solution for both exponential and Gaussian correlations with σ 2 = 0.1 and N = 2000 are shown in Figure 8. It is clear the predicted results with proposed deep collocation method nearly coincides with the manufactured solution (B.1) in the 1D domain.   Additionally, shown in Figure 9 is the log(Loss) vs. iteration graph for different parameters in constructing the random log-hydraulic conductivity field. It can be observed from the log(Loss) vs. iteration graph the following two crucial aspects: 1).the loss value for the Gaussian correlation is much smaller than the exponential correlation for all σ 2 and N values; 2).with transfer learning, the loss function drops significantly faster and the number of required iterations is greatly reduced, which will eventually obtain more accurate results with much less calculation time. Judging from above mentioned performance, the Gaussian correlation outweigh the exponential correlation in generating random log-hydraulic conductivity field for the physics-informed machine learning.

Two dimensional groundwater flow with both correlations
To solve the non-homogeneous 2D flow problem for Darcy equation, the manufactured solution in Equation B.5 is adopted for model verification. The hydraulic conductivity K is constructed from Equation (24) by Radom spectral method as Equation (B.7). The source term f is taken the form in Equation (B.8). Also, for detailed derivation, it is added in Appendix B. Also, the exponential and Gaussian correlations for the heterogeneous hydraulic conductivity are tested with varying σ 2 and N values. The same conclusion can be drawn from Tables 5 and 6 that 1). with increasing N , the predict hydraulic head becomes more accurate, however when σ 2 becomes bigger, the accuracy deteriorates in most cases. Without doubt, the PINNs with Gaussian correlation based hydraulic conductivity produces much better results than the exponential correlations. With transfer learning technique, the accuracy further improves. The contour plots of the predicted hydraulic head and velocity and the manufactured solution for both exponential and Gaussian correlations with σ 2 = 0.1 and N = 2000 are shown in Figures 10, 11, 12 and 13. It is obvious the predicted physical patterns with proposed deep collocation method agrees well with the manufactured solution (B.1) in the 2D case. For the velocity, once again, the performance of the Gaussian correlation outweigh the exponential correlation.   Further, the log(Loss) vs. iteration graph for different parameters in constructing the random loghydraulic conductivity field is demonstrated in Figure 14. It becomes more conspicuous that the loss for PINNs with Gaussian correlations is much smaller and decreases faster. For exponential correlations, although the iteration close fast in the L-BFGS stage, the loss is not fully minimized, this is largely improved for PINNs with Gaussian correlations. The effects of transfer learning is well shown in the convergence graph that the loss function plummets with less iterations, which largely reduces the training time. Thus for two dimensional groundwater flow, the Gaussian correlation shows much better performance than the exponential correlation for the physics-informed machine learning. The transfer learning technique can be conducive in boosting the whole model.

Three dimensional groundwater flow with both correlations
The non-homogeneous 3D flow problem for Darcy equation is further studied and verified, which complements the benchmark construction in [50]. The manufactured solution in Equation B.16 is adopted for model verification of three dimensional groundwater flow simulation. The hydraulic conductivity K is con-structed according to Equation (B.14). The source term f is employing the form in Equation (B.15). The exponential and Gaussian correlations for the heterogeneous hydraulic conductivity are tested with varying σ 2 and N values. Tables 7 and 8 listed the relative error of hydraulic head for DCM with transfer learning or without transfer learning. It can be observed that with for different σ 2 and N values, the performance of the PINNs with both correlations varies largely. In a whole, the Gaussian correlation yields much better numerical results. With transfer learning model, the predicted results is much more accurate for most cases. This is really promising, since transfer learning not only reduces the iteration number, which will result in less computational time, but improve the accuracy of the model for different σ 2 and N values and for both correlation functions. The hydraulic head predicted by both correlation functions with σ 2 = 0.1 and N = 2000 are then shown in Figures 15, 16, 17 and 18. It can be seen that both correlation functions predicts results agreeable with the exact solutions. For the loss-vs.-iteration graph, it is obvious that both formulations with exponential and Gaussian correlation decrease to a stable value around y = 0 axis, however, the transfer learning model Gaussian correlation function converges with less iteration numbers.   Above all, the calculation training times involving for DCM with both correlation functions are shown in Table 9, it is obvious that the Gaussian correlation function occupies less calculation time. When equipped with transfer learning model, the training time decreases exceedingly, which is favorably considering the loss with physics-informed constraints is usually difficult to train. In summary, the comparison reveals that the loss function in the Gaussian correlation tends to decrease faster than the exponential, and that the error under Gaussian is much smaller and more stable than the exponential. Regarding the training time, in all three dimensions, using Gaussian correlations requires less computation time than exponential. It is worth noting that in the three-dimensional case, the loss function of the exponential correlation coefficient explodes in a gradient explosion when the number of collocation points is greater than 150, while in the Gaussian correlation coefficient this does not occur even if the number of collocation points is equal to 1000.
From this we conclude that the PINNs model fits better with Gaussian correlations, and therefore we will use it in the next experiments.

Sensitivity analysis results
The purpose of sensitivity analysis is to help us eliminate irrelevant variables, especially when there are a lot of parameter variables, and to help us reduce the time it takes to run the hyperparameter optimizer. It is often used in the analysis of practical problems because of the complexity of the actual situation. The hyper-parameters in this flow problem are assumed to be these five: Maximum line search of L-BFGS algorithm [30,300] Substituting these parameters in Table 7 into our model for the sensitivity analysis, we obtain the following results:   Figure 20 and Figure 21 we can easily conclude that the number of layers have the greatest impact on system sensitivity, and the neurons is almost as large as it, in contrast, the maximum line search of L-BFGS has almost no effect on it. So we remove the number of layers and the maxls, and continue to calculate the sensitivity of the remaining three in the eFast model, the results is shown in Figure 22. We can draw that, neurons is the second parameter that we should emphasize on. As a result, the layers and the neurons are chosen as the hyper-parameters in search space to be selected by the automated machine learning program.

Hyperparameter optimizations method comparison
In order to select the hyperparameter optimization algorithm that is more suitable for our model, namely with small fitness value and fast computational speed, we use the two hyperparameters selected from the sensitivity analysis 4.2 as search variables in search space and compute the four algorithms presented in the previous chapter, all remaining conditions setting equal. The horizontal coordinates in the Figure  23 below represent the number of neurons per layer and the vertical coordinates represent the number of hidden layers, and the points circled by the box are the points where the optimal configuration is located, corresponding to the number of optimal layers and the number of neurons. The colormap is setting to be the value of the performance estimation strategy defined in Equation 34. 10    From this we can conclude that for our model, the Bayesian method gives the best accuracy in the shortest possible time. Therefore, in the following calculations, we will take the Bayesian algorithm.
It should be noted that, due to the limited numbers of search, the optimal solution searched by the algorithm is not necessarily the best and will gradually approach the optimal configuration as the number of searches increases, but our experiments show that the configuration searched by us can still achieve a particularly good accuracy even when the number of search is small.
We continue the screening of 2D and 3D neural network configurations using a Bayesian approach, with the following results: The optimal configuration obtained after screening is shown in the following table: These neural network configurations will be used as input for the next numerical tests.

Finite difference method(FDM)
In this section, the Finite difference methods(FDM) for solving the modified governing equation is introduced, which will be used as comparisons for later model evaluation. We also tried FEM for solving this groundwater flow in heterogeneous porous media, the numerical results are rather bad, especially with the increasing of σ 2 in the heterogeneous random hydraulic conductivity. Instead, FDM results is more advantageous. Finite difference methods(FDM) is a type of the numerical method of differential equations seeks approximate solutions to differential equations by approximating the derivative by finite differences. When the target region is divided approximately closely, the more accurate the approximate solution obtained with the finite difference method will be, although the corresponding computation time will be considerably higher. The modified Darcy equation is solved with the following FDM scheme [84]: For 1D: where f i represents the data f at the collation points x i andĥ i is the value of the numerical solution at the same collation points. For 2D: where f i,j represents the data f at the grid points (x i , y j ) andĥ i,j is the value of the numerical solution at the same grid points. In formula (64), we have used the following notations: For 3D: where f i,j,k represents the data f at the grid points (x i , y j , z k ) andĥ i,j,k is the value of the numerical solution at the same grid points. In formula (66), we have used the following notations: Applying above Equations, we can write matlab programs to obtain approximate solutions.

Model validation in different dimensional
Now we solve the modified Darcy Equation (25) by the DCM method with PINNs configurations searched from modified NAS model, which is constructed in section 3, and the FDM method in section 4.4 is added in the comparison, to test the feasibility of our approach. We fix σ 2 to 0.1, and N to 1000, and observe the hydraulic head and velocity distribution inside the physical domain. In order to reflect the adaptability of our algorithm, we construct a new set of manufactured solutions, the specific form and source term are detailed in Appendix B.

One dimensional case model validation
The manufactured solution we used for 1D case is Equation (B.1). We validate the two methods by comparing hydraulic head in the x-direction over the interval [0,25]. It is clear from Figure 26 that, in one dimension, both methods work particularly well for our approximation of hydraulic head for partial differential equations.

Two dimensional case model validation
The manufactured solution used for 2D case is Equation (B.9). We validate the two methods by comparing hydraulic head and velocity calculated in the x-direction, along midlines y = 10, over the interval Seen from Figure 27, for this two dimensional case, both methods match well with the exact solution as to compute the hydraulic head. But as shown in Figure 28, the FDM method's simulation results of v x are a bit off, while it still agrees well for results predicted by the proposed method.

Three dimensional case model validation
The manufactured solution we used for 3D case is in Equation (B.16). We validate the two methods by comparing hydraulic head and velocity in the x-direction, along along the y = 1, z = 0.5, over the interval The situation is quite different in 3D case, with the mesh density, the results obtained with the FDM method are far diverting from the actual situation and do not fit at all, while the results predicted with the DCM method are still in good agreement of exact solution. The specific accuracy and calculation times for the two methods are shown in Tables 13 and 14:   1D without TL 1D with TR 2D without TL 2D with TR 3D without TL 3D with TL The results showed that the higher the dimensionality, the more pronounced the difference between the effects of the two methods. It's predictable, that, while the results with FDM method become more accurate as the mesh is more finely divided, this takes an exceptionally long time and has absolutely no value for use in practical applications. However for the proposed method, with transfer learning, the calculation time is less reduced, whose computational even increased in parallel with FDM for one dimensional case. For higher dimensions, the computational time with Transfer learning is also greatly reduced, and with accuracy even slightly improved.
Then, the contour for hydraulic head and velocity for the three dimensional case with the proposed method is shown below: It demonstrates that the proposed method agrees well with the adopted exact solution and the Gaussian random field with Gaussian correlation function fits really well with the PINNs based model. The isosurface diagram of the predicted head and velocity in the 3D case is shown below: We can also see from the Figures 30-33 that our model performs the task of fitting the modified Darcy equation for the three-dimensional case very well with higher precision and less computational time compared with FDM.

Conclusion
In this paper, we mainly describe the composition of the physics-informed modified NAS model fitted to the deep collocation method with sensitivity analysis and transfer learning. The random spectral method in closed form is adopted for the generation of log-normal hydraulic conductivity fields. It was calibrated computationally to generate the heterogeneous hydraulic conductivity field with Gaussian correlation function, and by sensitivity analysis and comparing hyperparameter selection optimizers, Bayesian algorithm was identified as the most suitable optimizer for the search strategy in NAS model. Further, the three dimensional benchmark for groundwater flow in highly heterogeneous aquifers are constructed and used as the verification of the proposed method. By introducing transfer learning, the training time is greatly reduced and accuracy improved.
Since no feature engineering is involved in our physics-informed model, the modified NAS based deep collocation method can be considered as truly automated "meshfree" method and can be used to approximate any continuous function, which is verified to be very suitable for the analysis of groundwater flow problems. The automated deep collocation method is very simple in implementation, which can be further applied in a wide variety of engineering problems.
From our numerical experiments, it is clearly demonstrated that the modified NAS based deep collocation method with randomly distributed collocations and a physics-informed neural networks perform very well with a combined MSE loss function minimized by the two step Adam and L-BFGS optimizer. In order to achieve the same accuracy, the FDM method requires a tighter grid, and the computation time, although only takes 2.8s in one dimensional with FDM method, time required with our method with transfer learning is much faster and much more accurate in higher dimensions than with the FDM method, especially in the case of 3D. This lends more credence to extend the application of the proposed approaches to more complex practical situations. Most importantly, once those deep neural networks are trained, they can be used to evaluate the solution at any desired points with minimal additional computation time.
The k in the two-dimensional case can be derived by analogy from the above inference as k = 1 √ 2π (µ 1 /λ 1 , µ 2 /λ 2 ). For exponential correlations in two dimensional case, the PDF of k can be transformed into the following form: A possible solution to calculate the cumulative distribution function(CDF) is the transformation from Cartesian into polar coordinates, i.e. a representation like: k 1 = rcos(2πĥ)/λ 1 , k 2 = rsin(2πĥ)/λ 2 .