1 Introduction

In recent years, groundwater pollution has become one of the most important environmental problems worldwide. To protect groundwater quality, it is necessary to predict groundwater flow and solute transport. This in turn is based on the theory of porous media. The associated heterogeneity and complexity of the porous media still poses a major challenge for such groundwater flow problems. The hydraulic conductivity describes the ability of the medium to transmit fluid through pore spaces. It is common to use random fields with a given statistical structure to describe the porous media because of its intrinsic complexity [1]. Freeze [2] showed, that the hydraulic conductivity field can be well characterized by random log-normal distribution. This approach is often used for the flow analysis in saturated zones [3]. Both Gaussian [4] and exponential [5] correlations are commonly chosen for the log-normal probability distribution.

Different approaches have been used express the permeability as a function of pore structure parameters [6,7,8]. Analytical spectral representation methods were first used by Bakr [9] to solve the stochastic flow and solute transport equations perturbed with a random hydraulic conductivity field. If a random field is homogeneous and has zero mean, then it is always possible to represent the process by a Fourier (or Fourier–Stieltjes) decomposition into essentially uncorrelated random components. These random components in turn will give the spectral density function, which is the distribution of variance over different wave numbers \(\textit{k}\). This theory is widely used to construct hydraulic conductivity fields, and a number of construction methods have been derived such as turning bands method [10], HYDRO_GEN method [11] and Kraichnan algorithm [12]. Ababou et al. [13] used the turning bands method and narrowed down a range of relevant parameters. Wörmann and Kronnnäs [13] tested a gradually increase of the heterogeneity of the flow resistance and compared the numerically simulated residence time PDF with the observed based on HYDRO_GEN method. Unlike the other two methods, Kraichnan proposed an approximation algorithm for direct control of random field accuracy, by increasing the modulus through the variance of the log-hydraulic conductivity random field. Inspired by these results, Kolyukhin and Sabelfeld [1] constructed a randomized spectral model (RSM) studying the steady flow in porous media in 3D assuming small fluctuations. We adopted this approach here to generate hydraulic conductivity fields.

Deep learning methods have become very popular since the development of deep neural networks (DNNs) [14, 15]. They have been applied to a variety of problems including image processing [16], object detection [17], speech recognition [18], just to name a few. While the majority of applications employ DNN as regression models, there has been some recent interest in exploiting DNN for the solution of PDEs. Mills et al. for instance solved the Schrödinger equation with convolutional neural networks by directly learning the mapping between the potential and energy [19]. Weinan E et al. presented deep learning-based numerical methods for high-dimensional parabolic PDEs and back-forward stochastic differential equations [20, 21]. Raissi et al. devised a machine learning approach for the solution of linear and nonlinear differential equations using Gaussian Processes [22]. In [23, 24] they presented a so-called physical informed neural network for supervised learning of nonlinear partial differential equations solving for instance Burger’s equations and Navier–Stokes equations. Beck et al. [25] solved stochastic differential and Kolmogorov equations with neural networks. For several forward and inverse problems in solid mechanics, we presented a Deep Collocation Method (DCM) [26, 27]. Instead of exploiting the strong form of the boundary value problem, we presented a deep energy method (DEM) [28,29,30] which requires the definition of an energy potential instead of a BVP.

In a typical machine learning application, the practitioner must apply appropriate data pre-processing, feature engineering, feature extraction and feature selection methods to make the dataset suitable for machine learning. Following these pre-processing steps, practitioners must perform algorithm selection and hyper-parameter optimization to maximize their final machine learning model’s predictive performance. The physics informed neural network (PINN) models discussed in the previous paragraph are no exception though the randomly distributed collocation points are generated for further calculation without the need for data-preprocessing. Most of the time wasted in PINN models are related to the tuning of neural architecture configurations, which strongly influences the accuracy and stability of the approach. Since many of these steps are typically beyond the capabilities of non-experts, automated machine learning (AutoML) has become popular. The oldest AutoML library is AutoWEKA [31], first released in 2013, which automatically selects models and hyper-parameters. Other notable AutoML libraries include auto-sklearn [32], H2O AutoML [33], and TPOT [34]. Neural architecture search (NAS) [35] is a technique for automatic design of neural networks, which allows algorithms to automatically design high-performance network structures based on sample sets. Neural architecture search (NAS) aims to find a configuration comparable to human experts on certain tasks and even discover certain network structures that have not been proposed by humans before, which can effectively reduce the use and implementation cost of neural networks. In [36], the authors add a controller in the efficient NAS, which can learn to discover neural network architectures by searching for an optimal subgraph within a large computational graph. They used parameter sharing between the subgraphs to make the computing process faster. The controller decides which parameter matrices are used by choosing the previous indices. Therefore, in ENAS, all recurrent cells in a search space share the same set of parameters. Liu [37] used a sequence model-based optimization (SMBO) strategy to learn a surrogate model to guide the search of the structure space. To build up an Neural architecture search (NAS) model, it is necessary to conduct dimensionality reduction and identification of valid parameter bounds to reduce the calculation involved in auto-tuning. A global sensitivity analysis can be used to identify valid regions in the search space and subsequently decrease its dimensionality [38], which can serve as a starting point for an efficient calibration process.

The remainder of this paper is organized as follows. In Sect. 2, we describe the physical model of the groundwater flow problem, the randomized spectral method to generate hydraulic conductivity fields as well as the approach of manufactured solutions to verify the accuracy of our model. In Section 3, we introduce the neural architecture search model. We subsequently present an efficient sensitivity analysis and compare several hyper-parameter optimizers to find an accurate and efficient search method. In Section 4, we briefly describe the employed finite difference method which is used to solve several benchmark problems as comparison. At last, some conclusions are drawn in Section 5.

2 Stochastic analysis of a heterogeneous porous medium

2.1 Darcy equation for groundwater flow problem

Consider the continuity equation for steady-state, aquifer flow in a porous media governed by the Darcy law:

$$\begin{aligned} {\varvec{q}}({\varvec{x}})=-K({\varvec{x}})\nabla (h({\varvec{x}})), \end{aligned}$$
(1)

where \({\varvec{q}}\) is the Darcy velocity, K is the hydraulic conductivity, h the hydraulic head \(h=H+\delta h\) with the mean H and the perturbation \(\delta h\). To describe the variation of the hydraulic conductivity as a function of the position vector \({\varvec{x}}\), it is convenient to introduce the variable

$$\begin{aligned} Y({\varvec{x}})=\ln {K({\varvec{x}})}, \end{aligned}$$
(2)

where \(Y({\varvec{x}})\) is the hydraulic log-conductivity with the mean \(\langle Y \rangle \) and perturbation \(Y'({\varvec{x}})\):

$$\begin{aligned} Y({\varvec{x}})=\langle Y \rangle +Y'({\varvec{x}}), \end{aligned}$$
(3)

with \(E[Y'({\varvec{x}})]=0\), and \(Y({\varvec{x}})\) is taken to be a three-dimensional statistically homogeneous random field characterized by its correlation function

$$\begin{aligned} C_Y({\varvec{r}}) = \langle Y'({\varvec{x}} + {\varvec{r}})Y'({\varvec{x}}) \rangle , \end{aligned}$$
(4)

where \({\varvec{r}}\) is the separation vector. According to the conservation equation \(\nabla \cdot {\varvec{q}} = 0\), Equation (1) can be rewritten in the following form:

$$\begin{aligned} E(h)=\sum _{j=1}^{N}\frac{\partial }{\partial x_j}\left( K({\varvec{x}})\frac{\partial h}{\partial x_j}\right) =0, \end{aligned}$$
(5)

which is subjected to the Neumann and Dirichlet boundary conditions

$$\begin{aligned} \begin{aligned} h({\varvec{x}})={\bar{h}}, {\varvec{x}} \in \tau _D,\\ q_n({\varvec{x}})={\bar{q}}_n, {\varvec{x}} \in \tau _N. \end{aligned} \end{aligned}$$
(6)

with N denoting the dimension. The groundwater flow problem can be boiled down to find a solution h such that Equations (5) and (6) hold; E is an operator that maps elements of vector space H to vector space V:

$$\begin{aligned} E:H\rightarrow V, with\,h\in H. \end{aligned}$$
(7)

With Equation (5) and \(N=3\) in domain \(D=[0,L_x]\times [0,L_y]\times [0,L_z]\), the Dirichlet boundary and Neumann boundary conditions can be assumed as follows:

$$\begin{aligned} \left\{ \begin{array}{lr} h(0, y, z) = -J\cdot L_x, \quad h(L_x, y, z) = 0, &{}\forall y \in [0, L_y], z \in [0, L_z],\\ \frac{\partial h}{\partial y}(x,0,z)= \frac{\partial h}{\partial y}(x,L_y,z)=0, &{}\forall x \in [0, L_x],z \in [0, L_z],\\ \frac{\partial h}{\partial z}(x,y,0)= \frac{\partial h}{\partial z}(x,y,L_z)=0, &{} \forall x \in [0, L_x],y \in [0, L_y], \end{array} \right. \end{aligned}$$
(8)

where J is the mean slope of the hydraulic head in x direction [9]. As suggested by Ababou [13], the scale of the fluctuation should be significantly smaller than the scale of the domain. The lengths \(L_x, L_y, L_z\) of the domain are usually set to be ten times larger than \(\uplambda \). A reasonable mesh size \(\Delta x\) could then be

$$\begin{aligned} \frac{\Delta x}{\uplambda } \le \frac{1}{5}. \end{aligned}$$
(9)

As \(Y'\) is homogeneous and isotropic, we consider two correlation functions: the exponential correlation function [39],

$$\begin{aligned} C_Y({\varvec{r}})=\sigma _Y^2exp\left( -\frac{\left|{\varvec{r}}\right|}{\uplambda }\right) , \end{aligned}$$
(10)

and the Gaussian correlation function [40],

$$\begin{aligned} C_Y({\varvec{r}})=\sigma _Y^2exp\left( -\frac{\left|{\varvec{r}}\right|^2}{\uplambda ^2}\right) , \end{aligned}$$
(11)

where \(\uplambda \) is the log conductivity correlation length scale.

2.2 Generate the hydraulic conductivity fields

Due to the intrinsic complexity of heterogeneous porous media, random field theory is implemented for the generation of the heterogeneous field showing a fractal behavior. Applying Wiener–Khinchin theorem to the heterogeneous groundwater flow problem, the Gaussian random field [41] with given spectral density \(S({\varvec{k}})\) is just the Fourier transform of the correlation function (in Equation (4)):

$$\begin{aligned}&C_Y({\varvec{r}})=\int _{{\mathbb {R}}^N} e^{i2\pi {\varvec{k}}\cdot {\varvec{r}}}S({\varvec{k}}), \end{aligned}$$
(12)
$$\begin{aligned}&S({\varvec{k}})=\int _{{\mathbb {R}}^N} e^{-i2\pi {\varvec{k}}\cdot {\varvec{r}}}C_{Y}({\varvec{r}}), \end{aligned}$$
(13)

with \(S({\varvec{k}})\) spectral function of the random field \(Y'({\varvec{x}})\) and

$$\begin{aligned}&{\mathscr {F}}(exp(-\frac{\left|{\varvec{r}}\right|}{\uplambda }))=\frac{2\uplambda }{1+4\pi ^2{\varvec{k}}^2{\varvec{r}}^2}, \end{aligned}$$
(14)
$$\begin{aligned}&{\mathscr {F}}(exp(-\frac{\left|{\varvec{r}}\right|^2}{\uplambda ^2}))=\uplambda \sqrt{\pi }e^{-\pi ^2{\varvec{k}}^2{\varvec{r}}^2}. \end{aligned}$$
(15)

Substituting Eqs. (10), (11), (14) and (15) into Eq. (13), respectively, the spectral function under the exponential and the Gaussian correlation coefficient can be derived:

$$\begin{aligned}&S({\varvec{k}},\uplambda )=\sigma _Y^2 \uplambda ^d (1+(2\pi {\varvec{k}}\uplambda )^2)^{-\frac{d+1}{2}}, \end{aligned}$$
(16)
$$\begin{aligned}&S({\varvec{k}},\uplambda )=\sigma _Y^2 \pi ^{d/2}\uplambda ^d e^{-(\pi {\varvec{k}}\uplambda )^2}. \end{aligned}$$
(17)

A Gaussian homogenous random field in the general case can be retrieved [42]:

$$\begin{aligned} Y'({\varvec{x}}) = \sqrt{\frac{2\sigma ^2}{N}}\sum _{i=1}^{N} \big (\xi _1 cos(2\pi {\varvec{k}}_i {\varvec{x}})+\xi _2 sin(2\pi {\varvec{k}}_i {\varvec{x}})\big ), \end{aligned}$$
(18)

where \(\xi _i\) are mutually independent Gaussian random variables. For the random variable \({\varvec{k}}_i\), we can get its probability density distribution function \(p({\varvec{k}})\) and calculate its cumulative distribution function (\(\textit{cdf}\)) according to \(F(k)=\int _{-\infty }^{{\varvec{k}}}p(x)\mathrm {d}x\). As long as there exists another uniformly distributed random variable \(\theta \), the inverse function \({\varvec{k}} = F^{-1}(\theta )\) can be obtained, and k must obey the \(p({\varvec{k}})\) distribution. More details can be found in  AppendixB while the associated python script is summarized in  AppendixC. Figures 1 and 2 show the two- and three-dimensional random field space with fixed \(\langle k\rangle =\) 15, \(\sigma ^2 =\) 0.1 and \(N=\) 1000.

Fig. 1
figure 1

Two-dimensional hydraulic conductivity field with (a) exponential correlation and b Gaussian correlation

Fig. 2
figure 2

Three-dimensional hydraulic conductivity field with (a) exponential correlation and b Gaussian correlation

2.3 Defining numerical experimental model

After determining the expression for the stochastic flow analysis in heterogeneous porous materials, we need to set the geometric and physical parameters. It is well known that the number of modes N and the variance \(\sigma ^2\) govern the hydraulic conductivity. For the exponential correlation coefficient, large values of N might lead to a non-differentiable K-field [43]. We set the N values to 500, 1000 and 2000; \(\sigma ^2\) determines the heterogeneity of the hydraulic conductivity, a larger \(\sigma ^2\) indicating a larger heterogeneity. In real geological formations, \(\sigma ^2\) has a wide range of variation. As summarized in Sudicky’s study [44], in low heterogeneous Canadian Forces Base Borden aquifers, it is \(\sigma ^2 =0.29\), for Cape Cod it is 0.14, but in highly heterogeneous Columbus aquifers, \(\sigma ^2 =4.5\). First-order analysis [9] has been proven as a solid basis for predictions. Numerical simulations [11] indicate that the first-order results are robust and applicable when \(\sigma ^2\) is close to and even above 1. With this approximation, we can get the \(e^{\langle Y\rangle }=\langle K\rangle exp(-\sigma ^2/2)\) for one- and two-dimensional cases [45], and \(e^{\langle Y\rangle }=\langle K\rangle exp(-\sigma ^2/6)\) for three-dimensional cases [46]. In this paper, we set the value of \(\sigma ^2\) to 0.1, 1 and 3, covering the three cases from small to medium and large. The mean hydraulic conductivity is fixed to \(\langle K\rangle = 15 m/day\), a value representative for gravel or coarse sand aquifers [47]. And we set all the correlation lengths in one- and two-dimensional cases equal 1m, in three dimensional cases, we set them to \(\uplambda _1=0.5 m\), \(\uplambda _2=0.2 m\) and \(\uplambda _3=0.1 m\). Based on the above settings, we have finalized our test domain:

  • One-dimensional groundwater flow \(\rightarrow [0,25]\).

  • Two-dimensional groundwater flow \(\rightarrow [0,20]\times [0,20]\).

  • Three-dimensional groundwater flow \(\rightarrow [0,5]\times [0,2]\times [0,1]\).

2.4 Manufactured solutions

To verify the accuracy of our model and obtain an error estimation, we use the method of manufactured solution (MMS), which provides a general procedure for generating analytical solutions [48]. Malaya et al. [49] discussed the method of manufactured solutions in constructing an error estimator for solution verification where one simulates the phenomenon of interest with no priori knowledge of the solution. This artificial solution is then substituted into the equations. There will be a residual term since the chosen function is unlikely to be an exact solution to the original partial differential equations. This residual can then be added as a source term. With MMS, the original problem to find the solution of Equation (5) is thus changed to the following form:

$$\begin{aligned} E({\hat{h}})=\sum _{j=1}^{N}\left( \frac{\partial }{\partial x_j}\left( K({\varvec{x}})\frac{\partial {\hat{h}}}{\partial x_j}\right) \right) =\sum _{j=1}^{N}f_j=f. \end{aligned}$$
(19)

For operator \(E({\hat{h}})\), we now get a source term f. By adding the source term to the original governing equation E, a slightly modified governing equation will be obtained:

$$\begin{aligned} E'({\hat{h}})=E({\hat{h}})-f=0, \end{aligned}$$
(20)

which is solved by the manufactured solution \({\hat{h}}\). The Neumann and Dirichlet boundary conditions are thus modified as follows:

$$\begin{aligned} \begin{aligned} {\hat{h}}({\varvec{x}})&={\hat{h}}_{MMS}({\varvec{x}}), {\varvec{x}} \in \tau _D,\\ {\hat{q}}_n({\varvec{x}})&=-K({\varvec{x}}){\hat{h}}_{MMS,n}({\varvec{x}}), {\varvec{x}} \in \tau _N. \end{aligned} \end{aligned}$$
(21)

We adopt the form of the manufactured solution mentioned in Tremblay’s study [48],

$$\begin{aligned} {\hat{h}}_{MMS}({\varvec{x}})=a_0+sin\left( \sum _{j=1}^{N}a_j x_j\right) , \end{aligned}$$
(22)

where \(\left\{ a_i \right\} \) are arbitrary non-zero real numbers. When the manufactured solutions (22) are applied on the left side of Eq. (5), we will get a source term f,

$$\begin{aligned} f(x_j)=a_j \frac{\partial K({\varvec{x}})}{\partial x_j} cos\left( \sum _{i=1}^{N}a_i x_i\right) -a_j^2 K({\varvec{x}}) sin\left( \sum _{i=1}^{N}a_i x_i\right) . \end{aligned}$$
(23)

To verify the adaptability of our model to different solutions, we also used another form of manufactured solution [49],

$$\begin{aligned} {\hat{h}}_{MMS}({\varvec{x}})=a_0+\sum _{j=1}^{N}sin( a_j x_j), \end{aligned}$$
(24)

where the parameter values are the same as in Equation (22). We can get the source term as follows:

$$\begin{aligned} f(x_j)=a_j \frac{\partial K({\varvec{x}})}{\partial x_j} cos(a_j x_j)-a_j^2 K({\varvec{x}}) sin(a_j x_j). \end{aligned}$$
(25)

This leads to the change of the boundary conditions from Equation (6) to

$$\begin{aligned} \left\{ \begin{array}{lr} {\hat{h}}(0, y, z) ={\hat{h}}_{MMS}(0,y,z), &{}\forall y \in [0, L_y], z \in [0, L_z]\\ {\hat{h}}(L_x, y, z) = {\hat{h}}_{MMS}(L_x,y,z), &{}\forall y \in [0, L_y], z \in [0, L_z]\\ \frac{\partial {\hat{h}}}{\partial y}(x,0,z)= \frac{\partial {\hat{h}}_{MMS}}{\partial y}(x,0,z), &{}\forall x \in [0, L_x],z \in [0, L_z] \\ \frac{\partial {\hat{h}}}{\partial y}(x,L_y,z)= \frac{\partial {\hat{h}}_{MMS}}{\partial y}(x,L_y,z), &{}\forall x \in [0, L_x],z \in [0, L_z] \\ \frac{\partial {\hat{h}}}{\partial z}(x,y,0)= \frac{\partial {\hat{h}}_{MMS}}{\partial z}(x,y,0), &{} \forall x \in [0, L_x],y \in [0, L_y]\\ \frac{\partial {\hat{h}}}{\partial z}(x,y,L_z)= \frac{\partial {\hat{h}}_{MMS}}{\partial z}(x,y,L_z), &{} \forall x \in [0, L_x],y \in [0, L_y] \end{array} \right. \end{aligned}$$
(26)

These source terms can be used as a physical law to describe the system, and also as a basis for evaluating neural networks. The specific form of the constructed solutions and source terms f used in this paper are given in  AppendixC.

3 Deep learning-based neural architecture search method

3.1 Modified neural architecture search (NAS) model

The convolutional NAS approach has three main components [35]. The first one is a collection of candidate neural network structures called the search space. The second one is the search strategy and the last one is the performance evaluation. Inspired by Park [50], we construct the system configuration of the NAS fitted to the PINNs model in Fig. 3. It consists of a sensitivity analyses (SA), search methods, a physics-informed neural networks (NN) generator, which eventually outputs the optimum neural architecture configuration and the corresponding weights and biases. A transfer learning model is eventually built based on the weights, biases and the selected neural network configurations.

Fig. 3
figure 3

Overall methodology for NAS

3.1.1 Components of convolutional NAS

As already pointed out, the main components of the conventional neural architecture search method are

  • Search Space. The search space defines the architecture that can be represented. Combined with a priori knowledge of the typical properties of architectures well suited to the underlying task, this can reduce the size of the search space and simplify the search. For the model in this study, the priori knowledge of search space is gained from the global sensitive analysis. Figure 4b shows a common global search space with a chain structure. The chain-structured neural network architecture can be written as a sequence of n layers, where the ith layer \(L_i\) receives input from layer \(i-1\) and its output is used as input for layer \(i+1\):

    $$\begin{aligned} output = L_n\odot L_{n-1}\odot ... L_1\odot L_0, \end{aligned}$$
    (27)

    where \(\odot \) are operations.

  • Search Method. The search method is an initial filtering step narrowing down the search space. In this paper, hyperparameter optimizers will be used. The choice of the search space largely determines the difficulty of the optimization problem, which may result in the optimization problem remaining (i) noncontinuous and (ii) high-dimensional. Thus, some prior knowledge of the model features is needed.

  • Performance Estimation Strategy. The simplest option for a performance estimation strategy is standard training and validation of the data for the architecture. As pointed out in Sect. 2.4, we define the relative error of manufactured solution for the performance estimation strategy:

    $$\begin{aligned} \delta h=\frac{\Vert {\hat{h}}-{\hat{h}}_{MMS}\Vert _2}{\Vert {\hat{h}}_{MMS}\Vert _2}. \end{aligned}$$
    (28)
Fig. 4
figure 4

a Abstract illustration of NAS methods and b search space

3.1.2 Modified NAS

For the modified model shown in Fig. 3, the NAS is divided into four main phases. First, a sensitivity analysis will construct the search space with less human expert knowledge. Secondly, we test several optimization strategies including randomization search method, Bayesian optimization method, Hyperband optimization method, and Jaya optimization method. The third phase is the neural network generation including the generation of physics-informed deep neural networks tailored for a mechanical model based on the information from optimization. The final phase are the training and validation models, with the input neural architectures, which outputs the estimation strategies. A suitable estimation is recommended in Eq. (28).

3.2 Neural networks generator

Mathematicians have developed many tools to approximate functions such as interpolation theory, spectral methods, finite elements, etc. From the perspective of approximation theory, neural networks can be viewed as a nonlinear smooth function approximator. Using the neural network (NN), we can obtain an output value that reflects the quality, validity, etc. of the input data, adjusts the configuration of the neural network based on this result, recalculates the results and repeats these steps until the target is reached. Physics-informed neural networks, on the other hand, add physical conservation law and prior physical knowledge to the existing neural network, which require substantially less training data and can result in simpler neural network structures, while achieving high accuracy. The diagram of its structure is shown in Fig. 5. In this section, we further formulate the schematic on generation of physics-informed neural networks from two aspects: first, the deep neural network as universal smooth approximation methods is introduced and a simple and generalized way to introduce physics information for flow in heterogeneous media into the deep neural networks.

Fig. 5
figure 5

Physics-informed neural networks

3.2.1 Physics-informed neural network

Physics-informed neural networks generators include neural network interpreters, which represent the configuration of a NN and physical information checkers. The neural network interpreter consists of a deep neural network with multiple layers: the input layer, one or more hidden layers and the output layer. Each layer consists of one or more nodes called neurons, shown in Fig. 5 by small coloured circles, which is the basic unit of computation. For an interconnected structure, every two neurons in neighbouring layers have a connection, which is represented by a weight, see Figure 5. Mathematically, the output of a node is computed by

$$\begin{aligned} y_{i}=\sigma _i\left( \sum _{j} w_{j}^{i}z_{j}^{i}+b^{i}\right) \end{aligned}$$
(29)

with input \(z^{i}\), weight \(w^{i}\), bias \(b^{i}\) and activation function \(\sigma _i\). Now let us define:

Definition 3.1

(Feedforward Neural Network) A generalized neural networks can be written in tuple form \(\left( (f_1,\sigma _1),...,(f_n,\sigma _n)\right) \), \(f_i\) being an affine-line function \((f_i = W_i{\varvec{{x}}}+b_i)\) that maps \(R^{i-1} \rightarrow R^{i}\) and the activation \(\sigma _i\) mapping \(R^{i} \rightarrow R^{i}\). The tuple form defines a continuous bounded function mapping \(R^{d}\) to \(R^{n}\):

$$\begin{aligned} FNN: {\mathbb {R}}^d \rightarrow {\mathbb {R}}^n, \; \text {with}\; \; F^n\left( {\varvec{{x}}};\theta \right) = \sigma _n\circ f_n \circ \cdots \circ \sigma _1 \circ f_1, \end{aligned}$$
(30)

where d the dimension of the input, n the number of field variables, \(\theta \) consisting of hyperparameters such as weights and biases and \(\circ \) denotes the element-wise composition.

The universal approximation theorem [51, 52] states that this continuous bounded function F with nonlinear activation \(\sigma \) can be adopted to capture the smoothness, nonlinear property of the system. Accordingly, it can be shown that the following theorem holds: [53]:

Theorem 1

If \(\sigma ^i \in C^m(R^i)\) is nonconstant and bounded, then \(F^n\) is uniformly m-dense in \(C^m(R^n)\).

3.2.2 Deep collocation method

Collocation method is a widely used method seeking numerical solutions for partial differential and integral equations [54]. It is a popular method for trajectory optimization in control theory. A set of randomly distributed points (also known as collocation points) represents a desired trajectory that minimizes the loss function while satisfying a set of constraints. The collocation method is relatively insensitive to instabilities (such as blowing/vanishing gradients with neural networks) and is a viable way to train the deep neural network [55].

The modified Darcy equation (19) can be boiled down to the solution of a second-order differential equations with boundary constraints. Hence, we first discretize the physical domain with collocation points denoted by \({\varvec{{x}}}\,_\Omega =(x_1,...,x_{N_\Omega })^T\). Another set of collocation points are employed to discretize the boundary conditions denoted by \({\varvec{{x}}}\,_\Gamma (x_1,...,x_{N_\Gamma })^T\). Then the hydraulic head \({\hat{h}}\) is approximated with the aforementioned deep feedforward neural network \({\hat{h}}^h ({\varvec{x}};\theta )\). A loss function can thus be constructed to find the approximate solution \({\hat{h}}^h \left( {\varvec{x}};;\theta \right) \) by minimizing the governing equation with boundary conditions. Substituting \({\hat{h}}^h \left( {\varvec{x}}\,_\Omega ;\theta \right) \) into governing equation, we obtain

$$\begin{aligned} E'\left( {\varvec{{x}}}\,_\Omega ;\theta \right) =K({\varvec{x}}){\hat{h}}_{,ii}^{h}\left( {\varvec{{x}}}\,_\Omega ;\theta \right) +K_{,i}({\varvec{x}}){\hat{h}}^{h}_{,i}\left( {\varvec{{x}}}\,_\Omega ;\theta \right) -f\left( {\varvec{{x}}}\,_\Omega \right) , \end{aligned}$$
(31)

which results in a physical informed deep neural network \(E'\left( {\varvec{{x}}}\,_\Omega ;\theta \right) \). The boundary conditions illustrated in Section 2 can also be expressed by the neural network approximation \({\hat{h}}^h \left( {\varvec{{x}}}\,_\Gamma ;\theta \right) \) as

On \(\Gamma _{D}\), we have

$$\begin{aligned} {\hat{h}}^h \left( {\varvec{{x}}}\,_{\Gamma _D};\theta \right) ={\hat{h}}_{MMS}\left( {\varvec{{x}}}\,_{\Gamma _D}\right) . \end{aligned}$$
(32)

On \(\Gamma _{N}\),

$$\begin{aligned} {\hat{q}}_n^h \left( {\varvec{{x}}}\,_{\Gamma _N};\theta \right) = -K\left( {\varvec{{x}}}\,_{\Gamma _N}\right) {\hat{h}}_{MMS,n}\left( {\varvec{{x}}}\,_{\Gamma _N}\right) . \end{aligned}$$
(33)

Note the neural network \(E'\left( {\varvec{{x}}};\theta \right) \), \(q\left( {\varvec{{x}}};\theta \right) \) shares the same parameters as \({\hat{h}}^h \left( {\varvec{{x}}};\theta \right) \). With the generated collocation points in the domain and on the boundaries as training dataset, the field function can be learned by minimizing the mean square error loss function:

$$\begin{aligned} L\left( \theta \right) =MSE=MSE_{E'}+MSE_{\Gamma _{D}}+MSE_{\Gamma _{N}}, \end{aligned}$$
(34)

with

$$\begin{aligned} \begin{aligned}&MSE_{E'}=\frac{1}{N_d}\sum _{i=1}^{N_d}\begin{Vmatrix} E'\left( {\varvec{{x}}}\,_\Omega ;\theta \right) \end{Vmatrix}^2,\\&MSE_{\Gamma _{D}}=\frac{1}{N_{\Gamma _D}}\sum _{i=1}^{N_{\Gamma _D}}\begin{Vmatrix} {\hat{h}}^h \left( {\varvec{{x}}}\,_{\Gamma _D};\theta \right) -{\hat{h}}_{MMS}\left( {\varvec{{x}}}\,_{\Gamma _D}\right) \end{Vmatrix}^2,\\&MSE_{\Gamma _{N}}=\frac{1}{N_{\Gamma _N}}\sum _{i=1}^{N_{\Gamma _N}}\begin{Vmatrix} {\hat{q}}_n\left( {\varvec{{x}}}\,_{\Gamma _N};\theta \right) +K\left( {\varvec{{x}}}\,_{\Gamma _N}\right) {\hat{h}}_{MMS,n}\left( {\varvec{{x}}}\,_{\Gamma _N}\right) \end{Vmatrix}^2, \end{aligned} \end{aligned}$$
(35)

where \(x\,_\Omega \in {R^N} \), \(\theta \in {R^K}\) are the neural network parameters. \(L\left( \theta \right) = 0\), \({\hat{h}}^h \left( {\varvec{{x}}};\theta \right) \) is a solution to the hydraulic head. Here, the defined loss function measures how well the approximation satisfies the physical law (governing equation) and boundaries conditions. Our goal is to find the a set of parameters \(\theta \) that the approximated potential \({\hat{h}}^h \left( {\varvec{{x}}};\theta \right) \) minimizes the loss L. If L is a very small value, the approximation \({\hat{h}}^h \left( {\varvec{{x}}};\theta \right) \) is closely satisfying the governing equations and boundary conditions, namely

$$\begin{aligned} {\hat{h}}^h = \mathop {\arg \min }_{\theta \in R^K} L\left( \theta \right) . \end{aligned}$$
(36)

The solution of groundwater flow problems by the deep collocation method can be reduced to an optimization problem. To train the deep feedforward neural network, the gradient descant based optimization algorithms such as Adam are employed. The idea is to take a descent step at collocation point \({\varvec{{x}}}_{i}\) with Adam-based learning rates \(\alpha _i\),

$$\begin{aligned} \theta _{i+1} = \theta _{i} + \alpha _i \bigtriangledown _{\theta } L \left( {\varvec{{x}}}_i;\theta _i \right) . \end{aligned}$$
(37)

The process in Eq. (37) is repeated until a convergence criterion is satisfied. The combined Adam-L-BFGS-B minimization algorithm is used to train the physics-informed neural networks. This strategy consists of training the network first using the Adam algorithm, and after a defined number of iterations, we perform the L-BFGS-B optimization of the loss with a small limit of executions.

The approximation ability of neural networks for solving partial differential equations has been proven by Sirignano et al. [56]. For the stochastic analysis of porous material model, as long as the problem has a unique solution, s.t. \({\hat{h}} \in C^2(\Omega )\) with its derivatives uniformly bounded and the heterogeneous hydraulic conductivity function \(K({\varvec{{x}}})\) is assumed to be \(C^{1,1}\) (\(C^1\) with Lipschitz continuous derivative), we can conclude that

$$\begin{aligned} \exists \;\;{\hat{h}}^h \in F^n, \;\;s.t. \;\;as\;\;n\rightarrow \infty ,\;\;L(\theta )\rightarrow 0,\;\;{\hat{h}}^h\rightarrow {\hat{h}}. \end{aligned}$$
(38)

More details can be found in  AppendixD and [56].

3.3 Sensitivity analyses (SA)

Sensitivity analysis determines the influence of each parameter of the model on the output. Only the most important ones will be considered in the model calibration process. Parameters that have an impact on the output can be disregarded if they have little or no effect on the model results. This will significantly reduce the workload of model calibration [57,58,59]. In this work, the parameter sensitivity analysis experiment contributes to the whole NAS model by offering prior knowledge of the DCM, which helps to reduce dimensions of the search space and further improves the computational efficiency for the optimization method.

Global sensitivity analysis methods can be subdivided into qualitative ones such as Morris method [60], Fourier amplitude sensitivity test (FAST) [61] and quantitative analysis methods including Sobol method [62] or extend FAST [63]. Scholars have conducted numerous experiments to compare the advantages and disadvantages between different methods [64,65,66]. The results shows that Sobol’ method can provide quantitative results for SA, but it requires a large number of runs to obtain stable results. eFAST is more efficient and stable than Sobol’ method and is thus a good alternative. The method of Morris is able to correctly screen the most and least sensitive parameters for a highly parameterized model with 300 times fewer model evaluations than the Sobol’ method. We will follow an approach proposed by Crosetto [67], i.e., first test all the hyper-parameters using the Morris method, remove the two most and least influential parameters, then filter them again, but with the eFAST method. This yields the highest accuracy in a relatively small amount of time.

3.4 Search methods for NNs

After the sensitivity analysis, the search space is reduced and a suitable search method is employed to explore the space of the neural architecture. The search method adopts the performance metrics as rewards and learns to generate a high-performance architecture candidate. We employ the classical randomization search method, Bayesian optimization method [68] and some recently proposed optimization methods including the Hyperband algorithm [69] and Jaya algorithm [70].

3.5 Transfer learning (TL)

A combined optimizer is adopted for the model training. To improve the computational efficiency and inherit the learned knowledge from the trained model, transfer learning algorithm is added to train the model. Transfer learning stores knowledge gained while solving one problem and applies it to a different but related problem. The basic architecture of Transfer learning of this Model is shown in Figure 6. It is composed of a Pre-train model and several Fine-tune models. During the neural architecture procedure, the optimum neural architecture configuration is obtained through a hyperparameter optimization algorithm saving the corresponding weights and biases. Then the weights and biases are transferred to the fine-tuning model. It has been proven in the numerical example section that this inheritance method can greatly improve the learning efficiency. For different statistical parameters involved in the random log-hydraulic conductivity field, there is no need to train the whole model from scratch and the solution to the modified Darcy equation is obtained with less iterations, lower learning rates and higher accuracy.

Fig. 6
figure 6

Transfer learning schematic

4 Numerical examples

In this section, numerical examples in different dimensions and with various boundary conditions are studied and compared. Firstly, the influence of the exponential and Gaussian correlation functions is discussed. Next, we filter the algorithm-specific parameters by means of a sensitivity analysis and select the parameters that have the greatest impact on the model as our search space. Then, four different hyperparameter optimization algorithms are compared in both accuracy and efficiency identifying a trade-off search method for the NAS model. The relative error in Equation (28) between the predicted results and the manufactured solution is obtained to built the search strategy for the NAS model. These results are then substituted into the PINN. The results of the PINN model are compared to results obtained with FDM. All simulations are done on a 64-bit Windows 10 server with Intel(R) Core(TM) i7-7700HQ CPU, 8GB memory. The accuracy of the numerical results are compared through the relative error of the hydraulic head defined as

$$\begin{aligned} \delta {\hat{h}}=\frac{\Vert {\hat{h}}_{predict}-{\hat{h}}_{MMS}\Vert }{\Vert {\hat{h}}_{MMS}\Vert } \end{aligned}$$
(39)

\(\Vert \cdot \Vert \) referring to the \(l^2-norm\).

4.1 Comparison of Gaussian and exponential correlations

We first compare the two correlation coefficients, Gaussian and exponential. Those two are the most widely used correlations for random field generation. We calculated the results obtained with these two correlation coefficients for the one-dimensional (1D), two-dimensional (2D), and three-dimensional (3D) stochastic groundwater flow cases, respectively, with the same parameters. The number of hidden layers and the neurons per layer are uniformly set to 6 and 16.

4.1.1 One-dimensional groundwater flow with both correlations

The non-homogeneous 1D flow problem for Darcy equation can be reduced to Eq (19) subjected to Eq (C16). The hydraulic conductivity K is constructed from Eq (18) by the random spectral method, see Eq (C17). The source term f of the manufactured solution, Eq (C15), is obtained from Eq (C18). The detailed derivation can be found in  AppendixC.

The relative errors \(\delta {\hat{h}}\) of the predicted hydraulic heads for the exponential and Gaussian correlation of the ln(K) field are shown in Tables 1 and 2. The Gaussian correlation is more accurate for all N and \(\sigma ^2\). With transfer learning model, the accuracy even improves. The predicted hydraulic head, the velocity and the manufactured solution for both exponential and Gaussian correlations with \(\sigma ^2=0.1\) and \(N=2000\) are shown in Fig. 7. The predicted results nearly coincide with the manufactured solution Eq (C15) in the 1D domain.

Table 1 \(\delta {\hat{h}}\) for 1D case computed with exponential correlation in different variance and number of modes
Table 2 \(\delta {\hat{h}}\) for 1D case computed with Gaussian correlation in different variance and number of modes for one-dimensional case
Fig. 7
figure 7

One-dimensional hydraulic head when \(\sigma ^2=0.1\), \(N=2000\) with (a) exponential correlation and b Gaussian correlation

Fig. 8
figure 8

One-dimensional logarithm loss function with (a) exponential correlation and b Gaussian correlation

The log(Loss) vs. iteration graph for different parameters in constructing the random log-hydraulic conductivity field can be found in Fig. 8 and shows (1) the loss value for the Gaussian correlation is much smaller than the exponential correlation for all \(\sigma ^2\) and N values. (2) With transfer learning, the loss function drops significantly faster and the number of required iterations is greatly reduced. In summary, the Gaussian correlation outperforms the exponential one in generating random log-hydraulic conductivity fields for the PINN.

4.1.2 Two-dimensional groundwater flow with both correlations

To solve the non-homogeneous 2D flow problem for Darcy equation, the manufactured solution in Eq (C19) is adopted. The hydraulic conductivity K is constructed from Eq (18) by the Radom spectral method, see Eq (C21). The source term f is computed according to Eq (C22), see AppendixC for more details. The exponential and Gaussian correlations for the heterogeneous hydraulic conductivity are tested with varying \(\sigma ^2\) and N values. The same conclusion can be drawn from Tables 3 and 4 that with increasing N, the predicted hydraulic head becomes more accurate, however when \(\sigma ^2\) becomes bigger, the accuracy deteriorates in most cases. The PINNs with Gaussian correlation based hydraulic conductivity outperforms the exponential correlations. The contour plots of the predicted hydraulic head and velocity as well as the manufactured solution for both exponential and Gaussian correlations with \(\sigma ^2=0.1\) and \(N=2000\) are listed in the supplementary material, in Figs. S1, S2, S3 and S4. The predicted physical patterns agree well with the manufactured solution Eq (C15).

Table 3 \(\delta {\hat{h}}\) for 2D case computed with exponential correlation in different variance and number of modes
Table 4 \(\delta {\hat{h}}\) for 2D case computed with Gaussian correlation in different variance and number of modes

The log(Loss) vs. iteration graph for different parameters in constructing the random log-hydraulic conductivity field is illustrated in Fig. 9. The loss for the PINN with Gaussian correlations is much smaller and decreases faster while the loss is not fully minimized for the exponential correlations. With transfer learning, the loss function converges with less iterations, which largely reduces the training time. Also for the two-dimensional groundwater flow, the Gaussian correlation shows much better performance than the exponential correlation.

Fig. 9
figure 9

Two-dimensional logarithm loss function with (a) exponential correlation and b Gaussian correlation

4.1.3 Three-dimensional groundwater flow with both correlations

Let us now focus on the 3D non-homogeneous Darcy equation problem [43]. The manufactured solution in Equation (C30) is adopted. The hydraulic conductivity K is constructed according to Eq (C28). The source term f is devised from Eq (C29). The exponential and Gaussian correlations for the heterogeneous hydraulic conductivity and varying \(\sigma ^2\) and N values are tested again. Tables 5 and 6 list the relative error of the hydraulic head for DCM with and without transfer learning. For different \(\sigma ^2\) and N values, the performance of the PINN varies largely for both correlations. The same tendency as in 1D and 2D holds: The Gaussian correlation outperforms the exponential one and transfer learning has a significant impact on the computational cost. The hydraulic head predicted by both correlation functions with \(\sigma ^2=0.1\) and \(N=2000\) are shown in Fig. S5, S6, S7 and S8 (Fig. 10).

Table 5 \(\delta {\hat{h}}\) for 3D case computed with exponential correlation in different variance and number of modes
Table 6 \(\delta {\hat{h}}\) for 3D case computed with Gaussian correlation in different variance and number of modes
Fig. 10
figure 10

Three dimensional logarithm loss function with \(\left( a\right) \) exponential correlation and \(\left( b\right) \) Gaussian correlation

The computational cost of the DCM with both correlation functions are shown in Table 7. The Gaussian correlation function is not only more accurate but also more efficient.

Table 7 Calculation time required in different dimensions

In summary, the comparison reveals that the loss function in the Gaussian correlation tends to decrease faster than the exponential, and that the error using Gaussian is much smaller and more stable than the exponential one. The Gaussian correlation also requires less computation time. Note also that the loss function of the exponential correlation coefficient leads to gradient explosion when the number of collocation points exceeds a value of 150, while this is not observed form the Gaussian correlation coefficient, even for much larger number of collocation points. Subsequently, we will only use the Gaussian correlation.

4.2 Sensitivity analysis results

The sensitivity analysis should eliminate irrelevant variables to finally reduce the computational cost for the hyperparameter optimizer. The hyper-parameters in this flow problem are listed in Table 8.

Table 8 Hyper-parameters and their intervals in groundwater flow problem

The sensitivity analysis results obtained by the hybrid Morris-eFAST method are shown as follows:

Fig. 11
figure 11

Sensitivity histogram of Morris

Fig. 12
figure 12

Morris \(\mu ^*\) and \(\sigma \) computed using the Morris screening algorithm

Fig. 13
figure 13

Sensitivity histogram of eFAST

From Figs. 11 and  12, we conclude that the number of layers and the neurons have the greatest impact. In contrast, the maximum line search of L-BFGS has almost no effect. So we remove the number of layers and the maxls, and continue to calculate the sensitivity of the remaining three in the eFast model. The results are summarized in Fig. 13. The neurons are the second most important parameter. Hence, the layers and the neurons are chosen as the hyper-parameters in the search space for the automated machine learning approach.

4.3 Hyperparameter optimizations method comparison

   To select the most suitable hyperparameter optimization algorithm, we use the two hyperparameters selected from the sensitivity analysis results in Sect. 4.2 as search variables in search space and compute the four algorithms presented in the previous section. All remaining conditions are equal. The horizontal coordinates in Fig. 14 represent the number of neurons per layer and the vertical coordinates refer to the number of hidden layers.

Fig. 14
figure 14

Neural network configuration search results with (a). Randomization search method; b Bayesian optimization; c hyperband optimization; d Jaya optimization

The time required for each method and the search accuracy are shown in Table 9.

Table 9 Hyper-parameters search results with different algorithms

The Bayesian method gives the best accuracy in the shortest time and will subsequently be adopted. Due to the limited search numbers, the optimal solution searched by the algorithm is not necessarily the best one and will gradually approach the optimal configuration as the number of searches increases. For two and three dimension cases, the optimal configuration are illustrated in Figs. 15 and 16.

Fig. 15
figure 15

Neural network configuration search results of Bayesian optimization in two dimension

Fig. 16
figure 16

Neural network configuration search results of Bayesian optimization in three dimension

The optimal configuration obtained after screening is shown in Table 10. These neural network configurations will be used as input parameters for the next numerical tests.

Table 10 Neural architecture search results with Bayesian optimization

4.4 Model validation in different dimensions

Now we solve the modified Darcy Eq (19) by the NAS based DCM and the optimized configurations from Sect. 3, i.e. we fix \(\sigma _2\) to 0.1, and N to 1000. The manufactured solutions can be found in  AppendixC. The results are compared to solutions obtained from the finite difference method.

4.4.1 One-dimensional case model validation

The 1D manufactured solution is Eq (C15). We validate the two methods by comparing the hydraulic head in the x-direction in the interval [0, 25], see Fig. 17.

Fig. 17
figure 17

Hydraulic head calculated by FDM and DCM methods in one dimension

4.4.2 Two-dimensional case model validation

The 2D manufactured solution is Equation (C23) and we focus on the hydraulic head and velocity in the x-direction, along the midline at \(y=10\) within the interval \([0,20]\times [0,20]\).

Fig. 18
figure 18

Hydraulic head along \(y=10\) calculated by FDM and DCM methods in two dimension

Fig. 19
figure 19

Velocity in x-direction along \(y=10\) calculated by FDM and DCM methods in two dimension

Figure 18 demonstrates that both methods match well with the exact solution for the hydraulic head. However, as seen from Figure 19, the FDM method poorly predicts \(v_x\) while the proposed DCM still agrees well.

4.4.3 Three-dimensional case model validation

The manufactured solution for 3D the case is given by Equation (C30). We again compute the hydraulic head and velocity in the x-direction at \(y=1, z=0.5\) over the interval \([0,5]\times [0,2]\times [0,1]\). The results are summarized in Tables 11 and 12. While the results obtained by the FDM is rather poor, the DCM approach still provides solutions close the the exact one (Fig. 20).

Table 11 Solving Darcy equation with FDM
Table 12 Solving Darcy equation with DCM
Fig. 20
figure 20

Logarithm loss function with and without transfer learning

The higher the dimensionality of the problem, the more pronounced is the difference between the two methods (FDM and DCM). The FDM method requires an extremely dense discretization, which in turn leads to a high computational cost. DCM yields very accurate results even for very few training points. Transfer learning further reduces the computational cost while simultaneously slightly improving the accuracy. The contour plots for the hydraulic head and velocity are visualized in Figs. 21 and 22.

Fig. 21
figure 21

Three-dimensional hydraulic head when \(\sigma ^2=0.1\), \(N=1000\) with Gaussian correlation (a) exact solution and b predict solution

Fig. 22
figure 22

Three-dimensional velocity when \(\sigma ^2=0.1\), \(N=1000\) with Gaussian correlation (a) exact solution and b predict solution

The isosurface diagrams of the predicted head and velocity are illustrated in Figs. 23 and 24.

Fig. 23
figure 23

Three-dimensional isosurface diagram of hydraulic head when \(\sigma ^2=0.1\), \(N=1000\) with Gaussian correlation (a) exact solution and b predict solution

Fig. 24
figure 24

Three-dimensional isosurface diagram of velocity when \(\sigma ^2=0.1\), \(N=1000\) with Gaussian correlation (a) exact solution and a predict solution

5 Conclusion

In this paper, we proposed a NAS-based stochastic DCM employing sensitivity analysis and transfer learning to reduce the computational cost and improve the accuracy. The random spectral method in closed form is adopted for the generation of log-normal hydraulic conductivity fields. It was calibrated to generate the heterogeneous hydraulic conductivity field with Gaussian correlation function. Exploiting sensitivity analysis and comparing hyperparameter selection methods, the Bayesian algorithm was identified as the most suitable optimizer for the search strategy in the NAS model. While the sensitivity analysis and NAS mean additional cost, it still reduces the computational cost for a specified accuracy. Furthermore, for certain type of problems, it is not necessary to repeat this steps. To validate our approach, groundwater flow in highly heterogeneous aquifers are considered.

Since no feature engineering is involved in our PINN, the NAS based DCM can be considered as truly automated “meshfree” method. It approximates any continuous function. The presented automated DCM is simple to implement as it defines only the definition of the underlying BVP/IBVP and boundary conditions.

Through several numerical examples in 1D, 2D and 3D, we showed that the presented NAS-based DCM significantly outperforms the FDM method in terms of computational efficiency and accuracy. The benefits become more pronounced with increasing dimension and hence ’complexity’. Note that the presented NAS-based DCM outperforms the FDM even if all the steps from sensitivity analysis, optimization and training are accounted for. However, once those deep neural networks are trained, they can be used to evaluate the solution at any desired points with minimal additional computation time. Besides those advantages, the limitations for the proposed stochastic deep collocation method can be encapsulated in the computational cost in neural architecture search model for large multi-scale complex problems and that the gradient-descant based optimizer may get stuck in a local optimal. Those topics will be further investigated in our future research regarding a more generalised and improved NAS-based deep collocation method.