1 Introduction

Cut-off walls are widely used to hinder underground seepage flow, reducing the risk of seepage-induced failure [34, 36, 38]. The jet-grouted cut-off wall (JGCOW) is widely used because it is efficient, flexible and requires only light installation machines [13, 17, 31]. JGCOW is usually installed by high-pressure injections of grout into in situ soil strata through rotating small-diameter nozzles. The solidified columns casted in rows subsequently form an overlapped water-tight continuum [32, 41]. Though JGCOW technique was proved successful in many applications, random construction errors, due to variable workmanship and uncertain geological conditions, limited construction accuracy of construction machines, are inevitable in practice [3]. Construction errors stem from random and inevitable deviation from the ideal case, such as the inclinations of column axis and variation of column diameter. These construction errors may result in continuous untreated zones, which penetrate through the impermeable cement-treated slab, form the concentrated seepage channels among the adjacent jet-grouted columns and consequently trigger damage in adjacent buildings and delay in constructions [11, 15]. Quantitatively evaluating the impact of JGCOW defects on seepage discharge is of great importance for the project quality assessment and control.

Existing leakage evaluation approaches, such as the finite element simulation (FEM) and three-dimensional discretized algorithm (TDA), have achieved satisfying prediction accuracy for the seepage flow rate estimation of geometrically imperfect cut-off walls. Wu et al. [42] adopted a three-dimensional FE model to simulate the groundwater flow field of a deep foundation pit considering the leakage for cut-off walls with wished-in-place defects of deterministic dimensions. Pan et al. [20,22,23] proposed an advanced evaluation approach, TDA, to quantitatively evaluate the probabilistic distribution of leakage discharge through cut-off walls with given levels of constructions errors (e.g., random inclination and diameter variation of jet grouted columns). However, such approaches are still not optimal for an instant on-site leakage risk assessment because of the unacceptable computing time. Fresh-on-fresh is a prevalent construction technique employed in jet grouting applications, wherein adjacent columns are constructed sequentially without waiting for the primary column to harden. During the installation of jet-grouting columns, it is customary for the jet-grouting machines to operate continuously. Consequently, any delay in the process due to time-consuming reliability computations is considered impractical, and the expected computation time for reliability computation shall be in the scale of a few minutes. As prior studies have noted, the computing time for a single realization of defected cut-off walls using TDA depends on the model size and are normally several minutes. One realization of FEM simulation of the same scale may easily take hundreds of time longer. Since thousands of simulations are usually required to achieve a reasonably well-converged statistical characteristics of defect occurrences, the computational cost is excessively high for quantitative leakage risk evaluation. In this regard, a more efficient mapping between construction errors and performance of cut-off walls is highly demanded. One option for such mapping is the artificial intelligence (AI) approach.

AI, as an emerging field of geotechnical engineering, has the potential to learn autonomically from training datasets and to make quick inferences according to the obtained information [1]. Many pioneer works have been conducted to exploit its remarkable advantage of calculation efficiency into practical engineering cases, such as the characterization of soil constitutive relationship, parameter optimization for soil behavior description and risk management during construction or operation [7, 12, 30, 44]. Nevertheless, there are still several challenges in application of AI approaches in geotechnical domain. The AI approach is bounded to the inherent overfitting curse [43] and prone to become too attuned to the training data [14]. The overfitting AI model is only specific to training data and would result in poor outcomes when applied to new datasets [5]. Traditional geotechnical datasets are usually collected from laboratory and/or in situ tests. The lack of geotechnical data will lead to scarcity of data for AI model training and render it less reliable [25]. The AI approach offers few mechanistic explanations beyond its excellent fitting capacity, making it a “black box”. It has been an increasingly acceptable cognition that developing an interpretable model is much more practical than explaining black-box models [6, 29].

Considering the respective advantages of both physical model and AI approach, the fusion of two paradigms, namely the physics-inspired AI model, could be a rational solution [16, 26, 44, 45, 47]. There have been some reported attempts for the design of physics-inspired AI system in computer science, biological science, and geoscience [28, 39]. Raissi et al. [27] introduced the physics-informed neural networks (NNs) into solving general nonlinear partial differential equations (PDEs) that are endowed by any underlying physical laws. Jiang et al. [10] wrapped the conceptual hydrologic model with recurrent neural network (RNN) layer, developing a hydrology-aware deep learning architecture for the runoff modeling across the conterminous USA. Figure 1 illustrates some common expectations from the geotechnical engineers for the application of physics-inspired AI model: (i) Both high prediction accuracy and efficiency; (ii) Great physical interpretability for results; (iii) Easy for code implementation; (iv) Easy to double check results for engineers; (v) Excellent generalization ability.

Fig. 1
figure 1

Advantages and disadvantages of physical models and AI approaches and some expected features for a physics-inspired AI approach

This study proposes a novel physics-inspired neural network (NN) architectures to evaluate the seepage discharge of JGCOW with geometrical imperfections. The aim is to examine the performance, namely accuracy, computation cost, transferability, and result interpretability of physics-inspired AI in such scenario and provide an optimal surrogate model for corresponding time-consuming physics-based approaches.

2 Methodology

In this section, the benchmark method, namely three-dimensional discretized algorithm (TDA), is summarized in Sect. 2.1 such that the physical meaning of each parameter is illustrated. More details are elaborated in Pan et al. [24]. Then, a series of data-driven approaches with an ascending level of physical meaning (P1P5) will be implemented and compared to find an optimal balance among the accuracy, interpretability, and calculation expense.

2.1 Benchmark method: three-dimensional digitalized algorithm (TDA)

TDA [24] is a state-of-the-art seepage evaluation method to efficiently estimate the leakage amount through defective cut-off walls. Figure 2c illustrates the procedures of implementing TDA. The cut-off wall zone is represented by a fine-meshed grid of nodes. The refined mesh was necessitated by the fact that even a very small loophole that penetrates the wall can lead to a major leakage. These nodes can be classified as treated or untreated, depending on the coordinates. In contrast with treated nodes, the untreated nodes represent zones without being treated by cement binder and are characterized with much higher permeability coefficients. Then, continuous seepage paths are detected, and seepage rate is hence determined using a semi-analytical solution.

Fig. 2
figure 2

Illustration of geometric imperfections for jet-grouted cut-off walls (JGCOW): a from categories including diameter variability and positioning error; b from a 3D view; c Flowchart of TDA (modified from Pan et al. [21])

Specifically, Fig. 2a shows the two typical types of construction errors of a jet-grouted column, i.e., random axis orientation, and random variation of column diameter along depth. Latest field data show that there may also be coordinating error on ground level. However, this is not considered in this study, because the concern of this study is on the development of a surrogate model. The random orientation is characterized by two independent random variables, namely, azimuth (α) and inclination angle (β). The variation of column diameter is characterized as a random process prescribed by random seeds, wherein the spatial correlation is quantified by the scale of fluctuation (SOF). Figure 2b shows a typical random realization of a cut-off wall affected by both types of geometric imperfections.

Figure 3a shows a typical cell of a discretized zone with a structured grid. After the random geometrical parameters (azimuth, inclination angle, diameter) are generated, the treated and untreated nodes (marked by the red and black points, respectively) can be used for penetration detection, which aims at determining if a continuous leakage passage exists between two arbitrarily adjacent columns. This detection is critical because the flow rate through JGCOWs with continuous leakage passages is significantly higher than those without them. If continuous untreated nodes exist, the flow rate Q is governed by the harmonic average of cross-sectional area Asi along continuous untreated zone as:

$$Q = \frac{{k_{{\text{u}}} H}}{{\frac{t}{n}\sum\nolimits_{i = 1}^{n} {\frac{1}{{A_{Si} }}} }}$$
(1)

where ku is the permeability coefficient of untreated soil; H is the water head difference between two sides of cut-off walls; t is the nominal thickness of cut-off walls when the random imperfections are not considered; n is the number of slices; ASi is the sectional area of untreated zone, which can be easily determined by counting the number of untreated nodes in the specified cross section. Using Eq. (1), the seepage flow rate Q can be readily evaluated for different penetration situations and hydraulic conditions. The accuracy and validity of the TDA method have been verified by FEM results. In this work, the TDA method was used as the benchmark method to generate the datasets for the training and validation of NN models. The detailed algorithm flowchart is illustrated in Pan et al. [20].

Fig. 3
figure 3

Illustration of three-dimensional discretized algorithm (TDA) and different physical variables defined in P2P5: the JGCOW average thickness for P2; the JGCOW gap distance for P3; the layered imperfection distance of JGCOW for P4; and the sliced imperfection area of JGCOW for P5 (a is referenced from Pan et al. [20])

2.2 P1: traditional neural network

The success of a neural network (NN) application relies on its network topology. A common artificial NN that consists of three layers was utilized in this study, including the input layer, hidden layer and output layer. As shown in Fig. 4a, a single hidden layer with 40 neurons was employed for the JGCOW problem. The Mean Squared Error (MSE) between TDA calculations and NN predicted values was chosen as loss function. The Levenberg–Marquardt algorithm is a robust and widely used NN training algorithm and, hence, adopted here [9]. The strategy of learning rate decay was utilized to accelerate the training of the NN, and the initial training rate was set as 0.01. Besides, 84 random seeds for the description of imperfection angles (20) and column diameter variation (64) were selected to be the input variables for the input layer. Specifically, the imperfection angles include inclination angle and azimuth of 10 columns, and the 64 random seeds were used to generate the random process for the diameter variation. The seepage flow rate was the expected result of the output layer.

Fig. 4
figure 4

Network structure for P1 (traditional NN) and P2P5 (physics-inspired NNs, the physical meaning of variables in physical layer becomes clearer from P2 to P5, as illustrated in Fig. 3)

2.3 P2: physics-inspired neural network I (by the JGCOW average thickness)

The dimensional analysis for input variables shows that the unprocessed inputs (angles and radius) do not follow the dimensional homogeneity law [37]. Dimensionally homogeneous NNs have been proven to possess significant advantages over dimensionally inhomogeneous NNs on their interpretability and generalizability. The simplest but physically intuitive knowledge, namely the average diameter of jet-grouted columns on each depth, was used in the physical layer to improve the dimensional homogeneity while keeping the essential information. This is based on the physics-based assumption that the leakage risk reduces with increasing average diameter, given a constant column spacing. The reason for using average diameter at different depth as the physical layer is that the contribution of average diameters at different depths may increase with depth, as the inclination angle may lead to occurrence of larger gaps over depth. To reasonably capture the characteristics of a spatially varying column diameter, evenly spaced horizontal cross sections of the JGCOW are chosen from top to bottom. The horizontal cross section is defined as “horizontal layer” hereafter. Hence, the JGCOW average thickness at a given depth can be easily evaluated by calculating the average diameter of all columns in the corresponding horizontal layer. This average thickness was used in P2 to quantify the average defects of cut-off wall at given depth. As shown in Fig. 3b, the average thickness of JGCOW at a given depth h is expressed as

$$\overline{d}_{h} = \left( {\sum\limits_{i = 1}^{n} {d_{i} } } \right)/n$$
(2)

where di is the diameter of a jet-grouted column at the specified depth and n is the number of columns at the given depth. Cross sections at 0.24 m depth interval were used to calculate the average diameter, making the physical layer a vector with 84 elements. The number of 84 was chosen to ensure that these different NNs have the same node number of physical layers (except P5). This facilitates a relatively fair comparison in efficiency. The effect of total element number is evaluated in Sect. 4.1.1.

The physical layer considering the column average thickness was introduced between the input layer and hidden layer. The average thickness of JGCOW \(\overline{d}_{h}\) is only a rough representation, and some more elaborate physical layers are designed in the following sections.

2.4 P3: physics-inspired neural network II (by the JGCOW gap distance)

A more in-depth expert knowledge than average diameter in P2 is that the gaps between two adjacent columns have a predominant role in the leakage flow rate. In P3, representative gap distances at different depths were chosen as the physical variable in the physical layer. Figure 3c illustrates a typical layer with penetrated JGCOW at a certain depth. The representative gap distance at the prescribed layer is defined as the sum of gap distances between any two adjacent columns with continuous seepage passages. According to the geometric relationship shown in Fig. 3c, the gap distance \(g_{h}\) at given depth h can be expressed as the function with respect to the inclination angles, azimuth angles and diameters of columns as below:

$$\begin{aligned} g_{h} & = \sum\limits_{i = 1}^{n - 1} {g_{i} } \\ g_{i} & = \left\{ {\sqrt {\left( {S_{x} + h\left( {\tan \beta_{i} \cos \alpha_{i} - \tan \beta_{i + 1} \cos \alpha_{i + 1} } \right)} \right)^{2} + \left( {h\left( {\tan \beta_{i} \sin \alpha_{i} - \tan \beta_{i + 1} \sin \alpha_{i + 1} } \right)} \right)^{2} } - \frac{{D_{i} + D_{i + 1} }}{2}} \right\} \\ \end{aligned}$$
(3)

where \(\left\{ \cdot \right\}\) is the Macaulay brackets that will output the input directly if it is positive, otherwise, it will output zero; Sx is the spacing between two adjacent columns’ centers; h is the depth of corresponding layer; \(\alpha_{i}\) and \(\alpha_{i + 1}\) refer to the azimuth angles of these two adjacent columns; \(\beta_{i}\) and \(\beta_{i + 1}\) refer to their inclination angles; Di and Di+1 refer to the diameters of columns at corresponding layer; n refers to the number of columns. Similarly, the physical layer consists of a vector with 84 elements. This approach requires a “sharper” understanding of the problem than P2, at the price of a higher calculations expense.

2.5 P4: physics-inspired neural network III (by the layered imperfection distance of JGCOW)

In P4, the harmonic average length of gap along the longitudinal direction of the wall was used instead of the gap distance as the representative value for each depth, as indicated in Fig. 3d. It is more physically meaningful than P3 in that it considers the effect of the geometrical shape of the penetrating seepage passage, though in two-dimension. This gives P4 a more global and accurate characterization of the untreated zone for each layer than P3. Each layer at selected depths was firstly discretized into n − 1 regions according to the spacing Sx between two adjacent columns’ centers, in which n is the number of columns. For each region i, the harmonic average gap \(l_{i}\) along the longitudinal direction of the wall can be calculated by the harmonic average values for all discretized slices as \(l_{i} = m/\left( {\sum\nolimits_{j = 1}^{m} {\frac{1}{{\left\{ {l_{i,j} } \right\} + \delta }}} } \right)\), in which \(\left\{ \cdot \right\}\) is the Macaulay brackets that represent a piecewise linear function \(\max \left\{ {0,l_{i,j} } \right\}\), \(\delta\) is a very small value to avoid 0 in the denominator and adopted as 10–6 here; m is the number of discretized slices. When \(l_{i,j}\) is negative or equals to 0, \(l_{i}\) would be 0. The total layered imperfection distance lh at depth h can be estimated by concatenating all discretized segmentations \(l_{i}\) as \(\sum\nolimits_{i = 1}^{n - 1} {l_{i} }\), as shown in Fig. 3d. Finally, the physical layer, composed of a vector of lh with 84 elements, was constructed. A similar underlying NN structure was followed with the evaluation of layered imperfection distance.

2.6 P5: physics-inspired neural network IV (by the sliced imperfection area of JGCOW)

Discretization based on horizontal layers of JGCOW may omit some of the three-dimensional seepage channels that appear to be not penetrated from two-dimensional perspective. Thus, P5 adopted the sliced discretization for JGCOW and defined the area of untreated soil in each slice as the output of physical layer. The whole JGCOW was discretized into slices as the prescribed interval of 0.24 m, and the node identification was performed in each slice. It is much more difficult to classify the discretized nodes into treated and untreated types for the slice of P5 than what was carried out for the layer of P4. There is no off-the-peg reference frame (such as the local polar coordinate system of P4 in Fig. 3d) for each slice. Hence, a global coordinate system was set up for each slice. A geometrical check was done to examine whether a node is within any column range. If a node is not included in any adjacent column, then the node is marked as untreated. According to the distance between nodes and column axes, the nodes in any slice can be judged to fall into treated or untreated zone. Finally, the sectional area of an untreated zone in each slice As can be evaluated in a discretized form, as illustrated in Fig. 3e. A vector with 10 elements As was obtained by the physical layer. Once translating the random imperfection seeds into the sliced imperfection area of JGCOW via physical layer, the seepage flow rate can be estimated using the underlying NN.

3 Illustrative example

In this work, the TDA method was utilized to generate the benchmark database for training and validation of traditional NN and defined physics-inspired NNs. The adopted cases simulated one row of 10 columns with a depth of 20 m. The azimuths (α) were assumed to be uniformly distributed within [0, π], which indicates that the axis of a column can rotate toward any direction. The inclination angles (β) were assumed to follow a normal distribution with zero mean, as indicated in field measurement in Groce and Modoni [4] and Eramo et al. [8]. It was also shown in Pan et al. [23] that this standard deviation of 0.3 degree corresponds to an inclination limit of 1:100. A negative value of β signifies the opposite direction against the prescribed inclination. The above two variables regarding angle imperfections were assumed to be independent. A total of 84 input variables were used, that is, 64 of them are random seeds used for generating the random process of diameter with a normal marginal distribution, 10 for azimuths α and 10 for inclination angles β. Table 1 summarizes the configurations of random imperfections adopted in this case study. 5000 random realizations were calculated using TDA to provide input variables and output flow rate. To apply the proposed NNs in broader scenarios, the flow rate is provided in a normalized format \(\hat{Q} = Qt/(kHA_{w} )\), where t is nominal thickness of JGCOW; k is the coefficient of permeability of untreated soil; H is water head difference between two sides of the JGCOW; Aw is the area of the JGCOW [22].

Table 1 Configuration of benchmark case for JGCOW (after Pan et al. [22])

3.1 Performance of the trained NNs

The generated datasets are fed into the configured NNs defined in Sects. 2.12.5. These trained NNs are employed to construct surrogate models between the seepage flow rate and the random seeds of positioning errors. To ensure the consistency of the above models from P1 to P5, the structure of the hidden layer is set as the simplest single layer and the number of neurons in the hidden layer is set as 40. The major difference among the above NNs is the depth of physical knowledge in the physical layer, which increases from P1 to P5. Figure 5 shows the NN-predicted seepage flow rate values versus the TDA-predicted ones based on 5000 Monte Carlo calculations. Each NN prediction is illustrated by the scatter subplot of two clusters with different colors, namely training dataset (in green) and test dataset (in orange). The training dataset contains 80% of the samples of random imperfections and their benchmark seepage flow rate values, while the test dataset adopts the remaining 20% samples. The R2 (coefficient of determination) values are separately calculated for all the datasets. As shown in Fig. 5, a huge bias is observed for traditional NN predictions by P1, especially for the results of the test dataset. The R2 value of mentioned datasets is only 0.01, indicating that the traditional NN cannot learn the useful information from the given data without the introduction of a physical layer. The P2 predictions have a consistently poor performance in both training and test datasets, largely due to the fact that the solution space has been restrained by the chosen physical layer. The predictions made by the physics-inspired NNs P3P5 agree well with the benchmarking method. This shows that a carefully chosen physical layer with rational depth of physical expertise (i.e., with sufficiently clear physical knowledge) would help to greatly improve the prediction performance with the same depth of NN. It is observed that as the level of clarity in the physical meaning increases across different physical layers, denoted as P3 to P5, the accuracy of predicting the seepage flow rate exhibits improvement, as evidenced by the increment in the R2 value from 0.88 to 0.98 for the test dataset. The augmented clarity in the physical meaning empowers the NN structure to develop a more profound understanding of the interrelationships among various input variables. Consequently, the physics-inspired NNs demonstrate improved performance in their predictive capabilities. Moreover, NNs that possess a clear physical interpretation exhibit excellent interpretability, making the connections and interdependencies between input features and output predictions more evident. This transparency enables users to gain valuable insights into the decision-making process of the model, validate the accuracy of the learned representations, and offer explanations for the model's behavior. For instance, in the case of P3, a significant seepage flow scenario typically corresponds to a substantial gap distance.

Fig. 5
figure 5

Three-dimensional digitalized algorithm (TDA) versus NN predictions of normalized seepage flow rate for P1P5

To study the influence of sample size on the accuracy of NNs, five different training sample sizes, 400, 1300, 2200, 3100, and 4000, are selected for the traditional NN (P1) and a representative physics-inspired NN (P3). Figure 6 shows the scatter chart of normalized seepage flow rate, Q, given by the NN and benchmark (TDA) predictions using the different preset training sample sizes. As illustrated in Fig. 6a, the traditional NN method shows great sensitivity to specific dataset. When the number of adopted samples is small (i.e., 400), P1 performs relatively well for training dataset but does not achieve the desired results in the test dataset. The R2 value for training dataset reaches 1.00, while its value for test dataset is −1.25. The negative value indicates that the trained NN model does not follow the trend of data. This accounts for too many input variables for traditional NN, which induces an overfitting of data, such that the noise and some non-representative features in the training data are captured by the model. When the adopted training sample size is increased from 400 to 4000, the overfitting phenomenon gets improved as the R2 values of test dataset increase from −1.25 to 0.01. For the training dataset, there are substantial decreases for R2 values from 1.00 to 0.74.

Fig. 6
figure 6

Influence of adopted sample size on the performance of traditional NN (P1) and representative physics-inspired NN (P3)

However, after the introduction of a physical layer, the disorganized input data are converted to be physically significant and dimensionally consistent. Because of this, the performance of physics-inspired NNs is much better and more robust in the training and test dataset. The validity of P3 is substantiated by the calculated R2 values shown in Fig. 6b. For the cases that the training samples progressively increase from 400 to 4000, the minimal values of R2 for different datasets are all above 0.80. This indicates the existence of a physical layer helps to reduce the number of trainings to achieve a satisfactory performance, indicating a “faster” learner. Though a bias is observed in the test datasets for the situation that the normalized seepage flow rate values are greater than 0.04 (especially when using 400 adopted training samples in Fig. 6b), this bias can be reduced with the increased random samples. This indicates that the supplementary data allow the physics-inspired NN to learn additional features against extreme defect cases.

3.2 Trade-off decisions between prediction accuracy and computational cost

Table 2 summarizes the calculation time for 5000 predications of random imperfection cases using the defined NNs P1P5, as well as the benchmark method (TDA method). As shown in Table 2, P1 requires the minimum computational cost, only taking 3.20 s on a desktop computer with 8 GB RAM and four Intel Core i5 CPU with a clock speed of 3.2 GHz. However, the accuracy of P1 is too poor to be useful for any practical application. The other extreme (pure TDA) takes around 61,000 times as much computational time as the P1, because the TDA has to spend much CPU time to realize large matrix operations for penetration examination and flow rate calculation. The computational time of physics-inspired NNs (P2, P3, P4 and P5) fall in the interval between the traditional NN (i.e., P1) and TDA method. The computational cost increases as the physical expertise of defined physical variables increases (from P1 to P5). This increased computational cost is mainly contributed to the calculations performed within the physical layers. The proportions of physical-layer calculations keep ascending from P1 to P5, and meanwhile the proportions of underlying NN training and prediction constantly decrease. The trade-off chart of accuracy and computational cost for P1P5 are presented in Fig. 7. The accuracy plateaus at P3 and the computational cost are still acceptable. This observation suggests that if a physical variable with suitable complexity is chosen, physics-inspired NNs (such as P3 and P4) will grant engineers rationally precise and instant access to estimate the seepage flow rate and corresponding risk level.

Table 2 Performance of P1P5 in the accuracy and computational efficiency
Fig. 7
figure 7

Trade-off between accuracy and computational efficiency among P1P5 and TDA methods (the training sample size for P1P5 remains consistent and is set to 4000)

It should be noted that the time for 5000 predictions using the trained physics-inspired NNs is only several seconds and can be neglected. This is crucial for the instant on-site leakage risk assessment and construction management especially when met with emergencies. In the context of on-site risk analysis, a well-trained NN model can be pre-generated based on the available training and validation datasets. This allows for the reduction in time required for physical/physical layer calculations for the training and validation datasets, as well as NN training. Therefore, the primary focus of calculation time shifts toward physical/physical layer calculations for practical scenarios and the corresponding NN predictions. Due to the repetitive nature of calculations involved in risk analysis and the significant contribution of physical/physical layer calculations to the overall computational cost, the overarching conclusion remains consistent: the calculation time increases with the improvement in clarity of physical meaning across different physics-inspired NNs.

4 Discussions

4.1 Consistency and sensitivity analysis for the trained NNs

Due to the stochastic nature of the NN training process, multi-source uncertainties may influence the prediction performance. It is necessary to examine the consistency of predictions made by NNs using different and independent datasets, which contain the same number but individually generated samples of random imperfection JGCOWs. In the current study, 20 different sets of samples are generated and utilized to train the designed NNs separately. Variation in the statistic characteristics and distribution of predicted results are tested across these 20 datasets. Such variation is necessary to avoid the overfitting of underrepresented samples, which turns out to be inapplicable to other datasets. In addition, the influence of the model configuration and training setting on the performance of physics-inspired NNs are also studied. The configuration of NNs and their training sample sizes are summarized in Table 3.

Table 3 Convergence study for NN settings

4.1.1 Effects of NN model configurations

Figure 8 illustrates the variation of traditional NN performance against different NN topologies. For each NN configuration, 20 independently generated datasets are fed into the configured NNs to train 20 separate NNs of the same category, each of them is then used to evaluate the seepage flow rate value for the same realization. The 20 R2 values are calculated based on the differences between the evaluated seepage flow rate and benchmark. The attempts in adjusting hidden layer structure fail to address the overfitting problem encountered by traditional NN (P1). The R2 values even show significant drop with the increase in hidden layer number and neuron number. Considering that the R2 values for traditional NNs with different configurations do not get satisfying results, the introduction of physical layer into existing data-driven NN is necessary.

Fig. 8
figure 8

Influence of hidden layer structure on the performance of traditional NN (P1): in the notation A@B, A represents the number of hidden layer and B represents neuron number in each hidden layer

In Fig. 9, the fluctuations of R2 values are recorded to assess the model robustness against the number of hidden layer neurons. The structure of a hidden layer is prescribed as the simplest single layer. As the number of neurons in the hidden layer increases from 5 to 80, the mean R2 values for P2 slightly increase, but a sharp increase in the ranges (defined as the difference between the maximum and minimum R2) of R2 is observed. Too many neurons will introduce increasing uncertainties into the training process, contributing to the increasing variation of R2 values. In contrast, the mean value and range of R2 values for P3P5 remain almost the same when the number of neurons increases. This indicates that the performance of P3P5 is robust and marginally influenced by randomness of training process and training datasets. It is worthwhile to be noted that when the number of neurons in the hidden layer is small (such as 1), the training of NN may stuck in the local minima and tend to get an extreme high error for the test dataset (the red points in P2P5 with the R2 values around 0.01).

Fig. 9
figure 9

Influence of the number of neurons in hidden layer on the coefficient of determination (R2) of the physical NN-predicted seepage flow rate

Figure 10 shows the variation of R2 values with the increase in the resolution (defined as the number of discretized layers or slices for JGCOW, as shown in Fig. 3) for physical layers. It is essentially the node number of the defined physical layers. The higher the resolution, the less likely that the critical leakage holes are undetected, but it would also increase the computation cost. The mean R2 values for P2P4 show a rising trend with the increased number of partitions (from 1 to 84). When the resolution of physical layers exceeds 40, such rise becomes marginal. This reflects that too fine physical partitions would be unnecessary but drastically cost more computational time in physical layer. As can be observed from Fig. 10d, the fluctuation of R2 values for P5 is small when the resolution of physical layers is greater than 5. This is mainly because the interconnected imperfection area that is perpendicular to the flow direction controls the magnitude of seepage. This implies that using only several cross profiles can also rationally characterize the water tightness of JGCOW.

Fig. 10
figure 10

Influence of the resolution of physical layers on the coefficient of determination (R2) of the physical NN-predicted seepage flow rate (Resolution of physical layer is defined as the number of discretized layers or slices for JGCOW, as shown in Fig. 3)

Figure 11 illustrates the variations in R2 values for the proposed physics-inspired neural networks (NNs) with different activation functions. In this analysis, the activation functions are exclusively modified for the hidden layer, while the output layer neurons adopt a fixed linear activation function. The activation functions used in this study are summarized in Table 4. The results presented in Fig. 11 demonstrate that regardless of the chosen activation function, the performance of the physics-inspired NNs improves as the physical meaning is enhanced. Moreover, enhancing the physical meaning also enhances the robustness of the NNs in terms of activation function selection, as indicated by the decreasing fluctuations in R2 values from P2 to P5 across all the chosen activation functions. Notably, the utilization of complex activation functions, such as the TanhLU activation function, does not contribute significantly to the improvement of the NNs in this context. This observation can be attributed to the fact that complex activation functions are more suited for deep neural networks and large training datasets, which are not the characteristics of the NN applications considered in this study.

Fig. 11
figure 11

Influence of activation functions on the coefficient of determination (R2) of the NN-predicted seepage flow rate. For each training sample size, the representative quantile statistics are obtained from the results of 20 predictions

Table 4 Adopted activation functions in this study

4.1.2 Effects of training sample size

Five training sample sizes, 400, 1300, 2200, 3100, and 4000, are adopted to examine its influence on the performance of NNs. Figure 12 shows the consistency of the trained NNs by plotting the box chart for R2 values of NN-predicted seepage flow rate, in which the spacing of quartiles reflect the robustness of NN performance. The subplots (a)–(e) represent the response of different NN structures to the randomness of training samples. An increasing trend for the mean R2 values is observed for all NN structures (P1P5) as the training samples increase. Such a trend could be interpreted as that the features of flow field will be better characterized with more input samples, especially for some scenarios of extremely large seepage rate. The large seepage rate accounts for the occurrence of drastic defects in JGCOW, which is relatively rare and needs more supplied training data to cover. The range between maximum and minimum values of R2 is also observed to reduce with the growing training samples. This convergence of NN performance indicates that, as more samples are fed into the NN training, the consistency of NN predictions improves and the uncertainty involved in the NN training decreases. This good convergence also reassures the correctness of NN training. The physics-inspired NNs outperformed the traditional NN by learning “better” (higher Rat the same training sample size) and “faster” (reach highest R2 at lower training sample size). It is partly because the introduction of physical knowledge extracted more essential information from the input data, making the training of NN less dependent on the dataset size.

Fig. 12
figure 12

Influence of adopted training sample size on the coefficient of determination (R2) of the NN-predicted seepage flow rate. For each training sample size, the representative quantile statistics are obtained from the results of 20 predictions

4.2 Model capacity of physics-inspired NNs

The model capacity measures the capability of the trained model to capture and represent the pattern or relationship in data. It relates to how well the model can match both the training data and generalize to novel, unobserved data. To evaluate the applicable capacity of physics-inspired NNs, the same NN model trained from previous data (defined in Table 1) was used to map the relationship between input and output from additional datasets with different statistical characteristics of construction errors (defined in Table 5). Specifically, scenario A adopts a smaller standard deviation of inclination compared to original datasets, which simulates a case with less inclination. Scenario B chooses a shorter pile length as 10 m, which simulates a case with less embedment depth of the cut-off wall. Scenario C sets up a construction context that has more deep mixed columns. Each scenario includes 750 random samples to reach a representative scale for each dataset. Figure 13 compares the prediction results using the physics-inspired NNs against the TDA benchmarks. It can be observed that the NN-predicted values using P3P5 agree relatively well with the TDA predictions, despite that these NN models are trained with totally different datasets. This indicates that the use of physics-inspired NNs significantly improves its ability to adapt the new data with distinct boundary conditions. Figure 14 shows the cumulative distribution function (CDF) of seepage flow rate for original training data, verification data and corresponding predications using different physics-inspired NNs. It was found that even though the distribution of flow rate for the original training data is totally different from the verification scenarios, the NNs for P3P5 still achieved satisfying performance. This further validates the model capacity of physics-inspired NNs.

Fig. 13
figure 13

Validation of applicable capacity of trained physics-inspired NNs P2P5 (Scenario A–C represent the different configurations of random verification cases and can be referenced from Table 5)

Fig. 14
figure 14

Cumulative distribution function (CDF) of seepage flow rate for different verification scenarios (Scenario A–C represent the different configurations of random verification cases and can be referenced from Table 5)

Table 5 Configurations of random verification cases for the applicable capacity of physics-inspired NNs

Furthermore, one practical case study of subway shaft was conducted to assess the model capacity of physics-inspired NNs. The study focused on the No. 3 shaft at Shifoying station in the Beijing Metro, which was situated in sandy soil containing an unconfined aquifer with a thickness of 6.5 m [40]. To prevent water leakage, jet grouting was implemented as a sealing measure. The columns in the No. 3 shaft had varying diameters of 0.8 m and 1.0 m, along with corresponding column spacings of 0.5 m and 0.7 m, respectively. The layout plan of jet-grouted columns in the No. 3 shaft can be observed in Fig. 15a. Experimental tests determined the permeability coefficient k1 of the unconfined aquifer to be 3.4 × 10–5 m/s, while the permeability coefficient k2 of the treated soil was found to be 1.3 × 10–8 m/s [40]. The water level difference between the inside and outside of the shaft was 6.5 m. Details of the construction errors associated with the jet-grouted cut-off walls can be referred in Fig. 15a. Figure 15b depicts the predictions of seepage flow rate generated by various physics-inspired NNs. It should be noted that the employed physics-inspired NNs were trained using the training dataset configured as shown in Table 1. Due to significant distribution discrepancies between the seepage flow rates in the training dataset and the applied case study, predictions produced by P2 exhibited notable systematic errors, resulting in much higher seepage flow rates compared to the benchmark predictions provided by TDA. However, this error diminished as the physical meaning was improved from P3 to P5, as evidenced by the strong agreement between the predictions given by P3 to P5 and TDA. The satisfactory performance demonstrated by P3 to P5 supports the notion that NN training can be prepared and directly applied in practical cases. This advantage underscores the potential of physics-inspired NNs as an alternative to TDA. Additionally, the predictions generated by the physics-inspired NNs were compared with the deterministic prediction derived from the equivalent wall thickness method. The daily seepage flow rate reported by Wang et al. [40] using the equivalent wall thickness method was 1.76 m3, which closely aligned with the 50% fractile (i.e., median) of the seepage flow rate predicted by TDA at 1.64 m3/day. The 95% fractile of seepage flow rate given by TDA was 4.78 m3/day, which indicates that significant construction errors in the cut-off walls may lead to substantial seepage flow rates. Neglecting potential scenarios for construction errors would underestimate the seepage risk.

Fig. 15
figure 15

a The layout plan of jet-grouted columns for the No. 3 shaft of Shifoying station in the Beijing Metro (revised after Wang et al. [40]), b the predictions of seepage flow rate for the No. 3 shaft given by physics-inspired NNs

4.3 Insight into the internal adjustment of neuron connections

Synaptic weights of NNs refer to the amplitude of the connection between two nodes, corresponding to the amount of impact the activation of one neuron has on another in the field of neuroscience. Based on suitable synaptic weights, the information transmission is performed in the interconnected networks of neurons. Figure 16a–e plots the heat maps of synaptic weights between incoming neurons and hidden neurons for P1P5, in which their configurations of physical layers are different. The number of incoming neurons and hidden neurons for all used NNs are strictly consistent, i.e., 84 incoming neurons and 40 hidden neurons. Noted that the synaptic weights are normalized into the standard range from − 1 to 1 by linear scaling, and the maximum and minimum value of weights are also tagged. As shown in Fig. 16a–e, the pattern of neuron activation for traditional NN is different from physics-inspired NNs when using the complete datasets. The synaptic weights of the traditional NN (P1) only concentrate on very few connections, while those of the physics-inspired counterparts are more evenly distributed. Within the red rectangle in each subplot, which represents the most active 15% of connections for each incoming node, such differentiation is more evident. The pattern of high concentration on only a few connections implies that the traditional NN is an unstable network, because any slight change in the input may result in large differences in the output. The occurrence of large weights occurred in the NN is the sign of the tendency to the overfitting of training data. A large enough weight will tendentiously induce the output of the neuron to either the maximum or minimum side of activation function. The higher the synaptic weights of NN, the less room is left in deciding how features to pass on the activation based on inputs [18]. As shown in Fig. 16f, the traditional NN model has larger synaptic weights with higher standard derivation compared to physics-inspired NN models. A NN model with small weights tends to behave more robust against the statistical noises and specified examples in the training datasets [1]. This can be attributed to the robust regularization operation naturally carried out in the physical layer. By physical layer, the prior knowledge from the physical perspective is added into the underlying learning of the NN. The used physical information will restrict the range of the feasible solution space and, hence, inhibit the appearance of overfitting under small sample conditions. When faced with sufficient data, the physical layer will also assist the underlying NN to approach the optimal weights in a greater efficiency. The self-regulation phenomenon for physical layer/hidden layer neuron connections may account for the improved performances of physics-inspired NNs.

Fig. 16
figure 16

Neuron weight response between incoming neurons and hidden neurons. Subplot ae plot the heat map of neuron weights for P1P5, in which red rectangles mark the most active 15% connections for each incoming node; Subplot f compares the statistics (mean value and standard deviation) of neuron weights for P1P5

4.4 Information plane visualization of physics-inspired NNs

The information plane, which refers to the representation of Mutual Information (MI) values preserved by each layer of NNs between input and output variables, was employed as a visualization technique to examine the influence of the physical layer on NN training [35]. Mutual information functions as a statistical metric that quantifies the degree of dependence or shared information between two variables [35]. It evaluates the quantity of information that can be obtained from one variable when the value of another variable is known. In the context of two random variables, U and V, characterized by a joint distribution p(u, v), their mutual information is defined as follows [19]:

$$\begin{aligned} I(U;V) & = \sum\limits_{u \in U,v \in V} {p(u,v)\log \left( {\frac{p(u,v)}{{p(u)p(v)}}} \right)} \\ & = \sum\limits_{u \in U,v \in V} {p(u,v)\log \left( {\frac{p(u\left| v \right.)}{{p(u)p(v)}}} \right)} \\ \end{aligned}$$
(4)

The layers of physics-inspired neural network can be conceptualized as a series of internal representations derived from the input layer X, forming a sequential relationship akin to a Markov chain X → P → T → \(\hat{Y}\) [2]. Here, X, P, T, \(\hat{Y}\) represent the values of the input layer, physical layer, hidden layer, and output layer. By considering this sequential progression, the transmission of information can be characterized through the mutual information between consecutive variables, denoted as IX = I(X;T) and IY = I(T;Y) [35]. It is important to note that Y represents the desired output rather than the predicted output \(\hat{Y}\). Consequently, the horizontal and vertical axes of the information plane correspond to the respective values of IX = I(X;T) and IY = I(T;Y). Visualizing the information plane provides a valuable means to gain profound insights into the information flow and the NN's capacity to effectively convey information across different layers.

Figure 17 compares the responses of information planes under different configurations of physics-inspired NNs from P1 to P5. The results indicate that augmenting the physical clarity of NNs leads to an increase in both the values of IX = I(X;T) and IY = I(T;Y), regardless of the training algorithms employed, namely Stochastic Gradient Descent (SGD) and Levenberg–Marquardt (LM). This observation can be attributed to the fact that integrating physical meaning enhances the efficiency of information transmission across the NN layers, thereby contributing to the improved performance of the NNs.

Fig. 17
figure 17

Reponses of information plane for different physics-inspired NNs: a Stochastic Gradient Descent (SGD) training; b Levenberg–Marquardt (LM) training

5 Summary and conclusions

In summary, this research presents the use of physics-inspired NNs as a surrogate model to efficiently evaluate the seepage flow rate for JGCOWs with random construction errors. Several novel physics-inspired neural network (NN) models were proposed based on well-designed physical layers with varying complexity. The capacity of physical layers to extract high-level features about geometrical imperfections of cut-off walls was examined. Compared to the TDA method, physics-inspired NNs were more computationally efficient while keeping rationally good accuracy and robustness. The problem of data overfitting of traditional NNs was also mitigated by the introduction of physical layers, by which the disorganized inputs are converted to be physically significant and dimensionally consistent. Some detailed conclusions are summarized as below:

  1. (i)

    The problem of data overfitting confronted by traditional NNs was solved by the introduction of physical layers. The physics-inspired NNs outperformed the traditional NNs in terms of both prediction accuracy (higher R2) and learning efficiency (lower number of required sample size to reach a good prediction result).

  2. (ii)

    The prediction accuracy of discharge rate can be enhanced via input of higher levels of physical expertise, though at the price of higher computation cost. One can reach optimal and practical trade-offs between prediction accuracy and calculation expense by preparing physical layers with rationally clear physical meaning, depending on the on-site requirement in accuracy and efficiency.

  3. (iii)

    Insight into the internal adjustment of neuron connections was provided for the physical layers of varying complexity. It was found that, when the physical layers were introduced, the neurons maintained a reasonable level of activation. The self-regulation phenomenon for neuron connections is captured in the physics-inspired NNs and accounts for their brilliant performances. Additionally, the visualization of the information plane revealed that the augmentation of physical clarity enhances the efficiency of information transmission across the layers of the NNs, thereby contributing to the improved performance of physics-inspired NNs.

However, due to the nature of the research objects and methodologies employed, this paper has certain limitations that should be acknowledged:

  1. (i)

    In practical scenarios, the monitoring of jet grouting entails the collection of real-time parameters, such as jetting pressure and lifting speed. However, these detailed factors are not considered in this study.

  2. (ii)

    The assumption made in this paper is that the water head difference between the outside and inside of cut-off walls remains constant along the depth. In reality, the practical water head distribution is much more complex, considering factors such as underground water migration. Therefore, the simplified assumption restricts the full representation of the actual water flow conditions.