1 Introduction

The water retention behavior of a porous material is the relationship between the matric suction and the degree of saturation.Footnote 1 It governs the hydromechanical behavior of an unsaturated porous material since the capillarity therein induces cohesion, which, in turn, affects the effective stress and material strength. While the water retention behavior has been traditionally characterized through various devices such as pressure plate apparatus, filter paper, or tensiometer, recent advances in imaging technologies have diversified the ways to quantify it. For instance, X-ray computed tomography (CT) provides high-resolution images of the porous material of interest, from which we can detect the spatial distribution of wetting and non-wetting phases in the pores [49, 66] or conduct pore-scale flow simulations [15, 33]. Applications of nuclear magnetic resonance (NMR) enable us to directly measure the water content of soils and rocks, and they can be utilized to determine their pore size distributions and water retention curves [20, 58].

Standard water retention models in the literature (e.g., [14, 25, 39, 67]) were originally developed for unsaturated flow in porous media without deformation. While these standard models have also been employed in coupled hydromechanical modeling of deformable porous media, see, e.g., [13, 16, 17, 29, 32, 35, 38, 71], they are not ideal because they do not account for the dependence of the water retention behavior on the deformation of the solid matrix. In other words, while the standard water retention models assume that the relationship between suction and saturation is unaffected by deformation, a change in the specific volume alters the pore size distribution and the tortuosity of the pore space, thereby changing the water retention behavior.

To overcome the aforementioned limitation for deformable porous media, two types of strategies have been pursued in previous studies. The first type is to develop a physics-inspired phenomenological model that incorporates the impact of solid deformation on water retention characteristics, see, e.g., [27, 47, 62]. The second type of approach is to adopt a hierarchical multi-scale model that performs micro-scale (i.e., pore-scale) simulations on representative volume elements to replace a continuum-scale constitutive law (e.g., [1, 24, 70]). Compared to the former which still partially neglects micro-scale processes, the latter can reflect the multi-scale nature of water retention behavior. Still, however, the latter approach is impractical due to the high computational cost required for incremental constitutive updates at each material point by running sub-scale simulations.

As an alternative, data-driven approaches have recently gained popularity since they can replace micro-scale simulations in representative volume elements with machine learning models trained with experimental or simulation data. Among a variety of existing frameworks, deep neural networks have been the most popular due to their simplicity and capability of capturing complex dependencies on the micro-scale attributes without the need to determine the material parameters explicitly. For instance, previous studies have employed neural networks as universal function approximators that can reproduce nonlinear stress–strain relations [18, 37, 40], permeability [57, 65], and water retention characteristics [31, 43], demonstrating the remarkable predictive capabilities of the trained models.

Nevertheless, the existing data-driven approaches have not yet brought a practical impact on real-world applications, presumably because they often rely on black-box models that are not interpretable. Although there have been several attempts to address this issue of non-interpretability (e.g., [3, 45, 63]), the lack of an analytical expression makes them only partially interpretable while being less accurate than their black-box counterparts. Meanwhile, a symbolic regression approach can yield a completely interpretable data-driven model since it results in a mathematical expression that fits the given dataset. As pointed out in [11], however, this approach is often data-hungry to reach a desired level of accuracy, while the computational cost required to search the possible combinations of symbolic expressions proliferates with the dimensionality of the problem. To address this issue, Bahmani et al. [8] have recently proposed a neural polynomial method to perform a series of symbolic regressions in low-dimensional spaces to capture an evolving yield surface. Regardless, no attempt has been made to discover an interpretable data-driven model of the water retention behavior of a deformable porous material.

This study aims to develop an interpretable machine learning model that can replace sub-scale simulations in a hierarchical multi-scale model without compromising accuracy. Specifically, we propose a divide-and-conquer approach that leads to the discovery of an interpretable symbolic expression describing the water retention characteristics of deformable porous media. For this purpose, we consider multiple sets of points that comprise water retention curves of a certain type of porous material as a training dataset, collected from a series of pore-scale simulations through a pore-morphology-based algorithm. From the acquired dataset, we formulate a regression task by training a multi-layer perceptron in a supervised fashion that yields a black-box function exhibiting a high degree of expressivity. We then train a symbolic regression model via genetic programming to discover a mathematical expression that replicates the neural network function. The proposed framework can be integrated into multi-scale simulations by replacing pore-scale simulations with a discovered model that possesses high degrees of expressivity and interpretability at the same time. To limit the scope of the present work, we shall restrict our attention to the water retention behavior under drying (drainage), without consideration of hysteresis in the water retention behavior.

2 Methodology

In this section, we first summarize the image-based method adopted in this study which enables us to obtain a discretized water retention curve from a digital microstructure (Sect. 2.1). We then present a divide-and-conquer machine learning approach that yields an interpretable data-driven water retention model and trained with a dataset collected from a series of pore-scale simulations (Sect. 2.2). Specifically, our proposed framework trains a black-box neural network that can accurately represent the water retention behavior of a deformable porous material and then performs symbolic regression to discover the interpretable mathematical expression that best fits the learned function. The image-based simulation results and the predictive capability of the resulting data-driven model will be presented later in Sect. 3.

2.1 Image-based sphere insertion method

This study adopts an image-based method to simulate the invasion of the non-wetting fluid (e.g., air phase) into a water-saturated digital porous material to obtain water retention data. The technique is referred to as the image-based sphere insertion method or the pore-morphology-based method, since it mimics the process of fluid infiltration by inserting a set of spheres of radius r into the pore space to approximate the fluid configurations at a given suction s. For brevity, this section only provides a summary of the simulation process, referring the readers to [30, 33, 56] for details on the morphological operations and image processing algorithms.

Consider a three-dimensional binary image where 0 represents the solid while 1 represents the pore at each voxel, which serves as a platform to conduct the pore-scale simulations. We first define the inlet face, which is comprised of a set of pore voxels located on the boundary plane. Then, we apply the Euclidean distance transform on the binary image to replace the value of 1 with the minimum distance to the solid region at each pore voxel. This yields a distance map where the individual voxel values represent the radius of the maximum possible sphere that can be inserted into the center of each pore voxel, which can be converted to the minimum capillary pressure (\(p_c\)) required to invade the corresponding pore space through the Young–Laplace equation:

$$\begin{aligned} p_c = \frac{2 \gamma _\textrm{aw} \cos {\theta }}{r}, \end{aligned}$$
(1)

where \(\gamma _\textrm{aw}\) indicates the air–water interfacial tension and \(\theta\) denotes the contact angle. This implies that we can identify all the regions that can be invaded by the non-wetting fluid from a series of morphological operations. Hence, at a given suction s, we construct a binary mask by thresholding the obtained distance map by \(r_{\text {th}} = 2 \gamma _\textrm{aw} \cos {\theta } / s\), which indicates the regions where the minimum capillary pressure required for an invasion is less than or equal to s. At this point, the voxels within the mask that are disconnected to the inlet face are trimmed, while the remaining connected voxels are morphologically dilated by \(r_{\text {th}}\) to obtain the configuration of the fluids, where we can record the degree of saturation \(S_w\) by computing the volume fraction of the pore space filled by the wetting fluid. By repeating this process at increasing values of s, we can replicate a drainage experiment that yields a water retention curve formed by a set of points in a two-dimensional Euclidean space spanned by s and \(S_w\).

In this study, we consider a number of representative volume elements that, respectively, mimic the digital snapshots of a certain porous material undergoing an isotropic confinement while performing pore-scale simulations therein to obtain a set of points (e.g., \(\lbrace v^i, s^i, S_w^i \rbrace _{i=1}^{N_{\text {data}}}\), where \(v = 1 + e\) is the specific volume, e is the void ratio, and \(N_{\text {data}}\) indicates the total number of points) that comprises a surface in a three-dimensional Euclidean space that describes the water retention behavior of the target material that is deformable. Digital microstructures considered in this work and their simulation results will be presented later in Sect. 3.1.

2.2 Neural network-based symbolic regression

By considering the point cloud \(\lbrace v^i, s^i, S_w^i \rbrace _{i=1}^{N_{\text {data}}}\) as a training dataset, we formulate a supervised learning problem to train a feed-forward neural network counterpart of the water retention function, parameterized by the weights \(\varvec{W}\) and biases \(\varvec{b}\). Based on this setting, the learning problem seeks to minimize the difference between the ground truth and the neural network prediction for given samples, i.e.,

$$\begin{aligned} \varvec{W}, \varvec{b} = \mathop {\mathrm {arg\,min}}\limits _{\varvec{W},\varvec{b}} (\mathcal {L}) \, \, ; \, \, \mathcal {L} = \frac{1}{N_{\text {data}}} \displaystyle \sum _{i=1}^{N_{\text {data}}} \left[ f^{\text {NN}} (v^i, s^i) - S_w^i \right] ^2, \end{aligned}$$
(2)

where \(\mathcal {L}\) indicates the mean square error loss and \(f^{\text {NN}}\) is the neural network function. Although the hyperparameters can be optimized to achieve a higher level of efficiency (e.g., [9, 10]), for simplicity, this study considers a typical fully connected neural network comprised of two hidden layers with 20 neurons each with rectified linear unit activation functions \(\text {ReLU}( \bullet ) = \max {(0, \bullet )}\) followed by an output dense layer. Here, we have chosen the ReLU activation function instead of others (e.g., sigmoid or hyperbolic tangent activation functions) to ensure the effectiveness of the backpropagation process in training. In this case, the neural network function can be expressed as

$$\begin{aligned} f^{\text {NN}}(v, s)& = \varvec{W}^{(3)} \cdot \text {ReLU} \left( \varvec{W}^{(2)} \cdot \text {ReLU}\left( \varvec{W}^{(1)} \cdot \varvec{x} + \varvec{b}^{(1)} \right)\right. \\&\quad \left.+ \varvec{b}^{(2)} \right) + \varvec{b}^{(3)} \, \,; \, \, \varvec{x} = \left[ v, s \right] ^{\text {T}}, \end{aligned}$$
(3)

where the size of the weight matrices \(\varvec{W}^{(1)}\), \(\varvec{W}^{(2)}\), and \(\varvec{W}^{(3)}\) are \(20 \times 2\), \(20 \times 20\), and \(1 \times 20\), while the biases \(\varvec{b}^{(1)}\), \(\varvec{b}^{(2)}\), and \(\varvec{b}^{(3)}\) are the vectors of sizes 20, 20, and 1, respectively. This indicates that our neural network contains a total of 501 trainable parameters, which makes it difficult for us to interpret and is hence considered as a black box even though it can achieve a high level of accuracy, owing to its high degree of expressivity [7].

To overcome this issue, this study performs symbolic regression via genetic programming [22, 48] to discover the mathematical expression of the learned neural network function, which is inherently human-interpretable while leveraging the expressive power of a multi-layer perceptron. The symbolic regression algorithm used in this study considers a mathematical expression as a binary tree, which can be constructed from the set of variables, constants, and binary and unary operators. Specifically, as illustrated in Fig. 1, the leaf nodes (blue) of the tree structure contain either an input variable or a constant (e.g., 0.5, 2.5, or x), while the internal nodes (red) can accommodate mathematical operators [e.g., \(+\), −, \(\times\), or \(\exp {( \bullet )}\)]. The use of genetic programming in this work is similar to [7, 55], whereby the combinatorial space of all possible mathematical expressions is searched. It first randomly generates a population of candidate binary trees and evaluates their mean square error losses to measure the fitness of each candidate solution. Then, the algorithm iteratively evolves the population based on the selection (selecting the fittest candidate from a population), crossover (exchanging randomly selected sub-trees of two candidates), and mutation (randomly replacing a sub-tree of a candidate solution with a new tree) operators. This process is repeated until the algorithm discovers a satisfactory mathematical expression (i.e., \(f^{\text {NN+SR}}\)) that best fits the black-box function \(f^{\text {NN}}\).

Fig. 1
figure 1

Representation of an exemplary mathematical expression \(\exp {(0.5+x)} - 2.5x\) as a binary tree with depth 3 (color figure online)

The schematic of the proposed divide-and-conquer strategy is shown in Fig. 2, where a superposed tilde indicates the degree of saturation predicted via a machine learning model (e.g., either the black-box network or the symbolic expression discovered from it). The main advantage of this framework is that the interpretability enhanced via symbolic regression enables us to better understand the water retention behavior of a target material, without sacrificing the expressive power of a black-box neural network. It also results in an enhancement in the portability of the learned function for continuum-scale partial differential equation solvers regardless of the written programming languages, such that the model can easily be incorporated into almost any kind of existing simulation code. The training results and the discovered mathematical expressions will be reported in Sect. 3.2.

Fig. 2
figure 2

A schematic of the divide-and-conquer machine learning approach to discover an interpretable data-driven water retention model

Remark 1

In addition to the use of ReLU activation layers, we normalize the data before the training to avoid problems that may occur during the backpropagation process (e.g., vanishing/exploding gradients). Specifically, a sample \(X^i\) of a measure X is scaled to a unit interval as:

$$\begin{aligned} \bar{X}^i = \frac{X^i - X_{\text {min}}}{X_{\text {max}} - X_{\text {min}}}, \end{aligned}$$
(4)

where \(\bar{X}^i\) is the normalized sample point, while \(X_{\text {min}}\) and \(X_{\text {max}}\) are the minimum and the maximum values of the measure X inside the training dataset, respectively, such that all the data used in this paper are normalized within [0, 1] as a preprocessing step.

3 Results and discussion

This section presents the image-based simulation results from a set of randomly generated digital pore structures that serve as a dataset for training a neural network function, from which we extract mathematical equations via symbolic regression. Specifically, Sect. 3.1 first provides details on generating a set of digital microstructures that resembles the snapshots of a porous material subjected to isotropic confinement and then presents a series of discrete water retention curves obtained from the pore-morphology-based simulations. Based on the collected water retention data, Sect. 3.2 focuses on the machine learning models trained via the divide-and-conquer approach and on demonstrating their predictive capability and interpretability. Subsequently, Sect. 3.3 showcases the potential of the interpretable data-driven water retention model by incorporating it into a mixed finite element model. In this work, we have generated digital microstructures and pore-scale simulations therein by utilizing an image analysis toolkit PoreSpy [28] and have implemented the proposed framework with deep learning libraries PyTorch [51] and PySR [21].

3.1 Image-based drainage simulations in deformable porous media

In this study, we consider a set of digital microstructures represented by \(500 \times 500 \times 500\) voxels that have voxel sizes of 2 \(\upmu\)m, which mimics a set of representative volume elements of a highly porous and soft rock at different confinement levels. As in Ávila et al. [5], we generate each microstructure firstly from a \(500 \times 500 \times 500\) matrix of random noise with zeros and ones. To ensure that they share similar microstructural attributes, the images are blurred by the same Gaussian kernel by using the same blobiness parameter of 1.5 and are then binarized by applying thresholding until they reach their own target specific volume. By specifying the target values of specific volume v from 1.8 to 2.4 with an interval of 0.05, a total of 13 different binary images are obtained through this process. Figure 3 shows a subset of obtained digital microstructures with the specific volume specified as \(v = 1.8\) and \(v = 2.4\), respectively. Although their pore structures appear similar, we further investigate their pore size distributions and orientations to confirm that they can be considered the same type of material before subjecting them to pore-scale simulations.

Fig. 3
figure 3

Digital pore structures generated by applying Gaussian blur with different values of target specific volumes \(v = 1.8\) and \(v = 2.4\)

To quantify the topological characteristics of individual digital microstructures, we extract their pore networks based on the methods proposed by [59] and [61]. The obtained pore networks consist of pore chambers (i.e., pore volume segments) connected by pore throats (i.e., capillary tubes), which enables us to investigate the pore size distributions and their orientations. Figure 4a illustrates the pore size distributions obtained from the microstructures with \(v = 1.8\) (red), \(v = 2.1\) (purple), and \(v = 2.4\) (blue), respectively, while Fig. 4b shows the relationship between specific volumes of all 13 microstructures considered in this work and the corresponding mean pore diameters (circular symbols), where the error bars represent \(\pm 2.5\) % of their standard deviations. The results indicate that the pore sizes closely follow the lognormal distributions, similar to those of the typical geological materials [42], and show that the mean pore size tends to increase with increasing v accompanied by small increases in the standard deviation, exhibiting similar trends reported in Penumadu and Dean [52] using mercury intrusion porosimetry on kaolin clay under varying preconsolidation pressures. In addition, similar distributions of pore throat orientations from microstructures with different specific volumes as shown in Fig. 5 corroborate that the generated digital pore structures can be regarded as the same type of material.

Fig. 4
figure 4

Pore size distributions of digital microstructures generated from different values of specified v (color figure online)

Fig. 5
figure 5

Distributions of pore throat orientations within XY-, ZX-, and YZ-planes, where we set Z-, Y-, and X-axes as the \(0^{\circ }\) references, respectively

We then conduct a series of drainage simulations from the obtained microstructures using the image-based sphere insertion method. As exemplified in Fig. 6, the simulation injects the air phase into the water-saturated digital pore structure at the specified inlet face \(z = 0\), while we set the interfacial tension as \(\gamma _\textrm{aw} = 0.072\) N/m and the contact angle as \(\theta = 0^{\circ }\), following [6, 36]. Specifically, by prescribing zero pore water pressure (\(p_w = 0\)) at the top surface, the numerical simulation is performed by applying a stepwise increment of suction (\(s = p_a - p_w\)) from 0 kPa to 20 kPa, while we record the degree of saturation \(S_w\) at every 1 kPa intervals to construct a discrete water retention curve.

Fig. 6
figure 6

Snapshots of the drainage simulation of digital microstructure using the image-based sphere insertion method

Figure 7a illustrates the discrete water retention curves (circular symbols) obtained from 13 different microstructures considered in this study, fitted by the van Genuchten model [67]:

$$\begin{aligned} s = p_0 \left[ \left( \frac{S_w - S_\textrm{wr}}{1 - S_\textrm{wr}} \right) ^{-1/m} - 1 \right] ^{1-m}, \end{aligned}$$
(5)

where \(p_0\), m, and \(S_\textrm{wr}\) are the fitting parameters that are related to the air entry pressure, slope of the curve, and residual saturation, respectively. Although Eq. (5) does not accurately represent the suction-saturation relationship of our material of interest, as depicted in Fig. 7b–d, we can observe strong correlations between the specific volume v and the calibrated fitting parameters that are similar to those discovered by the previous studies [34, 50]: The air entry suction and residual saturation decrease with growing voids, while the water retention curve tends to exhibit steeper slopes with increasing v. The results also support the assumption that the generated digital microstructures are considered the same type, but more importantly, they imply that we require a phenomenological model that describes the relation between the specific volume, suction, and degree of saturation to replicate the water retention behavior of deformable porous media, which can either be handcrafted (e.g., [27]) or discovered through machine learning process (e.g., [31]).

Fig. 7
figure 7

a Water retention curves fitted by [67] and the correlations between specific volume v and the fitting parameters b \(p_0\), (c) m, and (d) \(S_\textrm{wr}\)

3.2 Data-driven discovery of interpretable water retention models

From the water retention data collected from a series of pore-scale simulations, our goal is to discover an interpretable data-driven water retention model for the material of interest based on the divide-and-conquer approach. Since the data recorded from the drainage simulations (i.e., circular symbols in Fig. 7a) can be considered a point cloud in a three-dimensional Euclidean space spanned by v, s, and \(S_w\), as described in Sect. 2.2, we first train a feed-forward neural network that predicts the saturation from the given specific volume and suction until 5000 epochs. The learning curve in Fig. 8a shows that the mean square error loss tends to decrease as the number of epochs increases until the model performance converges after \(\sim\)3000 epochs, while the trained neural network yields a smooth representation of a surface (Fig. 8b) that well fits the data. Even though the trained model is a black box, the results indicate that the neural network represents the water retention behavior better compared to Eq. (5), implying that neural networks are highly expressive models such that they can approximate almost any function of interest, which may not be an easy task for a human [23, 46, 54]. It should be noted that one may introduce additional constraints (e.g., on convexity [26] or monotonicity [4]) for training neural networks to make them physically sound. However, since this study focuses on discovering a phenomenological model, this extension will be considered in the future.

Fig. 8
figure 8

a Learning curve of the neural network function that predicts the degree of saturation and b a surface represented by the learned function \(S_w = f^{\text {NN}}(v, s)\)

To enhance the interpretability of the learned function \(f^{\text {NN}}\), we now conduct symbolic regression in an offline setting to discover the mathematical expression \(f^{\text {NN+SR}}\) that replicates the neural network. As pointed out in Cogswell et al. [19], one possible way to prevent overfitting in machine learning models is to train them with more data. Since one of the upshots of the divide-and-conquer approach is that we can resample as many data points as desired from the trained neural network, we therefore reconstruct the training dataset for the symbolic regression at this point. Following [60, 68, 69], the resampling is performed on a uniform grid of v and s, while the corresponding values for \(S_w\) are computed from the neural network function. Specifically, we sample 50 points along the V-axis (from 1.8 to 2.4) and 50 points along the S-axis (from 0 kPa to 20 kPa), such that the reconstructed dataset has \(N_{\text {data}} = \text {2500}\). Theoretically, from the dataset generated from a known function, symbolic regression can discover the expression that is nearly identical to the true expression if the specified operators are properly chosen (see Appendix A). However, considering the case where we do not have a priori knowledge, we specify the set of unary operators as \(\lbrace \text {abs}( \bullet ), \text {exp}( \bullet ), \text {sqrt}( \bullet ), \text {inv}( \bullet ) \rbrace\) and the set of binary operators as \(\lbrace +, \times , {^{\hat{\,}}}, / \rbrace\) that can be accommodated in the internal nodes of a binary tree. Here, \(\text {abs}( \bullet ) = | \bullet |\), \(\text {sqrt}( \bullet ) = \sqrt{\bullet }\), and \(\text {inv}( \bullet ) = 1 / \bullet\), while the symbol ’\({^{\hat{\,}}}\)’ denotes the power operation (e.g., \(a {^{\hat{\,}}}b = a^b\)). By setting the number of iterations of the algorithm as 1000, the number of populations as 15, the number of individual binary trees in each population as 33, and the maximum number of nodes in individual trees (i.e., the maximum degrees of complexity) as 40, the CPU time to complete the training takes 3129 s on a laptop equipped with an Apple M2 Max processor (12 cores, 12 threads @ 3.5 GHz) with 64 GB DDR5 memory.

Fig. 9
figure 9

Discovered mathematical expressions with different levels of complexity that best fit the neural network function and their mean square error losses

Figure 9 shows the best-fit mathematical expressions discovered by the symbolic regression algorithm depending on the degrees of complexity, where the superposed bar indicates the normalized values that range from 0 to 1 (see Remark 1), while Fig. 10 illustrates the surface representations of the discovered expressions with Complexity = 1, 3, 16, and 38, respectively. The results reveal that the discovered mathematical expression tends to better represent the learned function as the number of nodes in the tree increases, indicating that there is a trade-off between accuracy and simplicity. For instance, an expression from a single-noded tree is a constant function \(\bar{S}_w = 0.57\), which is the simplest expression that the algorithm can discover, results in an inaccurate representation of the learned function even though it can be considered the most accurate for the case where Complexity = 1. On the other hand, the best-fit solution discovered from a tree with Complexity = 38 results in a very accurate prediction with a relatively small mean square error, but it may be a very complex expression that is not very intuitive to interpret. Nevertheless, we underscore that the analytical expressions are always interpretable regardless of their complexity, which not only makes the post hoc analyses easier but also enhances the portability of the learned function, compared to the black-box neural networks. This implies that we may achieve the highest level of accuracy when we choose an expression that exhibits the lowest mean square error among all the potential candidates. Hence, based on the hyperparameters we specified, this study considers the mathematical expression with Complexity = 38 shown in Fig. 9 as the resulting symbolically regressed counterpart (i.e., \(f^{\text {NN+SR}}\)) of the neural network function \(f^{\text {NN}}\).

Fig. 10
figure 10

Geometrical representations of the discovered mathematical expressions with Complexity = 1, 3, 16, and 38, respectively

As a validation example, we conduct image-based drainage simulations within two digital microstructures that are not used during the training processes and compare the results with the predictions made from the trained neural network \(f^{\text {NN}}\) and its best-fit mathematical expression \(f^{\text {NN+SR}}\). Furthermore, as a reference, we have also made an additional set of predictions from a mathematical expression with Complexity = 38, which is directly discovered from the symbolic regression (\(f^{\text {SR}}\)) without neural network pretraining or data augmentation (Fig. 11). Here, all the parameters used for generating the microstructures remain the same as those summarized in Sect. 3.1 except the threshold values, which are specified to reach specific values of V: One microstructure exhibits \(v = 1.825\) which falls into the training data range, whereas the other has \(v = 2.5\) which is outside the range of the specific volume of the training dataset. For the case where we set \(v = 1.825\), as illustrated in Fig. 11a, both neural network-based model (orange curve) and its best-fit mathematical expression obtained from the divide-and-conquer approach (green curve) are capable of reproducing water retention curves that are very close to the image-based simulation results, whereas the model discovered directly from the symbolic regression (purple curve) fails to capture the water retention characteristics for the range of matric suction below the air entry value. More importantly, although the predictions are not as accurate as those made in the interpolation regime, the water retention curves for an unseen digital microstructure with \(v = 2.5\) can still be constructed with reasonable accuracy from the trained neural network (\(f^{\text {NN}}\)) and its best-fit symbolic expression (\(f^{\text {NN+SR}}\)), respectively (Fig. 11b). The results not only confirm the validity of the trained models but also emphasize the predictive capability of the mathematical model discovered by the divide-and-conquer approach, which cannot be achieved directly from a symbolic regression. This demonstrates that the proposed framework enables us to achieve high levels of accuracy and interpretability at the same time.

Fig. 11
figure 11

Predicted water retention curves for unseen digital microstructures: a \(v = 1.825\) and b \(v = 2.5\) (color figure online)

3.3 Implication for continuum-scale modeling

In this section, we showcase the applicability of the trained model by incorporating it into a continuum-scale simulation that may resolve the high computational cost issues of multi-scale models pointed out in Sect. 1. Specifically, we employ a mixed finite element method as a continuum-scale model, while replacing the micro-scale model at each Gauss point with a mathematical expression discovered from the divide-and-conquer approach (\(f^{\text {NN+SR}}\)) that replicates the pore-scale water retention behavior of the material of interest. The details of the continuum-scale model can be found in Appendix B.

Fig. 12
figure 12

Schematic of geometry and boundary conditions for the gravity-driven seepage problem

Table 1 Material parameters for gravity-driven seepage problem

The numerical example considered herein resembles the experiment conducted by Liakopoulous [41]. As illustrated in Fig. 12, the problem domain is a 1-m-tall column that possesses microstructure similar to those generated in Sect. 3.1. The domain is spatially discretized into a structured mesh with an element size of \(h_e = 0.02\) m and we set the time step size as \(\varDelta t = 10\) s, while the specified material properties are summarized in Table 1. By assuming that the column is initially equilibrated with hydrostatic pore water pressure so that it retains fully saturated condition (i.e., \(S_w = 1\)) at \(t = 0^{-}\), the numerical experiment begins at \(t = 0^{+}\) by allowing the water phase to escape from the bottom surface by prescribing the pore water pressure boundary condition as \(\hat{p}_w = 0\) while applying zero flux boundary conditions at all other boundaries.

Fig. 13
figure 13

Transient responses of the porous column: a pore water pressure, b degree of saturation, and c specific volume

Figure 13 shows the variations of the pore water pressure \(p_w\), degree of saturation \(S_w\), and the specific volume V along the Y-axis. Similar to the results reported in [41], gravity gradually builds up the negative pore pressure which affects the pore water to migrate toward the bottom end over time (Fig. 13a), accompanied by an increase in the compressive effective stress of the entire column (Fig. 13c), i.e., consolidation. Meanwhile, as illustrated in Fig. 13b, the air phase starts to invade from the top surface once the suction \(s = p_a - p_w\) exceeds the air entry value at the current specific volume, following a certain path on the surface represented by \(f^{\text {NN+SR}}\) that we specified (Fig. 14). In addition, the recorded saturation path of the top surface as shown in Fig. 14 suggests that a density-independent phenomenological model may not reflect the water retention behavior of a deformable porous material, while it underscores that an interpretable data-driven model enables us to capture the multi-scale nature of unsaturated flow with a computational cost similar to that of a single-scale finite element model.

Fig. 14
figure 14

Saturation path of the top surface recorded during the finite element simulation

4 Conclusion

In this work, we have proposed a framework for data-driven discovery of an interpretable model for the water retention behavior of deformable porous media. A divide-and-conquer approach has been developed to discover a data-driven water retention model that overcomes the accuracy-interpretability dilemma by leveraging the expressive power of a neural network and the interpretability of a symbolic expression. The proposed approach has been validated against the data generated from a series of image-based pore-scale simulations which resembles the water retention data collected from a deformable porous material. Unlike a typical neural network that yields a non-interpretable black-box function, the genetic programming algorithm can provide an interpretable symbolic expression that satisfactorily replicates the learned function. We have shown that both the trained neural network and its symbolically regressed counterpart are capable of reproducing water retention curves of digital microstructures unused in the training processes with reasonable accuracy. We have also demonstrated that thanks to its inherent portability, the discovered water retention model can easily be incorporated into continuum-scale simulations, revealing the potential of capturing the multi-scale nature of unsaturated flow in a deformable porous material without significant computational cost. Future works include an extension of the proposed framework to capture hysteresis in water retention and an improvement of the trustworthiness through the use of a physics-constrained layer or loss function.