Introduction

Under the development of subsurface oil and gas resources, there are many parameters and attributes that are crucial including porosity, permeability, relative permeability (RP), capillary pressure, wettability, and so on (Feng et al 2021). Among them, the RP curve (Honarpour et al. 2018), as an indicator for connecting and characterizing various phases of fluids, is essential for the efficient development of these resources. The RP curve refers to the relationship between the permeabilities of different phases (such as oil and water) within a porous medium. It not only determines the flow behavior of different phases within the porous and fracture spaces of the reservoir but also significantly influences the accuracy of the reservoir simulation model (Wang et al. 2023). More specifically, the RP curve is the crucial property in reservoir modeling for history matching, production schemes designing and optimization, and enhancing recovery. Consequently, rapidly and precisely obtaining RP curves with relatively convenient methods is absolutely essential.

So far, a great number of approaches have been applied to acquire RP curves that can be roughly divided into two categories called direct method and indirect method. As for the direct approach, it entails laboratory experiments on rock cores, employing steady-state or unsteady-state methods for measurement (Krevor et al. 2012; Alizadeh and Piri 2014; Chen et al. 2014; Kianinejad et al. 2014, 2015a, 2015b; Feng et al. 2018). In this way, the RP curve can be relatively easily obtained through direct method rather than complex derivation and calculation. Nevertheless, limitations such as the quantity of core samples, costly experimental fees, core contamination, and measurement errors often render direct experimental methods impractical (Honarpour et al., 2018). Consequently, several indirect prediction models have been widely adopted. Some researchers have proposed methods utilizing dynamic data to compute RP curves, such as the water drive curve method (Dianfa et al., 2013) and the water saturation curve method (Wang et al., 2005). These methods provide insights into the overall dynamic changes in reservoirs but rely on extensive production data over extended periods and do not account for reservoir heterogeneity. Mercury injection experiments involve injecting mercury into the microscopic pores of a porous medium under specific pressure conditions, establishing a relationship between pressure and mercury volume. These experiments and RP curves are both influenced by the complex micro-pore structure of the porous medium. Due to the ease of acquiring mercury injection experimental data and the relatively large sample sizes, numerous scholars have proposed RP models based on capillary pressure experiments (Purcell 1949; Burdine 1953; Corey 1954; Brooks and Corey 1966). Purcell (1949) introduced a permeability model based on the capillary pressure curve. Assuming that water flows through small capillary tubes and gas through larger ones, a straightforward RP model can be derived. Subsequent to Purcell's work, RP models based on capillary pressure were developed by Burdine (1953), Corey (1954), and Brooks and Corey (1966). These models consider pore size distribution and tortuosity, although they do not account for the irreducible water film. Since Helba et al. (1992) first integrated percolation theory into RP calculations, this method has been employed and refined by researchers, including Salomao (1997), Dixit et al. (1998), Phirani et al. (2009), and Kadet and Galechyan (2014). Unfortunately, precisely determining coordination numbers and fractions of pores in the network used in these models poses challenges. Zhou et al (2023) applied the ensemble Kalman method for the prediction of RP curves according to the saturation data. Lanetc et al. (2024) proposed a novel RP curves estimation approach based on hybrid pore network and volume of fluid methods. In short, these above-discussed researches had significant successes in the acquisition of RP curves by a variety of approaches. What cannot be ignored is that these methods all have some imperfections to some extent. Therefore, establishing an efficient framework for obtaining the RP curves is necessary.

With the rapid advancement of artificial intelligence technologies, optimization and machine learning methods have emerged as innovative solutions for calculating RP curves in underground reservoirs (Wang et al 2022). Researchers have explored various approaches harnessing the power of artificial intelligence to enhance the accuracy and efficiency of predicting RP. In detail, Esmaeili (2019) pioneered a model focusing on sand-type systems, predicting two-phase oil/water RP using the least square support vector machine (LSSVM) in a supervised learning framework. Adibifard (2020) employed Genetic Algorithm (GA) and Iterative ensemble Kalman filter (EnKF) to estimate optimal RP curves, enhancing Corey's three-phase RP correlation. Kalam (2020) utilized the artificial neural network (ANN) algorithm to establish empirical correlations for predicting RP profiles in oil–water two-phase flow within both sandstone and carbonate reservoirs. Zhao (2020) developed a machine-learning model based on the random forest algorithm (Breiman 2001), incorporating specific special core analysis laboratory (SCAL) data and proposing the Euler characteristic as a potential first-order predictor of RP, coupled with in-situ fluid saturations. Liu et al. (2019) predicted capillary pressure and relative permeability curves for an arbitrary bundle of capillary tubes by integrating it into a physics-informed data-driven approach using Artificial Neural Networks (ANN) using area, shape factors, and testing pressure as input parameters. Seyyedattar (2022) applied advanced machine learning methods, including the adaptive network-based fuzzy inference system, a hybrid of least square support vector machine (LSSVM) with coupled simulated annealing (CSA), and extra trees (ET) to predict various relative permeabilities in complex fluid systems. Mathew (2021) developed three machine learning models for fast estimation of PR curves from steady-state core-flooding experimental. Muoghalu (2022) adopted the Kmeans clustering algorithm for rock typing based on RP curves. More recently, Xie et al. (2023) utilized the three-dimension digital rock images and deep learning models for the prediction of relative permeability curves. These studies have established datasets using artificial intelligence algorithms based on existing experimental or numerical results. These efforts have achieved significant success in reliably and efficiently predicting RP, particularly in three-dimensional digital core image predictions (Rabbani et al. 2020; Kamrava et al. 2020; Rabbani and Babaei 2019; Tian et al. 2021; Tembely et al. 2020; Wang et al. 2021; Najafi et al. 2021; Sayyafzadeh and Guérillot 2022; Siavashi et al. 2022). Although the aforementioned studies have achieved success in the application of machine learning to obtain RP curves, to the best of our knowledge, there is a scarcity of direct applications of deep learning methods for the prediction of RP curves from capillary pressure curves of rock samples.

Therefore, we propose an artificial intelligence approach based on GAF (Alsalemi et al. 2023) and deep learning frameworks called RPCDL to predict RP curves from rock core MICP (Pittman et al., 1992) data. For the purpose of addressing the challenge of limited sample data, a suitable self-supervised learning paradigm that encompasses various reservoir types is introduced. The advantage of the proposed RPCDL method compared to traditional models is the self-supervised learning framework which plays an important role in increasing the prediction accuracy. The high accuracy of the RP curve predictions in the test samples validates the excellent generalization capabilities of the model constructed in this study. The remaining sections of this paper are organized as follows: In Sect. "Methodology", we introduce the GAF method for transforming MICP data into images, the establishment of mercury injection-capillary pressure experimental data samples, and the AI workflow for predicting RP curves. Sect. "Results and discussion" details the training process of the ConvLSTM model, including its application on the training and test datasets. We further compare the predictions of rock core RP curves obtained using our deep learning model with those from traditional models. Finally, we provide a summary of our findings in Sect. "Conclusions". By exploring the synergy of advanced imaging techniques, artificial intelligence, and deep learning methodologies, this research not only contributes to the field of reservoir engineering but also offers a novel and effective solution for predicting RP curves. The integration of GAF and ConvLSTM model represents a significant step forward in harnessing the potential of artificial intelligence in the domain of petroleum reservoir characterization.

Methodology

Three main parts of the theory adopted in the estimating of RP curves are conducted in this section. Firstly, the specific steps of transforming the MICP data into images based on GAF are displayed. Then, the MICP-RP data samples are established in the second part of the methodology. The proposed RPCDL prediction framework based on deep learning methods is developed and introduced in the third part.

Transformation of MICP data into images using GAF

GAF is a novel method for representing time series data (Wang and Oates 2015). By transforming one-dimensional signals into polar coordinates and performing inner product operations, GAF ensures angular distinction between different modes at the same time, preserving the differences between multiple modes and maximizing differentiation among different features. After the inner product operation, the resulting Gram matrix (GM) encodes the one-dimensional time series. GAF images contain numerical information from one-dimensional time series while retaining the temporal relationships between non-stationary time-varying signal data, distinctions between multiple modes, correlations between potential states, and reducing repetitive information (Damaeviius et al. 2018).

One distinction between MICP curves and regular time series data lies in the different abscissa corresponding to the mercury injection curve, mercury withdrawal curve, and pore throat distribution curve data points from different samples. Moreover, the abscissa distances between any two adjacent data points on the same curve might not be consistent. Thus, it is necessary to first interpolate and extrapolate all mercury injection curves, mercury withdrawal curves, and pore throat distribution curves using linear interpolation algorithms, standardizing the curves to a uniform length. In this study, the raw experimental data points for each sample are subjected to linear interpolation to obtain 128 evenly spaced points across the defined range. The abscissa of the mercury injection and withdrawal curves represents the mercury saturation percentage, which ranges from 0 to 100. For the pore throat distribution, the range extends from 0.001 to 1000 µm, which, on a logarithmic scale, translates to − 3–3 (base 10). This process ensures that all the curves are normalized to the same scale and length, facilitating direct comparison and analysis. In this study, the abscissa of mercury injection and withdrawal curves represents mercury saturation percentage. Using the linear interpolation method, the curves are interpolated to 128 evenly spaced data points. The abscissa of the pore throat distribution curve represents pore throat size, typically ranging from 0.001 to 100 um. Considering the rock samples' pore throat sizes in our target area, the pore throat distribution curve is interpolated to 128 evenly spaced data points on a logarithmic scale (base 10). Figure 1. displays the original and interpolated data curves for mercury injection, withdrawal, and pore throat distribution for a specific rock sample in our target area.

Fig. 1
figure 1

Schematic representation of original and interpolated curves for mercury injection, withdrawal, and pore throat distribution data

Using the same interpolation method for processing all rock sample experimental data ensures the comparability of the original sequences' trends and the transformed GAF representations for different samples. To prevent biasing the inner product toward maximum observed values, the interpolated mercury injection, withdrawal, and pore throat distribution sequences \({ {\text{X}}} = \left\{ {{ {\text{x}}}_{1} ,{ {\text{x}}}_{2} , \ldots ,{ {\text{x}}}_{128} } \right\}\) are scaled to the interval [–1,1]. The transformation formula employed is as follows:

$$ \tilde{x}_{i} = \frac{{\left( {x_{i} - \max \left( X \right)} \right) + \left( {x_{i} - \min \left( X \right)} \right)}}{{\max \left( X \right) - { {\min}}\left( X \right)}} $$
(1)

The scaled values are further converted into angles \(\left( \theta \right)\), and the abscissa is encoded as radius (r). This transformation maps the scaled sequences, as per Eq. (2), onto polar coordinates, representing the sequence within the unit circle.

$$ \left\{ {\begin{array}{*{20}c} {\theta_{i} = \arccos \left( {\tilde{x}_{i} } \right),\tilde{x}_{i} \in \tilde{X} } \\ {r_{i} = \frac{i}{128}} \\ \end{array} } \right. $$
(2)

where \(\theta\) represents the encoded angle cosine, r is the radius, and \(\tilde{x}_{i}\) corresponds to the scaled abscissa after transformation.

This method of transforming time series data into polar coordinates provides a novel approach for reinterpreting one-dimensional signals. This encoding method possesses two crucial properties: (1) The cosine function is monotonic in the range [0, π]. Therefore, for a given sequence signal, encoding it results in a unique mapping within the polar coordinate system, ensuring a one-to-one correspondence and a unique inverse mapping. (2) Compared to the Cartesian coordinate system, the polar coordinate system better preserves the signal's correlation on the abscissa. The polar coordinate curves obtained after transformation using Eqs. (1) and (2) for the sample curves illustrated in Fig. 1 are depicted in Fig. 2.

Fig. 2
figure 2

Representation of interpolated mercury injection, withdrawal, and pore throat distribution data in polar coordinates

Gramian Angular Summation Field (GASF) is defined through inner product operations and represents the GAF. In this study, GASF are employed to transform one-dimensional sequences into two-dimensional representations. GASF involves converting one-dimensional sequences into polar coordinates and extracting correlations between points at different intervals by computing the angles and sums between each point. The definition of GASF is as follows:

$$ { {\text{GASF}}} = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {{ {\cos}}\left( {{ {\upvarphi }}_{1} + { {\upvarphi }}_{1} } \right)} \\ {{ {\cos}}\left( {{ {\upvarphi }}_{2} + { {\upvarphi }}_{1} } \right)} \\ \end{array} } & {\begin{array}{*{20}c} {{ {\cos}}\left( {{ {\upvarphi }}_{1} + { {\upvarphi }}_{2} } \right)} \\ {{ {\cos}}\left( {{ {\upvarphi }}_{2} + { {\upvarphi }}_{2} } \right)} \\ \end{array} } & {\begin{array}{*{20}c} {\begin{array}{*{20}c} {...} & {{ {\cos}}\left( {{ {\upvarphi }}_{1} + { {\upvarphi }}_{{ {{\rm n}}}} } \right)} \\ \end{array} } \\ {\begin{array}{*{20}c} {...} & {{ {\cos}}\left( {{ {\upvarphi }}_{2} + { {\upvarphi }}_{{ {{\rm n}}}} } \right)} \\ \end{array} } \\ \end{array} } \\ {...} & {...} & {\begin{array}{*{20}c} {...} & {...} \\ \end{array} } \\ {{ {\cos}}\left( {{ {\upvarphi }}_{{ {{\rm n}}}} + { {\upvarphi }}_{1} } \right)} & {{ {\cos}}\left( {{ {\upvarphi }}_{{ {{\rm n}}}} + { {\upvarphi }}_{2} } \right)} & {\begin{array}{*{20}c} {...} & {{ {\cos}}\left( {{ {\upvarphi }}_{{ {{\rm n}}}} + { {\upvarphi }}_{{ {{\rm n}}}} } \right)} \\ \end{array} } \\ \end{array} } \right] $$
(3)

Figure -3. Shows the two-dimensional field images of the imbibition, drainage, and pore throat distribution curves in polar coordinates using the GASF method (Fig. 3), as presented in Fig. 2. In Fig. 4. A comparison is made between various mercury injection curves from different rock samples and their corresponding GASF transformed images. It is evident that flatter original mercury injection curves correspond to redder areas in the GASF images. The red region in the upper left corner of the GASF image corresponds to the number of large pores in the sample, while the red region in the lower right corner corresponds to the proportion of small pore throats in the sample that cannot be injected with fluid. Slowly rising mercury injection curves manifest as gradient colors in the GASF images, with the size of the gradient region depending on the steepness of the injection curve. Comparative analysis demonstrates that GASF images can reflect the physical meaning represented by the curves themselves, validating the complete mapping of the original signal achieved through GASF encoding.

Fig. 3
figure 3

Two-dimensional images obtained after GASF transformation of mercury injection, withdrawal, and pore throat data

Fig. 4
figure 4

Comparison of mercury injection curves and corresponding GASF transformed images for different rock samples

After obtaining the GASF images for mercury injection curves, withdrawal curves, and pore throat distribution curves, the corresponding matrix data are utilized as the R (Red), G (Green), and B (Blue) channels of an RGB image. This approach results in a three-channel image representation of the MICP experimental samples, as illustrated in Fig. 5.

Fig. 5
figure 5

Three-channel RGB images of rock samples from MICP experiments

Establishment of MICP—RP data samples

Fitting RP curves

Due to the frequent influence of sample contamination and testing errors, oil fields often rely on representative sample curves or use permeability models to fit experimental curves before applying them. In this study, the approach of predicting RP curves from capillary pressure curves does not directly predict the experimental points but forecasts the permeability regression model of experimental data. The Brooks–Corey model (1966) describes the constitutive relationship between RP and capillary pressure curves and is currently the most widely used RP model. The Brooks–Corey equations for oil and water RP models are expressed as follows:

$$ {\text{k}}_{{{\text{ro}},{\text{i}}}} = \left( {\frac{{{\text{S}}_{{{\text{o}},{\text{i}}}} - {\text{S}}_{{{\text{or}},{\text{i}}}} }}{{1 - {\text{S}}_{{{\text{or}},{\text{i}}}} - {\text{S}}_{{{\text{wc}},{\text{i}}}} }}} \right)^{{{\text{n}}_{{{\text{o}},{\text{i}}}} }} $$
(4)
$$ {\text{k}}_{{{\text{rw}},{\text{i}}}} = {\text{k}}_{{{\text{rw}},{\text{i}}}}^{{\max }} \left( {\frac{{{\text{S}}_{{{\text{w}},{\text{i}}}} - {\text{S}}_{{{\text{wc}},{\text{i}}}} }}{{1 - {\text{S}}_{{{\text{or}},{\text{i}}}} - {\text{S}}_{{{\text{wc}},{\text{i}}}} }}} \right)^{{{\text{n}}_{{{\text{w}},{\text{i}}}} }} $$
(5)

where: \({\text{k}}_{{{\text{ro}},{\text{i}}}}\) represents the oil phase RP for the i-th sample; \({\text{S}}_{{{\text{w}},{\text{i}}}}\) represents the water saturation for the i-th sample; \({\text{S}}_{{{\text{o}},{\text{i}}}}\) represents the oil saturation for the i-th sample; \({\text{S}}_{{{\text{or}},{\text{i}}}}\) represents the residual oil saturation for the i-th sample; \({\text{S}}_{{{\text{wc}},{\text{i}}}}\) represents the irreducible water saturation for the i-th sample; \({\text{n}}_{{{\text{o}},{\text{i}}}}\) represents the oil phase exponent for the i-th sample; \({\text{n}}_{{{\text{w}},{\text{i}}}}\) represents the water phase exponent for the i-th sample; \({\text{k}}_{{{\text{rw}},{\text{i}}}}^{{{\text{max}}}}\) represents the maximum water phase RP for the i-th sample.

In this study, the Brooks–Corey model is fitted to the experimental RP curves using the least squares method. The fitting parameters \({\vec{\text{p}}}_{{\text{i}}} = \left[ {{\text{k}}_{{{\text{rw}},{\text{i}}}}^{{{\text{max}}}} ,{\text{n}}_{{{\text{o}},{\text{i}}}} ,{\text{n}}_{{{\text{w}},{\text{i}}}} ,{\text{S}}_{{{\text{or}},{\text{i}}}} ,{\text{S}}_{{{\text{wc}},{\text{i}}}} } \right]\) are optimized using the least squares approach. The figure displays the fitted curves (solid lines) for selected samples along with the original experimental data points (markers). It is evident from Fig. 6. that the Brooks–Corey model provides good fits to the experimental RP data for various reservoir rock samples. The model effectively captures the characteristics of RP curves observed in different samples. However, it is important to note that for some samples with convex-shaped data points, the fitting errors are relatively larger. These samples might not be representative of the reservoir's behavior, leading to larger discrepancies in the fitting results.

Fig. 6
figure 6

Comparison of partial sample experimental points and fitted curves using the corey model

Rock core data matching and selection

Obtaining both relative permeability and Mercury Injection Capillary Pressure (MICP) data for cores involves conducting laboratory experiments on core samples. In MICP testing, mercury is injected into the pore space of the rock at various pressures, and the amount of mercury intrusion is measured. Meanwhile, the RP curves are acquired by conducting fluid flow experiments on saturated core samples. This study obtained a total of 658 sets of rock core MICP experimental data and 90 sets of rock core RP experimental data from X Oilfield. Based on the well names and depths of the two sets of experimental rock cores (with an error of less than 3 m in core retrieval depth), 59 sets of matched samples for both MICP experiments and RP experiments were acquired. Details of the samples are summarized in Table 1.

Table 1 Matched rock core samples for mercury injection and RP experiments

Figure 7. compares the porosity and permeability properties of the 59 samples filtered based on well number and depth. Among these samples, 47 sets exhibited relative errors of less than 20% in both porosity and permeability parameters, while 12 sets had errors ranging from 20 to 58%. Subsequently, the deep learning model was trained and tested using the 47 samples with errors below 20% (comprising 37 training samples and 10 testing samples). In the Results and Discussion section, the predictive performance of the testing samples and the 12 samples with errors exceeding 20% will be analyzed separately.

Fig. 7
figure 7

Comparison of porosity and permeability parameters for mercury injection and RP experiments

Construction of evolutionary sequence samples

Implementing the prediction of Brooks–Corey model parameters \(\overrightarrow{p}=[{k}_{rw}^{max},{n}_{o},{n}_{w},{S}_{or},{S}_{wc}]\) for RP curves from MICP experimental GASF images using deep neural networks faces challenges, especially in scenarios with limited data, where conventional deep learning models like CNNs struggle to achieve sufficient training due to the scarcity of data. To address this issue, sample data augmentation techniques are employed, which involve expanding the existing dataset to generate more equivalent data. Data augmentation proves effective in compensating for the inadequacy of existing training data, preventing model overfitting, and enhancing the model's generalization capabilities (Ren et al. 2018).

Sample data augmentation encompasses similar class data augmentation and mixed-class data augmentation. Similar class data augmentation involves applying transformations such as rotation and scaling to individual samples, creating new data while keeping the corresponding labels unchanged. Mixed-class sample augmentation involves randomly combining samples from different categories to create new data samples, thereby expanding the dataset. Classic algorithms for mixed-class augmentation include Mixup and random image cropping and patching (RICAP) (Takahashi et al. 2018). In this study, we develop a methodology to construct training samples for neural networks using evolutionary sequence analysis. We first establish a sequence sample representation that is derived from the difference matrix between successive MICP experimental GASF images. This framework captures not only the differences but also the evolutionary relationships among the image samples. Building upon this, our proposed methodology employs a self-supervised learning strategy to generate an extensive volume of training data from a relatively small sample set. By capitalizing on the inherent relationships within the data, our approach significantly maximizes the utility of the existing dataset. This optimized data leverage substantially enhances the effectiveness of the learning process, particularly in scenarios where data availability is limited. The innovative application of these sequential techniques not only deepens the analytical capabilities of our model but also bolsters its robustness by enabling comprehensive feature extraction and advanced learning from the enriched dataset. The process involves the following steps:

  1. 1.

    Randomly sample 8 images without replacement from the pool of 37 training samples. Arrange these 8 images in the order of sampling, denoted as \([{I}_{{n}_{1}},{I}_{{n}_{2}},\dots , {I}_{{n}_{8}}]\), where, \({n}_{1}\),\({n}_{2}\),…,\({n}_{8}\) takes values from 1 to 37 with no repetitions.

  2. 2.

    Compute seven difference matrices based on the sampled images. Taking \(B_{{n_{1} ,n_{2} }}\) as an example, its linear difference matrix is calculated as follows:

    $$ B_{{n_{1} ,n_{2} }} = I_{1} - I_{2} $$
    (6)

Here \(I_{1} - I_{2}\) is performed element-wise on the three-dimensional matrices.

(3): Arrange the data obtained in step (2) to form the input for the neural network. Each sample \(X_{m} = \left[ {I_{{n_{1} }} ,B_{{n1,n_{2} }} ,B_{{n2,n_{3} }} , \ldots , B_{{n7,n_{8} }} } \right]\) is represented as a tensor of size \(\left( {8 \times 128 \times 128 \times 3} \right)\) The output data corresponds to labels \(Y_{m} = \left[ {\vec{p}_{{n_{1} }} ,\vec{p}_{{n_{2} }} ,\vec{p}_{{n_{3} }} ,\vec{p}_{{n_{4} }} ,\vec{p}_{{n_{5} }} ,\vec{p}_{{n_{6} }} ,\vec{p}_{{n_{7} }} ,\vec{p}_{{n_{8} }} } \right]\), representing the Brooks–Corey model parameter vectors of length 5 for the last 4 core samples, forming a matrix of size \(\left( {8 \times 5} \right)\).

Following the aforementioned sample generation procedure, the number of training samples in this study has been expanded from 37 to \(\text{Perm}\left({8,37}\right)=\) 38,608,020 samples. The sample scale has reached the level of tens of millions, which is sufficient for training deep neural networks. The schematic diagram illustrating the sample construction is shown in Fig. 8.

Fig. 8
figure 8

Schematic representation of evolutionary sequence samples

Framework based on deep learning methods

Convolutional neural network (CNN)

CNN (Chen 2014) have become a prominent architecture in the realm of deep learning, especially tailored for handling grid-like data structures, notably images (Maji et al. 2016). At the heart of their success lies their unique architectural composition, which typically comprises convolutional layers, pooling layers, and fully connected layers. These are strategically interspersed with non-linear activation functions. The convolutional layers in a CNN are responsible for hierarchically sifting through the input data to extract spatial features. These layers utilize a series of trainable filters which, at the outset, detect rudimentary features like edges and textures. As we delve deeper into the network, these filters discern more intricate and higher-order features.

Conversely, pooling layers are a fundamental component of CNN architectures, serving to spatially down-sample the feature maps (Albawi et al. 2017). By reducing the spatial dimensions, pooling layers not only ensure computational efficiency but also introduce a form of translational invariance, which helps the network generalize better to unseen data. There are various types of pooling operations, with max pooling and average pooling being the most prevalent. Max pooling selects the maximum value from a defined window size and strides across the feature map, while average pooling computes the average value. These operations, despite their simplicity, have proven to be highly effective in capturing the essential information while discarding redundant spatial data.

To delve into the mathematical intricacies, consider an input image, which might extend beyond just RGB channels, having dimensions \(\left( {C_{in} ,H_{in} ,W_{in} } \right)\), After being processed by a CNN layer, the output dimensions are represented as \(\left( {C_{out} ,H_{out} ,W_{out} } \right)\). This transformation can be mathematically represented as follows:

$$ { {\text{out}}}\left( {C_{out,j} } \right) = { {\text{bias}}}\left( {C_{out,j} } \right) + \mathop \sum \limits_{k = 1}^{{C_{in} }} { {\text{weight}}}\left( {C_{out,j} ,k} \right){ \star }{ {\text{input}}}\left( k \right) $$
(7)

where \(j\) denotes the index of the output channel or, equivalently, the index of the convolution kernel. It ranges from \(\left\{ {1,2, \ldots ,C_{out} } \right\}\); \({ {out}}\left( {C_{out,j} } \right)\) signifies the output associated with kernel \(C_{out,j}\); \({ {bias}}\left( {C_{out,j} } \right)\) corresponds to the bias term for the kernel \(C_{out,j}\); \({ {weight}}\left( {C_{out,j} ,k} \right)\) represents the weight of kernel \(C_{out,j}\) for the input channel \(k\); \({ {input}}\left( k \right)\) pertains to the input at channel\(k\); ⋆ symbolizes the 2D cross-correlation operation.

The inclusion of activation layers, combined with pooling layers and the inherent modular design of CNNs, ensures that abstract and representative features of images, such as GASF images, are adeptly extracted. This further underscores the network's proficiency in hierarchical feature abstraction and learning.

ConvLSTM

ConvLSTM (Hu et al. 2020) extends the traditional LSTM model (Sundermeyer et al. 2012) to better handle sequence data with inherent spatial structures. By replacing the matrix multiplication operations present in standard LSTMs with convolutions, the ConvLSTM can capture spatial patterns in the data.

Given an input sequence \(X = \left\{ {X_{1} ,X_{2} , \ldots ,X_{T} } \right\}\), where each \(X_{t}\) is a two dimension spatial grid, the ConvLSTM evolves its hidden state sequences \(H = \left\{ {H_{1} ,H_{2} , \ldots ,H_{T} } \right\}\) and cell state sequences \(C = \left\{ {C_{1} ,C_{2} , \ldots ,C_{T} } \right\} \) over time.

For each time step \(t\), the update mechanisms are:

Input gate:

$$ i_{t} = \sigma \left( {W_{xi} *X_{t} + W_{hi} *H_{t - 1} + b_{i} } \right) $$
(8)

Forget gate:

$$ f_{t} = \sigma \left( {W_{xf} *X_{t} + W_{hf} *H_{t - 1} + b_{f} } \right) $$
(9)

Cell update:

$$ \tilde{C}_{t} = { {\text{tanh}}}\left( {W_{xc} *X_{t} + W_{hc} *H_{t - 1} + b_{c} } \right) $$
(10)
$$ C_{t} = f_{t} \circ C_{t - 1} + i_{t} \circ \tilde{C}_{t} $$
(11)

Output gate:

$$ o_{t} = \sigma \left( {W_{xo} *X_{t} + W_{ho} *H_{t - 1} + b_{o} } \right) $$
(12)

Hidden state:

$$ H_{t} = o_{t} \circ { {\text{tanh}}}(C_{t - 1} ) $$
(13)

where \(*\) denotes the convolutional operation; \(\circ\) represents the Hadamard (element-wise) product; \(\sigma\) is the sigmoid activation function;\({ {\text{tanh}}}\) represents the hyperbolic tangent activation function; The subscripts \(xi\), \(xf\), \(hf\) and \(xc\), indicate the parameters for the input gate, forget gate, output gate, and cell state, respectively; \({W}_{.}\) and \({b}_{.}\) are the convolutional weights and biases for the respective gates.

In ConvLSTM, instead of using matrix multiplication as in traditional LSTM, convolutional operations are employed. This enables the model to capture spatial dependencies within the input data while also preserving the LSTM's capability to handle temporal sequences. The convolutional operations within the gates ensure that the model learns spatial hierarchies and captures features at different spatial scales, making it especially suitable for space-temporal tasks.

Workflow and neutral network structure

As mentioned in 2.2.3, GASF images and differential images are incorporated into an evolution sequence, rendering it suitable for processing as spatial–temporal data. A framework was developed to construct a deep learning model that maps the GASF images evolution sequence to its associated Corey parameters. This methodical approach crafts a deep learning model skilled at converting an evolutionary sequence into spatial–temporal patterns, distilling crucial features, and culminating in regression to determine specific parameters. As visualized in Fig. 9, the workflow is segmented into four main modules, each serving a distinct function:

  1. 1.

    Data preprocessing: The workflow begins with data preprocessing, where the GASF images and the differences between them are methodically prepared. The initial GASF image, coupled with seven subsequent differential GASF images, are arrayed in a sequence that is subsequently interpreted in the context of space–time series data.

  2. 2.

    Spatial–Temporal feature learning: Here, the evolution from the preliminary GASF image to its terminal state is captured. With the previously processed sequence feeds in, the spatial–temporal feature learning phase imitates and captures the evolution process explicitly. The sequence is processed through a ConvLSTM structure, integrating convolutional layers with Long Short-Term Memory (LSTM) units (marked as s1, s2, … s8). This combination effectively discerns the spatial–temporal nuances intrinsic to the evolutionary data. The emergent outputs from the ConvLSTM units are denoted as \({H}_{1}, {H}_{2},\dots ,{H}_{8}\), indicate a series of consecutively refined features throughout the sequence. \({H}_{8}\) serves as the module's final output.

  3. 3.

    Abstract feature learning: The transition to this phase signifies a shift into abstract feature comprehension. The features gleaned from the preceding module undergo further refinement through a Conv2d module. This module, comprising multiple CNN layers, activation layers, and pooling layers, leverages two-dimensional convolutional neural networks. As a result, a gamut of feature maps and transformations emerges, enabling the capture of advanced representations from the evolutionary sequence.

  4. 4.

    Parameter regression: Concluding the workflow is the parameter regression phase. The refined features from the prior stages are channeled through a Fully Connected (FC) layer. The model's objective is to predict a series of Corey parameters, explicitly \({{\overrightarrow{p}}_{{n}_{1}},{\overrightarrow{p}}_{{n}_{2}},{\overrightarrow{p}}_{{n}_{3}},{\overrightarrow{p}}_{{n}_{4}},\overrightarrow{p}}_{{n}_{5}},{\overrightarrow{p}}_{{n}_{6}},{\overrightarrow{p}}_{{n}_{7}},{\overrightarrow{p}}_{{n}_{8}}\), which correspond to each of the GASF images in the sequence. While the primary intent is to procure the Corey parameters of the concluding GASF image, it's imperative for the network to preserve and consider features spanning the entirety of the GASF images sequence. This strategic approach ensures that the model is informed by the complete evolutionary progression, optimizing its predictions for the final image while maintaining awareness of the entire sequence.

Fig. 9
figure 9

The schematic of the workflow: the arrow indicates the direction of data flow

The illustrated architecture in Fig. 10. provides a comprehensive layout of the neural network's structure. The selection of hyperparameters for the deep learning model was guided by the inherent dimensional characteristics of the input imagery and the intricacy of the problem domain. This informed the architectural design to maintain the integrity of the output tensor's dimensions, ensuring consistent representational capacity across the model's successive layers. Such a meticulous approach was essential to balance computational efficiency with predictive performance. The input layer accepts a three-dimensional input of size 16 × 128 × 128. Beginning with the input layer, the data undergoes a series of transformations through ConvLSTM, Conv2d, ReLU, Max-Pooling, and Fully Connected (FC) layers. Each layer's configuration parameters, as shown above the respective block, dictate its processing behavior. The network concludes with two FC layers, reducing down to an output size of 8 × 5, which likely corresponds to the model's final regression output. All the configuration parameter are delibrately designed for given inputs to process them into output of expected shape. Notably: The dashed tuples accompanying ConvLSTM or Conv2d detail the convolution kernel parameters, sequentially denoting kernel size, stride, padding, and output channels. For Max-pool, they specify width, height, and stride, while for FC, they depict input and output dimensions.

Fig. 10
figure 10

Schematic representation of the neural network architecture. The top labels indicate the configurations and types of each layer, while the bottom labels specify the shape of the output tensor from each layer

Results and discussion

Model train and test

Subsequent to the elaboration of the neural network architecture in the prior section, we embarked on the pivotal phase of model training, followed by rigorous evaluation. The dataset, comprising MICP data and corresponding Corey parameters derived from RP experimental results, was bifurcated into two subsets: a training subset, encompassing 37 sets of data, and a validation subset, comprising the remaining 10 sets. This division was executed to ensure the model's validation on data that remained unexposed during the training period.

The training protocol proceeded as follows:

  1. (1)

    Data Preparation: The MICP data was procured and subjected to a preprocessing routine, as delineated in Sect. "Construction of evolutionary sequence samples", which involved translating the data into GASF image representations suitable for network input.

  2. (2)

    Partitioning: A stratified division method was employed to segregate the dataset into the aforementioned training and validation subsets, aligning with the distribution of Corey parameters.

  3. (3)

    Network Initialization: Neural network parameters were initialized using proven strategies to promote efficient convergence during training.

  4. (4)

    Mini-Batch Construction: In a stochastic manner, mini-batches, each containing eight samples, were curated from the training subset. Each mini-batch sample was then transformed into a GASF image and assembled into an evolution sequence, complete with labels, ensuring diversity in the training regimen.

  5. (5)

    Iterative Learning: Each evolution sequence mini-batch was propagated through the network, generating predictions of Corey parameters; The Mean Squared Error (MSE) (Prasad et al., 1990) was calculated for each prediction to quantify the deviation from the actual Corey parameters; Network parameters were refined through backpropagation using the loss gradients obtained from the MSE calculations; This iterative process continued until the model's performance on the training data reached a state of convergence, as evidenced by a plateau in the loss trajectory.

  6. (6)

    Hyperparameter Tuning: Throughout the initial training epochs, hyperparameters such as learning rate, batch size, and the total number of epochs underwent meticulous tuning to optimize the learning progression and performance of the model.

  7. (7)

    Validation and Generalization Assessment: Upon the completion of training epochs, the model was subjected to evaluation against the untouched validation subset to assess its generalization ability and ensure that overfitting was mitigated.

Through this structured training and validation approach, the model was fortified to predict with high fidelity, and its performance parameters were honed to yield optimal results. The systematic process and strategic hyperparameter adjustments were pivotal in realizing a model that stands robust against the nuances of the experimental data. In the practical training phase, a mini-batch approach was employed for each inference. The term "epoch" is typically used to signify a complete training cycle on the training dataset. For this model, a batch size of 16 was selected. Notably, the training methodology implemented here incorporated a random sampling step, making the concept of epochs almost redundant, given that the potential training sample combinations are nearly limitless. The box-plot in Fig. 11. offers a visualization of the distribution of the logarithmic (base 2) MSE values across approximately 1.3 million mini-batches. From the plot, we observe a general trend of declining MSE values.

Fig. 11
figure 11

Logarithmic MSE values across 1.3 million mini-batches during neural network training

In this study, the method employed for predicting the RP curves of test samples involved creating 500 evolutionary sequence prediction samples by combining the MICP images of the test samples with 500 permutations of the MICP images of 7 randomly selected training samples. Figure 12. Illustrates the parameter prediction distribution for sample No.38 in the 500 prediction samples generated by the trained model. The red dashed line represents the parameters obtained from the direct fitting of the experimental data for this sample. It can be observed that the median values of the real fitted parameters and the predicted fitted parameters exhibit a good level of consistency.

Fig. 12
figure 12

Distribution of fitted parameters and predicted parameters for sample 38 in the RP test data

Figure 13. Show that the trained ConvLSTM model predicts the RP curves for samples No.38 to No.47 using MICP experimental data, demonstrating high prediction accuracy on the 10 test samples. The average error in predicting the water-phase RP for these samples is 4.7%, and the average error in predicting the oil-phase RP is 2.5%.

Fig. 13
figure 13

Comparison of predicted experimental RP for 10 test samples

As for samples from No.48 to No.59, although from the same geological layer, exhibited relative errors in porosity and permeability exceeding 20%, indicating a relatively poor predictability of the MICP data on the RP results. The same prediction method was applied to samples from No.48 to No.59, and the results are shown in Fig. 14. The relative errors for this subset of samples are slightly larger compared to samples from No.38 to No.47, with average prediction errors of 8.2% for water-phase RP and 7.7% for oil-phase RP. The average prediction accuracy higher than 90%, demonstrating the good generalization performance of the model.

Fig. 14
figure 14

Comparison of predicted and experimental RP for samples No.48 to No.59

Comparison of the prediction accuracy

To rigorously compare the predictive capabilities of various methods, we developed two distinct models: an end-to-end LSTM model that maps the 3 curves of MICP to the Corey parameters, and a CNN model that correlates the GASF images with the Corey parameters. Notably, neither of these models utilizes data augmentation. As a consequence, the number of training samples available for both models is restricted to 37. This limited dataset size hints at a potential vulnerability to overfitting, a challenge often encountered in machine learning scenarios with insufficient training data.

Regarding the validation samples in this comparison experiment, we also adopted samples from No.38 to No.47 totaling 10 samples to avoid the impact of different samples on model performance. Utilizing the Corey parameters projected by each model, we were able to derive the predicted relative permeability curves and juxtapose them against the ground truth. A granular analysis was then performed by equal-interval sampling of 20 points from both the predicted curves and the actual ground truth. This procedure facilitated the calculation of relative error across the 10 test samples, the results of which are illustrated in Fig. 15. For the So-Error category, our model (referred to as "Ours") exhibits a narrower spread in error distribution with a median closely huddled around a lower error range. This suggests a more consistent and accurate performance in comparison to both the CNN and LSTM models. While considering the So-Error for CNN and LSTM, there's a palpable broader error distribution. Specifically, the LSTM showcases a slightly elevated median error when juxtaposed against both the CNN and our custom model. Transitioning to the Sw-Error metric, our model continues its trend of consistency with a closely packed distribution. Its median error is marginally lower than that of LSTM, albeit comparable or slightly exceeding that of the CNN model. In terms of Sw-Error, both CNN and LSTM models seem to be characterized by a higher number of outliers in relation to our model. These outliers hint at certain scenarios where these traditional models might significantly deviate from the established ground truth. Essentially, although LSTM and CNN are both robust modeling methods, limited training sample sizes may hinder their performance, especially when outliers are observed. Moreover, our established suitable self-supervised learning framework greatly improved the accuracy of the proposed RPCDL method by increasing training samples. In this way, our custom model demonstrates promising consistency and accuracy, emphasizing its suitability for this specific application.

Fig. 15
figure 15

Comparative analysis of relative errors for LSTM, CNN, and the proposed model

Conclusions

This study proposes a novel artificial intelligence framework (RPCDL) that predicts the relative permeability (RP) curves of oil and water directly from mercury injection capillary pressure (MICP) data, including the drainage and imbibition curves as well as pore-throat size distribution. For the first time, the gramian angular summation field (GASF) method is employed to transform the MICP data into polar coordinates, effectively capturing the non-stationary transformation characteristics and differences between multiple modes. By permutating and combining a small sample of MICP and RP data, the study explores the feasibility of self-supervised learning through evolutionary sequences in small-sample machine learning predictions. The integration of GASF-derived 3-channel red, green, blue (RGB) images, evolutionary sequence samples, and the convolutional long short-term memory (ConvLSTM) deep learning framework enables accurate prediction of rock oil–water RP curves. This established deep learning framework provides an intelligent and efficient way for estimating RP curves in various reservoir engineering scenarios without costly experiments. Summing up the application of the proposed method, there are three key points are listed below:

  1. (1)

    Using the proposed RPCDL method, after training on 37 sample pairs, testing on 10 sample sets obtained an average prediction error of 4.7% for the water phase RP and 2.5% for the oil phase validating the good generalization performance of the model with small samples.

  2. (2)

    The validation results of the distinct rock sample set with 12 samples showed the average prediction error with a slight rise to 8.2% for the water phase and 7.7% for the oil phase which also satisfied the requirement of engineering errors.

  3. (3)

    The comparison experiments between our proposed RPCDL method with conventional convolutional neural networks (CNN) and long short-term memory (LSTM) models quantitatively demonstrated its superior performance in the prediction of both So and Sw lines in oil–water RP curves.

These findings not only qualitatively and quantitatively underline the stellar performance of the RPCDL method in RP curve prediction but also hint at its potential future applications. The RPCDL approach could be applied to predicting gas–water RP curves that efficiently assist the efficient exploitation of both conventional and unconventional gas reservoirs. In addition, the temporal dynamics of the RP curve presents an intriguing avenue for future research. Furthermore, leveraging the RPCDL framework to predict other crucial reservoir development parameters, such as well testing and logging curves, is a promising direction for forthcoming endeavors.