Introduction

Rock types are defined as units of rocks that present unique reservoir properties and this uniqueness may be caused by similar geological conditions and/or digenetic processes (Gunter et al. 1997). Rock typing is a process that aims to categorize a given heterogeneous reservoir into different rock types and each rock type presents a relatively homogeneous pore structure or a specific lithology phase (Wang et al. 2021). Reservoir rock typing is considered one of the most significant steps in reservoir characterization and modeling study because it is an effective approach for unraveling the reservoir heterogeneity and further for the interpretation of fluid units within the reservoir (Kadkhodaie-Ilkhchi et al. 2014). Over the past six decades, rock typing has been studied at different scales ranging from pore-scale to basin-scale based on various measurements such as image textures (Wang et al. 2021), laboratory measurements (Abedini et al. 2011; Al-Dujaili et al. 2021; Colombo et al. 2018; Dakhelpour-Ghoveifel et al. 2019; Krivoshchekov et al. 2023), geophysical properties (e.g., well logs Aranibar et al. 2013; Ghadami et al. 2015; Manzoor et al. 2023; Safaei-Farouji et al. 2022; Shahat et al. 2023; Wang et al. 2015), seismic data (Chevitarese et al. 2018), ASTER TIR (thermal infrared radiometer) (Watanabe and Matsuo 2003), and hyperspectral data (Owada et al. 2023) using various category schemes (see Fig. 1). Before the universal application of imaging devices such as X-ray micro-computed tomography (u-CT), scanning electron microscopy (SEM), thin sections and focused ion beam scanning electron microscopy (FIB-SEM) in petroleum industry, most of the rock typing methods are implemented based on some petrophysical properties (e.g., porosity and permeability) and/or their derivative formation factors (e. g., formation zone index and Leverett dimensionless J-function) (Maldar et al. 2022). However, it is impractical to quantitatively estimate the porosity and permeability contribution and the geological genesis of each rock type based on the results of the conventional rock typing methods. For example, the conventional rock typing methods just classify the target reservoir into different rock types but fail to answer what geological genesis (e.g., erosion process and clay filling) make a given rock type present different reservoir properties from others. Further, it is hard to calculate the porosity and permeability contribution of a given geological process quantitatively.

Fig. 1
figure 1

Classification of the main rock typing methods (adapted from Fig. 4 in Rebelle and Lalanne 2014)

Currently, the most popularly used imaging devices are X-ray micro-computed tomography (μ-CT) (Alhammadi et al. 2020; Ding et al. 2019; Wildenschild and Sheppard 2013), scanning electron microscopy (SEM) (Liu et al. 2022; Mangi et al. 2022; Mangi et al. 2023; Mangi et al. 2020; Scott et al. 2019), thin sections (Jobe et al. 2018) and focused ion beam scanning electron microscopy (FIB-SEM) (Kelly et al. 2016). The μ-CT image is considered to be the most direct way to obtain the 3D inner structure of the rock sample. However, the maximum resolution of a μ-CT image is about 2 μm3/voxel, which is insufficient to describe some small structures in the porous media of the rock sample (Okabe and Blunt 2007; Wang and Rahman 2023). SEM can provide nanometer resolution image, but it is 2D. FIB-SEM is an effective way to provide high-resolution 3D images, but its field of view is too small to cover the heterogeneity of the rock samples. There is a long history of using thin sections to describe the inner structure of the rock sample, but thin sections are 2D images and their resolution is always limited. Although none of these imaging devices are perfect, they significantly enhance our ability to characterize the inner structure of the rock sample. Thanks to the universal application of various imaging devices, the image-based microscale rock typing (IMRT) can be directly conducted based on the observed pore structures which fundamentally determine the rock types (Shaik et al. 2019).

Rebelle and Lalanne classified the current rock typing methods according to two criteria: geology versus petrophysics, and large-scale versus small-scale (Rebelle and Lalanne 2014). Figure 1 illustrates the classification scheme of the main rock typing methods that adapted from the Rebelle-Lalanne’s strategy by adding the image-based microscale rock typing method. Currently, rock typing methods can be approximately divided into seven categories according to their driven factors (see Fig. 1). From large-scale to small-scale, different rock typing methods can be used to describe the heterogeneity of the reservoir at different field of view. All these rock typing methods can be also qualitatively located according to how much geological information we can obtain from the rock typing results (Zhan et al. 2022). For example, we can identify the sedimentary phases according to the result of the rock typing based on seismic phases. However, it is difficult to derive the geology genesis of a given rock type from the flow zone indicator (FZI)-based rock typing because FZI mainly reflects the petrophysical features of the reservoir.

The IMRT has its unique significance. First, IMRT is an effective way to identify the lithofacies (Su et al. 2020), carbonate pore systems (Marmo et al. 2005), and reservoir zones (A et al. 2024; Jobe et al. 2018). Second, IMRT can accurately characterize the spatial distribution of each rock type and further characterize the pore structure of a given rock type. The conventional methods such as MICP just statistically provide the pore size distribution of the measured rock sample. It is, however, still challenging to quantitatively estimate each secondary porosity system’s permeability contribution. Secondary porosity is defined as the additional porosity acquired by diagnosis and fracturing after the original rock formation process. Wang and Sun processed several SEM images of Northern Sea sandstone with secondary porosity that occurred in rock fragment, clay, and eroded feldspar, and quantitatively calculated the permeability contribution of each type of secondary porosity. Two authors concluded that the permeability contribution of different secondary porosity highly depends on the content of this micropore medium and the permeability of the macropore structure. In one demonstrated rock sample, the permeability contribution of different secondary porosity can be neglected. However, in another demonstrated sample, two authors concluded that the permeability attribution of the eroded feldspar, rock fragment, and clay are 1.38%, 0.37%, and 2.64%, respectively. (Wang and Sun 2021c). Third, IMRT is one of the core steps for upscaling. It is not practical to numerically simulate the flow transportation directly in a sizeable rock sample’s image due to heavy computational burden. Upscaling is considered to be an effective way to reduce the computing price by mapping the fine scale porous medium into a coarse scale grid system. The upscaling accuracy highly depends on the understanding of the heterogeneity of the target rock sample which can be unraveled by IMRT. Fourth, fluid flow mechanisms in different scales pore networks are dominated by different flow regimes of continuum flow, slip flow, transition flow, and Knudsen diffusion (Li et al. 2020, 2021; Wang et al. 2018d; Yuan et al. 2016, 2017). Therefore, IMRT should be the premise of the multiscale pore structure’s numerical flow simulation (Wang et al. 2017; Yuan et al. 2015; Zhang et al. 2019).

For convenience, the popularly used image-based microscale rock typing methods and their applications will be reviewed as the structure presented in Fig. 2. According to the application scenario, an IMRT task can be classified as a pattern recognition problem or a texture segmentation problem, which will be reviewed in Sect. "Pattern recognition-related rock typing" and Sect. "Texture segmentation-related rock typing", respectively. The pattern recognition-related rock typing (PRRT) is carried out to solve some problems like identifying lithofacies, reservoir zone, or Dunham textures (Liu et al. 2023). In this case, the input is a rock sample image, and the output is a label that describes the class of this sample (see Sect. "Self-defined features based PRRT"). The texture segmentation-related rock typing (TSRT) is undertaken to segment the target image into different areas where each area presents a homogeneous porous medium (see Sect. "Learning-based PRRT"). TSRT is the premise of the pore scale heterogeneity characterization. It is well-known that reservoir rock typing belongs to a classification problem. A typical workflow of solving a classification problem contains feature extraction and classification. Feature extraction step is implemented to organize a feature vector that consists of a number of structure descriptors for each specimen, and the classification step is undertaken to classify these specimens into different categories by a certain classifier. From the perspective of feature extraction, IMRT can be classified as object-based rock typing and pixel-based rock typing. In the pixel-based rock typing, each image pixel is treated as a specimen (see Sect. "Pixel-based TSRT"). While in object-based rock typing, the input image needs to be preprocessed to extract the target objects (e.g., grains or pores), the geometry descriptors of each object are then be organized as its corresponding feature vector (see Sect. "Object-based rock typing"). Pixel-based rock typing identifies each pixel’s rock type, while the object-based rock typing identifies each object’s category. Note that an image object is a combination of a group of pixels/voxels. There are two kinds of pixel-based rock typing: local feature-based rock typing and phase field-based rock typing, which will be reviewed in Sect. "Local feature-based TSRT" and Sect. "Phase field-based TSRT", respectively. Local feature-based rock typing applies conventional classifiers such as Decision tree, random forest, Gaussian mixture model, but phase field rock typing uses some phase field segmentation methods to realize the rock typing. Finally, some conclusions are given in Sect. "Conclusions".

Fig. 2
figure 2

Classification of different image-based microscale rock typing methods

In summary, conventional rock typing techniques are employed to classify reservoirs based on flow characteristics like porosity and permeability. However, they often fall short in providing insights into the geological origins of each rock type and their specific contributions to porosity and permeability. Image-based rock typing offers a solution by categorizing heterogeneous rock samples according to their pore structures. With the widespread use of imaging devices in reservoir characterization (digital rock physics), we gain access to the intricate inner structures of porous mediums. This enables us to quantitatively analyze structural heterogeneity within rock samples. Following image-based rock typing, we can delve into the geological genesis of each rock type based on pore structure features such as pore-throat size distribution, pore connectivity, and pore surface curvature. Moreover, we can estimate the individual influence of each rock type on porosity and permeability across the entire sample. Successful implementation of image-based rock typing relies on two key prerequisites: (1) the rock sample must exhibit varied pore structures, and (2) an adequate resolution of images is required to effectively observe and describe these pore structures. The quality of the image can be mitigated by adjusting the field of view (FOV) and resolution of the imaging devices. For example, some SEM images present very high resolution (say 20 nm), which is suitable to identify the mineral types but fails to characterize the pore structures. The advantages of the image-based rock typing are invaluable, but it's essential to clarify that image-based rock typing doesn't aim to replace conventional methods, because the sample size of current image-based rock typing is constrained to millimeter to core sizes due to the limitations of the imaging devices' field of view.

Pattern recognition-related rock typing

The aim of the pattern recognition-related rock typing (PRRT) is to identify the rock type of a given rock image. The input of the PRRT is a grayscale or color image, and the output is its corresponding rock type (e.g., Dunham classification and reservoir zone). The PRRT is a typical classification task which can be solved by two steps of feature extraction and classifier selection. According to whether the user needs to manually extract structure features, the PRRT can be further divided into Self-defined features based PRRT and learning-based PRRT.

Self-defined features based PRRT

Self-defined features based PRRT requires users to determine what features are preferred to use for rock typing. Currently, the features extracted for PRRT are divided into three classes: color features, image features, object features. Color features (Li et al. 2017), also known as statistical features (Singh et al. 2010), are calculated based on the distribution of the image intensity, which contain 13 variables such as Mean, Variance, Entropy, Skewness and so on. Image features are extracted from the entire image, for example the number of perimeter pixels, white areas, and Canny edge pixels. Object features, also known as region features (Singh et al. 2010), are calculated based on a given number of biggest objects (e.g., four biggest grains (Cavalin and Oliveira 2017) or white areas (Singh et al. 2010)). From each object, several features are calculated such as Area of object, Solidity, Convex deficiency, and Extent (Zhou et al. 2021). The object’s area is obtained by counting the number of pixels/voxels of the target object. Solidity is the area of an object divided by the area of its convex hull. Convex deficiency is similar with Solidity and equates to one minus Solidity. The Extent of an object is the ratio between the object area and its smallest rectangle bounding box (see Fig. 3).

Fig. 3
figure 3

Popularly applied features for image-based microscale rock typing

Followed by the feature extraction, a classifier is selected to divide these images into different categories. Popularly used classifiers include K-nearest neighbors (KNN) (Coomans and Massart 1982; Wang et al. 2018c), Gaussian Mixture Model (GMM) (Day 1969), decision tree (DT) (Breiman et al. 1984), multiclass support vector machine (MSVM) (Angulo et al. 2003), random forest (Tin Kam 1998; Zhou 2020), and artificial neural network (ANN) (Crick 1989).

Marmo et al. introduced a machine learning approach to identify the carbonate rocks’ Dunham texture (Dunham and Ham 1962; Jardine and Wilshart 1982) based on thin sections (Marmo et al. 2005). In that study, 23 image features are extracted from each thin section firstly, and then the dimensionality of these feature vectors is decreased from 23 to 15 by principal component analysis (PCA). A three-layer artificial neural network (ANN) model composed of 15 input nodes, 4 output nodes, and 8 hidden nodes is established to train the 532 training images that are selected from more than 1000 thin sections. The 4 nodes of the output layer correspond to 4 Dunham classes. The trained ANN model was tested in other 268 thin sections selected from the same dataset and obtained a 93.3% accuracy. It is interesting that even a higher accuracy of 93.5% is presented when the test images are collected from other carbonate sequences. Although their proposed method presents good generalization ability, it still has significant room for improvement because applying more features rather than just image features may bring a better result. In addition, a deeper and more complex neural network model may also improve the classification performance.

Singh et al. trained a four-layer neural networks with 27 input features, including statistical features, image features, and object features, to identify the rock texture for basaltic rock mass based on 300 thin sections collected from 140 rock samples (Singh et al. 2010). The trained model is then tested on 90 thin sections and obtained a 92.22% accuracy. Similar work was presented by Patel and Chatterjee, who classified an Indian limestone mine into nine rock types (e.g., pink limestone, dark-gray limestone, and greenish-gray limestone) using image features and a probabilistic neural network (Patel and Chatterjee 2016). A more than 98% accuracy is obtained in that study. Mlynarczuk et al. used nine different rock samples to demonstrate the feasibility of the automatic recognition based on thin sections. Four popular machine learning algorithms (nearest neighbor algorithm (NN), K-nearest neighbor (KNN), nearest mode algorithm (NM), and the method of optimal spherical neighborhoods (OSN) are applied. Besides, different color spaces: CIELab, YIQ, RGB, and HSV, are also proven to have an effect on the prediction accuracy. They concluded that the combination of the NN method and CIELab color space present the best performance with an accuracy of 99.98% (Młynarczuk et al. 2013).

Jobe et al. demonstrated an example of how to use various carbonate pore geometries to identify the reservoir zone based on thin sections (Jobe et al. 2018). In this example, ninety-one thin sections collected from the same location with several different lithofacies are firstly segmented into binary images containing two phases of void and solid. A single pore is defined as a cluster of void pixels with a four-neighborhood connection, and the single pixel pores are considered as noise that will be omitted. All pores are firstly divided into five categories according to their pore size. In each category, pores are further classified into nine categories according to three criteria, including pore size (the number of pixels), axial proportion (four bins), and rugosity proportion (four bins). Axial proportion defines the ratio of the minor axis length and the major axis length of the minimum ellipse that contains the pore. Rugosity proportion approximates the deviation of the pore perimeter from the perimeter of an ellipse. Then the frequency of the pores that fall within the given range is recorded. Finally, each image has a feature vector containing 45 descriptors. Multiple machine learning algorithms, including distance to mean (DM), K-nearest neighbors (KNN), decision tree (DT), and multiclass support vector machine (MSVM), are applied to realize the classification. They conclude that the DT and KNN methods can provide more than 80% accuracy.

Besides extracting features from the initial image, the image features can also be extracted from the filtered images processed by various filters such as Local Binary Pattern (LBP), Gabor Filter, Discrete Fourier Transform (DFT), Coordinated Cluster Representation (CCR), and discrete Cosine Transform (DCT) (Fernández et al. 2011). In addition, some derivative filters are proposed to enrich the information we can extract from the target images. Both LBP and Gabor related features are considered to be very effective descriptors for texture discrimination. Then a combination of the LBP and the Gabor filters are proposed to enhance the discriminative ability of LBP. The G-ALBPCSF is extracted by computing the LBP based on the images after applying Gabor filtering. Similarly, the D-ALBPCSF is calculated by computing the LBP based on the images after applying DCT (Vangah et al. 2019).

All self-defined features based PRRT methods share one workflow with two steps: one is feature extraction, by which different rock types can be evaluated quantitatively, the other is classification. All current popular applied methods can be treated as a certain combination of some specific features and a classifier. More applications of the self-defined features based PRRT can refer to (Chatterjee 2013; Shang and Barnes 2012; Tian et al. 2019). Current self-defined features based PRRT methods could provide an accuracy range from 80 to 100% according to the data released by different kinds of literature.

Learning-based PRRT

The selection of features for rock typing is quite a tricky process which highly depends on the material itself. Nowadays, learning-based PRRT is utilized more and more widely in rock fabric recognition by which the model will automatically select proper features for the target rock samples, which reduces the work intensity but increases the classification accuracy.

Cheng and Guo applied a CNNs model (Lecun et al. 1998; Wang et al. 2018a) to classify 4800 thin sections of the feldspar sandstone samples from the Ordos basin, China, into three types according to their granular size. In these 4800 images, each rock type has 1600 specimens. To every rock type, 1200 samples are applied to train the CNNs model, and other 400 are used for validation. On the test dataset, a 98.5% accuracy is obtained (Cheng and Guo 2017). The CNNs model is also applied to classify the rock types based on field image patches by Ran et al. A total of 2290 images that belong to six rock types of sandstone, limestone, mylonite, conglomerate, granite, and shale are collected to demonstrate their proposed classification method. The overall accuracy of the classification reaches 97.96% (Ran et al. 2019). Jobe et al. applied a CNNs model to predict the carbonate Dunham textures. A large number of images are labeled firstly and then be used to train the CNNs model for the prediction of textures in unlabeled images. The proposed method presents an 83% accuracy in the test dataset (Jobe et al. 2018). In order to recognize the rock lithology of the target survey field, Chen et al. developed a CNNs model consisting of two lightweight CNNs models, SqueezeNet and MobileNets, which is capable to identify 28 kinds of rocks. Their developed module can be installed in a smart phone and obtained an accuracy of over 96% (Fan et al. 2020). Zhou et al., proposed a novel CNN called RockNet to realize the automatic rock classification. Seven common Hong Kong rock types, namely fine-grained granite, medium-grained granite, coarse-grained granite, coarse ash tuff, fine ash tuff, feldspar phyric rhyolite, and granodiorite, are applied to validate the performance of the proposed RockNet and obtained an impressive performance in precision (90.9%), recall (90.4%), and f1-score (90.5%) (Zhou et al. 2022).

Many publications show that training based on existing model parameters referred to as fine-tuning may out-perform or perform as well as training from scratch. Polat et al. classified the thin sections of the volcanic rocks using transfer learning networks based on DenseNet121 and ResNet50 networks, respectively. In that study, 1200 thin sections were applied for training and testing and obtained an accuracy of more than 98.8% (Polat et al. 2021). Zhang et al. used 2206 grayscale and color images belong to 12 types to recognize the geological structures by Inception-v3 model. A CNN model was also established and trained using transfer learning which presents a success rate of more than 90% (ye et al. 2018). In addition, the VGG16 model (Simonyan and A Zisserman, 2014) is also used to identify the rock types of sandstone, granite, limestone, basalt, peridotite, sulfate, gneiss, and tuff using transfer learning strategy and obtained a 95.16% accuracy.

Learning-based PRRT is becoming the mainstream of the image recognition method because it automatically selects classification features. The computational price of the CNNs method is quite high, but this limitation has been extremely relieved by the transfer learning strategy. Most of the learning-based IMPRT presents a more than 95% accuracy according to the current literature.

Texture segmentation-related rock typing

Texture segmentation-related rock typing (TSRT) can be treated as an extension of the pattern recognition-related rock typing (PRRT) discussed in the last section. The PRRT deals with the problem of each image is being identified as a unique rock type, while TSRT is carried out to classify a single image into different areas and each area presents a specific rock type. Compared to PRRT, TSRT is more challenging because it not only requires identifying different rock types but also locating their spatial distribution. TSRT is a typical texture segmentation issue that is universally studied in many fields such as object-identification of aerial images and biomedical image segmentation (Cavalin and Oliveira 2017; Lashari and Ibrahim 2013; Mushrif et al. 2006; Sali and Wolfson 1992; Shaban and Dikshit 1998; Willis et al. 2017; Xu et al. 2022; Zhang et al. 2017).

Pixel-based TSRT

The conventional way to realize the texture segmentation is to successively identify the rock type of each pixel until the entire image is processed. According to classification methods, pixel-based rock typing can further be classified as local feature-based rock typing and phase field-based rock typing.

Local feature-based TSRT

In this strategy, each pixel is considered as a specimen, and a certain number of structure descriptors are calculated based on a neighborhood centered by a given pixel. We desire to find a set of descriptors that could minimize the contrast within the same rock type but maximize the contrast among different rock types.

Ismail et al. applied regional Minkowski functionals as structure descriptors and multi-variate Gaussian mixture model (MGMM) as classifier to realize the sandstone pore structure rock typing (Ismail et al. 2013). The Minkowski functionals consist of volume (m0X), surface area (m1X), mean curvature (m2X), and total curvature (m3X) which are always treated as basic integral geometric measurements to quantitatively describe the porous media (Arns et al. 2010, 2004, 2001; Wang et al. 2018b). The first functional M0 is simply the total fraction of the target phase which is given by:

$${m}_{0X}={M}_{0}\left(X\right)/{V}_{t}=V\left(X\right)/{V}_{t}$$
(1)

where X ⊂ Ω (Ω is the embedding space) is the space occupied by the target phase. The other Minkowski functionals are defined by integrating over the surface of the pores (denoted as δX) which unambiguously defines its morphology features of the pore structure at the given resolution. The first integral measures the area of the surface of the target phase which can be described by

$${m}_{1X}={M}_{1}\left(X\right)/{V}_{t}=\frac{1}{6}{\int }_{\delta X}{\text{d}}s/{V}_{t}$$
(2)

where ds, denotes a surface element. The second integral measures the mean curvature of the interface

$${m}_{2X}={M}_{2}\left(X\right)/{V}_{t}=\frac{1}{3\pi }{\int }_{\delta X}\left(\frac{1}{{r}_{1}}+\frac{1}{{r}_{2}}\right){\text{d}}s/{V}_{t}$$
(3)

where r1 and r2 are the minimum and maximum radius of curvature for the surface element ds. This radius is positive for convex curvatures and negative for concave curvatures. The third integral measures the total curvature

$${m}_{3X}={M}_{3}\left(X\right)/{V}_{t}={\int }_{\delta X}\left(\frac{1}{{r}_{1}{r}_{2}}\right){\text{d}}s/{V}_{t}$$
(4)

which is related to the connectivity of the considered phase. For well-connected phases and few isolated components this measure is typically negative, and crosses zero to become positive close to the percolation threshold of the material.

Then a MGMM is used to realize the rock typing based on extracted Minkowski functionals. A MGMM is a weighted sum of M component Gaussian distribution described by:

$$p\left({\varvec{x}}|\lambda \right)=\sum_{i=1}^{M}{\omega }_{i} g\left({\varvec{x}}|{{\varvec{\mu}}}_{i},{{\varvec{\Sigma}}}_{i}\right)$$
(5)

where x is the feature vector of a given pixel; \({\omega }_{i}\) is the mixture weight of the ith (i = 1, …, M) Gaussian density component and\({\sum }_{i=1}^{M}{\omega }_{i}=1\); \({{\varvec{\mu}}}_{i}\) and \({{\varvec{\Sigma}}}_{i}\) are the expectation and the covariance matrix of the ith Gaussian density component, respectively; \(\lambda\) is a parameter combination consisting of\({\omega }_{i}\), \({{\varvec{\mu}}}_{i}\) and \({{\varvec{\Sigma}}}_{i}\); the M denotes the total number of Gaussian densities. Each Gaussian function component can be given by:

$$g\left({\varvec{x}}|{{\varvec{\mu}}}_{i},{{\varvec{\Sigma}}}_{i}\right)=\frac{1}{{(2\pi )}^{D/2}{\left|{{\varvec{\Sigma}}}_{i}\right|}^{1/2}}{\text{exp}}\left\{-\frac{1}{2}{({\varvec{x}}-{{\varvec{\mu}}}_{i})}^{T}{{{\varvec{\Sigma}}}_{i}}^{-1}({\varvec{x}}-{{\varvec{\mu}}}_{i})\right\}$$
(6)

To relieve the computational price, only 10% pixels are selected randomly to train the GMM, and then the trained model is applied to classify all other pixels. Given a training dataset which is always organized as a T × N matrix (T is the training pixel number and N is feature number), a GMM can be trained by estimating the parameter \(\lambda\) to maximize the following likelihood:

$$p\left({\varvec{x}}|\lambda \right)=\prod_{t=1}^{T}p\left({{\varvec{x}}}_{t}|\lambda \right)$$
(7)

Then the expectation–maximization (EM) algorithm can be used to obtain the parameter \(\lambda\). For more details about the EM method, please refer to (Dempster et al. 1977).

However, the Minkowski functionals are sensitive to spatial support. We prefer to use larger spatial support because the insufficient size of the support will lead to the contrast increase in one rock type. However, large regional support window will result in computationally expensive. To calculate the regional measures over a relatively larger window, Jiang and Arns introduced a statistical strategy to accelerate the calculation of the Minkowski functionals based on large window size via Fast Fourier Transforms (FFTs) (Jiang and Arns 2020).

Wang and Sun studied the permeability contribution of different micropore structures using a heterogeneous North Sea oil reservoir sandstone as demonstration (Wang and Sun 2021c). The pore structure presented in the SEM images is characterized as a macropore system (consists of grains and macropores) and micropore system. In that study, three categories of image features, including noise filters (e.g., Gaussian blur (Misra and Wu 2020)), edge detector (e.g., Laplacian of Gaussian (Sotak and Boyer 1989), Difference of Gaussian (Young 1987), and Gaussian Gradient Magnitude (Acton 2009)), and texture detectors (e.g., Hessian of Gaussian Eigenvalues Arganda-Carreras et al. 2017; Sommer et al. 2011) and Structure Tensor Eigenvalues (Sertcelik and Kafadar 2012)) are extracted from the target images. Then, the random forest method is used as a classifier to complete the rock typing (see Fig. 4).

Fig. 4
figure 4

An example of the local feature-based TSRT where G denotes grain, EF denotes eroded feldspar, C denotes clay, RF denotes rock fragment, Pma denotes macropore, and Pmi denotes micropore. a is an SEM image of a sandstone, b presents the rock typing result of (a), c to e present the local details of the EF, RF and C extracted from (a)

The random forest is a combination of a large number of decision trees. Once a new sample is imported, all trees within the forest will classify the sample independently. The classification outputs of these trees will be used to vote the final belongings of the target sample. To introduce the random forest algorithm, it is necessary to briefly introduce its fundamental element, the decision tree.

The decision tree is characterized by a tree structure of a binary tree or a multi-branch tree. Each non-leaf node presents a testing process of a given feature, the different branches of this non-leaf node denote the classification output of the corresponding feature, and the leaf-node stores a class label. The point of building a decision tree is the selection of the features for each level to make the uncertainty of the non-leaf node decrease as far as possible. Assuming that we have a training dataset D with m samples (e.g., a structural feature vector of each pixel in image-based rock typing), n features (e.g., porosity and average curvature), and K classes (e.g., eroded feldspar, rock fragment, and clay). Any sample si (\(1\le i\le m\)) can be presented as \((\mathop{x}\limits^{\rightharpoonup} , y) = (x_{1} , \, x_{2} , \, \ldots , \, x_{n} , \, y)\), where \({x}_{j}\)(\(1\le j\le n\)) is the jth feature of the sample si, and y is the corresponding class label. If the frequency of the kth (\(1\le k\le K\)) class is denoted by pk, a popular measurement of the uncertainty is information entropy, which is given by (Zhou 2020):

$${\text{Ent}}\left(D\right)=-\sum_{k=1}^{K}{p}_{k}{\text{log}}\left({p}_{k}\right)$$
(8)

Let \({D}_{f}(v)\) denotes a subset of training dataset D such for whose attribution f (f is one of a group of features which are applied to classify a pixel into a specific rock type) is equal to v, then the conditional entropy \(Ent\left(D, f\right)\) is given by (Zhou 2020):

$${\text{Ent}}\left(D, f\right)=-\sum_{v\in {\text{vals}}(f)}p({D}_{f}(v))*{\text{Ent}}\left({D}_{f}(v)\right)$$
(9)

where \(p({D}_{f}(v))\) denotes the frequency of the samples whose feature f equals to v count for the total number of samples in D. Equation (2) is suitable for the situation that f is a discrete variable. If feature f is a continuous variable, a threshold t needs to be introduced to divide the training dataset D into two subsets, \({D}_{t}^{+}\) and\({D}_{t}^{-}\), where \({D}_{t}^{+}\) is the subset of D consisting of all samples satisfying \(f>t\), and \({D}_{t}^{-}\) is\({D\backslash D}_{t}^{+}\). Then the conditional entropy is given by (Zhou 2020):

$${\text{Ent}}\left(D, f\right)=\underset{t}{{\text{min}}}{\text{Ent}}\left(D, f, t\right)=\underset{t}{{\text{min}}}(-\sum_{\lambda \in \left(-,+\right)}p({D}_{t}^{\lambda })*{\text{Ent}}\left({D}_{t}^{\lambda }\right))$$
(10)

The threshold t is obtained by a greedy strategy in which all observed values of f in training datasets are sorted in descending or increasing order. Then the middle point of every two adjacent number is tried as threshold t to calculate the conditional entropy and select the t which result in minimum \(Ent\left(D, f\right)\) (Quinlan 1993). Then information gain that using feature f to split the set D is calculated as:

$${\text{Gain}}\left(D, f\right)={\text{Ent}}(D)-{\text{Ent}}\left(D, f\right)$$
(11)

Information gain presents the decrement of the uncertainty of the dataset D when the information about feature f is known. When establishing the decision tree, we prefer to select the feature whose information gain is largest. The strategy discussed before is the so-called ID3 algorithm (Quinlan 1986). However, using information gain as a selection criterion has a disadvantage that the information gain prefers to choose the feature with high probability. Therefore, the C4.5 algorithm is proposed to deal with this drawback by using the information gain ratio replaces the information gain (Quinlan 1993). Information gain ratio is defined as:

$${\text{Gain}}\_{\text{ratio}}\left( {D,{ }f} \right) = \frac{{{\text{Gain}}\left( {D,{ }f} \right)}}{{{\text{IV}}\left( f \right)}},$$

Where

$${\text{IV}}\left( f \right) = - \mathop \sum \limits_{v = 1}^{V} \frac{{\left| {D^{v} } \right|}}{\left| D \right|}{\text{log}}\frac{{\left| {D^{v} } \right|}}{\left| D \right|}$$
(12)

However, the information gain ratio may result in the prefer to select the features with low probability. Therefore, in practice, we firstly select a group of features whose information gain is above the average value, and then among which select the one who has the largest information gain ratio.

Besides information gain, Gini impurity is another universally applied method to measure the impurity of a system. Gini index is the uncertainty measurement used in classification and regression trees (CART), which has a close relationship with information entropy (Breiman et al. 1984). If we use 1–\({p}_{k}\), which is the first-order Taylor series expansion of the term –\({\text{log}}\left({p}_{k}\right)\) at \({p}_{k}=1\) to replace the –\({\text{log}}\left({p}_{k}\right)\) in Eq. (1), the Gini index can be given by:

$${\text{Gini}}\left(D\right)=1-\sum_{k=1}^{K}{{p}_{k}}^{2}$$
(13)

Similar to information gain, conditional Gini index is given by:

$${\text{Gini}}\left(D, f\right)=-\sum_{v\in {\text{vals}}(f)}p({D}_{f}(v))*{\text{Gini}}\left({D}_{f}(v)\right)$$
(14)

The algorithm will repeatedly partition the data into smaller and smaller subsets until reach one of the following status: (1) all samples contained in the final subset belong to the same class; (2) all samples contained in the final subset have same attribution values; (3) no sample exists in the final subset. In situation (2), the current node will be labeled as leaf-node, and its class is labeled as its most popular class. In situation (3), the current node will be labeled as the leaf-node, and its class is labeled as the most popular class of its father node.

Considering the status of a single tree is limited, the decision tree algorithm was extended to a random forest algorithm by establishing a ‘forest’ consisting of many trees (Tin Kam 1995). The final decision is made by considering the output of all trees rather than a single tree applied in the decision tree algorithm. As a supervised machine learning algorithm, the random forest method can be carried out via two steps, training and prediction. The main procedures of random forest refer to (Breiman 2001).

Because it is impractical to prepare a ground truth reference image for the validation of their proposed rock typing method, the rock typing performance is validated by visual sensitivity analysis. From Fig. 4, one can see that the proposed method presents a good performance in identifying different rock types of macropore system and micropore system that occurred in eroded feldspar, clay and rock fragments. Based on the result of the rock typing, flow simulation is carried out to estimate the permeability contribution of each rock type and concluded that the permeability contribution of a micropore structure in a multiscale porous medium varies from 0 to 100%, which highly depends on the content of this micropore medium and the permeability of the macropore structure. In the demonstrated sample, the permeability contribution of the micropore structures of eroded feldspar, rock fragment, and clay are 1.38%, 0.37%, and 2.64%. The permeability contribution of all micropore structures is 3.1%. Therefore, the permeability contribution of the micropore structures in the demonstrated sample is neglectable. This workflow presented in this study provided an effective way to analyze the genetic classification of various pore structures. The genetic classification of rock types was also discussed by Rojas et al., whose study was carried out based on core description, MICP and log data rather than images (Rojas et al. 2020).

Phase field-based TSRT

An obstacle of the local feature-based TSRT is that it is challenging to accurately locate the boundary between different rock types because the selected local image patch unavoidably contains a part of pore structures belong to the adjacent rock types when we calculate the local features of the boundary-close pixels/voxels. Wang et al. introduced an innovative approach for image-based rock typing based on Chan-Vese model (Wang et al. 2021) which can be described as:

$$F\left({c}_{1},{c}_{2},C\right)=\mu \cdot {\text{Length}}\left(C\right)+{\lambda }_{1}{\int }_{{\Omega }_{1}}{\left|{u}_{0}\left(x,y\right)-{c}_{1}\right|}^{2}{\text{d}}x{\text{d}}y+{\lambda }_{2}{\int }_{{\Omega }_{2}}{\left|{u}_{0}\left(x,y\right)-{c}_{2}\right|}^{2}{\text{d}}x{\text{d}}y$$
(15)

where C is a contour that divides the image domain, \(\Omega\) into two regions consists of \({\Omega }_{1}\) and \({\Omega }_{2}.\) (\({\Omega }_{2}=\Omega /{\Omega }_{1}\)). \({\Omega }_{1}\) is the objective region (inside C), and \({\Omega }_{2}\) is the background region (outside C).\(\mu \ge 0\), \({\lambda }_{1}\ge 0\), and \({\lambda }_{2}\ge 0\) are fixed parameters. In this paper,\({\lambda }_{1}={\lambda }_{2}=1\). \({u}_{0}\left(x,y\right)\) is the intensity value of the pixel located at (x,y). c1 and c2 are the average image intensity inside and outside contour C, respectively.

The problem described by Eq. (1) can be formulated and addressed by the level set method (Osher and Sethian 1988) via defining a level set function \(\phi \left(x,y\right)\). The key idea of the level set function is to implicitly represent a contour interface as the zero level set of a higher dimensional function, namely the level set function, and formulate the evolution of the contour through the evolution of the level set function (Estellers et al. 2012). The contour C is considered to be the zero level set of the level set function \(\phi \left(x,y\right)\) where \(C=\left\{\left(x,y\right):\phi \left(x,y\right)=0\right\}\). Due to the optimized segmentation obtained by the evolution of the contour C, the level set function is then modified as \(\phi (t,x,y)\) by introducing another variable time t. Usually, \(\phi (t,x,y)\) is defined as the signed minimum Euclidean distances from every point (x,y) to the boundary C where \(\phi >0\) if the point (x,y) belongs to \({\Omega }_{1}\) and \(\phi <0\) if the point (x,y) belongs to \({\Omega }_{2}\).

Because Chan-Vese model is not sensitive to the gradients when implementing image segmentation (Chan and Vese 2001), phase field-based rock typing can obviously relieve the ambiguities of the rock type of the boundary-close pixels/voxels. The method is carried out via two steps of filtering and segmentation. The target segmented image that has two phases of pore and solid is consecutively processed by a local homogeneity filter (LHF) and an average filter (AF). The aim of the local homogeneity filtering is carried out to increase the structure contrast among different rock types, and the average filtering is implemented to weaken the structure contrast within each single rock type. Then, the images after filtering are segmented by Chan-Vese model to complete the rock typing (see Fig. 5). Currently, the proposed Chan-Vese model is still challenging to realize multiphase rock typing. The method was validated using two synthetic images by calculating the Hamming distance between the reference image and the processed image. The Hamming distance between the reference images and their corresponding processed images ranges from 0.0017 to 0.0034, which highly depends on the structural contrast between different rock types.

Fig. 5
figure 5

The workflow of the Chan-Vese model-based rock typing method

Object-based rock typing

The object-based rock typing is further categorized into pore-based rock typing and grain-based rock typing. Different from previously discussed rock typing methods, object-based rock typing requires recognizing the target objects firstly, such as grain partitioning (Knackstedt et al. 2005; Wang and Sun 2021a).

In order to identify the porosity types, Javad et al. proposed a semi-automatic porosity identification workflow based on thin sections (Ghiasi-Freez et al. 2012). In that paper, 384 pores extracted from 240 thin sections are selected manually to demonstrate their method, among which 294 pores are used as training data, and others are applied for validation. The pores are classified into five types, including interparticle pores, intraparticle pores, moldic pores, vuggy pores and biomoldic pores. Six geometrical features, including elongation, roundness, rectangularity, eccentricity, solidity, and the ratio of equivalent diameter to major diameter, are calculated to characterize each pore. Then a Gaussian mixture model is trained and can be used to classify other pores into different pore types. The results show that the prediction accuracy varies from 66.6% (vuggy pores) to 100% (biomoldic pores), which highly depends on the pore types.

The heterogeneity of the pore structure in the rock sample can be affected by many geological processes. In some special cases, different rock types can be identified by their grain features such as size and sphericity. Wang et al. proposed a grain feature-based rock typing method that can significantly relieve the boundary ambiguousness issue (Wang and Sun 2021b). To validate, the proposed method is applied to process two synthetic images whose rock types have been labeled initially. The accuracy of the proposed method was estimated by computing the similarity between the reference image and the processed image using Hamming distance. The calculated Hamming distance in two synthetic images is 0.0232 and 0.0308, respectively, which indicates the effectiveness of the proposed rock typing method. The grain partitioning is carried out firstly to separate the granular rock sample image into a combination of a large number of single grains, and then the geometry features of each single grain are computed. After that MSVM algorithm is applied to recognize each single grain’s rock type. Detailed workflow of grain features-based rock typing is listed as follows (see Fig. 6):

  1. 1.

    Image preprocessing, including denoising and segmentation, is implemented to classify a grayscale or color image into two phases including void and solid;

  2. 2.

    The grain partitioning is carried out to separate the solid phase into a combination of single grains;

  3. 3.

    A set of grain geometry features (e.g., grain size, sphericity (Wadell 1935) and relative surface area) are calculated for every single grain;

  4. 4.

    The rock types of a given number of grains are labeled manually, which can be used as training data to train the classifier;

  5. 5.

    Using the trained classifier to interpret all single grains;

  6. 6.

    The pixels/voxels that belong to the pore phase are labeled as a rock type identical with its nearest solid pixel.

Fig. 6
figure 6

The workflow of the object-based rock typing, a is an example of the grain geometry-based IMRT workflow, and b is the general flowchart of the object-based IMRT

The grain-based rock typing always presents impressive performance for granular partitionable and distinguishable rock samples, but its limitation is also obvious that it has failed to deal with the samples whose grains are challenging to be partitioned in images such as highly consolidated sandstone and vuggy limestone.

The results of the IMRT can be then used to estimate the permeability contribution of each rock type under the help of numerical flow simulation and geological analysis. First, the permeability of entire rock sample is calculated and denoted as K0. The target rock type is then assumed to be either pure porous or pure solid, depending on whether it decreases or enhances the reservoir permeability. For example, the rock type related to secondary clay minerals can be treated as pure pore, while the rock type related to erosion process can be treated as pure solid. Then the sample’s permeability can be calculated again and denoted by K1. Finally, the permeability contribution ratio, R can be calculated as (Wang and Sun 2021c):

$$R=\frac{\left|{K}_{1}-{K}_{0}\right|}{{K}_{1}}$$
(16)

When the pore structure texture of the given rock type is determined by a special geological process, the image-based rock typing results and the calculated permeability ratio also can be used to reveal the geological genesis of the rock type as well as its permeability contribution.

Conclusions

This paper reviews the currently popular used image-based microscale rock typing (IMRT) methods and their applications. According to our previous review, some conclusions about IMRT can be summarized as:

  1. 1.

    The IMRT is one of the most effective ways to quantitatively analyze the heterogeneity of the reservoir at pore scale. According to the application, an IMRT task can be classified as a pattern recognition issue or a texture segmentation issue. The pattern recognition-related rock typing (PRRT) is carried out to solve some problems such as identifying lithofacies, reservoir zone, or Dunham textures. In this case, the input and output of the process are a rock sample’s image and its corresponding label that describes the class the sample belongs to, respectively. The texture segmentation-related rock typing (TSRT) is undertaken to categorize the target image into several regions and each region is a homogeneous rock type. TSRT is the premise of the pore scale heterogeneity characterization.

  2. 2.

    The wide application of deep learning significantly improves the accuracy of the IMRT. Convolutional neural networks (CNN) and its various derivatives present impressive accuracy in IMRT. In addition, the use of the transfer learning significantly reduces the training time.

  3. 3.

    Current self-defined features based PRRT methods could provide an accuracy range from 80 to 100%, which highly depends on the specific task, extracted features, and selected classifiers.

  4. 4.

    Phase field-based TSRT can effectively relieve the ambiguity in classifying the boundary pixels due to it not being sensitive to the image gradient, but it is currently challenging to deal with multiphase rock typing (more than 3 rock types). The performance of the Phase field-based TSRT highly depends on the structural contrast among different rock types.

  5. 5.

    Object-based rock typing can effectively identify the boundaries among different rock types but with a limitation that it is just suitable for grain partitionable and distinguishable rocks.

  6. 6.

    The results of the image-based rock typing can be used to quantitatively evaluate the porosity and permeability contributions of each rock type in a heterogeneous rock sample with the help of numerical flow simulation. When the pore structure texture is determined by a special geological process, the IMRT results also can be used to reveal the geological genesis of each rock type.

  7. 7.

    Currently, there are four main challenges in the field of the image-based microscale rock typing: (1) how to classify the rock types with low contrast in terms of pore structure; (2) how to reduce the boundary errors between different rock types due to the pore structure features are always extracted within a window with a given size; (3) how to reduce the manual intervention in the rock typing; and (4) the sample size of current image-based rock typing is constrained to millimeter to core sizes owing to the limitations of the imaging devices' field of view. Therefore, image-based rock typing serves as a complementary method to conventional rock typing approaches, rather than a replacement.