Automated porosity estimation using CT-scans of extracted core data

Estimation of porosity at a millimeter scale would be an order of magnitude finer resolution than traditional logging techniques. This enables proper description of reservoirs with thin layers and fine scale heterogeneities. To achieve this, we propose an end-to-end convolutional neural network (CNN) regression model that automatically predicts continuous porosity at a millimeter scale resolution using two-dimensional whole core CT scan images. More specifically, a CNN regression model is trained to learn from routine core analysis (RCA) porosity measurements. To characterize the performance of such approach, we compare the performance of this model with two linear regression models trained to learn the relationship between the average attenuation and standard deviation of the same two-dimensional images and RCA porosity. Our investigations reveal that the linear models are outperformed by the CNN, indicating the capability of the CNN model in extracting textures that are important for porosity estimations. We compare the predicted porosity results against the total porosity logs calculated from the density log. The obtained results show that the predicted porosity values using the proposed CNN method are well correlated with the core plug measurements and the porosity log. More importantly, the proposed approach can provide accurate millimeter scale porosity estimations, while the total porosity log is averaged over an interval and thus do not show such fine scale variations. Thus, the proposed method can be employed to calibrate the porosity logs, thereby reducing the uncertainties associated with indirect calculations of the porosity from such logs.


Introduction
A comprehensive formation evaluation requires investigating different reservoir characteristics such as formation pressure, temperature, pore and grain size distribution, fluid saturations, permeability, and porosity. Out of these, porosity is one of the most important properties for reservoir modeling and reservoir characterization processes, as its value and variation within the reservoir determines the volume of reserves. Porosity is moreover strongly correlated to the reservoir quality with respect to the transport properties.
A common method for porosity estimation employs well logs such as density, neutron, and sonic logs, where porosity is indirectly computed from log responses by means of theoretical and empirical calculations. This method provides continuous porosity values along the well. The uncertainty in this type of porosity estimation is high since the logs are affected by other rock and fluid properties in addition to the porosity, e.g., lithology, type of fluids in the pore spaces, wellbore environment and type of drilling mud. Therefore, logderived porosity interpretations need to be corrected and calibrated against core plug porosity measurements. Moreover, log-derived porosity interpretations overlook high resolution (i.e., below log resolution) porosity variations associated with fine-scale heterogeneities and thin layers.
Laboratory measurements using core plugs (typically 1 to 1.5 in. in diameter) and/or sidewall cores (1 in. in diameter) provide local porosity estimates that represent an interval of the cores extracted from the wells. Such analysis can produce acceptable reservoir porosity estimations for relatively homogenous pore systems. However, core plugs, normally sampled once per foot, might not provide representative data in heterogeneous reservoirs with millimeter scale porosity variations. More frequent sampling is necessary in this kind of reservoirs, which is laborious and cost consuming.
X-ray computerized tomography is a non-destructive technique that was initially used in medical science, and that has now been widely applied to investigate materials in the earth sciences, including petroleum geoscience [1,2]. Whole core CT scanning, that provides two-dimensional and threedimensional digital representations of extracted cores, has a long history in assisting geologists. Utilizing this technique leads to cores representations as grayscale images, where the gray-level values represent X-ray attenuation depending on the density and atomic number of the underlying materials. Core CT scans can provide valuable information on the texture and internal structure of reservoir rocks [3]. Recent developments in artificial intelligence techniques, combined with the fact that whole core CT scans are stored digitally, open new possibilities to enhance utilization of such data in rock characterization workflows. In this regard, CT scan images can be employed for automatic estimation of porosity at the millimeter scale, thereby enhancing their value in operational settings.
Moreover, several publications have evaluated the potential of deep neural networks in predicting transport properties (e.g., porosity and permeability) based on pore-scale micro-CT and Scanning Electron Microscopy (SEM) images [21][22][23][24][25][26][27][28][29]. Hebert et al. [24], inspired by Sudakov et al. [28], used an AutoEncoder to segment three-dimensional grayscale micro-CT images of Berea samples and to estimate porosity values using expert-derived segmented images as training ground truth. Bordignon et al. [23] proposed a methodology to estimate grain size and porosity distribution using synthetic idealized rocks (sphere packs) and Convolutional Neural Networks (CNN). Alqahtani et al. [21] employed deep learning to predict porosity, coordination number, and average pore size, where the ground truth values were computed using pore networks extracted from manually segmented images. More specifically, they used a CNN regression model to estimate the mentioned properties from two-dimensional grayscale micro-CT images. In the mentioned publications, machine learning and deep learning method were applied at the pore (micrometer) scale as an extension of Digital Rock Analysis (DRA).
In this study we propose a methodology to estimate porosity at the whole core (millimeter) scale using two-dimensional core CT slices. More specifically we employ an end-to-end CNN regression scheme to estimate porosity in a well using a training dataset that includes porosity measurements derived from routine core analysis. The trained model is then used to populate the whole well with porosity values. The obtained results are compared with a total porosity log computed based on the density log. We moreover compare the deep learning results with a simple linear regression (LR) model trained on average attenuations of the same two-dimensional image slices, and a multiple linear regression (MLR) model trained on both average attenuations and standard deviation of twodimensional image slices. In fact, we discover that the CNN model outperforms the linear regression models, which indicates the optimized convolutional kernels extract the textures and patterns that are important for estimating porosity. Moreover, we quantitatively demonstrate that the proposed CNN model can be employed to provide accurate local porosity estimations.
The subsequent sections of this work are organized as follows: Section 2 presents a detailed description of the employed methodology, i.e., convolutional neural networks. Section 3 describes the image pre-processing steps, including the processes that have been applied to remove images with undesired image artifacts and perform image augmentation. Section 4 provides a brief description of the lithology of the studied formations together with the available data, including whole core CT scans and core analysis measurements. Section 5 describes the details of the model training and CNN hyperparameter selection processes. Section 6 summarizes and discusses the obtained training and prediction results. Finally, the last section provides some concluding comments, emphasizing the advantages of the proposed method for porosity estimation over other schemes.

Method
In this study we compare two approaches for automatic porosity estimation at the core scale. As shown in Fig. 1, the methodology starts with pre-processing of two-dimensional image slices, where images with undesired artifacts are flagged and removed. The pre-processing steps are explained in detail in Section 3.1.
After image pre-processing, two approaches are considered: In the first, the remaining pre-processed images are augmented to expand the training dataset, as CNN networks are proven to show better generalization capabilities with a higher number of augmented images. Then, the augmented images together with porosity values from routine core analysis are employed to train a CNN regression model. In the second approach, a simple (LR) and a multiple linear regression (MLR) model are trained to predict porosity. The LR model is trained to learn the relationship between average attenuations of the two-dimensional images and RCA-derived porosity values. To count for distribution of the gray-level attenuations, we compute the standard deviation of the twodimensional images. The computed standard deviation and average attenuations are used to train the MLR model to learn the measured RCA porosity. The reason for choosing linear models is that we expect a linear correlation between the average gray-level attenuations and porosity for a monomineralic rock type with one type of fluid [30,31]. This comparison will reveal if the CNN model extracts pattens and features rather than just learning the average attenuation of the images. The training results of these approaches are compared, and finally the best performing model, i.e., the CNN, is used to predict porosity of unseen images and to populate the whole well with millimeter scale porosity values. The theory of convolutional neural networks will be summarized in subsequent sections.

Convolutional neural networks
Convolutional neural networks, initially proposed by LeCun [32], have been one of the most popular machine learning algorithms employed especially in classification, regression, and object detection tasks that involve images. CNNs consist of convolutional and pooling layers in addition to the fully connected layers of classical neural networks. The network in the convolutional section is locally connected, in contrast with fully connected layers where each neuron is fully connected to the neurons in the previous layers, and each connection has its own weight. The convolutional section in the CNN models consists of three main layers: convolutional, activation, and pooling.
The sequence of these stacked layers can be repeated many times depending on the task at hand. The convolutional layers are responsible for feature extraction during training, where low level features are extracted in the first convolutional layer, while higher level features are extracted hierarchically as the output of a layer feeds into the next layer. The extracted features are then mapped into the output layer by fully connected layers [33]. The main elements of CNNs and their tasks are described below.

Convolutional layers
In the convolutional layers, a set of optimizable kernels are superposed on the two-dimensional or three-dimensional digital images represented as arrays of pixel numbers. Each kernel is convolved across the image using an element-wise multiplication between the kernel and the current area of the image that it covers (a.k.a. its receptive field). Then, these dot products are summed up and stored in the corresponding position of the output, i.e., the feature map. Once the convolutional operation is computed and stored for a specific position, the kernel is moved by a pre-defined offset, called stride. The convolution process is then repeated until the entire image has been traversed by the kernel. Since convolution is a linear operation, the output feature maps are passed through an activation function to introduce a nonlinearity. Currently the most common activation function to accomplish this is the rectified linear unit (ReLU). However, other activation functions, such as hyperbolic tangent or sigmoid functions, can also be employed. The convolution operation in the convolutional layers can be written as (modified from Anjos et al. [34]): where o i is the output of the i th layer, o i − 1 is the output of the previous layer, * is the convolution operation, W i is the kernel weights of the i th layer, and g is the nonlinear activation function.

Pooling layers
The feature map outputs of the convolutional layers record the exact position of the features; that said, minor spatial changes in the input image result in a different feature map. Therefore, after applying the activation function, a pooling layer is added with two purposes. First, to make the extracted feature maps invariant to local translations and spatial variations in the input image. Second, to reduce the spatial dimension of the feature maps, thereby reducing the computational time associated with model training. Pooling is analogous to the convolutional operation, where a window is sliding over the input image. However, it executes a selection of elements inside the pooling window without trainable parameters. The most common pooling operations are maximum pooling and average pooling. Maximum pooling picks the maximum value inside the window and discards the others, while average pooling computes the average of values inside the pooling window. The down-sampled feature map output of the pooling layer is called the pooled feature map, and it is expected to contain the most important structural features.

Fully connected layers
The pooled feature maps of the convolutional section are then flattened and used as input to the fully connected layers, whose purpose is to map the extracted features into the output layer. As opposed to the convolutional layers, here the input nodes are fully connected to the output nodes by learnable weights. Nonlinearity in the fully connected layers can be introduced by adding an activation function such as ReLU so that mathematically these layers can be explained as (modified from Anjos et al. [34]): where z i is the output vector, z i − 1 is the output of the previous layer in the fully connected network, W i is the weights tensor of the i th layer, b i is the bias of the i th layer and g is the activation function. For the first layer, z i − 1 is the flattened version of the last convolution layer, i.e., o i − 1 . Note that the activation function applied to the final fully connected layer is normally selected based on the type of the task, e.g., classification or regression. The softmax function is a common activation function for multi-class classification. However, linear activation function is used in case of regression.
One of the limitations of the CNN architecture described above is that it can easily be overfit to the training data, and thus diminish the generalization capabilities of the model. To overcome this limitation one may employ some regularization technique, such as dropout, where randomly selected neurons are temporarily removed together with their incoming and outgoing connections, and not used in the back-propagation phase [35,36]. Dropout is performed to optimize the biasvariance tradeoff, i.e., balancing out the bias and variance of the estimates towards better performance. Batch normalization is another common regularization technique, where the output of a convolutional layer is normalized before being used in the next one. Batch normalization is empirically known to make training less sensitive to the initialization point, plus speeding up the network training [37].

Data pre-processing
The provided CT scan data for individual cores are stored in 16bit unsigned DICOM format [38]. We stack the DICOM images of individual core scans together and store them as three-dimensional raw images using the ImageJ software [39]. Once the three-dimensional raw images are created, the next step is to remove border effects by cropping the image into rectangular crops. We used a centered crop of size 256 × 256 pixels, as shown in Fig. 2. Then a global minimum and maximum intensity value is assigned to the rectangular crops of the individual cores, i.e., the images are normalized between the global minimum and maximum intensity values. These global intensity values are selected by observing the three-dimensional histograms of rectangular crops of the entire considered core. Further, the intensity adjusted images are encoded in 8bit format, i.e., 0 − 255 grayscale values.

Removing undesired artifacts
Some of the studied CT images contain non-core regions and undesired artifacts. One type of such artifacts is missing pixels associated with poor core recovery, induced fractures, or rush plugs. Images with missing pixels show low gray-level attenuation values (Fig. 3a). Other types of undesired features are related to the intervals with drilling mud invasion, highdensity minerals (e.g., pyrite and siderite) cementation ( Fig.  3b), and core barrel couplings (Fig. 3c).
The images with artifacts can negatively affect the performance of the considered machine learning algorithms, since they contain features not related to the actual phenomena being modeled. Thus, we implemented specific type-dependent algorithms that aim at excluding images with such artifacts. More precisely, to remove missing intervals we calculate the average attenuation μ c at the center of the image using a centered square covering 40% of the total number of pixels. If the computed average attenuation is less than a pre-defined cutoff C m , the image is flagged and removed.
where f m is the flag for missing interval. The image is removed if f m is equal to 1. The cutoff C m is defined by the user based on observing the average attenuation of images with missing pixel values. In this study, we used cutoff C m = 50. Image intervals with drilling mud invasion and high-density material show high gray-level attenuation readings, and they appear very bright compared to the other intervals. To identify these intervals, we compute the average attenuation μ of the whole two-dimensional image, and if the average attenuation is greater than a pre-defined cutoff C h , the image is flagged for removal: where f h is the high density flag. The image is removed if f h is equal to 1. The cutoff C h is defined by the user based on observing the average attenuation of images with drilling mud invasion and high-density minerals. In this study, we used cutoff C h = 170.
Images from intervals with core barrel couplings show low gray-level attenuation readings in the middle, while the image edges appear brighter with higher attenuation readings (Fig.  3c). To detect these intervals, the difference in average attenuation of the center and edges of the image slices is calculated. As above, the center average attenuation μ c is computed considering 40% of the total number of pixels using a centered square. To represent edge average attenuation μ e , the outer 5% of the total number of pixels along the edges are considered. If the difference between center average attenuation and edge average attenuation is greater than a pre-defined cutoff value C b , the image interval is flagged and removed: where f b is the core barrel coupling flag. The image is removed if f b is equal to 1. It is worth to mention that the employed thresholds were computed using the global distribution of the minimum, mean, and maximum intensity values observed in the data set. Note that in the first approach (Fig. 1), the images are normalized before being used as input for the CNN training (i.e., images are divided by 255). Moreover, to reduce the computational time associated with model training, the remaining two-dimensional image slices are coarsened by a factor of four, so the final image size as used for input to the CNN is 64 × 64 pixels.

Image augmentation
Deep learning networks notoriously require large amounts of training data to achieve good performance. Aware of this, we augmented the training images used for the CNN through image augmentation, i.e., by applying different kinds of modifications (e.g., shifting, flipping, random rotation, and shearing) to the original images. In this way the model can learn from more examples during Fig. 2 As a pre-processing step, the original DICOM images are cropped into crops of size 256 × 256 pixels (blue square). This is done to remove border effects Fig. 3 Two-dimensional image slices with undesired artifacts: (a) missing pixels, (b) high density material, (c) core barrel coupling training phase, and at the same time make the model less sensitive to spatial translations. In this study we specifically considered rotation and horizontal flips of the twodimensional images. In the case of horizontal flip, we employed the "ImageDataGenerator" class, a publicly available code that can be used for image augmentation purposes on the fly, in Python using the Keras API [40].
The "ImageDataGenerator" class rotates the images randomly within a range of user-defined angles, where in case of squared images it is very likely that for some specific rotation angles the pixels will fall out of the image frame leaving some areas of the image with no pixels. There are several interpolation techniques, such as nearest neighbor, that can be used for those areas, but they can amend the key features and create dissimilar features that are then counterproductive for the training process. To avoid this problem, the images were rotated outside Keras, while the horizontal flipping was performed on the fly using the "ImageDataGenerator" class. Here we thus eventually considered rotation of the original images by 90°, 180°and 270°.  Fig. 4. Formation 1 consists of very fine-grained argillaceous sandstones and cemented sandstones; Formation 2 consists of successive layers of mudstones and fine-grained sandstones; Formation 3 consists of granule rich medium-grained sandstones and spiculites (a biogenic rock composed of sponge silica spicules); Formation 4 comprises mud and calcite rich marlstones. The overall lithology in this well is divided into 20 lithofacies by manual core descriptions [41]. A brief description of different lithofacies together with associated abbreviations is provided in Table 1.

Whole core CT scan images
As mentioned previously, in this study we employ twodimensional core CT images. The original images are stored as 16bit unsigned DICOM images slices with a vertical resolution of approximately 0.45 millimeters. The images are preprocessed, i.e., cropped and discarded if presenting undesired artifacts as explained in section 3. The remaining images are coarsened to 64 × 64 pixels and normalized before being used for further analysis.

Routine core analysis data
Porosity measurements of 388 core plugs from routine core analysis (RCA) are considered as ground truth for the considered convolutional neural network and linear regression models. The RCA porosity-permeability cross-plot is shown in Fig. 5, where different colors represent different lithofacies observed through core descriptions. According to Fig. 5, the studied well contains a whole range of porosity values, from approximately 0.03 for well cemented sandstones (WCemMSS) up to 0.40 for cross stratified sandstones (CrossFineSS). Permeability.
values range from 0.01 mD for mudstones up to 50 Darcy for granule rich medium-grained sandstone samples (GraMSSDispC).

Training
As mentioned previously, convolutional neural networks and linear regression models were employed in two separate approaches to estimate porosity at the whole core scale. This section presents the details of the training processes for both approaches.

Labeling and train-test splitting
In this study the RCA-derived core plug porosity measurements were used as the target porosity values for individual two-dimensional images. Porosity is often measured on core plugs with typical lengths of 1 to 2 inches [42], which corresponds to approximately 55 to 110 two-dimensional CT image slices with vertical resolution of 0.45 millimeters. In our approach we assign the same porosity labels to 19 successive two-dimensional images (approximately 8.5 mm vertical length) at depth intervals corresponding to the core plug depths. In other words, individual two-dimensional images are used as input to train the CNN model, where 19 successive images at each depth interval are labeled with the same porosity values coming from core plug measurements. Note that the number of images in specific intervals might be less than 19 since images with artifacts are removed by image preprocessing.
For regression modeling we use the same labeling strategy. However, in case of linear regression, the average attenuation and standard deviation of each two-dimensional image is computed. The simple LR model is trained to learn the relationship between average attenuations (as independent variables) and core analysis porosity measurements (as dependent variables), while the MLR model is trained to learn RCA measured porosity from both average attenuation and standard deviation.
Generally, a machine learning model is trained on a set of samples, i.e., training set. In a supervised learning environment, the training dataset consists of pairs of input and corresponding output values that are used as the training target. The output predictions during training are compared with the actual target values, and the model parameters are updated to minimize the error between actual target and predicted values. Once the model is fitted to the training set, its statistical performance is evaluated on a set of unseen instances, i.e., test set. In this manner, the test set provides an unbiased evaluation of the trained model.
The labeling strategy explained above resulted in 6863 labeled images (about 3% of the total number of images in the studied well). To cover the distribution of the measured porosity values, we chose~90% of the images as the training set and the remaining~10% as the test set. The train and test sets were separated manually to ensure that the test set was representative for the porosity distribution seen in the well. The test set is a continuous subset of the well, therefore we do not have images from the same plug in the train and test sets. Since the lithofacies classes are varying gradually and slowly, a random spilt would result in similar images in the training and test sets, and overestimation of the performance of the model. Moreover, 20% of the training set was employed as the validation set to provide an unbiased evaluation of the model while tuning the kernels and the weights in the network. As mentioned before, the images used for CNN training were further augmented (rotated by three angles of 90°, 180°and 270°) to expand the training dataset and to obtain a more robust model that is invariant to the image modifications. The image augmentation process resulted in 19764 training images.

CNN training and hyperparameter selection
Commonly, CNN training is performed by forward and backpropagation processes, where the kernels in the convolutional layers and weights in the fully connected layers are updated iteratively to minimize the error (cost function) between the actual and predicted values. Mean squared error (MSE) and mean absolute error (MAE) are commonly used as cost functions in CNN regression problems, and calculated as where N is the total number of samples, while y i and b y i are the actual and predicted values of sample i. The cost function is minimized using a gradient descent optimization algorithm. More precisely, the partial derivative of the cost function with respect to each learnable parameter is calculated and those parameters are updated using [33]: where ω refers to each learnable parameter (i.e., weights and kernels), α is the learning rate, and E is the cost function. In this study we considered both types of the above cost functions. However, based on our preliminary experiments the optimization of the MAE leads to a model with better performance on the validation set than a model optimized with MSE. Therefore, we used MAE as the cost function to be minimized during model training.
The learning rate determines how fast the learnable parameter should move along the direction of the gradient, and its value can significantly affect the convergence of the cost function optimization process. There are different types of optimizers and descent methods to minimize the cost function. Here, we specifically used the Adam optimizer [43] and mini-batch gradient descent to search for optimum weights and kernel parameters. Mini-batch gradient descent is the most common variation of gradient descent in deep learning [44], where the training dataset is split into small batches, and the error is calculated per batch before updating the learnable parameters. In our study the final optimal approach was to consider a batch size of 32 images. The optimization procedure is carried out using a pre-defined number of epochs, where an epoch is a period in which all the training samples have been presented at least once to the network. The number of training epochs has high impact on the performance of the CNN model too, since a too large or too small number of epochs can lead to over-and underfitting, respectively. To overcome this issue, in our study we used an early stopping regularization technique, i.e., make the model training stop once the model performance on the validation dataset starts to degrade after a certain number of epochs. Here, a total number of 300 epochs was assigned for the training process.
In addition to the learnable parameters there exist other types of parameters, so called hyperparameters, that are not learnable during the training processfor example the number of convolutional layers, the number of kernels in the convolutional layers, the kernel size, the learning rate, and similar factors.
The values of such hyperparameters are typically initially assigned by the user using domain expertise considerations and are kept constant during the training phase. Since the choice of the hyperparameters highly affects the performance of CNN networks, such hyperparameters may be adjusted through a process referred to as hyperparameter tuning. In this study we performed such a tuning on the number of convolutional layers, the number of kernels in the convolutional layers, kernel size, learning rate, number of neurons in the fully connected layer and dropout rate. The proposed CNN model was developed in Keras with the Tensorflow backend. To automatically tune the considered hyperparameters, we employed the Keras tuner library [45], where we define a search space for each hyperparameter together with a proper tuner. The tuner automates the tuning process by evaluating a certain number of configurations with different hyperparameter combinations. The considered hyperparameters and associated search spaces are presented in Table 2. There are different types of tuners such as RandomSearch, Hyperband and BayesianOptimization. More detailed information on the differences among these approaches can be found in [46][47][48][49]. In this study we used the Hyperband algorithm [48], which is a relatively new method, and that uses adaptive resource allocation and an early stopping rule to quickly converge to a high-performance model. More precisely, a large number of random configurations are run for a specific number of epochs (i.e., one or two) per configuration. Then the top-performing model configurations based on the previous results are trained for more epochs. Finally, the best model configuration trained to the assigned  maximum number of epochs is returned by the algorithm. The proposed CNN architecture, optimized using the above hyperparameter selection process, is indicated by bold numbers in Table 2, while the corresponding CNN is shown in Fig.  6. The CNN architecture is then described in detail in the next subsection.

CNN architecture
The optimal CNN architecture is presented in Fig. 6. The input images are two-dimensional grayscale images of size 64 × 64, with one channel represented by two-dimensional arrays.
To preserve the original image size, zero padding is applied in the convolutional layers, i.e., a layer of pixels with values of zero are added around the edges of the images. Here, the convolution operation is applied using a stride of 1, i.e., the convolutional kernel is moved by 1 pixel while performing convolution operations.
To introduce nonlinearity, a ReLU activation function is applied to the feature map outputs of each convolutional layer. Then, the feature maps are sent to the next layer, the pooling layer, where they are downsampled using a pooling window size of 2 × 2 and a stride of 1.
The pooled feature maps of the convolutional section are flattened into a one-dimensional vector that is fed into the fully connected layer to be mapped to the output layer. The proposed architecture consists of one hidden layer with 160 neurons optimized by the hyperparameter tuning process explained above. Moreover, a ReLU activation function followed by a dropout layer is applied to the hidden layer. As explained in subsection 2.1.3., the dropout layer is applied as a regularization technique to prevent overfitting in the proposed network. Here, a dropout rate of 0.2, optimized during hyperparameter tuning, was applied in the proposed network, meaning that one in 5 of the neurons (in the hidden layer) together with their incoming and outgoing connections were randomly ignored from each update iteration.
The output layer contains a single neuron since we are dealing with a regression problem, where a single porosity value is predicted for each input image.

Linear regression training
As indicated before, two linear regression models (LR and MLR) were trained to predict porosity directly from the average attenuation and standard deviation values calculated for the individual two-dimensional image slices. The purpose of such a step is to enable a subsequent analysis revealing if the CNN model extracts relevant features rather than just learning the average attenuation and standard deviation of the images. Figure 7 then shows the average attenuation versus measured porosity values for the training dataset, where different colors correspond to various lithofacies classes in the study well. Note that the same porosity values are assigned to 19 successive image slices. We can see that the number of images associated to some porosity values are less than 19 since some of the images with artifacts are removed by image pre-processing. This figure shows that porosity is negatively correlated with average attenuation. However, some of the datapoints behave differently and appear as outliers (see the red and blue ellipsoids in Fig. 7). Based on our investigations the datapoints within the red ellipsoid are associated with images that partly contain missing pixels, and this indicates that these images were not completely removed during the image preprocessing step. The missing pixels are characterized by lower gray-level attenuation values, resulting in lower average attenuation in these images compared to the other images with similar measured porosity values. The datapoints within the blue ellipsoid, instead, are associated with images containing Fig. 6 The proposed CNN architecture for porosity estimation using two-dimensional CT images high density material with higher average attenuation values compared to the images with similar measured porosity values. Since the linear regression training process can be affected by outliers, we included a step of detecting and removing these outliers prior to model training. To detect the outliers we employed the Isolation Forest (iForest) algorithm initially proposed by Fei Tony Liu et al. [50,51]. The iForest algorithm is a model that is based on unsupervised learning and that works using the principle of isolating anomalies. Similar to the Random Forest, the iForest method randomly splits the datapoints by building an ensemble of trees (called iTrees), where the goal here is to isolate the anomalous datapoints. Based on this algorithm, there is a tendency for anomalous instances to be isolated easier compared to the normal instances, i.e., anomalies are the datapoints with short average path lengths on the iTrees.
As expected, the datapoints related to the images with partly missing pixels and high-density material (red and blue ellipsoids in Fig. 7) were detected as outliers by the iForest algorithm. These outliers are removed from the training set before fitting the linear regression models. The fitted LR model is shown in Fig. 7.

Results and discussions
We now present the obtained training results and compare the prediction capabilities of the CNN and linear regression approaches presented above.

Training results
The CNN regression model was trained to predict RCA measured porosity values using two-dimensional whole core CT image slices. The linear regression models were instead trained to predict RCA measured porosity based on the average attenuation and standard deviation values computed on the same two-dimensional CT image slices. Note that the LR model was trained only on the average attenuations, while the MLR model was trained on both average attenuations and standard deviations with the purpose of counting for variation from the mean average attenuation. To evaluate the training results, we have shown a quantitative description of the proposed models in Table 3, where we can clearly see that the CNN outperforms the linear regression models. Moreover, we can see better performance in the linear regression models by including the standard deviation as one of the independent variables. The measured porosity is plotted versus the predicted porosity for the CNN and MLR models in the first row of Fig. 8. In addition, the corresponding residual plots are shown in the second row. We observe that the measured and CNN

CNN evaluation
As mentioned, to obtain an unbiased evaluation of the CNN model, 20% of the training set was used as validation set during training and hyperparameter tuning process. In other words, once the model is trained at the end of each epoch, its generalization capability is evaluated on the validation set using the mean absolute error as the training metric. Figure 9 shows how the error is minimized with increasing the number of epochs in both the training and the validation sets. Note that despite assigning 300 epochs, the training process stops after 221 iterations due to the early stopping regularization technique dictating so. We can see that the model shows high generalization capabilities on the validation dataset.

Porosity prediction
The proposed CNN and MLR models were further used to predict porosity on a set of unseen images, i.e., the test set, from the same well. Like the training set, 19 successive images at each depth interval were labeled with the same RCA derived porosity values. Note that in the case of MLR model, the calculated average attenuation and standard deviations of test set images were used to predict RCA porosity, while in case of CNN the image slices were directly used to predict porosity. The CNN predicted results are plotted versus the actual measured values in Fig. 10 (left). Here we show the CNN predicted porosity values with error bars, where the markers signify the mean predicted porosity of 19 successive images, while the error bars show the standard deviations, i.e., the variability of the predicted porosity for the 19 images. The mean predicted porosity and the actual measured porosity for the CNN and MLR models (Fig. 10) show an r 2 of 0.81 and 0.69, respectively. Here, we can see that the CNN predicted mean porosity values are closer to the line of equality (1:1 line) and the error bars for most of the samples cross the line of equality), showing that the correlation is balanced. These results again indicate higher performance of the CNN model in predicting porosity on the unseen images. However, there are a few samples that occur relatively far below the equality line, i.e., samples belonging to the poorly cemented granule rich medium-grained sandstone (PCemGraMSS) and muddy fine-grained sandstone (MudFineSS) classes. These are shown within the red dashed ellipsoid in Fig. 10.
Image examples of these classes are presented in Fig. 11. One can clearly see that the images in the first row are characterized by high drilling mud invasion, where mud appears with very bright to white gray-level attenuation values. Moreover, the images in the second and third rows are darker Fig. 9 The CNN model performance during training using mean squared error as the evaluation metric. The blue and red lines depict the training and validation errors, respectively in the middle and brighter at the edges. As mentioned before, this type of artifacts is characteristic of the image intervals with core barrel couplings. Apparently, these images are not completely removed during pre-processing, and the mentioned artifacts are probably the reason for CNN model deficiency in porosity prediction associated with these images. Here the model is predicting lower porosity values compared to the actual values. This is expected given the presence of pixels with high gray-level attenuations.
In addition, it is worth to mention that the proportion of different lithofacies classes is varying in the training set (as shown in Table 4), meaning that the CNN model is trained on fewer image examples for classes in minority than the more frequent ones. Given such an imbalanced training dataset, the model is expected to show lower prediction performance on the minority classes. This is clearly confirmed in Table 4, where the average prediction error for lithofacies classes in minority is higher than the one in more frequent classes. Note that high average prediction error for the PCemGraMSS class is associated with image artifacts (as explained above).
Overall, the proposed CNN model shows satisfying prediction performance on the unseen images. Therefore, we applied this model to predict porosity throughout the whole studied well, thereby populating the whole well with millimeter scale porosity values. Note that this interpolation step is performed using a model that was trained and evaluated on approximately 3% of the available images. The obtained results are presented in Fig. 12, where we can see a clear correlation between the CNN predicted porosity (light blue curve) and the RCA porosity (red circles). The available total porosity log is also plotted on the same figure, i.e., the black curve. The total porosity (φ) log is computed using density porosity equation: Fig. 11 Examples of the two-dimensional images from the test set, where there are bigger deviations between the actual measurements and the predicted porosity values. Clearly, the image artifacts due to mud invasion (first row) and core barrel couplings (second and third row) are the reason for CNN model deficiency where ρ ma indicates the matrix density index obtained from core measurements, ρ b is the bulk density index derived from the density log, and ρ fl is the measured fluid density. To  12 Populating the whole well with porosity values. The CNN predicted porosity is in line with the RCA measured porosity. A 1.5 m interval is zoomed in and shown in the plot to the right compare the predicted results with the total porosity log at the same scale, the predicted porosity was coarsened to match the resolution of the porosity log. For this purpose, we calculated a moving average of the CNN predicted porosity using a window size of 30 cm. The coarsened porosity curve is shown by green color in Fig. 12, where one can see that it is well correlated with the total porosity log. A 1.5 meter interval of the well is zoomed in and shown in Fig. 12. Here we can see the millimeter scale porosity variations predicted by the CNN model, while the other two curves are averaged and do not show finer scale variations. In fact, the predicted porosity is more accurate and more in line with the core porosity measurements compared to the total porosity log. Note that a bulk depth shift has been applied on the CNN images and RCA porosity measurements to match the depth of the total porosity log. As this is a bulk depth shift, there might be local depth-match issues.
The proposed method can thus be employed to provide accurate local porosity estimations and the obtained results can be used to calibrate porosity logs. Moreover, it can help in identifying the appropriate core plug locations.

Cross-well validation
To further evaluate the generalization capability of the proposed CNN model, we employed the model to predict porosities from core CT images coming from a second well from the same field, and thus on data that was completely unseen by the model training process. This well penetrates only two of the previously mentioned formations, i.e., Formation 3 and Formation 4 as shown in Fig. 13. The lithology of Formation 4 in this well is very similar to the first well, consisting of mud and caliche rich marlstones. However, Formation 3 is quite different than the one in the first well, as it also contains intervals with more coarsegrained lithofacies. The obtained prediction results are presented in Fig. 13, where one can see quite good correlation between the predicted (light blue curve) and measured porosity (red circles), especially for depth intervals above XX17. As mentioned, the lithology in these intervals is very similar to the lithology in the first well. This might explain the higher prediction accuracy. However, the model predictions are less accurate for porosity values in the intervals where the lithology is significantly different than the lithologies that the model was trained on.
Overall, the MAE and r 2 between the core porosity measurements and model predicted porosity is 0.032 and 0.73, respectively. These results indicate the practical applicability of the proposed CNN model on unseen images from other wells.

Conclusions
This study investigated the capability of convolutional neural networks to automatically predict RCA porosity from two-dimensional whole core CT scan images and proposed a workflow to predict millimeter scale continuous porosity values. The proposed CNN regression model, trained to learn the relationship between convolution-derived features and RCA-derived porosity values, was compared with linear regression models trained to predict RCA porosity from average gray-level attenuation and standard deviation computed for the same two-dimensional CT images.
Summarizing, the results show that the CNN model is well able to predict the porosity with a coefficient of determination of 0.81 for the unseen test set. This indicates that the optimized convolutional kernels have learned the textures and patterns that are important for estimating porosity. The capabilities of the CNN model in predicting porosity on completely unseen images, and the capabilities of cross-populating two wells with porosity values were moreover both tested quantitatively. The obtained results show that the model generalizes well on unseen images where the lithology is similar to those that the model has been trained on. However, images coming from significantly different Fig. 13 Porosity prediction in the second well employing the proposed CNN model. The model performs quite well given that the lithology in some intervals is different than the lithofacies that the model is trained on lithologies, and images with specific features/image artifacts (e.g., images with drilling mud invasion and core barrel couplings) can result in model deficiencies.
Overall, comparison of the predicted porosity with total porosity log reveals that the proposed method can provide accurate millimeter scale porosity estimations, while the porosity log has a lower resolution and therefore overlooks such high-resolution variations.
Our conclusion is thus that the proposed method can be employed to estimate continuous core scale porosity values in an automatic fashion, and at early stages of the reservoir characterization process. This method can moreover be used to calibrate the porosity logs, thereby reducing the uncertainties associated with indirect calculations of the porosity from such logs. We also remark that it may be helpful to identify proper core plug locations for an improved core analysis. However, the most significant application of the proposed high-resolution porosity log might be in reservoir characterization and modeling.
It is worth to mention that in this study the CNN model was trained only on images from a single well. Ideally, training the model on multiple wells, covering different lithofacies and a wide range of porosity values, is expected to result in higher generalization capabilities.