Paired and Unpaired Deep Learning Methods for Physically Accurate Super-Resolution Carbonate Rock Images

X-ray micro-computed tomography (micro-CT) has been widely leveraged to characterise the pore-scale geometry of subsurface porous rocks. Recent developments in super-resolution (SR) methods using deep learning allow for the digital enhancement of low-resolution (LR) images over large spatial scales, creating SR images comparable to high-resolution (HR) ground truth images. This circumvents the common trade-off between resolution and field-of-view. An outstanding issue is the use of paired LR and HR data, which is often required in the training step of such methods but is difficult to obtain. In this work, we rigorously compare two state-of-the-art SR deep learning techniques, using both paired and unpaired data, with like-for-like ground truth data. The first approach requires paired images to train a convolutional neural network (CNN), while the second approach uses unpaired images to train a generative adversarial network (GAN). The two approaches are compared using a micro-CT carbonate rock sample with complicated micro-porous textures. We implemented various image-based and numerical verifications and experimental validation to quantitatively evaluate the physical accuracy and sensitivities of the two methods. Our quantitative results show that the unpaired GAN approach can reconstruct super-resolution images as precise as the paired CNN method, with comparable training times and dataset requirements. This unlocks new applications for micro-CT image enhancement using unpaired deep learning methods; image registration is no longer needed during the data processing stage. Decoupled images from data storage platforms can be exploited to train networks for SR digital rock applications. This opens up a new pathway for various applications related to multi-scale flow simulations in heterogeneous porous media.


Introduction
With the aid of X-ray micro-computed tomography (micro-CT), the pore-scale structure of subsurface rock can be accurately characterised to understand how fluids flow in porous rock for enhanced oil recovery (Armstrong and Wildenschild 2012), carbon dioxide sequestration (Juanes et al. 2006), multiphase flow in fuel cells (He et al. 2000;Tang et al. 2022a, b), and various other applications. Based on the standard workflow (Andrä et al. 2013), digital rock grey-scale images are initially obtained from micro-CT and then segmented into two phases-pore and grain for direct numerical simulation or pore network modelling (Blunt et al. 2013). The achievements of those applications, however, are highly dependent on how accurate the pore-scale geometries are captured from the micro-CT images. Overall, there is a trade-off between image resolution and field-of-view (FOV); high-resolution images more accurately depict pore geometries at the cost of reducing the FOV. In contrast, low-resolution images have larger FOV but cannot represent the true structural details of the rock. This presents a critical challenge since high-resolution data can depict pore characterisation precisely, while FOV needs to be large enough to represent the presence of heterogeneity .
In the past few decades, super-resolution (SR) technology has been applied to circumvent the trade-off between resolution and FOV. SR aims to reconstruct a high-resolution (HR) counterpart of a degraded low-resolution (LR) image (Park et al. 2003). Traditional SR methods have been demonstrated to improve image resolution, such as stochastic approaches Tian and Ma 2010), Bayesian method (Tipping and Bishop 2002), neighbour embedding (Chang et al. 2004;Gao et al. 2012), sparse representation (Yang et al. 2008(Yang et al. , 2010, projection onto convex sets (POCS) approach (Stark and Oskoui 1989), and example-based approach (Freeman et al. 2002). These traditional methods, however, have their own drawbacks. For instance, neighbour embedding does not implement well on complicated images with textural regions (Gao et al. 2012). POCSs and example-based methods need high computational time (Freeman et al. 2002;Shen et al. 2014). Sparse representation has the challenge of balancing the relations between dictionary size and computational cost (Yang et al. 2010).
Recent advances in deep learning have exceeded traditional methods to solve the single image super-resolution task (SISR) using convolutional neural networks (CNN) or generative adversarial networks (GAN). Dong et al. (2014) developed a deep convolutional network, called SRCNN, by learning the end-to-end mapping between bicubic LR and HR data. Thereafter, more advanced deep neural networks have been proposed for SISR inspired by SRCNN using various effective structures. Dong et al. (2016) first introduced a fast super-resolution convolutional neural network (FSRCNN) using normal deconvolution layers, which can reduce the computational time. However, the deconvolutional layer can cause redundancies during the upsampling procedure (Yang et al. 2019). Instead of using the deconvolution layer, an efficient sub-pixel convolutional neural network (ESPCNN) was proposed to learn the upscaling process for SISR by rearranging the feature maps of the low-resolution image to high-resolution image mapping (Shi et al. 2016). Thereafter, more neural network-oriented approaches were presented for SISR, such as VDSR (Kim et al. 2016a), DRCN (Kim et al. 2016b), EDSR (Lim et al. 2017), SRDenseNet (Tong et al. 2017), MemNet (Tai et al. 2017), WDSR (Yu et al. 2018), and so forth. Most of the current deep learning models need paired training data, which is not always available. Therefore, researchers have applied various generative adversarial network (GAN) approaches to solve SR problems using unpaired training data, such as SRGAN (Ledig et al. 2017), CinCGAN (Yuan et al. 2018), high-to-low GAN (Bulat et al. 2018), DSR/CSR (Lugmayr et al. 2019), and others. Peak signal-to-noise ratio (PSNR) and Structural Similarity Index Measure (SSIM) are the common metrics to examine image quality. Paired algorithms usually provide higher accuracy based on PSNR/SSIM, while unpaired algorithms are more flexible to leverage real-world data (Bulat et al. 2018;Yuan et al. 2018).
In digital rock physics, SR techniques can provide large-scale domains at high resolution for flow simulation where the large FOV can represent heterogeneous features (Jackson et al. 2021). Recent SR studies on digital rock images demonstrated that CNN-based SR models can generate high-quality images (Wang et al. 2019a, b;Wang et al. 2019a, b). These previous works evaluated SR performance based on grey-scale analyses, e.g., histogram data, differential maps, as well as image quality metrics. These standards, however, cannot explicitly determine the physical accuracy of the SR images for petrophysical analyses, such as porosity and absolute/relative permeability, which are critical parameters for digital rock physics. Wang et al. (2019a, b) demonstrated the permeability of SR images can be consistent with their HR ground truth (GT) counterparts with various segmentation thresholds using paired SRCNN and SRGAN methods. Niu et al. (2020) further illustrated that the physical accuracy of SR encompassing porosity, permeability, pore size distribution, and Euler characteristic was equivalent to its HR counterpart using an unpaired CinC-GAN. Further validation work by Jackson et al. (2021) demonstrated the reliability and efficiency of EDSR on the application of multiphase flow simulation on large-scale heterogeneous porous media. Results show that the physical accuracy of EDSR results is comparable to the related GT data and experimental data.
The paired and unpaired SR deep learning methods raise an important question. Can an unpaired method achieve equivalent physical accuracy when compared to a paired method? In this paper, we examined two state-of-the-art SR paired/unpaired deep learning models-EDSR (paired) and CinCGAN (unpaired) to enhance the image resolution of an imaged carbonate sample, which includes resolved and sub-resolved pores that are challenging to characterise from a single resolution image. In general, both EDSR and CinCGAN are found to precisely capture the edge sharpness and high frequency texture of the SR grey-scale images, which cannot be resolved in LR images. Simulated petrophysical properties using pore network modelling show that both EDSR and CinCGAN can accurately reconstruct SR images comparable to their HR counterpart. Our results suggest that unpaired deep learning models can become an alternative way to enhance digital rock image resolution when paired data are unavailable. Image registration can be skipped to accelerate the entire image processing workflow. In addition, images from data archives can be exploited in unprecedented ways by using unpaired approaches to provide SR solutions.

Materials
A 6-mm heterogeneous Middle Eastern carbonate (MEC) core cylindrical plug was initially scanned at LR (10.72 µm) and HR (2.68 µm) with scale factor of 4x. The imaging details are presented in Table 1. Basic settings, such as voltage, tube current, and exposure time, are the same for both scans, while the distance from the source determines the image resolution. The original 16-bit micro-CT images for this study can be found on digital rock portal. (https:// www. digit alroc kspor tal. org/ proje cts/ 362).

3
The original 3D 16-bit micro-CT LR/HR images were paired as the position of the sample was fixed during the LR/HR scans. The images were cropped to 380 × 380 × 1025 voxels for LR and 1520 × 1520 × 4100 voxels for the corresponding HR to remove the background. Afterwards, the 16-bit images were converted to 8 bits using standard image normalisation as where p is the grey-scale value, while p min and p max are the maximum and minimum values by eliminating the extremums.

EDSR
EDSR was introduced by Lim et al. (2017) as a 2D multi-scale CNN-based deep learning framework for SISR image enhancement. They utilised interpolated images as inputs, while GT images are the coupled HR images. EDSR encompasses two convolutional layers, a series of residual blocks and upsampling blocks for resolution enhancement (He et al. 2016). In this paper, we extend the EDSR model to 3D, as shown in Fig. 1a. To alleviate the computational burden, we reduce the filter numbers in the convolutional layers from 64 to 32. Instead of inputting interpolated LR images, we utilised natural LR/HR images as inputs/outputs to retain the original image information. We also apply a trilinear upsampling method for resolution enhancement where the feature maps have the same scale as the output to replace the pixel shuffle upsampling method in the original EDSR. To train the EDSR model, the L1 loss function was applied to optimise the weights and biases.
where y gt is the ground truth data and y predicted is the predicted data from the neural network. The detailed structure of the EDSR is shown in Figure S1 from Supplemental Material.

CinCGAN
GAN proposed by Goodfellow et al. (2014) has been broadly applied in computer vision tasks, e.g., image segmentation (Luc et al. 2016;Souly et al. 2017), SR (Niu et al. 2020), image denoising (Liu et al. 2020;Niu et al. 2021), and image synthesis and manipulation (Nie et al. 2017;Zhang et al. 2017;Zha et al. 2020). Among the applications of GANs, one is called cycle-consistent GAN (CycleGAN) that was initially designed for image-to-image translation (Zhu et al. 2017). Yuan et al.(2018) presented an unpaired CinCGAN SR model, which can generate high-quality SR images when compared with paired SR methods. Figure 1b shows the architecture of CinCGAN where two CycleGANs are applied to construct CinCGAN. To train a CinCGAN, the three datasets shown in Fig. 1b are required: (1) a low-resolution image (X) , (2) a bicubic low-resolution image (Y) that is interpolated from (Z) , and (3) a high-resolution image (Z) . The first CycleGAN mapping is denoted as X → Y → X in the black box of Fig. 1b. G 1 generates fake image Y ′ that is similar to the clean bicubic LR image ( Y) in order to confuse the Discriminator, D 1 . G 1 acts as a deblurring filter to clean the LR image ( X) by regarding the bicubic LR ( Y) as a reference. This is because the bicubic LR image ( Y) is noise-free when compared with the input LR image (X) . Generator G 2 maintains a reverse mapping Y → X to reinforce the under-constrained mapping of X → Y . Discriminator D 1 aims to distinguish the fake image Y ′ generated by G 1 from the real image Y . The second CycleGAN mapping is denoted as X → Z → X in the black dotted box of Fig. 1b. In this step, a pretrained 2D EDSR model is initially trained between the bicubic LR image ( Y) and HR image ( Z) . Then the trained G 1 from first CycleGAN and pretrained EDSR are regarded as a new generator G 1 + EDSR to generate a fake SR image Z ′ similar to the real HR image (Z) . Similar to the first Cycle-GAN, Generator G 3 adds an inverse downscaling mapping Z → X to constrain the solution. Discriminator D 2 aims to differentiate the fake SR image Z ′ from the real HR image (Z) . The loss function to optimise the weights and biases is  where L total GAN is total generator-adversarial loss, TV is total variation loss, 1 , 2 and 3 are the weights for the different losses in the first X → Y → X CycleGAN and 1 , 2 and 3 are the weights for the losses in second X → Z → X CycleGAN. The details for each loss function can be found in the Supplemental Material. In addition, a detailed structure of the CinC-GAN is provided in Figure S2 of Supplemental Material.

Mercury Intrusion Capillary Pressure
To quantify the porosity of the macro-and micropores system of the tested carbonate sample, Mercury Intrusion Capillary Pressure (MICP) test was conducted. The test was run using POREMASTER® by Quantachrome instruments on another sample from the same block (Alqahtani et al. 2022). The results were analysed using a suite of Thomeer hyperbolas (Thomeer 1960). Thomeer hyperbolas can be used to decode different pore systems through type-curve matching and superposition in porous media (Clerke et al. 2008;Buiting and Clerke 2013). A Thomeer hyperbola can be expressed as where B v is the volume of mercury injected, B ∞ is the percentage of bulk volume intruded with mercury at infinite pressure, G is a pore geometrical factor, P c is injection pressure (capillary pressure), and P d is the displacement pressure required for mercury intrusion to the largest pore throat. The Thomeer hyperbola parameters are depicted in Fig. 2a. The related Thomeer hyperbolas matched to the experimental MICP data in Fig. 2b show a total porosity of 28.81% where macroporosity and microporosity account for 17.84% and 10.97% of the sample bulk volume, respectively.

Training Process
EDSR training data were extracted using a sequence of image patches with 40 3 voxels for LR and 160 3 voxels for the corresponding HR using a sliding window moving on the training/testing region with overlapping step sizes of 20 and 80 for LR and HR data, respectively. All LR data were coupled to the HR data. In total, there were 2080 and 512 image patches for training and testing, respectively. The example image patches for LR/HR ground truth images are shown in Fig. 1a. The Adam optimisation method was applied to update weights/bias in the EDSR (Kingma and Ba 2014). The learning rate was initially set at 10 -4 and decreased tenfold every twenty epochs. Batch size was 6 to reduce computational cost and 100 epochs were used for training. The same number of image patches were generated for CinCGAN-40 2 voxels for LR, 160 2 voxels for HR. The LR/ HR image patches were independently extracted from different FOVs with overlapping step sizes of 40 and 160 for LR and HR data. We initially trained the first X → Y → X CycleGAN for 100 epochs to restore the noisy input data to clean data. Then, a pretrained EDSR was trained as an upscaling model for the second X → Z → X CycleGAN training. The pretrained EDSR was trained using HR images and corresponding bicubic downsampled images. With the help of the pretrained EDSR, we loaded the trained G 1 from the first CycleGAN along with the pretrained EDSR and trained the second X → Z → X CycleGAN for another 50 epochs. All training was implemented with Adam optimisation (Kingma and Ba 2014). Batch size was set to 8 and the initial learning rate was 10 -4 and then halved every twenty epochs.
All training was conducted using a NVIDIA GeForce RTX 2080Ti GPU. All code was developed using the PyTorch platform. When the training procedure was completed, a validation 3D LR volume (380 × 380 × 512 voxels) that has never been seen by the EDSR nor CinCGAN was fed into the pre-trained models to provide a corresponding 4 × SR volume, i.e., 1520 × 1520 × 2048 voxels. The 3D EDSR model cannot input such a large 3D volume directly due to GPU memory limitation. Herein, we split the validation volume into a series of sub-volumes (380 × 380 × 4 voxels) in the Z-axis direction. Each LR sub-volume was reconstructed and then stacked to form a full 3D SR validation image (1520 × 1520 × 2048 voxels). We indeed visually observed some inconsistent artefacts at the boundaries between sub volumes in z plane shown in Fig. S3(a)-(b) from Supplemental Material. This was caused by padding of convolutional kernels since there is limited information at image boundaries. However, Fig. S3(c) from Supplemental Material shows that the inconsistent artefacts at image boundaries do not result in segmentation errors.

Validation
Running a 3D CinCGAN directly is time-consuming and overloads the internal memory of the GPU. Therefore, we implemented a few simple steps as demonstrated by our previous work to reconstruct 3D image using the 2D CinCGAN (Niu et al. 2020).
The procedure does not cause any coupling problems caused by using the two different networks (Niu et al. 2020). This is because both the CinCGAN and pretrained EDSR model are trained using the same training data. Figure S4(a), (b) in Supplemental Material demonstrates no visually apparent inconsistent artefacts at image boundaries in z plane by the bicubic interpolation method. The corresponding segmentation from S4(c) in Supplemental Material also demonstrates no boundary artefacts.
To ensure that the SR results can be fairly compared with the HR ground truth, a histogram match method was implemented on the SR validation images using 'imhistmatch' function in MATLAB. The 'imhistmatch' function adjusted the histogram of the SR image to the HR ground truth reference image.

Pore Network Modelling
We used the conventional PNM approach presented (and available online) in Raeini et al. (2017Raeini et al. ( , 2018, which were updated versions of the original algorithms (Valvatne and Blunt 2004;Dong and Blunt 2009). Full details of the approaches can be found from references therein. Further validations of the PNM are available in (Bultreys et al. 2018;Zahasky et al. 2020;Jackson et al. 2021 Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 Step 9

Fig. 3 The overall workflow for validation of the reconstructed super-resolution images and petrophysical analyses
In summary, we use a maximal spheres algorithm to assign pore-bodies and throats to represent the pore space. The pore bodies and throats are then assigned shape-factors based on their geometry, and quasi-static capillary dominated drainage flow was simulated across the network, for a constant capillary pressure. At each capillary pressure equilibrium stage, single or multiphase transport (hydraulic or electric) can be simulated. Local conductivities are found either analytically or through empirical relationships, e.g., for corner-flow. Porebody potential was solved for the network by enforcing conservation of flux at each pore body. These potentials can be averaged at the inlet and outlet, which when combined with corresponding fluxes can be used to obtain macroscopic transport properties, e.g., permeability, relative permeability, and formation factor. A water/decane system was utilised as fluid properties on our PNM simulation. The microporosity is not considered in the PNM.

Results and Discussion
In this section, we first visually observe the reconstructed grey-scale images, measure the resulting PSNR/SSIM, and report the computational performance of the SR algorithms. We then present an objective means for data segmentation for macroporosity and microporosity determination. Microporosity maps are generated and quantified in terms of void fraction and heterogeneity. Lastly, typical petrophysical properties are evaluated using pore network modelling. Overall, we provide a robust quantitative assessment of the resulting SR images in comparison to HR ground truth images and MICP data. Figure 4 shows the four validation volumes-LR, HR ground truth (HR-GT), EDSR validation with histogram match (EDSR-HM), and CinCGAN validation with histogram match (CinCGAN-HM). The images demonstrate the finer features that were captured in the HR and SR images. In addition, all images were within a similar grey-scale range, which will be important for image segmentation and subsequent evaluation of physical accuracy, which is qualitative evident in Fig. 4 but also observed in the image histograms that will be presented in Sect. 3.2.

Reconstructed Images
Table 2 provides a performance comparison of the EDSR and CinCGAN networks. CinCGAN needed less time to train but more time for SR reconstruction than EDSR. Overall, the total computational time of 2D CinCGAN remained lower than EDSR. In addition, both PSNR and SSIM of EDSR were 15.93% and 35.04% higher than CinCGAN, respectively. This was because EDSR as a CNN-based method can immediately learn the mapping between LR and HR data using paired data, while unpaired CinCGAN as a GANbased approach causes more uncertainty when generating fake data from the learned distribution. In this study, both PSNR and SSIM are lower than standard reported values (Lim et al. 2017), this was because reported SR models are commonly trained using HR images and corresponding LR images from interpolated HR images. Our training data, however, was directly obtained from LR and HR micro-CT images. The uncertainties during micro-CT scanning and reconstruction processes will cause slight difference on the image pairs, which affects the PSNR and SSIM values.  Fig. 5e-h, we can see discrepancies between the images. The LR image displayed in Fig. 5e does not capture the grey-scale textures of the microporous regions well. Also, the edges between the grain and macropores lack sharpness. The blue boxes in Fig. 5f-h show detailed differences between the HR-GT, EDSR-HM, and CinCGAN-HM images. The EDSR-HM image shown in Fig. 5g restores the high-frequency information  when comparing with the HR-GT image in Fig. 5f. In contrast, CinCGAN-HM displayed in Fig. 5h creates what appears to be unrealistic grey-scale textures. Figure  The gradients for the line profiles at the interfaces between the macropores and grain were also similar for all images. A slight lag can be observed in CinCGAN-HM from macropore phase to grain phase in Fig. 5i. This was because that GAN-based approach can increase brightness inconsistency (Gulrajani et al. 2017). Thus, the histogram of CinCGAN-HM was slightly different with HR-GT after histogram match.

Image Segmentation
The watershed-based method was applied to segment the image volumes (Beucher 1979).
To implement watershed segmentation, we initially defined two markers for macro-pore and solid phases. Then morphological watershed transformation algorithm (Beucher 1992) was applied for interphase region growing. In general, the micropores in carbonate rock are also called sub-resolution pores (Lin et al. 2016) which cannot be resolved from the micro-CT images due to resolution limitations. The micropores can be defined by image resolution. For example, the voxel size of our high-resolution carbonate data is 2.68 µm which means that any pore diameters less than 2.68 µm can be regarded as micropores. Two regions (macropores and grains) were segmented initially. The microporosity was then defined within the 'grain' phase as a subsequent step. The main challenge for segmentation is the threshold selection, which usually results in a user bias. To compare the histograms of the HR-GT, EDSR-HM, and CinCGAN-HM images with the LR image, we cropped a subvolume (380 × 380 × 512 voxels) from the HR-GT, EDSR-HM and CinCGAN-HM images that had the same number of voxels as the LR image. Figure 6a shows the intensity histograms of the LR, HR-GT, EDSR-HM, and CinCGAN-images. It is clear that the LR, HR-GT, EDSR-HM, and CinCGAN-HM images have similar histograms, which means they can share similar thresholds for segmentation. This provides a comparative way to quantitatively appraise the physical accuracy of the images. The optimal segmentation thresholds, however, cannot be resolved directly from the histograms provided in Fig. 6a due to the wide intensity range between the main two peaks with relative high frequency. Herein, we calculated the image gradient magnitude map versus voxel intensity for the HR-GT image, as shown in Fig. 6b, to determine the optimal thresholds. Regions of low gradient magnitude with high frequency indicate pure phases, which are macropores or grain, while regions with high gradient magnitude are interfacial regions. We calculated intensity gradient magnitudes for the HR-GT image from twenty random interfacial regions. Then, the minimum gradient magnitude of 10.75 (intensity variation/voxel) was selected as the threshold of pure macropore phases. Regions with a gradient magnitude less than 10.75 were considered as pure macropore phase. In Fig. 6c, we extract a histogram for only regions with gradient magnitudes between 0 and 10.75. The Fig. 6 Optimal threshold determination on validation images for watershed segmentation. a Image intensity histograms of LR and HR-GT, EDSR-HM, CinCGAN-HM sub volumes (380 × 380 × 512 voxels), b Image intensity versus gradient magnitude histogram of HR-GT, c Image intensity versus gradient magnitude histogram of HR-GT between 0 and 10.75 intensity variation per voxel extracted histogram displays a clear separation between the macropores and grains. Therefore, we selected the optimal thresholds for watershed segmentation as 0-55 for macropore and 65-255 for grain, based on Fig. 6c. In addition, we also generated extra segmentations for the validation data by increasing/decreasing the optimal thresholds for sensitivity analyses.
Image segmentation provides a quantitative way to evaluate geometrical properties of an image. The differences between the EDSR-HM and CinCGAN-HM images can be observed in the segmented data (optimal thresholds). Figure 7a-d depicts the 2D segmentations over many pores. Both the EDSR-HM and CinCGAN-HM images captured the finer micropores that were also captured in the HR-GT image. Regions of interest (ROI) images are provided in Fig. 7e-h, which demonstrate that the EDSR-HM image can recover more representative pore structures than the CinCGAN-HM image, as noted by the blue box. Also, note that the segmented data presented in Fig. 7 are taken from the same region as the grey-scale images presented in Fig. 5.

3D Local Porosity Maps
To quantify the microporosity, we firstly multipled the grey-scale image by the corresponding segmented image (Micro-pore: 0, Grain:1) to obtain grey-scale images with 'grain' phase only. A local porosity map for the grain phase region was generated by where T (x,y,z) is the intensity of the local position (x, y, z) in the 3D image, T pore is the threshold of pure pore phase, T grain is the threshold of pure grain phase, and micro is the range of micropore porosity between 0 and 1.
The generated local porosity maps for the HR-GT, EDSR-HM, and CinCGAN-HM images are depicted in Fig. 8. Firstly, in Fig. 8a-d, we observe that both EDSR-HM and CinCGAN-HM images can accurately recognise the microporosity as recognised in the HR-GT image. However, when observing the ROIs in Fig. 8e-h, it is apparent that the EDSR-HM image in Fig. 8g restored most of the microporosity characteristics compared with the HR-GT image, while the CinCGAN-HM image showed a slightly larger fraction of microporosity. Table 3 provides the macro-/microporosity values for the 3D validation volumes using the optimal segmentation thresholds. The porosity results show that both micro-and macroporosities in the HR-GT image were representative of the MICP experimental data. The slight differences between MICP data and segmented data were caused by the segmentation error as well as the presence of heterogeneities since the validation images were not the same sample as used for MICP. In contrast, the LR image resulted in a large discrepancy when compared to the MICP data. The HR-GT image and MICP results showed that our selected optimal thresholds were accurate enough to represent the geometrical information of the related volume.
Overall, our SR models provided relatively consistent porosity results. Compared with the HR-GT image and MICP data, the EDSR-HM image slightly overestimated macro-/ microporosity, while the CinCGAN-HM image slightly underestimated them. Overall, the results from the EDSR-HM and CinCGAN-HM images were close to the corresponding  HR-GT images based on bulk micro-and macroporosity. Additional porosity results on various segmentations can be found from S5-S6 in the Supplemental Material in order to consider the uncertainty on the threshold values.
In addition, we also investigated how the microporosity was distributed in the images. To quantify the microporosity distribution, we calculated the Dykstra-Parson coefficient curves to measure the degree of heterogeneity (Dykstra and Parsons 1950;Tiab and Donaldson 2015). Figure 9 shows the Dykstra-Parson coefficient curves for HR-GT, EDSR-HM, CinC-GAN-HM and LR. Results showed that the Dykstra-Parson coefficients of the CinCGAN-HM image were closer to the HR image than the EDSR-HM image. This indicates that CinCGAN can recreate the features of SR images that are comparable to HR-GT level. In addition, the Dykstra-Parson coefficient of LR generally had larger bias than the HR-GT, EDSR-HM and CinCGAN-HM images.

PNM for Petrophysical Analyses
The previous results showed that both the EDSR-HM and CinCGAN-HM images can resolve the macro-/micropores accurately compared with the HR-GT results. We further implement a PNM on the validation images to measure permeability, formation factor, capillary pressure, and relative permeability. The HR image was used as the ground truth data and the LR image was used as a baseline measure to assess the accuracy that was gained by using the SR algorithms. Figure 10 shows the single phase/electrical flow PNM simulations for the LR/HR-GT/ EDSR-HM/CinCGAN-HM images. Each category contains seven segmentations with various thresholds around the pre-determined optimal. The HR results are represented on the X-axis as the benchmark. With thresholds increasing, porosity and absolute permeability increase appropriately in Fig. 10a, b for the LR/EDSR-HM/CinCGAN images. In general, the porosity results of both EDSR-HM and CinCGAN-HM were consistent with the HR-GT images within the tested threshold ranges. However, discrepancies could be found with the absolute permeability results where the CinCGAN-HM images had less deviation from the HR-GT images than the EDSR-HM images. Conversely, the formation factor results in Fig. 10c showed that the EDSR-HM results were more precise than the CinC-GAN-HM results. In addition, all simulation results for the LR images do not correspond to the HR image results and displayed high variability over the tested thresholds. This was because the LR data had ambiguous boundaries between the pore and grain phases, as demonstrated by the high number of voxels that exist between the two main histogram peaks, see Fig. 6a. Figure 10d shows the pore size distribution measured based on the local distance maximum method (Shabro et al. 2012;Chung et al. 2020;Wang et al. 2020).The pore size distributions of the EDSR-HM and CinCGAN-HM images were mostly equivalent to the HR-GT image with only a few smaller pores resolved in the HR-GT images. Whereas the LR image resolves only larger pores and provided a limited range of pore size compared to the pore size distributions of the HR-GT, EDSR-HM, and CinCGAN-HM images, the LR image also provided many more large pores than the SR and HR counterparts suggesting that the pore space was over segmented when using the optimal thresholds. Figure 11 shows the multiphase flow PNM results for the LR/HR-GT/EDSR-HM/ CinCGAN-HM images. The relative permeability curves in Fig. 11a demonstrated that both the EDSR-HM and CinCGAN-HM images were aligned smoothly with the HR-GT image, while the LR image results were less correlated. Particularly in the specific range of 0.6 < S w < 0.8 , the LR results show more non-continuous 'bounds' or 'step-like' features. This effect was more prominent in the non-wetting relative permeability curves shown in Fig. 11b, while the EDSR-HM/CinCGAN-HM images were consistent with the HR results.  (Shabro et al. 2012;Chung et al. 2020;Wang et al. 2020) As pore/throat sizes dominate how the relative permeability varies across the water saturation range. These 'step-like' features indicated that there is a narrower variation of pore sizes in the LR images, which was consistent with the pore size distribution results shown in Fig. 10d. In contrast, the relative permeability curves of the EDSR-HM, CinCGAN-HM, and HR-GT showed smoother transitions across the entire saturation range. This indicated that EDSR-HM and CinCGAN-HM can resolve relatively small macropores comparable to the HR-GT PNM results, while LR images only resolved the larger macropores. The capillary pressure curves are shown in Fig. 11c. The capillary pressure variations in the EDSR-HM and CinCGAN-HM images were more consistent with the HR-GT images than the LR images. The capillary pressure curves for the LR images generally meet the irreducible water saturation point earlier than the EDSR-HM, CinCGAN-HM, and HR-GT counterparts. This indicated that less of the smaller macropores were resolved in the LR image. At a given saturation point, capillary pressure for the LR image was lower than the EDSR-HM and CinCGAN-HM images as well as the HR-GT images. This means that the resolved average pore sizes of the LR images were larger than the EDSR-HM, CinCGAN-HM, and HR-GT images. Overall, the HR-GT, EDSR-HM and CinCGAN-HM images showed accurate correlations in the range of 0.5 < S w < 1 , while the capillary pressure from the HR-GT, EDSR-HM and CinCGAN images were underestimated in the range of S w < 0.5 . This can be considered as a resolution restriction since MICP can detect more tiny micropores than the micro-CT data. More non-wetting phase fluids move to those micropores in the MICP experiment at low S w . Consequently, the capillary pressure of nonwetting phase in MICP was greater than the capillary pressure estimated in the HR images. It should also be noted that MICP was conducted on another core plug of the sample and not the same core plug as used for imaging. In addition, when observing the relative permeability and capillary pressure in Fig. 11, it becomes evident that both the EDSR-HM and CinCGAN-HM images were less sensitive to the threshold variation than the HR-GT image. This effect actually reduced the user bias when determining the image segmentation settings. The effect occured because the SR deep learning models utilise a quantisation technique to reduce the model size and computational cost (Hong et al. 2020). Consequently, the segmentation of the quantised greyscale images in EDSR-HM and CinCGAN-HM have less noise and less intermediate greyscale values, which subsequently reduced their sensitivity to threshold values.

Conclusion
A comparative study was conducted using paired and unpaired super-resolution deep learning models for physically accurate digital rock images. A carbonate rock sample was scanned at low resolution and 4 × high resolution for EDSR and CinCGAN training. We then reconstructed an unseen low-resolution validation volume (380 × 380 × 512 voxels) to its super-resolution counterpart (1520 × 1520 × 2048 voxels) by EDSR and CinCGAN. A gradient-based method was implemented to select the optimal thresholds for image segmentation. Various segmentations were generated for macropores and grains around the optimal thresholds using a watershed-based method. The macroporosity and microporosity results obtained from the watershed segmentations were consistent with the HR image results as well as MICP experimental data.
Furthermore, petrophysical properties were simulated using a PNM in a drainage water/ decane system. Compared EDSR versus CinCGAN images with the high-resolution ground truth images, petrophysical properties showed that both the paired EDSR and unpaired CinCGAN methods can precisely restore the sharpness of the pores structures that were not well resolved in the LR image. In addition, the petrophysical properties of the EDSR and CinCGAN images were equivalent to HR images through various segmentations, while the LR image could not represent the characteristics of the HR image.
Unlike EDSR which is a CNN-based approach by learning immediate mapping between LR and HR data, CinCGAN aims to recreate realistic spatial features close to the distribution of real data. In other words, CinCGAN causes more uncertainty than EDSR since the realistic information was generated. Table 4 provides an overall performance comparison of EDSR and CinCGAN. The detailed methods of quantitative analyses can be found in Supplemental Material. Our results showed that CinCGAN can generate realistic SR images that have equivalent performance to EDSR but requires 22.5% less computational time than EDSR when considering both training and reconstruction times. This means that the unpaired GAN-based method is more flexible and less time-consuming than paired CNN method for real applications in digital rock.
One critical challenge is on how the trained SR model can be inferred to other types of digital rock images with similar textures. Jackson et al. (2021) illustrated that deep learning SR model can precisely reconstruct SR images for those micro-CT rock images with similar textures by comparing the physical accuracy of the SR images with HR ground truth. In this way, we may not need to train a new model every time for different rock samples. Other approaches using domain transfer techniques (Zhu et al. 2017;Tang et al. 2022a, b) are a possible means to integrate data sets into larger standard sets for image enhancement and quantification.
Overall, we introduced an integrated workflow to enhance digital rock image resolution by examining the physical accuracy of paired and unpaired deep learning methods. Our results showed that both paired EDSR and unpaired CinCGAN can reconstruct physically accurate SR images that were equivalent to the HR ground truth image. This unlocks new applications for using unpaired deep learning for digital rock image quality enhancement. The unpaired deep learning approach accelerates the application of SR methods since image registration is not required. Furthermore, decoupled digital rock data from retrieval platforms, such as the Digital Rock Portal (https:// www. digit alroc kspor tal. org/), can be exploited more efficiently to deal with a wide range of geological data for image upscaling in a physically accurate way. Further studies of unpaired methods can be conducted for image resolution improvement of multimineral rock images, or other types of images collected from other imaging modalities, such as transmission electron microscopy, scanning electron microscopy, and X-ray computed tomography.

Acknowledgements
The original micro-CT carbonate data used in this paper is available at https://www. digitalrocksportal.org/projects/362.

Author contributions
All authors contributed to the study. Conceptualisation, data processing, methodology, deep learning simulation, result analyses, and the manuscript writing were implemented by Yufu Niu. Numerical simulation was performed by Samuel J. Jackson. Data acquisition was conducted by Naif Alqahtani. Supervision and manuscript review were performed by Ryan T. Armstrong and Peyman Mostaghimi.

Funding
The authors have not disclosed any funding. Open Access funding enabled and organized by CAUL and its Member Institutions. Table 4 Overall comparison of EDSR and CinCGAN performance using the optimal segmentation. The number of training data for both networks is similar, and reconstruction time is estimated by the time cost of reconstructing a 1520 × 1520 × 2048 voxels SR volume from 380 × 380 × 512 voxels LR input validation data. The best performance value is in bold font