Improved Fracture Segmentation from Unwrapped Drill-Core Images Using an Innovative Two-Stage Segmentation Approach

While machine learning (ML) provides a great tool for image analysis, obtaining accurate fracture segmentation from high-resolution core images is challenging. A major reason is that the segmentation quality of large and detailed objects, such as fractures, is limited by the capacity of the segmentation branch. This challenge can be seen in the Mask Region-based Convolutional Neural Network (Mask R-CNN), which is a common and well-validated instance segmentation model. This study proposes a two-stage segmentation approach using Mask R-CNN to improve fracture segmentation from unwrapped-core images. Two CNN models are used: the first model processes full-size unwrapped-core images to detect and segment fractures; the second model performs a more detailed segmentation by processing smaller regions of the images that include the fractures detected by the first model. In addition, the procedure uses a new architecture of Mask R-CNN with a point-based rendering (PointRend) neural network module that can increase segmentation accuracy. The method is evaluated on approximately 47 m of core from four boreholes and results in an improvement to previous fracture segmentation methods. It achieves an increase in the average intersection over union of approximately 27% from the baseline (one-stage segmentation with standard Mask R-CNN). The enhanced fracture segmentation provides a mean for obtaining an accurate fracture aperture with an average error of less than 1 mm, which represents a reduction of 0.5 mm from the baseline method. This work presents a novel contribution towards developing an ML-based workflow for core-image analysis.


Introduction
Unwrapped-core images present a detailed view of the core surface and the structural features, such as fractures. In addition to being an alternative to physical cores, core images offer an opportunity for developing intelligent core analysis systems aided by the rapid advances in computer vision and machine learning (ML). The images provide a high-resolution three-dimensional representation of the core and its fracture characteristics, including fracture aperture, roughness, orientation, and spacing (Chakraborty and Mukherjee 2020;Tiwari et al. 2017). Studying fine features of high-resolution images is, however, still challenging. A major challenge to developing automated fracture identification is achieving the required accuracy for the task, especially in defining the fracture edges. Currently, accurate fracture segmentation for core analysis is lacking because of the limited research in this area. This study thus presents an innovative ML-based approach for accurate and high-resolution fracture segmentation from unwrapped-core images.
Automated and accurate fracture segmentation can facilitate the fracture characterization process by offering a significant reduction in the time and effort involved in the manual procedure. Fracture characterization is essential in many petroleum and mining applications, such as reservoir evaluation, reservoir modeling, and the assessment of rock mass strength (Agosta et al. 2007;Deere 1964;Nelson 2001a). Accurate and reliable fracture segmentation can be particularly beneficial for detailed fracture analyses, such as estimating fracture roughness or aperture sizes. For example, the recent work by Al-Fahmi et al. (2021) required time-consuming manual digitization of fractures in unwrapped-core images to accurately estimate fracture roughness and mismatch, which would be significantly faster with automatic identification. Moreover, automatic aperture calculations can be highly advantageous, if the fracture is segmented accurately, not only by accelerating the analysis but also by adding accuracy, as aperture sizes along the fracture can be considered instead of relying on a limited number of representative measurements. Fracture aperture has a major impact on the porosity and permeability of fractured reservoirs; for example, fracture porosity can have a significantly greater contribution to the fluid transport than the matrix porosity because of the greater interconnectedness (Nelson 2001b). Fracture aperture distribution is also important in multi-phase flow analysis and the relative permeability computation, as it can affect the capillary pressure and fluid saturation (Aljehani et al. 2018;Nelson 2001b). In this study, we develop a two-stage segmentation procedure that can enhance the segmentation of fractures from high-resolution core images and enable automatic calculation of fracture apertures.
Previous works are mainly focused on fracture detection from common fracture resources. One focus is on using core-tray images (Lemy et al. 2001;Ozturk and Saricam 2018). Unlike core-tray images, unwrapped-core images are taken by scanning the core in a 360°mode to capture the core surface from all angles while the core is rotated around its major axis (Paulsen et al. 2002;Schepers et al. 2001;Tiwari et al. 2017). The resulting images show a full view of fractures; therefore, they are more beneficial than core-tray images for fracture analysis and provide a wider range of applications. For example, fractures, especially those with high dips, are used as matching features during core reorientation by correlating them with the fractures from borehole image logs, allowing for a low-cost and indirect core orientation method (Paulsen et al. 2002). In addition, 360°fractures from unwrapped-core images allow for integration with borehole image logs (Schepers et al. 2001). Another major focus of previous studies is on fracture detection from borehole image logs (Cruz et al. 2017;Dias et al. 2020;Xavier et al. 2015). Fractures from cores are, however, essential in reducing uncertainties in log interpretation and beneficial in detecting fine fractures undetected by image logs (Fernández-Ibáñez et al. 2018;Genter et al. 1997). Modern core scanners provide high-resolution unwrapped-core images with resolutions up to 40 pixel/mm (Tiwari et al. 2017).
Moreover, previous works are highly focused on using image processing methods, but ML, such as the convolutional neural network (CNN), is more robust for image analysis. CNN offers better generalization than traditional image processing algorithms, owing to its ability to learn important features and disregard irrelevant data (Goodfellow et al. 2016). Modern ML algorithms can provide high-quality image analysis including instance or semantic segmentation Long et al. 2015;. While both are powerful ML tools, instance segmentation is distinctly different from the more commonly used semantic segmentation in digital core analysis, and seismic data interpretation (Karimpouli and Tahmasebi 2019;Niu et al. 2020;Wang and Chen 2021). With semantic segmentation, objects of the same category are segmented as a single object, while for instance segmentation objects are detected separately. An example application for instance segmentation, using the Mask Region-based CNN (Mask R-CNN) , is provided in Fig. 1a, where features in a street-level image are identified and segmented. Such technologies are an important step towards the operation of autonomous vehicles. So can this technology be translated to core analysis applications? The aim would be to develop an instance segmentation algorithm that identifies important core structural features, such as those displayed in Fig. 1b where Mask R-CNN is applied to core images for fracture identification. With this technology, fractures could be segmented and characterized on a per-fracture basis in an automated way. The efficacy of the method for core analysis, however, remains to be tested.
Mask R-CNN and its base object detection algorithms, Fast R-CNN and Faster R-CNN, can provide accurate fracture detection (e.g., Alzubaidi et al. 2022 andDias et al. 2020). In our previous work (Alzubaidi et al. 2022), we used Mask R-CNN to detect and analyze fractures from unwrapped-core images. The method, which was evaluated on two boreholes, achieved a Precision of over 90% for bounding box detection. On the other hand, the standard form of the Mask R-CNN algorithm had a limited fracture segmentation accuracy due to the uncertainty of pixel classification at fracture edges. Fracture segmentation accuracy was sufficient for obtaining fracture skeletons and estimating fracture dip angle with an error of almost 2°and dip direction with less than 11°error. However, uncertain fracture boundaries introduce a challenge in obtaining more detailed fracture characteristics, such as aperture size. The limitation/challenge is due to the fact that the accuracy of object segmentation by ML models depends on  (2017), which was used under the license (Creative Commons)] from the COCO test dataset (Lin et al. 2014) in (a), and the application of the model for fracture segmentation from unwrapped-core images in (b), showing the result from two-stage segmentation using Mask R-CNN + PointRend the size of the object compared to the resolution of the output raw mask (Kirillov et al. 2020), which is usually limited by computational time and memory requirements (e.g., He et al. 2017). In high-resolution core images, fractures appear as large objects that span the entire width of the image, but their apertures are finer features that require high-resolution images. The segmentation of such features is directly affected by the trade-off between field-of-view (computational and memory requirements) versus image resolution. Therefore, we propose a solution using a two-stage segmentation approach and a modified Mask R-CNN model.
Two-stage segmentation has been applied in various applications where multi-scale features are present in an image. Wang et al. (2019) developed a two-stage segmentation framework using two three-dimensional U-Nets to enhance the segmentation of organs from computed tomography images. The first U-Net segments the whole image, and the second U-Net segments a small part of the image that includes the organ of interest detected by the first model. A similar approach was used by Amiri et al. (2020) to refine the segmentation of ultrasound breast images, which resulted in an improved average Dice score, a segmentation metric (Tustison and James 2009), of up to 14% approximately. Alternatively, Kirillov et al. (2020) introduced a point-based rendering (PointRend) module to produce high-resolution masks from semantic segmentation and instance segmentation models with only a minor increase in the memory requirement. PointRend can be incorporated with Mask R-CNN to improve segmentation quality.
Herein, we develop and evaluate a two-stage segmentation approach using Mask R-CNN with the PointRend module (Mask R-CNN + PointRend) to enhance fracture segmentation from core images. The first stage detects full-size fractures in the input image. The second stage provides finer segmentation by processing small regions of the image that include detected fractures. The models are trained with images from two boreholes and evaluated on images from four boreholes, including core images from a new block. The proposed method focuses on the enhancement of pixel-wise segmentation of detected fractures and is not concerned with optimizing the detection accuracy. The segmentation results are evaluated by (1) visual inspection, (2) calculating the intersection over union (IoU) between manual and predicted segmentation of fractures in new core images, and (3) calculating fracture aperture sizes.
The article is organized into four sections. Section 2 provides a description of the unwrapped-core images used in our work and the preparation of the datasets for training and testing the segmentation models. The section also introduces the proposed two-stage procedure, the tested architectures for the instance segmentation models, the training of the models, and the evaluation metrics. Section 3 discusses results from model training and presents a comprehensive quantitative and qualitative analysis of the method performance on the test images. Lastly, conclusions of the study are provided in Sect. 4.

Methodology
The proposed procedure is demonstrated in Fig. 2. The input is an unwrapped-core image with an unrestricted core length (up to 1 m, approximately). Fractures in the image are detected and segmented using two ML models. Each model is developed using training images of different sizes according to its purpose. Consequently, Model A is trained on whole core images and is used mainly to detect fractures by capturing their shape, while Model B is trained on high-resolution fracture patches and is used to refine fracture segmentation. Finally, the refined segmentation is used to calculate fracture apertures.

Data Preparation
The core images were obtained from four boreholes located in the Barents Sea (boreholes 7220/6-1 and 7220/11-3) and North Sea (boreholes 16/2-17 B and 16/2-18 S).  The images were acquired from Lundin Energy (Lundin). Boreholes 7220/6-1 and 7220/11-3 are located on the Loppa High in the Barents Sea, whereas boreholes 16/2-17 B and 16/2-18 S are from the Johan Sverdrup field and are located on the Utsira High in the North Sea. Core intervals and net lengths of the images used for each borehole are summarized in Table 1. The sections from boreholes 7220/6-1 and 7220/11-3 were from the Ørn formation that is dominated by carbonate rock, while the sections from 16/2-17 B and 16/2-18 S were taken from the Asgard formation, which is composed mainly of claystone, marlstone, and limestone, and from the basement group that comprises highly fractured granite (The Norwegian Petroleum Directorate). The images were divided into three datasets for training, validating, and testing the CNN models. The training and validation images were selected from two boreholes, as detailed in Table 2, with an initial number of 100 images that included 204 fractures. Data augmentation was used to expand this number to 1,200 images, which were divided into 1,000 images for training and 200 images for validation. The augmentation included blurring, sharpening, altering the brightness and contrast, and horizontal/vertical flipping. The test images were selected from the four boreholes (two blocks) to include images from boreholes 16/2-17 B and 16/2-18 S obtained from new regions and formations. The test dataset was used for the assessment and comparison of the segmentation approaches; the dataset included a total of 72 images comprising 160 fractures (Table 2). Fractures in the training, validation, and test images were manually labeled using the Supervisely annotation online tool (Supervisely). The images were 811 to 2,905 pixels high and 356 to 8,980 pixels wide, with a resolution of 2.6 to 9.0 pixels/mm, as shown in Table 1. To build the model for the second stage of segmentation, we created three smaller images from each fracture in the training and validation datasets following the procedure explained in Sect. 2.2. The resulting images were 144 to 1,057 pixels high and 74 to 3,457 pixels wide.
Before inputting the images into the models during each stage of segmentation, the images were resized to the default size of Mask R-CNN (discussed in Sect. 2.4). After resizing, the images had different resolutions, which depended on the original resolution and on image dimensions that controlled the resizing ratio. The full-size images (first segmentation stage) had an average resolution of 2.1 pixels/mm, while the small images (second segmentation stage) had a higher average resolution of 11.2 pixels/mm.

Two-Stage Approach
The two-stage segmentation approach is presented in Fig. 3. Model A received a fullsize image and output a mask for each fracture, like the ordinary implementation of an instance segmentation model. Model B was then used to segment small regions of the image to enhance the masks produced by Model A. The enhancement was achieved by dividing each fracture in the image into three regions based on the fracture location obtained by Model A; cropping included an overlapping ratio of 5% of the fracture height. We also tested dividing the fracture into four regions instead of three, but it did not provide a further improvement from the three-region division. The cropped images were then re-segmented by Model B, and the high-resolution masks produced were used to replace the original mask from Model A (Fig. 3).
In some cases, Model B yielded more than one mask for a single image when it included parts of nearby fractures. To decide on the correct mask for such images, the mask with the highest IoU with the primary mask from Model A was used. An example is provided in Fig. 4. The definition of IoU is provided in Sect. 2.5.

Experimental Investigation of Mask R-CNN Architectures
To define the architecture of Model A and Model B, two recognized instance segmentation architectures were evaluated: (1) standard Mask R-CNN and (2) Mask R-CNN + PointRend. In general, the Mask R-CNN architecture comprises a region proposal network (RPN), a deep CNN for feature extraction (backbone), and a RoIAlign layer, followed by branches for bounding box detection, classification, and mask segmentation . The purpose of the RPN is to propose candidate object regions, or anchors, in the input image of different sizes and height-to-width ratios. The RoIAlign layer crops features extracted by the backbone of the proposed regions and resizes them to a uniform shape, for example, 7 × 7 or 14 × 14 pixels. Then the detection and classification branches yield a classification score and bounding box coordinates for each region of interest (RoI). In parallel, the segmentation branch outputs a binary mask for each RoI. The main difference between standard Mask R-CNN and Mask R-CNN + PointRend lies in the segmentation branch. The mask with the highest IoU (40%) was selected to update region C in the primary mask instead of the mask with 1% IoU that belongs to a nearby fracture In standard Mask R-CNN, the segmentation branch receives RoIs of 14 × 14 pixels and yields 28 × 28 pixels masks. The branch consists of four convolutional layers, a deconvolutional layer, and an output layer. This architecture is based on using a backbone with a feature pyramid network (FPN) (Lin et al. 2017), which is a common architecture of Mask R-CNN. Due to the small size of the output masks in the standard architecture, the segmentation of large objects has limited accuracy, and thus fine details at the object boundaries are usually not segmented properly. However, increasing the capacity of the segmentation branch would require significantly more computational time and memory. Therefore, an alternative method is required, such as PointRend.
The PointRend module was developed by Kirillov et al. (2020) to refine the segmented masks generated by instance segmentation models, such as Mask R-CNN, with a minor increase in the memory requirement. PointRend can increase the resolution of the output masks to 224 × 224 pixels with nearly 30 times less memory and computational time than that required by the original segmentation branch to yield the same resolution. The module predicts higher-resolution masks from both lowerresolution masks and features of selected points from the masks by sampling points of low certainty. Pixel-wise prediction is performed by a multi-layer perceptron (Kirillov et al. 2020). For Mask R-CNN + PointRend, the mask head predicts 7 × 7 pixels for each RoI, then the PointRend module refines the prediction to 224 × 224 pixels through multiple iterations of point sampling and mask resolution enhancement. The mask resolution in each iteration is 7 2 , 28 2 , 56 2 , 112 2 , and 224 2 pixels, respectively. An illustration of the mask refinement process is provided in Appendix A.
Our configurations for both the standard Mask R-CNN and Mask R-CNN + PointRend used ResNet-50-FPN as the backbone, and added additional height-towidth ratios, which define the RPN anchors in Model A, to match the shapes of the fractures. We added height-to-width ratios of 4:1 and 6:1 to the original ratios of 2:1, 1:1, and 1:2. The rest of the architectures remained as the defaults used in He et al. (2017) and Kirillov et al. (2020).

Training Implementation
We used the Mask R-CNN and Mask R-CNN + PointRend implementations from Detectron2 (Wu et al. 2019). The training was initialized with pre-trained models on Microsoft COCO images (Lin et al. 2014). The input images were resized to have a maximum of 1,333 pixels on the large edge, which is the default size of Mask R-CNN (Wu et al. 2019). The models were trained for 30,000 iterations using a batch size of two and four images for Model A and Model B, respectively. The training was conducted using an NVIDIA GeForce RTX 2080 Ti GPU with 11 GB of memory. An initial learning rate of 0.01 was defined, and this rate was halved every 10,000 iterations. All other training parameters were like the default values used in Detectron2 (Wu et al. 2019).
The training process was assessed using segmentation loss for the training dataset (Fig. 5) to monitor convergence of loss during training. Segmentation loss was calculated as the average binary cross-entropy loss , given by Eq. (1). In addition, Precision of segmentation was also used to assess the training process and ensure that the trained models had no overfitting problem by comparing the model's performance on the training and validation datasets (Fig. 6). Precision was calculated using Eq. (2), where true positives are determined by a specific IoU threshold between the objects segmented by the model and the true objects. Precision was reported as an average at different IoU thresholds (0.5-0.95). The training and validation performance are discussed in more detail in Sect. 3.
where N is the total number of pixels in the mask. For each binary mask, y i is the ground-truth label at location i, and P(y i ) is the probability of i being assigned the correct label by the model. Precision = true positives true positives + false positives . (2)

Evaluation of the Two-Stage Method
Fracture segmentation using the two-stage method and PointRend module was evaluated both qualitatively and quantitatively. Visual inspection was first performed to confirm correctly detected fractures and remove outliers from the subsequent analysis by comparing all detected fractures to the ground-truth fractures. Visual inspection was also performed to assess the quality of the segmentation, including comparing the one-stage and two-stage segmentation as well as comparing standard Mask R-CNN and Mask R-CNN + PointRend results. The quantitative analysis was achieved by calculating the IoU (intersection over union) between the ground-truth and predicted fractures and measuring the effect of mask improvement on the accuracy of the fracture aperture calculation. The IoU, which is also known as the Jaccard similarity index (Jaccard 1912), is defined as the percentage of the overlap area between two objects to the total area of the objects. IoU is defined in Eq. (3), where objectA and objectB are the ground-truth and predicted masks, respectively. IoU ranges from 0% (no match) to 100% (a perfect match).
Fracture apertures were measured from binary masks segmented manually and by the model. Approximately 60 measurements were taken from each fracture following the procedure explained in Appendix B. To evaluate aperture calculations by the models, the absolute error was computed as the difference between aperture measurements from the ground-truth and predicted masks.

Results and Discussion
The analysis of the results focuses on evaluating the improvement in fracture segmentation, as this is the main purpose of the study. Therefore, the results of the fracture bounding box detection are briefly discussed in this section. The detection performance on the training and validation datasets, shown in Fig. 6a, b, indicates that the four models had excellent detection accuracy with a bounding box Precision of more than 97%. Similarly, fractures were detected accurately from the test images using the overall procedure, which was achieved primarily by Model A (see Figs. 2, 3). The test images have a total of 160 ground-truth fractures identified from visual inspection. Detected fractures by the models were inspected and assessed against ground-truth fractures in the images. The inspection showed that standard Mask R-CNN correctly identified 153 fractures (out of 166 total predictions) and Mask R-CNN + PointRend correctly identified 151 fractures (out of 165 predictions). This yields nearly the same Precision of almost 92% from both architectures. The main purpose of using the PointRend extension is to enhance fracture pixel-wise segmentation rather than improving the detection accuracy of the algorithm. Therefore, the detection results are comparable to our previous work using standard Mask R-CNN (Alzubaidi et al. 2022). This fracture detection accuracy (using Mask R-CNN with or without the PointRend module) presents novel results. For example, laser scanning of physical cores achieved lower detection accuracy of 90% (Harraden et al. 2019). Moreover, the Mask R-CNN-based method is trained to scan high-resolution images with a length of up to 1,333 pixels, which allows the processing of long core sections, in comparison to the work by Dias et al. (2020) that detects fractures from much smaller log images of length up to 300 pixels, using Fast R-CNN (Girshick 2015).
On the other hand, the segmentation results of the training and validation datasets differed among the models. Figure 6 shows that Mask R-CNN + PointRend outperformed standard Mask R-CNN on both high-resolution and low-resolution segmentation. This also agrees with the trend of the segmentation loss during training (Fig. 5) which shows that Mask R-CNN + PointRend resulted in a smaller segmentation loss. Moreover, high-resolution segmentation by Model B had significantly higher segmentation accuracy than Model A, as evident in Fig. 6. The segmentation Precision increased from 4% for the full-size images (Model A) to 16% for the small images (Model B) in standard Mask R-CNN, and from 24 to 47% in Mask R-CNN + PointRend. Consequently, the results on the validation images indicate that similar improvement in the segmentation can be obtained for the test images using the proposed procedure. The latter is thoroughly analyzed in Sects. 3.1 to 3.3.
The processing time of 1 m of core using the proposed two-stage method was approximately 4.4 s by standard Mask R-CNN and 3.3 s by Mask R-CNN + PointRend, compared to 1.3 s by the one-stage standard Mask R-CNN. Although the focus of the study is on optimizing the accuracy rather than the computation efficiency, there was a moderate increase in the computation requirement.

Qualitative Analysis
Visual inspection of the predicted fracture masks suggested that the two-stage method provided more accurate fracture segmentation than the one-stage method using both Mask R-CNN architectures. The proposed method showed visual improvement in the segmentation accuracy of fractures in the test images. For example, in Fig. 7a, enhancement in the fracture segmentation by standard Mask R-CNN can be noticed in the mask obtained from the second stage of segmentation with a 38% IoU compared to the initial mask with a 15% IoU. Similarly, for Mask R-CNN + PointRend (Fig. 7b), the two-stage method yielded a more detailed segmentation for the fracture than that provided by the one-stage method, as indicated by the increase in the IoU from 36 to 62%. Overall, based on visual inspection, the results showed that IoU greater than 45% provided good segmentation accuracy, IoU of 35 to 45% provided moderate segmentation accuracy, and IoU less than 35% had limited segmentation accuracy. Although perfect segmentation requires IoU close to 100% according to Eq. (3), the interpretation of the IoU depends on the shape of the segmented objects. For example, an IoU of around 50% can indicate more accurate segmentation for a thin object, such as a fracture, than an object of the same area but with a smaller perimeter, such as a circle. This is mainly due to the fact that segmentation error mostly occurs at the object boundary. A similar observation was found by Byun et al. (2021) regarding the interpretation of the IoU for fracture segmentation.
From the visual comparison of the results from both architectures, it was observed that Mask R-CNN + PointRend segmented the fractures more accurately than standard Mask R-CNN, in particular for fine fractures, such as that shown in Fig. 8. This  (Kirillov et al. 2020;Suresha et al. 2021;Zhang et al. 2020). On the other hand, Mask R-CNN + PointRend introduced an issue when detecting adjacent or overlapped fractures. In some cases, Mask R-CNN + PointRend failed to distinguish between the main fracture and other fractures within the detected region, as appeared in approximately 20 fractures. An example is shown  Fig. 9. This problem, however, appeared less often (by almost 50%) in the results from the standard Mask R-CNN.
Another example of the results is shown in Fig. 10 which presents the segmentation results of the proposed method using test images from Block 16 and Block 7220. The figure shows that Mask R-CNN + PointRend provided better and more detailed segmentation than standard Mask R-CNN, while the latter had better detection for complicated fractures. Mask R-CNN + PointRend achieved higher IoU for all fractures in both images. Regarding the detection accuracy, both architectures detected four out of five fractures in Fig. 10 (top), missing fracture no. 2 in the input image. In Fig. 10 (bottom), all ground-truth fractures were detected by standard Mask R-CNN, while fracture no. 4 was not detected by Mask R-CNN + PointRend. The predicted fracture labeled in purple color was not considered a ground-truth fracture due to its uncertain edges, which makes the segmentation evaluation not feasible. Overall, the proposed method has a high potential to provide an intelligent workflow for core analysis, as evident in Appendix C, which shows its application on extended sections of cores from the new Block (16). Almost all fractures in the input images were detected (27 fractures detected out of 28 ground-truth fractures), except only one fracture (no. 7) in the right-hand side image. The main reason for misdetection by the model is the existence of very fine fractures given that the images were downsampled before passing to Model A for detection. The downsampling ratio is proportional to the image length; thus fine fractures in long core sections are less visible after downsampling. Such fractures can be detected by the model after passing a shorter section of the core, as shown in Appendix C.

Gain in Segmentation Accuracy
Based on the calculation of IoU between predicted and ground-truth fractures in the test images, the two-stage method showed improvement from the one-stage method by both the standard Mask R-CNN and Mask R-CNN + PointRend models. The average IoU increased from 28.9% to 43.6% using standard Mask R-CNN and from 44.3% to 55.5% using Mask R-CNN + PointRend, as shown in Table 3 and Fig. 11. The Table 3 Comparison between the average IoU of fractures detected in the test images from blocks 7220 and 16 (see Table 2) using the one-stage and two-stage methods

Architecture
Block IoU by the one-stage method (%) IoU of the test images from Block 7220, which was used in the models training, was greater than that of the images from Block 16; however, the latter showed slightly more improvement by the two-stage method ( Table 3). As demonstrated in Fig. 11, the distributions of the IoU obtained by the two-stage method shifted towards higher IoUs. The two-stage method provided a significant increase in the number of fractures segmented with IoU greater than 40% by standard Mask R-CNN and greater than 50% by Mask R-CNN + PointRend. The number of fractures with IoU greater than 40% increased from 38 to 97 using standard Mask R-CNN, as shown in Fig. 11 (left). Similarly, with the use of Mask R-CNN + PointRend, the number of fractures with IoU greater than 50% almost doubled using the two-stage method, increasing from 56 to 105 fractures, as shown in Fig. 11 (right).
The difference between the IoUs between the one-stage and two-stage segmentation methods using Mask R-CNN is statistically significant with a p value less than 0.001. Similarly, the difference between the IoUs between the one-stage and two-stage segmentation methods using Mask R-CNN + PointRend is statistically significant with a p value less than 0.001. The p value was calculated based on the paired t-test implemented using the SciPy statistical library (Virtanen et al. 2020).
Overall, the best segmentation, in terms of the IoU, was obtained by the twostage method using Mask R-CNN + PointRend. The IoU results agreed with the visual inspection results, which indicated that Mask R-CNN + PointRend offered more accurate and detailed fracture segmentation than the standard Mask R-CNN. We attribute this to the mask refinement process by the PointRend module as introduced in Sect. 2.3 and Appendix A. The two-stage segmentation using Mask R-CNN + PointRend achieved a greater than 55% average IoU, with approximately 70% of the fractures having IoUs greater than 50%.
This represents a significant gain in segmentation accuracy from our previous work using one-stage segmentation with standard Mask R-CNN (Alzubaidi et al. 2022), offering a 26.6% increase in the IoU. Moreover, the method has several advantages over recent work by Byun et al. (2021), which used a U-Net-based CNN for fracture segmentation: (1) it achieved higher segmentation accuracy represented by an approximately 17% increase in the IoU, (2) it offers instance segmentation, which is more beneficial for fracture analysis in cores than semantic segmentation used in the method of Byun et al.,and (3) it was evaluated on a larger test set (three times more images).

Improvement in Aperture Calculation
Aperture calculation results showed an increase in aperture accuracy after using the enhanced fracture masks from the two-stage method using both tested architectures. The average aperture error for the 153 fractures detected by standard Mask R-CNN decreased from 1.4 to 1.1 mm, or by 21%. The number of fractures having an error smaller than 1 mm increased from 42 to 73 fractures by the two-stage method, as demonstrated in Fig. 12 (left). Likewise, the average error for the 151 fractures detected by Mask R-CNN + PointRend decreased from 1.1 to 0.9 mm, or by 18%. The number of fractures with aperture error less than 0.5 mm significantly increased from 8 to 53, as shown in Fig. 12 (right).  An example of the improvement in the aperture calculation accuracy along a fracture from Block 16 is provided in Fig. 13. After the second stage of segmentation, calculated apertures were in better alignment with the ground-truth-based measurements than those obtained from the first stage. For this example, the aperture error decreased from 1.1 mm to 0.5 mm using Mask R-CNN (Fig. 13a, b) and from 0.9 mm to 0.4 mm using Mask R-CNN + PointRend (Fig. 13c, d).
The resulting errors were affected by other factors in addition to the segmentation accuracy. A major factor was the intactness of the core around the fracture. Obtaining accurate aperture calculations from the predicted masks requires having intact and clear fracture edges, such as the example fractures shown in Fig. 14 (right). On the other hand, accurate calculations of aperture were not feasible in cases where fracture edges were damaged or core pieces at the fracture boundary were visibly missing, as shown in Fig. 14 (left). Such features influenced both the ground-truth measurement and automated CNN-based measurements. Aperture error for such fractures, thus, could vary depending on the ground-truth manual segmentation, which can be subjective, as the true fracture edges are not always well defined.
Overall, the two-stage segmentation using Mask R-CNN + PointRend reduced the error from aperture calculation by 0.5 mm from the previous procedure presented in Alzubaidi et al. (2022). This represents a 35% increase in the aperture calculation accuracy. Accurate determination of fracture aperture is essential in reservoir engineering, as a fracture can be the main path for fluid flow in porous media. Fracture aperture has a significant impact on the rock and rock/fluid properties such as porosity, permeability, relative permeability, and fluid distribution (Aljehani et al. 2018;Nelson 2001b). Thus, the determination of a fine feature of the fracture using ML can broaden the applicability of the technique, which offers low-cost and rapid fracture analysis.

Conclusions
This study focused on improving fracture segmentation from unwrapped-core images to outperform existing ML-based methods of fracture analysis. Our method produced high-resolution fracture masks using a modified Mask R-CNN architecture (with a point rendering extension) and a two-stage segmentation procedure. The improvement was obtained by the second stage of segmentation, while the detection was conducted mainly by the first stage. Therefore, optimizing the detection accuracy was beyond the scope of the study. However, the detection results confirmed high Precision of fracture detection (92%) by instance segmentation after testing different Mask R-CNN models on new core images (from a new region).
Fracture segmentation results from the test images were assessed quantitatively and qualitatively. The analysis showed that two-stage segmentation can increase the accuracy of segmentation for both Mask R-CNN architectures; however, the best results were obtained by Mask R-CNN + PointRend. The latter provided an IoU of 55% compared to manual segmentation. Enhanced fracture segmentation resulted in noticeable improvement in the aperture error distribution, especially for errors greater than 1 mm for standard Mask R-CNN and greater than 0.5 mm for Mask R-CNN + PointRend. Overall, a comparison between standard Mask R-CNN and Mask R-CNN + PointRend demonstrated that the latter provided more accurate and detailed segmentation than the former, but it was affected by overlapping and neighboring fractures.
The proposed novel method presents segmentation accuracy of fractures that was in a good agreement with manual segmentation. The innovation of the study is improving fracture segmentation and providing a reliable definition of fracture edges using instance segmentation. Instance segmentation is a powerful tool for the identification and characterization of image features, where its utility has been readily realized for various applications. It has a major contribution towards developing image-based fracture analysis, as it enables the analysis of fracture details such as aperture and potentially fracture roughness. This expands the applicability of the technology, as fracture aperture is an important feature for reservoir characterization and evaluation influencing reservoir porosity and permeability and, hence, is essential for fluid flow studies and reservoir modeling. Other advantages of the method include the detection of fine fractures smaller than 1 mm and the ability to determine fracture dip angle and direction from fracture masks.
The method can be further optimized in terms of computation cost, evaluation on new rock types, and the differentiation between mechanical and natural fractures. In addition, while the current study mainly focuses on fracture analysis, future works can build on this platform to provide a spate of tools for the automated analysis of core images. Overall, we envisage a "Digital Geologist" machine learning platform based on deep learning and CNN.
article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/ by/4.0/.