Abstract
Geothermal energy is a sustainable energy source that meets the needs of the climate crisis and global warming caused by fossil fuel burning. Geothermal resources are found in complex geological settings, with faults and interconnected networks of fractures acting as pathways for fluid circulation. Identifying faults and fractures is an essential component of exploiting geothermal resources. However, accurately predicting fractures without high-resolution geophysical logs (e.g., image logs) and well-core samples is challenging. Soft computing techniques, such as machine learning, make it possible to map fracture networks at a finer resolution. This study employed four supervised machine learning techniques (multilayer perceptron (MLP), random forests (RF), extreme gradient boosting (XGBoost), and support vector regression (SVR)) to identify fractures in geothermal carbonate reservoirs in the sub-basins of East China. The models were trained and tested on a diverse well-logging dataset collected at the field scale. A comparison of the predicted results revealed that XGBoost with optimized hyperparameters and data division achieved the best performance than RF, MLP, and SVR with RMSE = 0.02 and R2 = 0.92. The Q-learning algorithm outperformed grid search, Bayesian, and ant colony optimizations. The blind well test demonstrates that it is possible to accurately identify fractures by applying machine learning algorithms to standard well logs. In addition, the comparative analysis indicates that XGBoost was able to handle the complex relationship between input parameters (e.g., DTP > RD > DEN > GR > CAL > RS > U > CNL) and fracture in geologically complex geothermal carbonate reservoirs. Furthermore, comparing the XGBoost model with previous studies proved superior in training and testing. This study suggests that XGBoost with Q-learning-based optimized hyperparameters and data division is a suitable algorithm for identifying fractures using well-log data to explore complex geothermal systems in carbonate rocks.
Graphical abstract
Article highlights
-
Machine learning provides a promising solution to predict fractures in geothermal reservoirs.
-
Multiple well-logging datasets were collected to train and test the models at the field scale.
-
XGBoost provides the most accurate prediction out of four machine learning models.
-
This study provides a template for predicting fractures in enhanced geothermal systems.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Advanced horizontal drilling techniques and multistage hydraulic fracturing have revolutionized the extraction of hydrocarbons from unconventional reservoirs (Liu et al. 2019; Yasin et al. 2018b). These techniques have a significant potential for storing large volumes of carbon beneath complex subsurface geological conditions and enhancing geothermal systems (EGS) (Vo Thanh et al. 2022; Yasin et al. 2022a). Therefore, the subsurface fracture network should be identified, mapped, and characterized to optimize completion designs for enhancing hydrocarbon production, developing high permeability zones for EGS, and determining reservoirs' integrity for capturing and storing carbon. A natural fracture significantly impacts reservoir behavior, affecting fluid flow and conductivity. Identifying and characterizing these fractures will allow us to understand better the subsurface and design completion strategies that maximize production and minimize costs (Siler et al. 2019; Yasin et al. 2023a).
In recent years, the growing global energy demand has led to EGS becoming an attractive option for sustainable energy production and reducing carbon emissions (Okoroafor et al. 2022b). However, the successful management of EGS reservoirs depends heavily on reservoir simulation, which can be expensive due to factors like reservoir heterogeneity, matrix-fracture interactions, and complex physical processes (Ishitsuka and Lin 2023). The development of EGS requires improving the permeability of hot crystalline rocks. Still, significant technical and non-technical challenges must be overcome for EGS reservoirs to be economically viable. EGS requires an increase in the permeability of deep rocks as their natural permeability is usually insufficient (Jia et al. 2022). Technical barriers include the need for better stimulation technologies, while non-technical obstacles include land access, permitting, and financing issues.
Geothermal energy exploration and development demand a complete understanding of geothermal reservoirs, including lithology, porosity, permeability, reservoir temperature and pressure, stress, faults and fractures, and chemical composition. The availability of geochemical, geological, and geophysical information in oil-producing regions, along with an extensive understanding of geological models and geothermal resource assessment, are vital for evaluating and developing geothermal resources (Xiao et al. 2023; Guo et al. 2024). A recent study conducted in Nevada and Oregon (USA) showed potential geothermal reservoirs in structurally complex geological settings. Siler et al. (2019) reported that closely spaced and intersecting faults in such settings exhibit high-temperature anomalies. The study also suggests that structurally complex geological settings may be conducive to geothermal energy development. However, the complexity of the geological environment makes it challenging to identify, evaluate, and develop a fracture network. Further research in structurally complex geological settings could help unlock new renewable energy sources and contribute to a more sustainable future (Kölbel et al. 2020).
With technological advances and a better understanding of geothermal systems, detecting and extracting proper quantities of energy from geothermal systems is becoming more accessible. Energy from geothermal sources can become the world's largest energy source, displacing oil and natural gas combined in terms of energy supply potential (Okoroafor et al. 2022a). Over the past few decades, it has gradually developed into a clean, renewable form of thermal energy in urban and rural areas. Geothermal energy is widely accepted due to its reliability, low cost, ease of maintenance, and the fact that there are no ground-related restrictions regarding laws and policies. US Geological Survey report estimates that at least 70 percent of the world's geothermal resources remain unexplored (Siler et al. 2019). It is found throughout the world. Developing geothermal resources safely and responsibly can significantly benefit local communities, the economy, and the environment.
Because of recent scientific and technological breakthroughs, several authors have tried to investigate the true potential of geothermal fluid by applying the most advanced machine learning (ML) approaches, such as different inversion schemes used for seismic data (Qiang et al. 2020). Soft computing techniques including convolutional neural networks (CNN) offer solutions to seismic inversion for predicting the spatial variation of rock properties. Okoroafor et al. (2022b) used the CNN method to predict fracture development zones in the geothermal reservoir by mapping logging porosity and seismic multi-attributes. Dalgaard et al. (2019) used artificial neural networks–Hidden Markov models to classify lithofacies and estimate porosity in geothermal reservoirs. Recently, Yasin et al. (2023b) used a soft-porous petrophysics model combined with a CNN-LSTM-based deep neural network to extract porosity and fractures in geothermal reservoirs. They validated the model's fracture prediction results using formation micro-image (FMI) logging. The application of ML techniques to engineering problems has been successful due to their ability to predict outcomes and optimize processes. Nevertheless, one of the significant challenges of this application is the presence of human bias in the selection of ML algorithms and hyperparameters (James et al. 2022; Okoroafor et al. 2022b). It is important to note that this bias is highly dependent on the expertise and domain knowledge of the operator, which may result in suboptimal selections that do not accurately reflect the problem at hand. The impact of human bias can be reduced through physics-informed ML. These ML models, incorporating informed prior knowledge and constraints on the issue, can offer consistent prediction results in extrapolation tasks and remain robust to dataset limitations (Karniadakis et al. 2021). The physics-informed neural network can predict the fracture development zone, temperature, and pressure in various geothermal reservoirs in structurally complex geological settings (Ishitsuka and Lin 2023). The neural network trains the data with physics-informed knowledge to ensure that intelligent decision tools can be adopted for similar data characterization. Therefore, the physics-informed neural network has the potential to map natural fractures using well logs to optimize the fracture mapping for EGS at a low cost.
In addition to ML, thin section analysis FMI logs and scanning electron microscope (SEM) are useful for identifying natural fractures. Examining rock samples under a petrographic microscope allows one to gain insight into their microscopic structure (Ren et al. 2022; Healy et al. 2017). With specialized staining techniques, fractures become more visible, and it is possible to differentiate between primary structures, such as bedding planes, and secondary features, like fractures (Golsanami et al. 2019). These are valuable methods for understanding the characteristics and nature of rock fractures, but they are time-consuming, labor-intensive, and prone to biases.
The geometries and fracture networks of naturally fractured reservoirs play a significant role in the fluid flow of geothermal energy production. Identifying faults and fractures is, therefore, crucial to unlocking the full potential of a fault-bounded EGS (Okoroafor et al. 2022a; Zhou et al. 2021). A sonic and micro-resistivity imaging log has traditionally been used to describe natural fracture intensity (i.e., how many fractures are present per meter). However, these techniques have several significant disadvantages, including the risk of human bias during the interpretation of logs and the long interpretation time required. Although these deficiencies are well-known within the industry, no standard procedures have been developed to address them. Recent research has demonstrated that fracture zones have marked effects on the log reading (Aghli et al. 2019, 2016; Aguilera 2008; Tokhmchi et al. 2010; Tokhmechi et al. 2009). According to these studies, density, resistivity, sonic, and porosity logs, in particular, gamma-ray (uranium), and caliper logs, in general, are the best tools for mapping fractures or fracture zones (Martinez et al. 2002; Saboorian-Jooybari et al. 2015; Yasin et al. 2018a). A resistivity log can detect changes in electrical conductivity caused by fractures. Fractures are generally characterized by lower resistivity values than the surrounding rock, making identifying their location and orientation possible. Additionally, sonic and density logs may help identify fractures, as they detect changes in rock mechanical properties associated with fractures (Yasin et al. 2022b).
In this study, an analysis of fractures in geothermal carbonate reservoirs was performed using four supervised ML techniques: multilayer perceptron (MLP), random forest (RF), extreme gradient boosting (XGBoost), and support vector regression (SVR). The ML models have been trained and tested based on diverse well-logging data, including the natural fracture density from acoustic borehole image logs. It has been demonstrated that ML algorithms with optimized hyperparameters and data division can predict fractures with high accuracy from limited standard well logs through the application of ML algorithms. The application of ML algorithms in the exploration and exploitation of geothermal resources has been demonstrated to be successful in identifying fractures in heterogeneous reservoirs.
2 Contributions to the current literature
The characterization of fractures is an essential part of the geothermal reservoirs. They play an important role in increasing the efficiency of geothermal systems since they act as flow conduits and contribute significantly to reservoir permeability. Fractures are particularly important in EGS. Characterizing fractures involves determining their nature, network, density, and stress-related properties, which is relevant for drilling wells and modeling and predicting geothermal reservoir performance (Okoroafor et al. 2022b; Yasin et al. 2023a, b). Four main types of fractures are typically encountered in the reservoir rock: netted fractures, straight split fractures, oblique fractures, and horizontal fractures (Wang et al. 2023). A typical fracture density in the reservoir is 10 to 20 fractures per 10 m, with an average fracture width of 0.2 to 9 mm and an average fracture length of 0.5 to 48 cm.
It is laborious and time-consuming to manually pick the fractures using resistivity image logs spanning over 1000 m. There is no doubt that human input is required in various processes. However, the quality of such input may not always be perfect. Consequently, it is difficult to establish a ground truth for fracture density prediction, which is necessary to set a benchmark for an EGS. Moreover, this process is prone to human error, making it more challenging to achieve an accurate interpretation. As a result, manual fracture identification can take months to complete, and it is possible to consider the potential for human error in the interpretation process. Also, this process is often subject to bias due to the different levels of expertise present in the process of interpretation.
Figure 1 shows the resistivity-based image log analysis of the carbonate rock to pick faults and fractures for the EGS. The fault's strike is mainly NEE-SWW, close to the strike of NEE-SWW's major fault in the region. The occurrence of unconformity and major faulting in the formation has caused significant instability, evidenced by the substantial changes in the attitude of the strata near depths of 4300 m and 4600 m (shown with arrow). The FMI images of the faults at these depths further demonstrate the significant changes in the deposition of the strata above and below the faults. These significant disparities emphasize the need for technology that can accurately monitor and predict faults and fractures to eradicate the possibility of human bias and minimize the time and effort required for fracture analysis. Considering this critical need, Zhang et al. (2015) proposed a method for estimating fracture stiffness, in situ stresses, and elastic parameters of a hypothetical naturally fractured reservoir. The process is based on a hybrid artificial neural network (ANN) and genetic algorithm (GA) approach, commonly called displacement back analysis. ML techniques have gained popularity in geothermal exploration and production in recent years. A recent study by Okoroafor et al. (2022a) and Zhou et al. (2021) demonstrated using a deep neural network surrogate model to determine the fracture density and fractal dimension of a geothermal reservoir's discrete fracture network. Several researchers have highlighted the potential of ML techniques for improving geothermal exploration and production; however, they cannot identify fracture density in complex geothermal systems. Advanced algorithms and models can help us better understand geothermal reservoirs and improve the efficiency and sustainability of geothermal energy production. In this study, we applied advanced ML algorithms and chose the most appropriate one for accurate fracture density prediction using well logs.
3 Geothermal settings and reservoir characteristics
Geothermal energy is a promising renewable energy source that can reduce greenhouse gas emissions and dependence on fossil fuels. Northeast China has significant potential for geothermal projects. This region can greatly benefit from the use of geothermal energy. The study area's Depression is a typical geothermal field that undergoes heat conduction. The primary sources of heat are the mantle and crust (Zhou et al. 2023). Water supply comes from atmospheric precipitation (Fig. 2b). In the south Zhuangdong Depression, in the north Bozhong Depression, and to the east and west, the Chengbei low uplift and Bonan low uplift, respectively. The subsurface structure is controlled by NNE and NE near EW basement faults, with a structural pattern of 'one uplift', 'two depressions', and many 'buried hills' with high, middle, and low surrounding, EW zoning, and NS blocking (Fig. 2a).
There are mainly high-resistivity fractures and drilling-induced fractures. Both high-conductivity and high-resistivity fractures belong to structural fractures, primarily formed by releasing regional palaeotectonic stress. The high-resistivity fracture is formed by filling with high-resistivity material or closed fracture (Fig. 1). Thin section and core analysis show that microfractures are relatively developed and corroborate the presence of natural fractures (Fig. 3). According to reservoir sample analysis, the porosity of the reservoir is 2.1% to 34%, and permeability is 0.01 to 800 mD. The Paleozoic strata is the main geothermal reservoir, which belongs to the porous geothermal reservoir of high-resistivity fractures.
4 Materials and methods
The process of developing the ML model begins with collecting datasets. More than 6,000 data points were interpreted from FMI logging data in 7 wells (307, 306, 302, 313, 301, 39, 40). For the model, we used five wells (306, 302, 301, 39, 40) for training, one well (307) for validation, and one well (313) for testing. This data division allowed us to train our model and accurately assess its performance effectively. We chose this approach to ensure that our model was reliable and that the results of our tests were accurate. The model features were acquired using conventional well logs, including compressional velocity (DTP), gamma-ray (GR), shallow and deep resistivity (RS and RD), density, caliper, and neutron porosity (CNL) logs. We obtained the near-wellbore fracture density from a resistivity-based image log (FMI) for labeling and training purposes. The FMI log provides high-resolution images of the formation, which can be used to characterize fractures in the near-wellbore region.
The steps in developing ML models are data pre-processing, optimizing the model and hyperparameters, and testing the performance of the best model on a blind well.
4.1 An Overview of data pre-processing
Pre-processing is essential for obtaining accurate and reliable results from data analysis. Scanning of all the features present in the dataset is conducted as part of this process. For this purpose, the K-Nearest Neighbor (KNN) imputation was used to fill in all missing values in the dataset. It is useful when less than 20% of data is missing for a specific feature (Troyanskaya et al. 2001). This study employed the KNN imputation technique with 90% accuracy.
Additionally, control limit lines (CLLs) were used to identify outliers in the dataset. The CLL method utilized the mean of each batch of data (plus or minus) with standard deviations to establish the upper and lower control limit lines. This approach allowed the outliers for each feature to be identified and separated from the rest of the data points. The results were more accurate as the analysis did not consider the outliers. Statistical characteristics of features are often studied in data science to identify underlying patterns in the data. Nevertheless, specific ML techniques may be challenging when features present a nonstandard distribution. It is often necessary to transform the features to a normal distribution using the quantile transformation to address this issue. A normal distribution of features tends to improve the performance of ML techniques. ML algorithms are often based on the Gaussian assumption, which assumes normal distributed data. Therefore, we can improve the performance of these algorithms by transforming nonstandard distributions into normal distributions. Table 1 displays the statistical characteristics of input features. The high standard deviation of the GR and resistivity logs indicates a large variation in the data. The heterogeneous nature of the rocks and minerals in the formation or a change in lithology causes the variation.
Identifying highly correlated parameters in a dataset is an essential step in selecting and reducing features to improve the accuracy and efficiency of ML models. A highly correlated dataset contributes to multiple complexities during the training process by reducing the variability of the input dataset. When two or more features are highly correlated, their effects on the target variable may be redundant, resulting in redundant inferences during model training. In this case, the model may overfit, resulting in perfect results on the training dataset but poor results on the test dataset. The Pearson correlation coefficient and Spearman rank correlation coefficient are two methods for detecting positively correlated features. The Pearson correlation coefficient (r) is a statistical measure used to determine the strength and direction of the linear relationship between two variables. It has a range from −1 to + 1. When 'the r-value is 1, the two components being compared are perfectly linearly related. For this study, those features with a 90% or more correlation were considered “strong” correlations and removed from the dataset. Removing highly correlated features from the dataset helped improve the model's accuracy and avoid overfitting. The number of features has been reduced from 12 to 8 after feature reductions (Fig. 4). This reduction was made possible through collinearity detection and feature transformation techniques.
4.2 Model selection and hyperparameter tuning
The dataset collected from the geothermal well logs consists of 13 training features and 6029 rows of records. The sampling interval was set at one record per 0.05 microsecond, significantly improving the ML algorithm's data analysis and prediction capabilities. With this level of precision in the data, the ML algorithm can better identify patterns and trends, leading to more accurate predictions. The target values for natural fractures obtained from the FMI log interpreted by individuals have the same number of rows. These values range from 0.0 to 2.2 natural fractures per meter. All features were carefully selected to ensure accuracy and effectiveness. In addition, min–max normalization was used to normalize all features. This normalization ensures that all features are represented on a common scale to prevent one particular feature from dominating the analysis due to its relative importance.
The process of partitioning data is essential to the development of ML models. This step separated the dataset into two groups: testing and training. Training groups develop training models while testing groups evaluate the performance of predictive models. The ratio of partitioning data into training and testing has not been established according to any specific principle. Earlier research has shown that the percentage of training data affects the accuracy of ML models (Vo Thanh et al. 2022). To improve the accuracy of ML models' outputs, the data proportions for the training model need to be considered in several ways. We partitioned the entire database into four segments (training and test datasets): 60–40%, 70–30%, 80–20%, and 90–10%, where the first partition represents the training group and the second part describes the testing group. Table 2 illustrates various regression algorithms, their corresponding hyperparameters, the models evaluated, and the parameters used to optimize their performance. The pre-processing of the data, selection of the ML model, and optimization of the hyperparameters were significantly faster than expected; the entire process took approximately an hour. This workflow is much more efficient than manual pre-processing, yielding excellent results. This workflow illustrates the power of automation and its ability to accelerate workflows.
4.3 Workflow of developing machine learning models for fracture identification
Several steps were involved in the ML model for predicting fractures in carbonate geothermal reservoirs. The workflow is summarized into five steps, as shown in Fig. 5.
-
1.
The conventional logging data along with FMI and thin sections were collected and pre-processed to develop ML models.
-
2.
Cross-plot of logging curves were drawn for the fracture labels and non-fracture labels, the response characteristics of conventional logging of fractures were defined, and sensitivity curves were selected. Then one or more wells are randomly selected as blind wells, and the rest of the data is used for training, validation, and testing the model.
-
3.
Due to the huge difference in the proportion of fractured and unfractured label data, it is easy to ignore the fractured label data with a small proportion when it is directly used for model training, resulting in a poor fracture identification effect. The data balancing method used in this study was under-sampling. After under-sampling, the unfractured label data that was originally too high is relatively balanced.
-
4.
Supervised ML techniques for target data were proposed using the four scenarios of database segments. Then, the models with good recognition effects in both the training set and test set were selected as the optimal model for fracture recognition.
-
5.
From four ML algorithms developed for predicting fractures in carbonate rocks, the best model was determined based on the performance of each model.
-
6.
The final step was a blind well test to evaluate the validity of the best model. In detail, one or more wells not used for training are used as inputs, and the output of the model was compared with the FMI and thin sections description results. If the fourth and fifth steps are completed, it is considered that the established model has high accuracy and can be applied to fracture identification.
4.4 Evaluation metrics
The fracture prediction model performance was evaluated using R2, RMSE, and mean absolute error (MAE) metrics (Eqs. 1–3).
where fracture prediction is estimated using \({\widehat{{\varvec{Y}}}}_{{\varvec{i}}}\), while \({{\varvec{Y}}}_{{\varvec{i}}}\) represents the measured fracture value, \(\overline{{\varvec{Y}} }\) is the average value of measured fracture, and N indicates the total number of samples.
5 Results and discussion
5.1 Sensitivity analysis of well logs for fractures
Sensitivity analysis of logging curves is fundamental to fracture identification and is crucial in ensuring the accuracy of fracture. By analyzing log response characteristics using well logs labeled by fracture descriptions from thin sections and FMI logging fracture interpretations, the fracture logging response characteristics of the reservoirs can be determined. Figure 6 shows the cross plots of the fracture and non-fracture development segment. The data distribution represents the fracture development section and the non-fractured section of the curve, respectively. The more obvious the differentiation between the fracture and non-fracture curves, the better the differentiation effect of the curve. There is a difference in the peak values of the curves between fractured and non-fractured samples.
It is important to note that there are specific physical property changes along the fracture development, such as low density, high neutron porosity, differentiation of shallow and deep resistivity, and variations in borehole diameter. These features are often used to identify and evaluate the development of fractures. However, each logging curve is a result of the combined effects of lithology, fractures, fluids, boreholes, and other factors, making the logging response characteristics very complex.
5.2 Performance of ML models on validation and test dataset
When selecting the most appropriate ML method, there is no one-size-fits-all solution. A careful consideration of the characteristics of the data is essential to choosing the most appropriate method for the particular scenario (Vo Thanh et al. 2022). Four scenarios were examined to determine the most suitable ML method for a given database segment. Table 3 summarizes the predictions made by four ML models. We observe that the training and testing phases of the developed ML models are accurate to 70 to 90% in all scenarios of data proportions. XGBoost and RF models were found to predict fractures based on 6029 data samples accurately.
The overall performance of the ML model on each data division shows that the XGBoost method is more accurate with high fracture prediction accuracy than RF, MLP, and SVR. XGBoost achieves stable prediction results both in the training dataset and testing dataset. Also, XGBoost achieves the highest prediction accuracy for all data portions, ranging from 80 to 20%. According to the results, the XGBoost model reached the highest level of accuracy out of all four ML models, both on the training and testing data portions. The SVR model, on the other hand, has the lowest accuracy among the four models. As a result, XGBoost, RF, MLP, and SVR were ranked in order of overall performance. The models were further evaluated by dividing the data into 80%–20%. Table 3 compares the prediction performance achieved for different regression problems based on the percentage of data division. It is important to note that the optimal ratio of data division may vary depending on the specific regression problem. Therefore, it is recommended to carefully consider this factor when developing regression models and selecting the appropriate percentage of data division. The effectiveness of ML methods depends significantly on the division of the data during training and testing. Carefully selecting the data division can enhance prediction performance and lead to successful ML results.
A cross plot of testing performance based on data division 80–20% by four ML models for fracture prediction is shown in Fig. 7. Specifically, XGBoost and RF models have proven to be highly effective in fitting FMI log data to predictive results (Fig. 7a, b). In addition to their accuracy, both models are well known for their ability to handle large datasets. The MLP and SVR models show overfitting and fail to establish a good correlation between the FMI logs data and predicted fracture density (Fig. 7c, d).
Figure 8 compares fracture density prediction for four ML models in the blind test well. The XGBoost predicts relatively close results to the original fractures, indicating good prediction accuracy. In contrast, the MLP and SVR models have many predicting points spreading far from the original fractures, indicating a lower prediction accuracy (Fig. 8c, d). This finding emphasizes the importance of selecting the appropriate ML algorithms for a given problem to achieve accurate predictions. Thus, the results presented in a blind test well demonstrate that XGBoost is the most reliable model with optimized hyperparameters among all the tested models for fracture density prediction. XGBoost outperforms all other models in accuracy and robustness. The testing points of the XGBoost distribution in the blind well are consistent with the FMI data (Fig. 8a). The study's findings are expected to significantly contribute to the development of more accurate and reliable models for fracture density prediction.
5.3 Relative importance of influence predictors
It is important to note that a variety of critical factors influence fractures. To understand how each input feature affects the fracture prediction for geothermal reservoirs, it is essential to analyze the crucial variables in the XGBoost model. Figure 9 demonstrates the results of ranking important features for fracture prediction in the present study. Figure 9 shows that DTP is an essential feature of fracture identification in a geological formation. A fracture in a rock formation, which consists of a break or crack, can significantly affect transmission speed. This is because fractures create spaces within a formation that are less dense than the surrounding rocks. A lower density may reduce the ability of the formation to transmit signals. Furthermore, open fractures allow fluid to flow more quickly through the formation. This phenomenon can further reduce the density of the formation, making it even more difficult for signals to penetrate. Thus, fractured zones will likely experience slower transmission speeds (Yasin et al. 2018a, 2022b). Therefore, sonic logs are the best fracture indicator.
Identifying fractures and fluid content in a reservoir using deep and shallow resistivity is critical in the oil and gas industry. The deep resistivity (RD) values are typically lower in fracture zones containing fluid invasion, typically salt water. As a result, the fluid in fractures can significantly decrease RD. The lower the RD value, the greater the probability of fractures in the reservoir (Saboorian-Jooybari et al. 2015). According to previous research, it has been observed that density (DEN) tends to decrease notably at fracture zones (Yasin et al. 2022b). Kölbel et al. (2020) reported that it is caused by an increase in fluid volume in these areas, as the density of fluids is lower than that of rocks (Aghli et al. 2019). Therefore, a developed model incorporating appropriate input features can simplify utilizing ML models to map fracture networks in EGS. Streamlining this process will enable the researchers to map fracture networks more quickly and easily.
5.4 Error analysis and comparison of XGBoost model optimization
The XGBoost model is evaluated for its applicability and performance based on William's plot. These plots allow researchers to gain insight into the most significant variables in their data. The plot is drawn by considering the standardized residual (SR) and leverage values (Vo Thanh et al. 2022). The calculation of leverage parameters involves several statistical steps. These steps are described in detail in Hemmati-Sarapardeh et al. (Hemmati-Sarapardeh et al. 2016). Figure 10 shows William's plot of the XGBoost regression model used for data applicability and error analysis. It is evident from the figure that the majority of data points are bounded close to the zero line, indicating that the prediction of fracture density is statistically valid and the proposed models are reliable. The training and testing results analysis reveals that a significant portion of the data falls within the 3 ≤ SR ≤ 3 range. Distribution of this kind is desirable in many modeling and prediction tasks because it allows for greater accuracy and precision. However, few data points deviate from the zero line, which can be attributed to outliers in the dataset. The data analysis suggests that the XGBoost model has high reliability and statistical trueness and is suitable for predicting fracture density.
A comparative analysis was performed to assess the performance of the XGBoost model using several optimization algorithms such as Q-learning, grid search, Bayesian optimization, and ant colony optimization. Table 4 presents a comparison of the prediction accuracy and computational time for each algorithm. It is important to note that all the algorithms were executed on the same computer configuration.
Figure 11 compares the fracture prediction performance of XGBoost models optimized with different algorithms. The XGBoost model using Q-learning achieves an R2 of 0.95 and an RMSE of 0.0018%.
5.5 Comparison of the performance of XGBoost with the previous study
The XGBoost model in this study has been found to have the highest accuracy compared to other popular ML models like RF, MLP, and SVR. However, it is essential to assess the stability and robustness of the XGBoost model to ensure its reliability. We compare the performance of the XGBoost model with previously developed ML models proposed by Fathi et al. (2022). The comparison confirms that the XGBoost model performs consistently and provides accurate predictions. Table 5 summarizes the statistical indicators for the XGBoost model with the proposed features and hyperparameters tuning based on an improved Q-learning algorithm.
XGBoost is superior to the previous KNeighbors model in producing the highest R2 and lowest RMSE values throughout the training and testing phases. The performance metrics indicate that the XGBoost model makes accurate and consistent predictions.
5.6 XGBoost model verification
5.6.1 Fracture porosity and permeability
XGBoost's application perspective is verified by comparing its predictions to actual measurements of reservoir fracture porosity. The results can help determine the effectiveness of the XGBoost model in predicting porosity and identifying potential reservoirs. Figure 12 shows the regression analysis for XGBoost prediction and reservoir fracture porosity in a blind test well of carbonate rocks. The results shown in the figure are well-fitting, with R2 values of 0.72. On comparing, it can be observed that the prediction performance of XGBoost closely corresponds with that of reservoir fracture porosity. In addition, Table 6 compares the fracture density measured from FMI logs and the XGBoost model. The comparative analysis conducted in a blind test well indicated positive results regarding the efficacy of the developed XGBoost model. The results confirm that the model is suitable for exploring EGS. The model's success in the blind test well further solidifies its potential for future EGS exploration applications.
Permeability is a crucial property of carbonate rocks, and its influence on fracture density distribution is essential. We conducted sensitivity analyses to determine the effect of permeability on fracture density distribution.
Figure 13 presents the results of a sensitivity analysis of permeability and fracture density distribution derived from the XGBoost model. The results show higher permeability values are associated with higher fracture density distribution values. The analysis suggests that the XGBoost model can be used to explore EGS accurately. The model can capture the relationships between permeability and fracture density distribution, providing a reliable and efficient tool for analyzing EGS.
5.6.2 Thin section analysis
Understanding the distribution, orientation, and connectivity of fractures in geothermal reservoirs through thin-section analysis is crucial for assessing reservoir permeability and fluid flow pathways and ultimately optimizing geothermal energy extraction strategies (Kölbel et al. 2020). Figure 14 corroborates the presence of natural and healed fractures, supporting the accuracy of identifying natural fractures from the XGBoost model. The agreement between the ML predictions and thin section analysis strengthens the reliability of the results and provides valuable insights for geothermal reservoir characterization. The research contributes to understanding fracture networks in geothermal systems and can aid in optimizing geothermal energy extraction strategies. Further studies could explore the integration of ML with other geophysical and geological data for a more comprehensive analysis of fracture systems in geothermal reservoirs.
5.7 XGBoost model’s application in spatial fracture distribution
We employed the XGBoost model to determine fracture density spatial distribution (Fig. 15a, b). The fracture density from the XGBoost model in the study area is high along the highly fractured wells (302, 307, 313), evidenced by the thin sections of these wells (Fig. 14). Furthermore, promising areas for EGS based on high fracture density can be selected for hydraulic fracturing operations. Therefore, comprehending the characteristics and distribution of these fractures is imperative for optimizing hydraulic fracturing design. The red color in the seismic profile shows the areas of high fracture density where fluid flow and heat transfer are expected to be most efficient. These locations optimize geothermal wells and maximize energy extraction from the subsurface.
6 Conclusions
-
ML workflows have remarkable potential for reducing the time and cost of fracture mapping in EGS while producing accurate, robust, reproducible, and assessable results. ML algorithms facilitate data analysis more quickly, accurately, and highly reliably. Furthermore, selecting ML models with optimizing hyperparameters and data division can eliminate human bias from the decision-making process, leading to more accurate and impartial results.
-
For predicting fractures in carbonate geothermal reservoirs, the XGBoost model was found to be the most reliable model. XGBoost performs better in both phases (training and testing) for a blind test well than other ML algorithms: RF, MLP, and SVR. XGBoost is faster and more accurate, achieving the highest correlation factor (R2 = 0.92 and RMSE = 0.02). Optimization of the XGboost model using Q-learning outperformed the grid search, Bayesian optimization, and ant colony optimization.
-
Fracture network mapping plays a vital role in understanding the behavior of EGS reservoirs and optimizing their performance. An effective fracture network mapping can be achieved with the appropriate input features, resulting in improved reservoir management and optimization.
-
The XGBoost model selected in this study provides a reliable and accurate method for predicting fractures in any geological formation for EGS. Analysis, evaluation, and validation demonstrate its utility and potential for widespread application.
-
An analysis of permeability and thin-section confirmed the presence of natural fractures that may affect the reservoir's potential as a source of high-temperature fluid.
7 Limitations
ML models exhibit strong stability and scalability, as they can handle both large-scale and small-sample data and are capable of efficiently processing data of varying sizes. However, ML models have certain limitations.
-
The feature representation process needs to be strengthened. Since the original feature vectors are high-dimensional, the enhanced feature vectors used in the XGBoost model can easily be overwhelmed.
-
As a supervised learning method, these models place high demands on the quality of the labels in the original data. Therefore, for the problem of intelligent identification of fractures in conventional well logs using core and FMI logs, it is crucial to select fracture label data located in the middle of a continuous fracture development segment and discard marginal label data to ensure label quality.
-
The issue of data imbalance in the original dataset should not be ignored. Choosing an appropriate data balancing method to improve the distribution proportion of different labels in the dataset is crucial for ensuring the training effectiveness of the fracture identification model.
Availability of data and materials
The data and materials used for this study are confidential. Readers may have access to the Python code.
Abbreviations
- FMI:
-
Formation micro-image
- φF:
-
Fracture porosity (%)
- SVR:
-
Support vector regressor
- RF:
-
Random forest regressor
- MLP:
-
Multi-layer perceptron
- XGBoost:
-
Extreme gradient boosting
- GR:
-
Gamma-ray
- RS:
-
Shallow resistivity
- RD:
-
Deep resistivity
- DTP:
-
P-wave sonic (μs)
- DTS:
-
S-wave sonic (μs)
- DEN:
-
Density (g/cm3)
- CNL:
-
Neutron porosity
- CAL:
-
Caliper
- U:
-
Uranium
References
Aghli G, Soleimani B, Moussavi-Harami R, Mohammadian R (2016) Fractured zones detection using conventional petrophysical logs by differentiation method and its correlation with image logs. J Petrol Sci Eng 142:152–162
Aghli G, Moussavi-Harami R, Mortazavi S, Mohammadian R (2019) Evaluation of new method for estimation of fracture parameters using conventional petrophysical logs and ANFIS in the carbonate heterogeneous reservoirs. J Petrol Sci Eng 172:1092–1102
Aguilera R (2008) Effect of fracture dip and fracture tortuosity on petrophysical evaluation of naturally fractured reservoirs, Canadian International Petroleum Conference, pp PETSOC-2008-110
Dalgaard E, Bredesen K, Mathiesen A, Balling N (2019) De-Risking Geothermal Plays by Seismic Reservoir Characterisation 2019(1):1–5
Fathi E et al (2022) High-quality fracture network mapping using high frequency logging while drilling (LWD) data: MSEEL case study. Machine Learning with Applications 10:100421
Golsanami N et al (2019) Distinguishing fractures from matrix pores based on the practical application of rock physics inversion and NMR data: A case study from an unconventional coal reservoir in China. J Nat Gas Sci Eng 65:145–167
Guo X, Ding C, Wei P, Yang R (2024) Theoretical analysis of the interaction between blasting stress wave and linear interface crack under high in-situ stress in deep rock mass. Int J Rock Mech Mining Sci 176:105723. https://doi.org/10.1016/j.ijrmms.2024.105723
Healy D et al (2017) FracPaQ: A MATLAB™ toolbox for the quantification of fracture patterns. J Struct Geol 95:1–16
Hemmati-Sarapardeh A, Ameli F, Dabir B, Ahmadi M, Mohammadi AH (2016) On the evaluation of asphaltene precipitation titration data: modeling and data assessment. Fluid Phase Equilib 415:88–100
Ishitsuka K, Lin W (2023) Physics-informed neural network for inverse modeling of natural-state geothermal systems. Appl Energy 337:120855
James B, Daniel Y, David C (2022) Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. PMLR, pp 115–123
Jia Y, Tsang C-F, Hammar A, Niemi A (2022) Hydraulic stimulation strategies in enhanced geothermal systems (EGS): a review. Geomech Geophys Geo-Energy Geo-Resour 8(6):211
Karniadakis GE et al (2021) Physics-informed machine learning. Nat Rev Phys 3(6):422–440
Kölbel L et al (2020) Identification of fracture zones in geothermal reservoirs in sedimentary basins: a radionuclide-based approach. Geothermics 85:101764
Liu BO, Jin L, Hu C (2019) Fractal characterization of silty beds/laminae and its implications for the prediction of shale oil reservoirs in Qingshankou formation of northern Songliao basin, Northeast China. Fractals 27(01):1940009
Liu B et al (2023) Seismic characterization of fault and fractures in deep buried carbonate reservoirs using CNN-LSTM based deep neural networks. Geoenergy Sci Eng 229:212126
Martinez L, Hughes RG, Wiggins ML (2002) Identification and characterization of naturally fractured reservoirs using conventional well logs. https://api.semanticscholar.org/CorpusID:17723264.
Okoroafor ER, Offor CP, Prince EI (2022a) Mapping relevant petroleum engineering skillsets for the transition to renewable energy and sustainable energy, SPE Nigeria Annual International Conference and Exhibition, pp D031S017R005
Okoroafor ER et al (2022b) Machine learning in subsurface geothermal energy: two decades in review. Geothermics 102:102401
Qiang Z, Yasin Q, Golsanami N, Du Q (2020) Prediction of reservoir quality from log-core and seismic inversion analysis with an artificial neural network: a case study from the Sawan Gas Field, Pakistan, Energies
Ren C, Yu J, Liu X, Zhang Z, Cai Y (2022) Cyclic constitutive equations of rock with coupled damage induced by compaction and cracking. Int J Mining Sci Technol 32(5):1153–1165. https://doi.org/10.1016/j.ijmst.2022.06.010
Saboorian-Jooybari H, Dejam M, Chen Z, Pourafshary P (2015) Fracture identification and comprehensive evaluation of the parameters by Dual Laterolog Data, SPE Middle East Unconventional Resources Conference and Exhibition, pp D021S005R004
Siler DL et al (2019) Three-dimensional geologic mapping to assess geothermal potential: examples from Nevada and Oregon. Geotherm Energy 7(1):2
Tokhmchi B, Memarian H, Rezaee MR (2010) Estimation of the fracture density in fractured zones using petrophysical logs. J Petrol Sci Eng 72(1):206–213
Tokhmechi B, Memarian H, Noubari HA, Moshiri B (2009) A novel approach proposed for fractured zone detection using petrophysical logs. J Geophys Eng 6(4):365–373
Troyanskaya O et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525
Vo Thanh H, Yasin Q, Al-Mudhafar WJ, Lee K-K (2022) Knowledge-based machine learning techniques for accurate prediction of CO2 storage performance in underground saline aquifers. Appl Energy 314:118985
Wang Y, Peng J, Wang L, Xu C, Dai B (2023) Micro-macro evolution of mechanical behaviors of thermally damaged rock: a state-of-the-art review. J Rock Mech Geotech Eng. https://doi.org/10.1016/j.jrmge.2023.11.012.
Xiao D, Liu M, Li L, Cai X, Qin S, Gao R, Li G (2023) Model for economic evaluation of closed-loop geothermal systems based on net present value. Appl Thermal Eng 231:121008. https://doi.org/10.1016/j.applthermaleng.2023.121008
Yasin Q, Du Q, Ismail A, Ding Y (2018a) Identification and characterization of natural fractures in gas shale reservoir using conventional and specialized logging tools, SEG Technical Program Expanded Abstracts 2018. SEG Technical Program Expanded Abstracts. Society of Exploration Geophysicists, pp 809–813
Yasin Q, Du Q, Sohail GM, Ismail A (2018b) Fracturing index-based brittleness prediction from geophysical logging data: application to Longmaxi shale. Geomech Geophys Geo-Energy Eo-Resour 4(4):301–325
Yasin Q, Majdański M, Awan RS, Golsanami N (2022a) An analytical hierarchy-based method for quantifying hydraulic fracturing stimulation to improve geothermal well productivity, energies
Yasin Q, Majdański M, Sohail GM, Vo Thanh H (2022b) Fault and fracture network characterization using seismic data: a study based on neural network models assessment. Geomech Geophys Geo-Energy Geo-Resour 8(2):41
Yasin Q, Gholami A, Majdański M, Liu B, Golsanami N (2023a) Seismic characterization of geologically complex geothermal reservoirs by combining structure-oriented filtering and attributes analysis. Geothermics 112:102749
Yasin Q, Liu B, Majdański M, Golsanami N (2023b) Fracture density prediction using CNN-LSTM deep neural network for geologically complex geothermal reservoirs, vol 2023, no 1, pp 1–5
Zhang S, Yin S, Yuan Y (2015) Estimation of fracture stiffness, in situ stresses, and elastic parameters of naturally fractured geothermal reservoirs. Int J Geomech 15(1):04014033
Zhou G, Lin G, Liu Z, Zhou X, Li W, Li X, Deng R (2023) An optical system for suppression of laser echo energy from the water surface on single-band bathymetric LiDAR. Optics Lasers Eng 163:107468. https://doi.org/10.1016/j.optlaseng.2022.107468
Zhou Z, Roubinet D, Tartakovsky DM (2021) Thermal experiments for fractured rock characterization: theoretical analysis and inverse modeling. Water Resour Res 57(12):e2021WR030608
Acknowledgements
This work was financially supported by the Hainan Province Science and Technology Special Fund (ZDYF2023GXJS009).
Author information
Authors and Affiliations
Contributions
QY and BL conceptualized the study and secured funding for it. YD provided the necessary data and shared their expertise on the EGS. HVT and QD conducted laboratory work and developed ML models. QY wrote the initial draft, and all authors approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Informed consent
Not available.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yasin, Q., Ding, Y., Du, Q. et al. Fault and fracture network characterization using soft computing techniques: application to geologically complex and deeply-buried geothermal reservoirs. Geomech. Geophys. Geo-energ. Geo-resour. 10, 83 (2024). https://doi.org/10.1007/s40948-024-00792-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40948-024-00792-8