Keywords

1 Introduction

The mapping of natural fracture networks plays a significant role in predicting the hydrocarbon production of reservoir, flow and transport problems in oil and gas production. Due to insufficient data on the subsurface characteristic, predicting the fracture network is still a challenge.

The current literature contains numerous publications that explore the fracture model in the subsurface by utilizing geomechanical conditions that necessitate significant computational power. In recent years, several publications have been published documenting the fracture model in porous media. These works focus on analyzing the fracture behavior in the coupled system and give insights into complex mechanics [1, 2, 19, 24]. However, the complexity of geomechanical models can lead to challenges in verifying simulation results because the model’s inputs, including physical properties, boundary conditions, and others, exhibit significant uncertainty and spatial variations.

Considering the problem of mapping fracture network from a geomechanical point of view, the direction of fractures depends on the properties of the subsoil, such as geomechanical features, intersections of fractures, and others [6]. The stress field near fracture nodes is critical in modeling fracture propagation. The research [19, 24] showed that energy minimization is important in systematically modeling the fracture path using the phase field model. Therefore, the influence of adjacent fractures should be considered when modeling the direction of the fracture.

To overcome these challenges, recent research has focused on developing fracture models that can effectively model complex fracture networks. These models aim to provide a faster and more realistic simulation process that can be used in real-time decision making for reservoir management. One of the effective tools for reducing uncertainty in porous media is stochastic simulation algorithms [8]. These methods incorporate geological data of the fracture network, including fracture azimuth distributions, location and length. While the fracture network model may not always generate the exact topology of porous media, it still provides a suitable configuration for the flow and transport process [3, 4, 12]. The application of such tasks can be useful for ground remediation, CO2 sequestration, hydrocarbon production, and others. Data collected near a wellbore, or preliminary seismic process from a specific location is a good candidate for training geostatistical models to simulate geological configuration for unknown near locations. These stochastic approaches have gained considerable attention from researchers in their ability to reduce uncertainty in geological modeling.

The Fracture Network model is a powerful tool for simulating fluid flow in fractured porous media. The accuracy of the model depends on the ability to represent the geometry of fractures in the subsurface accurately. There are several approaches that can be used to create the geometry of fractures, including geostatistical analysis, numerical modeling, and field observations. A hybrid approach that combines these methods can provide the most accurate representation of the subsurface while minimizing uncertainties.

Machine learning solves the problems of developing learning algorithms, including unsupervised learning, supervised learning, data dimensionality reduction methods, and feature evaluation. Unsupervised learning allows the identification of stable and rare states of systems; regression and classification are used to predict and identify the state of objects. The use of classical or modern machine learning methods is determined by the amount of data at the disposal of researchers [10, 20].

In the literature, some works have shown the application of machine learning methods to reduce uncertainty in geoscience applications [5, 16, 17]. The well log data is also used by machine learning algorithms such as KNN, Decision Tree, Random Forest, XGBoost, and LightGBM in order to detect multilabel lithofacies classification in [18].

The authors of the paper [15] proposed geostatistical modeling of the fracture system using pattern statistics. The images used multiple-point statistics (MPS) to model the fracture network and its propagation. The authors of [9] used the same MPS for simulating a fracture network; the authors trained the model on satellite images to build fractures on the surface and deep into the earth. This approach is closest to machine learning algorithms. MPS is a popular method for constructing a fracture network if fracture images are available. The MPS method is used in studies [7, 13] to model the fracture network system. The MPS method considers the available data around fractures. It builds statistical histograms of known zones to model a network of fractures for unknown zones while maintaining properties from the known zone. This method has limitations for the problem under consideration, building a network of fractures underground. Tasks contain a significant amount of uncertainty, and obtaining images is difficult. Training the MPS method requires a large number of images for quality training. In this regard, this approach for subsurface construction of a fractured network will not be considered.

In this paper, we extend the fracture network model [3] by incorporating with machine learning algorithm in 2D to predict the parameters of fractures in the porous media. We also used machine learning algorithms to generate the parameters of fracture topology, including azimuth angle and compared them with the proposed fracture network model. The natural fault data from Kazakhstan (North of Balkhash Lake) was used to verify the proposed model.

2 Methodology

We consider the fractures for the algorithm and use the digitized data from the geological fault zone. In the fracture network model, the fracture is a collection of segments, which is the line between nodes. In other words, the fracture is a graph containing nodes and branches(fracture segments). Segment’s midpoint is used to define the segment location in a domain. In Fig. 1, nodes are presented in black circles, and the midpoint of segments is in the blue circle. Based on the defined location of segments we calculate the azimuth angle of each of them. The azimuth angle of the segment is measured between the fracture segment direction and the north vector, see Fig. 1.

2.1 Fracture Network Algorithm

We use machine learning approaches for the probabilistic classification of azimuth fractures. The input data is the 2D fracture network, which contains the coordinates of each fracture segment. Based on the coordinates, we calculated the azimuth of the fracture segment and the distance between the closest fracture segments, see Fig. 2.

Generated features from the known region fracture networks describing fracture geometries and positions are used to predict the azimuth of the fracture network in the unknown region. The key goal of the classification model is to predict the 8 classes of azimuth of the fracture segment. Below, detailed information about the classification of azimuth angle is presented. The steps of generation features from the fracture network are presented below, and as an example, it is for one initial fracture:

  • The azimuth of the closest 6 fractures segment in the fracture network (6 parameters);

  • The coordinates of the closest 6 fractures segment: X coordinates are 6 features, and Y coordinates are 6 features;

  • The distance of the closest 6 fractures segment (polar coordinates);

  • The azimuth for the closest 6 fractures segment (polar coordinates).

In Fig. 3, there is a flowchart of the proposed steps of the algorithm. We prepared a training and validation dataset from the fracture network. The target of the training model is an azimuth angle. In the algorithm, the prediction of azimuth angle segments in machine learning model proceeds as follows:

  1. 1.

    We define the 6 neighbor fractures for each fracture segment. From each fracture, we got true azimuth fractures for the training model.

  2. 2.

    We calculate two types of azimuth: First, the azimuth angle of the 6 closest fractures segment, and second, the azimuth angles from the initial fracture to the 6 closest fractures segment. Also, we define distances from the initial fracture to the 6 closest fractures.

  3. 3.

    The azimuth fracture angles range from 0 to 360 and are divided into 8 equal sectors of angle 45 centered at the fracture. The angle ranges related to each azimuth class are provided in Table 2.

  4. 4.

    Preparing the dataset for training and validation is performed in two ways:

    1. (a)

      split the dataset randomly;

    2. (b)

      split the dataset by known and hidden areas of interest.

  5. 5.

    Training the LightGBM models and validating the result of the models on the validation dataset.

After training, the ML model can forecast the class azimuth of a fracture network for the location where the model has been trained.

Fig. 1.
figure 1

The fracture network illustrates different node and segment types.

Fig. 2.
figure 2

The searching closest fracture segments.

Fig. 3.
figure 3

Flow-chart of the fracture algorithms.

2.2 Machine Learning Algorithm

Now there are separate machine learning algorithms, each with pros and cons for resolving various problems. In the study, we chose one machine learning algorithm - LightGBM, to verify our hypothesis and classify fracture characterization. The LightGBM algorithm is an open-source framework used for almost tabular data, and our dataset is also tabular. This approach is decision-tree-based, an improved variant of the gradient boosting decision tree (GBDT) algorithm [14].

Another reason is the boosting ensemble algorithms can provide good performance for multi-class imbalanced data. The various problems are multi-class imbalanced data, which have been managed by applying ensemble learning techniques. It is one of the main reasons why we chose LightGBM for this task [23]

The aim of LightGBM algorithm is to obtain an estimate \(\widehat{F(X)}\), of the function F(X) mapping X to Y with minimization of the loss function L(YF(X)).

In gradient boosting, each new \(b_i\) algorithm (tree) is added to the already built composition:

$$\begin{aligned} a_i(x)=a_{i-1}(x)+b_i(x) \end{aligned}$$
(1)

Such an algorithm corrects the answers of the algorithm \(a_i(x)\) to correct answers on the training set. If we consider several algorithms, the algorithm is:

$$\begin{aligned} a_n(x)=\sum _{i=1}^{N}b_t(x) \end{aligned}$$
(2)

For the classification task, the loss function has several options, one of option is:

$$\begin{aligned} L(Y,F(x))=log(1+exp(-YF(x))) \end{aligned}$$
(3)

where \(F(x)=a_n(x)+s_i\), \(s=(s_1,...,s_l)\) - vector of shift (correction). Our loss function is:

$$\begin{aligned} L(Y,F(x))=log(1+exp(-YF(x))) \end{aligned}$$
(4)
$$\begin{aligned} \sum _{i=1}^{l}log(1+exp(-y_i*(a_n(x_i)+s_i)))\rightarrow \min _s \end{aligned}$$
(5)

The forecasting performance of the machine learning models is estimated by one statistical indicator - f1 score [11]. The f1 score has a balance between precision and recall. This metric is used when the class distribution has irregular. The f1 metric is a good scoring metric for imbalanced data when a model needs to classify the positives [21].

$$\begin{aligned} F1 = 2 * \frac{(precision * recall)}{(precision + recall)} \end{aligned}$$
(6)

2.3 Data Analysis

We train and validate the algorithm in a 31701 km\(^2\) area north of Balkhash lake, Kazakhstan. The area of interest is near several gold, silver, and copper mines [22]. In addition, there are several actual and possibility mines in the area, see Fig. 4 and Fig. 5. We made digit data of geology and faults for the study area to train and evaluate fracture characterization.

Fig. 4.
figure 4

Area of interest in Kazakhstan [22].

Fig. 5.
figure 5

The geological faults data of Central Kazakhstan.

Figure 6 shows histograms of the azimuth of geology faults. Azimuth angle presented from 0 until 360. The histogram is non-normal distribution because there are several groups of azimuth. Table 1 gives a descriptive statistic of azimuth.

Fig. 6.
figure 6

Histograms of azimuth.

Table 1. The descriptions of azimuth.

The histogram showed that some segments have a small number of azimuth angles. We classified azimuth by segments. In table 2, azimuth is classified by segments. The 4 segments [0,45), [45,90), [135,180), and [180,225) have not enough account azimuth for training. Therefore we do not use these segments for the training model because a model can not be trained on these segments.

Table 2. The angle ranges related to each azimuth class for Balkhash dataset.

3 Numerical Results

We applied LightGBM to real geological fracture networks to classify the azimuth of fractures. We provide a comparison of the result LightGBM for two cases of splitting dataset and two cases of input dataset:

  • Split training and validation datasets are performed randomly by 80% and 20%, respectively. The dataset contains just locations X and Y; secondly, the input dataset all information about the 6 closest fractures;

  • Split training and validation datasets are performed by known and hidden areas, see Fig. 7.

For each prepared dataset, we trained LightGBM models, as mentioned in the flow-chart of the fracture algorithms Fig. 3. Models 1 and 2 used data for 6 closest fractures, with random and known and hidden areas splitting, respectively. Models 3 and 4 used data for location data just X and Y features, with random and known and hidden areas splitting, respectively

3.1 Case with Random Selection

The total dataset for the 6 closest fractures is 535 rows and 30 features. We randomly split the dataset into the training dataset contains 428 rows and 30 features, and the validation dataset contains 117 rows and 30 features. The total dataset for the X and Y fractures is 2961 rows and 2 features. We randomly split the dataset into the training dataset contains 2368 rows and 2 features, and the validation dataset contains 593 rows and 2 features.

The result has been compared on X and Y and by 6 closest fractures datasets by the f1 metrics. The classification report of the models are provided in Table 3 and 4. By considering the score information from the tables, we highlighted that model 3 showed better results for a dataset with X and Y, and every 4 classes have more than 0.64 f1 scores. For the 6 closest fractures, model 1 got less f1 score, but for class 0, it is 0.87, other classes f1 scores are less.

Table 3. Classification report of LightGBM result for X and Y.
Table 4. Classification report of LightGBM result for 6 closest fractures.

3.2 Case with Known and Hidden Areas

To validate the model for this case, we hide some areas of the fracture network in the center area (cropped from the original domain). In Fig. 7, the red color line is the limit of the crop domain from the original fractures. The black color lines are the original fractures or fractures from the known area.

Fig. 7.
figure 7

Hidden zone of fractures from the geological faults data from Kazakhstan.

The total dataset for the 6 closest fractures is 553 rows and 30 features. We took data from a known area containing 461 rows and 30 features, and the validation dataset, the hidden area, contains 92 rows and 30 features. The total dataset for the X and Y fractures is 2961 rows and 2 features. We took data from a known area containing 2434 rows and 2 features, and the validation dataset, the hidden area, contains 442 rows and 2 features.

Table 5. Classification report of LightGBM result for X and Y.

The classification report of the models are provided in Table 5 and  6 for cases with known and hidden areas. The model showed 2 better results for a dataset with 6 closest fractures, each of 4 classes having more than 0.46 f1 scores, and it is better than X and Y datasets (model 4).

Table 6. Classification report of LightGBM result for 6 closest fractures.

4 Discussion

Using the natural fault data from Kazakhstan, we established that a machine learning algorithm could be used for the problem of recreation of a fracture network for a zone with uncertainty. We considered machine learning approaches for the probabilistic classification of azimuth fractures. This approach contains two limitations.

Firstly, machine learning approaches require a lot of data to get a reasonable result. In our case, we have the same problem with an imbalanced dataset, and some classes do not have enough data to train a model. We excluded some classes from the process due to the amount of data that is not reasonable to catch a pattern of fracture in these classes. Therefore, the selected dataset should contain enough data to train and validate a model of machine learning.

Secondly, we concentrated on the fixed length of the fracture segment. The length of the fracture segment defines the fracture propagation from the initial point to the neighbor fracture segment. We set the length of the segment as an average fracture length distribution from a known fracture network. In further research, we will study fracture length, anisotropy, and connectivity available of fractures, it should enable better prediction of fracture network.

5 Conclusion

This paper analyzes the numerical models integrated with LightGBM to classify fracture network azimuth from the Kazakhstan geological data in different scenarios. The findings suggest that the fracture network model with LightGBM shows better results in creating fracture geometry parameters for the unknown area based on known area features. The real fault data from Kazakhstan was applied to different models. The direct model, which uses coordinates with azimuth angles, has a good result in F1 measurement for the randomly selected subset of data.

When comparing the classification results by machine learning algorithm for two datasets with features of only fracture segment coordinates and 6 nearest neighbors, we observed that the model 3 has a good result for the dataset with coordinates in randomly splitting the dataset for training and validation dataset. In the case of hidden zone problem, model 2 predicts better for a dataset containing features with 6 neighbors. This suggests that model 2 captures the key knowledge of fault patterns in the known zone and applies it to the hidden zone successfully.

In our further research, we intend to concentrate on the regression of azimuth fracture; also, we will apply deep learning algorithms such as LSTM to predict azimuth.