Automated lost circulation severity classification and mitigation system using explainable Bayesian optimized ensemble learning algorithms

Elmousalami, Haytham; Sakr, Ibrahim

doi:10.1007/s13202-024-01841-4

Automated lost circulation severity classification and mitigation system using explainable Bayesian optimized ensemble learning algorithms

Original Paper - Production Geophysics
Open access
Published: 11 July 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Petroleum Exploration and Production Technology Aims and scope Submit manuscript

Automated lost circulation severity classification and mitigation system using explainable Bayesian optimized ensemble learning algorithms

Download PDF

358 Accesses
Explore all metrics

Abstract

Lost circulation and mud losses cause 10 to 20% of the cost of drilling operations under extreme pressure and temperature conditions. Therefore, this research introduces an integrated system for an automated lost circulation severity classification and mitigation system (ALCSCMS). This proposed system allows decision makers to reliability predict lost circulation severity (LCS) based on a few drilling drivers before starting drilling operations. The proposed system developed and compared a total of 11 ensemble machine learning (EML) based on collection 65,377 observations, the data was pre-processed, cleaned, and normalized to be filtered using factor analysis. For each generated algorithm, the proposed system performed Bayesian optimization to acquire the best possible results. As a result, the optimized random forests (RF) model algorithm was the optimal model for classification at 100% classification accuracy based on testing data set. Mitigation optimization model based on genetic algorithm has been incorporated to convert high severe classes into acceptable classes of lost circulation. The system classifies the LCS into 5 classes where the classes from 2 to 4 are converted to be class 0 or 1 to minimize lost circulation severity by optimizing the input parameters. Therefore, the proposed model is reliable to predict and mitigate lost circulation during drilling operations. The main drivers that served as LCS inputs were explained using the SHapley Additive exPlanations (SHAP) approach.

Effective prediction of lost circulation from multiple drilling variables: a class imbalance problem for machine and deep learning algorithms

Article Open access 08 December 2021

Development of Heavy Rain Damage Prediction Technique Based on Optimization and Ensemble Method

Article 14 March 2023

The use of explainable artificial intelligence for interpreting the effect of flow phase and hysteresis on turbidity prediction

Article 26 July 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Background

Lost circulation is a costly and serious issue at any stage of the drilling process. According to the United States Department of Energy in 2010, mud losses account for 10 to 20% of the cost of drilling under extreme pressure and temperature conditions (Alkinani et al. 2020a, b). Lost circulation, which accounts for 12% of all non- productive time (NPT) in the oil and gas industry, is a significant contributor to nonproductive time. According to Arshad et al. (2015), this industry spends about 2 billion dollars a year to reduce circulation loss (Arshad et al. 2015). According to J. Yang et al. 2022a, b, the China National Petroleum Corporation (CNPC) has an annual loss of about 4000 days as a result of lost circulation. When the applied pressure is higher than the formation breakdown pressure, lost circulation problems can develop, which can result in a variety of formation types, including extremely permeable, and unconsolidated formations (Elmousalami et al. 2021; Fidan et al. 2004). Another issue is the necessity to abandon some incomplete wells in depleted formations because completely losses occur frequently. Due to frequent complete losses, especially in formations above cap rock, some wells in Middle Eastern oilfields have been suspended for years (Moazzeni et al. 2012; Shadravan et al. 2009).

Adding lost circulation material (LCM) to drilling fluid, lowering mud weight, and zonal cementing are common remedies for lost circulation. Interdisciplinary approaches incorporating geomechanics analysis, optimized well trajectory, specialized drilling mud additives, and efficient drilling hydraulics control are crucial for preventing and reducing such losses (Lavrov 2016; Whitfill and Hemphill 2003). Based on the extent of the loss rate, lost circulation during drilling is divided into seepage, moderate, and severe or total losses (CP 2014). The addition of bridging materials is necessary to control significant losses that reach up to 470 barrels or last longer than 48 h, as opposed to minor losses that can be handled by modifying mud viscosity or adding blocking materials within 48 h (Moazzeni et al. 2011). Pore-fluid pressure and fracture gradients establish the mud-gradient window, a critical range of mud-pressure gradients for safe drilling. This range may be constrained in difficult drilling situations, such as deep-water or deviated wells. Different kinds of LCMs are added to drilling fluids to seal induced or natural cracks and maintain wellbore integrity in order to strengthen the wellbore and reduce circulation losses (Mehrabian et al. 2015).

There are four different forms are typically to blame for lost circulation; vugular or cavernous, extremely permeable, and unconsolidated formations, either natural or induced (Pilehvari & Nyshadham 2002). Both formation characteristics (such as pressure and gradient) and operational parameters (such as pump pressure and flow rate) have an impact on loss circulation. Wellbore shape, drilling mud characteristics, and hole cleaning effectiveness are further considerations (Burgoyne et al. 1991; Lavrov 2016).

Problems and gaps

According to Deng et al. (2023), Scholars from China and other countries have proposed numerous clever strategies for forecasting lost circulation incidents in recent years due to the ongoing development of intelligent algorithms. These techniques are more precise at predicting lost circulation accidents and are also more scientific. While there is valuable research on mitigating lost circulation during drilling, the focus on prediction prior to drilling remains relatively limited. There are several issues, including insufficient data utilization, limited applicability of models to specific areas, and poorly explained methodologies in certain studies (Al-Hameedi et al. 2018a, b; Alkinani et al. 2019; Li et al. 2018). Furthermore, It is noted that most of the articles were talking about a specific region (Sabah et al., 2019, Alkinani et al. 2020a, 2020b, Hou et al. 2020, Aljubran et al. 2021, Mardanirad et al. 2021, Wood et al. 2022, Olukoga and Feng, 2022, Sabah et al. 2021; Toreifi and Rostami 2014; Otchere et al. 2022, Salih and Abdul Hussein 2023, Yang et al. 2022a, b, Jafarizadeh et al. 2022, Tootkaboni and Ibrahim 2021, Su et al. 2021), unlike some articles, such as (Alkinani et al. 2020a, b), that could be applied globally. Table 1 illustrates a comparison between this proposed model and existing past models.

Table 1 A comparison between this proposed model and existing past models

Full size table

Expected objectives and contribution

This research aims to develop an explainable integrated system called automated Lost Circulation Severity Classification and mitigation System (ALCSCMS) that can accurately predict Lost Circulation Severity (LCS) accurately at the early stage of drilling operations based on only conceptual data of the operations. The objective of this study is to create and compare the optimized machine learning algorithms for lost circulation prediction prior to drilling using key drilling data of 65,377 data records from 20 wells drilled into the giant Azadegan oil field in Iran. This study’s insights will assist the drilling staff in planning solutions before hitting the loss zone and in proactively modifying the primary drilling parameters to prevent or at least lessen losses. Moreover, mitigation optimization model based on genetic algorithm has been incorporated to convert high severe classes into acceptable classes of lost circulation.

Literature review

An artificial neural network-based technique for calculating lost circulation during underbalance drilling is proposed by Behnoud and Hosseini 2017. In order to solve operational issues, the study emphasizes the significance of quantifying and decreasing lost circulation. The features are not enough, one source of dataset is used, and the study applied on underbalance drilling techniques which are difficult in several countries due to financial and high technology limitations (Elmousalami et al. 2024, Elmousalami and Sakr 2024). In drilling operations, the use of artificial intelligence (AI) techniques, such as fuzzy logic (FL), functional networks (FN), and artificial neural networks (ANN), has shown promise in locating lost circulation zones . The artificial neural network (ANN) model including five neurons in a single internal layer demonstrated exceptional accuracy in forecasting lost circulation zones, as evidenced by its correlation value (R) of 0.987 and root mean square error (RMSE) of 0.081. The ANN model accurately predicted 99.1% of the lost circulation zones when the models were trained and tested using real-time surface drilling parameters from multiple wells.

Manshad et al. (2017), highlighted the significance of mud rheology by using support vector machines (SVM) and ANN to predict loss of circulation in the fractured Maroun oilfield. Improved classification accuracy was attained, and the research offered qualitative findings on lost circulation volume. The author used both operational and geological data. However, the methodology involved the SVM model only. Al-hameedi et al. (2017b) developed a model that emphasizes the importance of equivalent circulation density (ECD), mud weight (MW), and yield point (Yp) in order to estimate mud loss in the Shuaiba formation in the South Rumalia Field in Iraq. Al-Hameedi et al. 2017a investigated mud loss in the Dammam Formation and found that MW, ECD, and rate of penetration (ROP) all had significant effects. Both researches provide models for mud loss estimation and reduction during drilling operations.

Based on variables like depth, lithology, drilling mud properties, and operational variables, Al Hameedi et al. (2017), created mathematical models for mud loss. The study showed that their volume loss model successfully forecast circulation loss events in the Hartha formation in the South Rumalia Field in Iraq while highlighting the major influence of ECD, MW, and Yp on mud loss. Agin et al. (2020), predicted lost circulation in oil well drilling using adaptive neuro-fuzzy inference system (ANFIS), design of experiments (DOE), and data mining. In comparison to data mining, ANFIS demonstrated greater prediction ability, highlighting the significance of proactive circulation loss forecasting. Reactive approaches to problem-solving are preferred over creative forecasting techniques. Although this study was carried out with a large dataset, three models, and 17 variables, it gives a prediction of the severity of lost circulation when it happens. Therefore, there is a need to predict this problem before it happens and give a solution for it.

Abbas et al. (2019a, b), propose a new model for predicting circulation loss in an Iraqi oilfield using (SVMs) and (ANNs). SVMs show more promising results, but caution is advised when applying the models to additional data beyond the training datasets. Abbas et al. (2019b), analyzed drilling data from Southern Iraq, including 795 datasets from 385 wells. Their models accurately (ANN’s R² is 0.88 for training and 0.84 for testing, respectively. However, SVM’s R² is 0.97 for training and 0.95 for testing, respectively) identified solutions for lost circulation, emphasizing the importance of ML in enabling intelligent decisions by drilling engineers. Abbas et al. (2019a, b), used drilling data from southern Iraq to create an accurate model for predicting circulation loss during drilling (Abbas et al. 2019a, b). The model, based on an artificial neural network (ANN) approach, considered operational and geological variables. The results showed high accuracy. Using information from more than 1500 wells worldwide, Alkinani et al. (2019) created an artificial neural network (ANN) model to forecast mud losses in fracture formations before drilling. Based on drilling parameters, the model calculated lost circulation appropriately. Though it’s a great challenge for the writer, one artificial intelligence is insufficient. The problem of forecasting lost circulation risk in an Iraqi carbonate resource using seismic data is examined by Geng et al. (2019).

The study established a correlation between seismic characteristics and lost circulation events, developing a prediction model. The problem is that dealing with this type of dataset is very difficult and requires a lot of work. Shi et al. (2019), explore drilling technologies and propose a machine learning approach using historical and recent data to train SVM and random forest models for precise influx and loss prediction. The study also utilizes the classification and regression tree (CART) approach to create a decision tree for classification and regression. The classification results of the data show that no accident, influx, or loss occurs, but there is no estimation for the severity.

combine evolutionary algorithms with machine learning models to predict lost circulation in the Marun oil field. The hybrid intelligence models outperform standalone ML methods, highlighting the effectiveness of this integrated approach in accurately anticipating circulation loss. developed prediction models for lost circulation in the Marun oil field using various techniques such as genetic algorithm-multi-layer perception (GA-MLP), decision trees (DT), ANFIS, ANN, and ANFIS. The study demonstrated high accuracy. Alkinani et al. (2020a, b), investigated machine learning methods for lost circulation treatments. The study found that quadratic SVM exhibited the best accuracy among the tested models. Hou et al. (2020), examined circulation loss incidents in the high-temperature and high-pressure Yingqiong Basin. They constructed an ANN prediction model with strong performance, offering valuable insights and accurate risk assessment for circulation loss in drilling operations. A deep learning model was created by Aljubran et al. (2021) to recognize loss circulation incidents (LCIs) during drilling operations. On test data, their models of one-dimensional convolutional neural networks demonstrated good levels of precision, recall, and F1 scores.

Alsaihati et al. (2021), used SVMs, random forests, and K-NN to detect lost circulation events during drilling based on surface characteristics. The K-NN classifier achieved the highest F1-score followed closely by random forests. Alsaihati et al. (2022), developed predictive models using surface characteristics and active pit volume interpretation to estimate the loss of circulation rate (LCR) during drilling. Both studies have many drawbacks, like a smaller dataset, using drilling surface parameters only, and a lack of generalization. Magzoub et al. (2021), developed a ML approach to select drilling fluid compositions for preventing circulation loss. Gradient Boosting exhibited the highest accuracy of up to 91% in predicting rheological features. The issue here is the limited dataset, and predicting lost circulation is indirect through the attainment of appropriate rheological characteristics. Mardanirad et al. (2021), developed deep learning (DL) models to forecast fluid loss classes using a dataset from 20 wells in Iran’s Azadegan petroleum sector. The convolutional neural network (CNN) model demonstrated the highest accuracy rate (98%) compared to LSTM and GRU models.

Wood et al. (2022), compared (ML) and (DL) methods for predicting drilling fluid loss classes using data from Mardanirad et al. (2021). The Random Forest (RF) model outperformed other models, achieving the highest overall performance. Adaboost (ADA) and decision tree (DT) models performed well for specific fluid-loss classes. DL models showed lower accuracy and longer execution times. There are two problems with DL models compared to ML models: they failed to predict classes 4, 5 and require much longer execution times. Olukoga1 and Feng (2022), employed ML methods to classify drilling mud circulation loss occurrences based on data provided by Mardanirad et al. (2021). The RF ensemble achieved a flawless F1 score of 1, while the CART model demonstrated superior performance with a high weighted F1 score of 0.9904.

Based on the datasets supplied by Sabah et al. (2021); Toreifi and Rostami (2014); Otchere et al. (2022), constructed a model employing seven (ML) algorithms to estimate the loss of circulation rate (LCR) in the Marun oil field. Low error metrics produced the best results for the Extra Tree (ET) regressor model. Using data from 75 oil wells in the Rumaila oilfield, Salih and Abdul Hussein (2023), used three machine learning models to forecast lost circulation: DT, RF, and additional trees. The extra trees model showed the highest prediction accuracy. Yang et al. (2022a, b), developed an ANNs model for forecasting fracture width in fractured rocks using data from oil and gas drilling operations. The model had strong R² values, a low Root-mean-square deviation (RMSE), and good prediction accuracy. Although the very good idea of this article is that using, the artificial neural network model alone is not enough and lacks good generalization.

Pang et al. (2022), created a method to estimate lost circulation in the Mishrif reservoir in the Middle East Gulf. The Mixture Density Network, feature selection, and real-time evaluation are all part of their methodology. Although the method of lost circulation prediction is new, the interpretation of the relationship between mudlogging parameters and the mud loss rate needs more experience in well logging interpretation. Data from the Marun Oilfield in Iran were used by Jafarizadeh et al. (2022), to create a forecasting model for mud circulation loss rate. Prior to implementing least square support vector machine (LSSVM), CNN, and hybrid modelling methods, they used noise attenuation and feature selection strategies. Tootkaboni and Ibrahim (2021), created models for forecasting circulation loss using a vast database from the Maroon oilfield. The clean and prepared dataset with eighteen parameters highlighted the validity and accuracy of the suggested model in predicting circulation loss. In order to create a real-time model for identifying leakage layer locations during drilling operations, Su et al. (2021), made use of a large dataset. The suggested strategy displayed consistent prediction results and good forecasting accuracy for leakage layers. The model suffers lack of generalization in the same area of study.

In order to forecast API and HPHT filtrate loss parameters in drilling operations, Gul and Oort (2020), employed three field datasets. To estimate the parameters with high accuracy, machine learning and deep learning models such as MLP, RF, XGB, SVM, and multilinear regression were used. However, the study did not take into explicit consideration Formation properties such as porosity and permeability. Al-hameedi et al. (2019), built predictive models for circulation loss amount, ECD, and ROP using data from more than 500 wells in the Rumaila field, Iraq. The models’ accuracy in predicting circulation loss in fractured formations was confirmed using additional data from 30 wells, revealing a strong agreement with the measured data. For the purpose of forecasting the rheology and filtration properties of drilling mud containing silica oxide (SiO₂) nanoparticles, Ning et al. (2023), created machine learning models. Parizad et al. (2018), provided inputs for their work, and Mahmoud et al. (2016, 2018); Vryzas et al. (2015), provided experimental data. The models demonstrated good prediction accuracy for both shear stress and filtration volume, with (LSSVM) outperforming ANN for shear stress. The characteristics of the Formation were not directly considered in the study.

ALCSCMS system methodology

Data collection on Lost Circulation Severity (LCS) is the first step in the conceptual framework, and it contains six basic steps, as shown in Fig. 1. In order to prepare the acquired data for computer processing, the second phase involves correcting missing values, outliers, and inconsistencies. In the third stage, the most important parameters for the suggested model are ranked and chosen through factor analysis in feature engineering. The fourth phase, which comes after choosing the important characteristics, is using single ML algorithms (like MLR and SVM) and EML techniques (like RF and GBM) to forecast LCS. In the fifth step, the framework involves the assessment and ranking of all utilized machine learning algorithms, with the objective of selecting the algorithm that demonstrates the highest performance according to relevant evaluation metrics. In the sixth and final step, a thorough evaluation of the chosen algorithm’s performance is conducted, accompanied by a comprehensive discussion that offers insights and analysis. In addition, LCS Mitigation optimization model based on genetic algorithm has been incorporated to convert high severe classes into acceptable classes of lost circulation.

Data engineering

The effectiveness of the model proposed is influenced by the volume and quality of the data gathered, as referenced by Elmousalami (2020a, b) and Bode (2000). To achieve optimal performance, machine learning models necessitate a considerable volume of collected data. In the process of constructing the model, a total of 65,377 observations were gathered. Prior to the application of machine learning algorithms, the data underwent cleaning and preprocessing procedures to eliminate any missing or redundant information. Subsequently, the remaining dataset was subjected to normalization and standardization, which scaled the data values to fall within the range of [0.0 to 1.0].

The model parameters were derived from twenty wells drilled into the world’s largest unexploited oil field, Azadegan Oilfield in Iran, provided a significant subsurface dataset for this study. There are three separate structural blocks in this field: I, II, and III. The field’s wells provided the data for the dataset. The majority of reported lost circulation instances took place in block III. Every meter of the drilled depth’s loss rates was taken out of the daily drilling reports and transformed into five classifications, ranging from “no loss” to “complete loss”(Mardanirad et al. 2021; Wood et al. 2022; Wood et al. 2022).

Table 2 lists the whole collected parameters and categorizes each variable into four main groups: Drilling Parameters, Fluid Flow, Mud Properties, and output. The first type of loss, coded “Class 0”, indicates that there is no significant rate of loss; the second type, coded “Class 2”, indicates that there is a fluid loss in the range of 10 to less than 100 barrels per hour (bbl/hr); the third type, coded “Class 3,” involves fluid loss exceeding 100 bbl/hr with slight returns of lost fluid; and the fourth type, coded “Class 4,” corresponds to fluid loss without return.

Table 2 The collected parameters

Full size table

As illustrated in Fig. 2, the correlation matrix shows that there are several strong correlations between the variables in this dataset. For example, there is a positive correlation between Mud Weight (P11) and Solid (P17) (R = 0.95). This means that as the Flow In increases, the Pump Stroke also increases. Similarly, Flow In (P8) and Pump Stroke (P10) are high correlated (R = 0.98). On the other hand, there are also several weak correlations between the variables. For example, the Rotation (P5), and the Plastic Viscosity (P13) have a weak positive correlation (R = 0.19). This indicates that as the area Rotation increases, so does the Plastic Viscosity. Additionally, there is a negative correlation (R = − 0.04) between the Rotation (P5), and Rate of Penetration (P3). This means that as the Rotation slightly increases the Rate of Penetration decreases. Therefore, correlation heat map is very significant to illustrate and understand the relation between the parameters.

Principal component analysis (PCA)

PCA is a method widely employed in statistics, machine learning, and data analysis to decrease the dimensionality of a dataset that may consist of numerous interrelated variables. Its core objective is to convert this high-dimensional data into a smaller collection of uncorrelated variables known as principal components. These principal components are designed to retain the majority of the original data’s variance while simplifying its structure (Kurita 2019).

Figure 3 illustrated the results of PCA on the collected parameters of LCS where the ranking of parameters based on PCA absolute loading represented as a descending bar chart regard to PCA absolute loading values. As a result, P1, P2, P8, P10 were the most influential parameters on LCS. On the other hand, P6, P9, P14, P11 were the least influential parameters on LCS. Consequently, the study can dimension the number of the used parameters based on PCA absolute loading. However, this study preferred to use all possible parameters to obtain the highest possible accuracy. Moreover, all these parameters are available before the drilling operations. Therefore, there is no need for dimensionality reduction.

Machine learning models development

The obtained data were divided into three separate sets after the data engineering and analysis process: a training set that contained 70% of the data, a validation set that contained 15% of the data, and a testing set that also contained 15% of the data. In order to minimize bias in the data, the validation set (15%) contributed to the optimization of hyperparameters using a tenfold cross-validation technique. Individual learning algorithms and ensemble learning algorithms are the two types of machine learning algorithms used in this study. Four distinct learning algorithms—Artificial Neural Networks, Multiple Linear Regression, Polynomial Regression, Support Vector Machine, and Decision Tree—were developed for the study. Additionally, the study used seven ensemble learning (EL) algorithms: Gradient Boosting, Extreme Gradient Boost, Light Gradient Boosting Machine, Extremely Randomized Trees, and Light Gradient Boosting Machine. Consequently, a total of 11 algorithms were created and used for the analysis of Lost Circulation Severity (LCS).

Evaluation metrics

Evaluation metrics for both hyperparameters and model performance were computed based on the validation and testing datasets, as described in Bishop and Nasrabadi (2006). Accuracy measures the proportion of correctly classified observations out of the total observations: (Number of Correct Predictions) / (Total Number of Predictions) as Eq. (1):

$${\text{Accuracy}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}}$$

(1)

where true negative (TN), false negative (FN), false positive (FP), and true positive (TP). The ratio of accurately anticipated positive instances to all expected positive instances is the measure of precision. Equation (2) represents (True Positives) / (True Positives + False Positives).

$${\text{Precision }} = { }\frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}}$$

(2)

Recall (Sensitivity or True Positive Rate) measures the ratio of correctly predicted positive observations to the actual positive observations: (True Positives) / (True Positives + False Negatives) as in Eq. (3). The true positive rate (TPR) represents the y-axis and false positive rate (FPR) is the x-axis (Sokolova et al. 2006).

$${\text{Recall}} = {\text{TPR}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}}$$

(3)

The F1 Score is the harmonic mean of precision and recall. It provides a balance between precision and recall: 2 * (Precision * Recall)/(Precision + Recall) as in Eq. (4)

$${\text{F}}1{\text{ score}} = \frac{{{\text{TP}}}}{{{\text{TP}} + 0.5{*}\left( {{\text{FP}} + {\text{FN}}} \right)}}$$

(4)

The F2 Score is a variation of the F1 Score that place more weight on recall. It’s useful when recall is more important than precision: (1 + 2^2) * (Precision * Recall)/(2^2 * Precision + Recall) as in Eq. (5)

$${\text{F}}2{\text{ score}} = \left( {1 + 2^{2} } \right){*}\frac{precision*recall}{{(2^{2} *precision) + recall}}$$

(5)

The F-Beta Score is a generalized form of the F1 Score that allows to give different weight to precision and recall using the parameter β: (1 + β^2) * (Precision * Recall) / (β^2 * Precision + Recall) as in Eq. (6)

$${\text{F}} - {\text{Beta Score}} = \left( {1 + \beta^{2} } \right){*}\frac{precision*recall}{{(\beta^{2} *precision) + recall}}$$

(6)

Plotting the true positive rate (recall) against the false positive rate at different thresholds, the receiver operating characteristic curve (ROC AUC) calculates the area under the curve. For multiclass classification evaluation, these evaluation metrics can have been calculated for each class individually and then aggregated the results using methods such as micro-averaging, macro-averaging, or weighted averaging, depending on the specific use case and evaluation requirements (Elmousalami and Elaskary 2020).

Bayesian hyperparameter optimization

According to Hammam et al. (2020), the goal of hyperparameter optimization is to maximize predicted accuracy by determining the optimal hyperparameters for every learning algorithm. According to Wu et al. (2019), common techniques for hyperparameter optimization include manual search, random search, grid search, Bayesian optimization, and evolutionary optimization. According to Feurer and Hutter (2019), although Bayesian and evolutionary methods automate the optimization process with little to no human interaction, manual search, random search, and grid search need extensive testing to uncover optimal hyperparameters. These techniques are also useful for resolving hyperparameter search’s high dimensionality problem.

In this work, globally optimal model settings are established using Bayesian algorithms before training, with the aim of reducing the error for each method (expressed by Eq. 1). The maximum number of iterations has been set at 10,000, with a predetermined upper limit, as shown in Fig. 4. According to the work of Feurer et al. (2015), the basic idea is to use Bayesian learning theory and Gaussian stochastic processes to generate a targeted Gaussian distribution inside the hyperparameter space of the applied machine learning method. After initializing the model’s hyperparameters, the algorithm iteratively performs the actions listed in Algorithm 1.

The effectiveness of the models created depends on critical hyperparameters, including the learning rate, number of iterations, depth, the count of estimators, and the L2 regularization term. It’s crucial to predefine these hyperparameters because determining them during the training process is impractical. The chosen hyperparameters have a significant impact on various aspects, such as the model’s complexity, training duration, susceptibility to overfitting, and the speed of convergence, as discussed in Snoek et al. (2012).

The performance of the constructed models is heavily dependent on hyperparameters, including factors such as the learning rate, number of iterations, depth, the quantity of estimators, and the L2 regularization term. It is essential to preset these hyperparameters since attempting to determine them during the training process is not practical. The influence of these hyperparameters extends to various aspects of the model, including its complexity, the time required for training, susceptibility to overfitting issues, and the speed of convergence, as described in the work by Snoek et al. (2012).

LCS classification model results and discussion

The eleven classifiers shown in Table 3. have been validated based on the testing dataset using assessment metrics like accuracy, precision, recall, F1 score, F2 score, F-Beta, and ROC AUC. As indicated in Table 3, the classifiers have been arranged in descending order according to evaluation metrics. According to this study, random forest (RF) provides the best accuracy for LCS. RF produced an overall correct classification of 99%, meaning that this model was 99% of the time able to correctly identify the wells that belong to the specified features. In second and third place were Decision Tree and XGB.

Table 3 Performance of Bayesian optimized ML algorithms for LCS

Full size table

Choosing the appropriate machine learning algorithms should consider more than just predictive accuracy. Factors such as computational resources, which encompass memory usage and processing time, play a vital role in the decision-making process. Table 4 presented that RF processed the data in just 1.79 s and 164.11 megabytes (MB). Nonetheless, it’s crucial to recognize that the computational expenses, both in the context of time and memory usage, can experience a notable surge when working with extensive datasets, including those featuring numerous attributes or a substantial volume of data.

Table 4 Optimal hyperparameters of each ML algorithm

Full size table

The trade-off between the true positive rate (TPR) and the false positive rate (FPR) across different threshold values is illustrated by the Receiver Operating Characteristic (ROC) curve, as shown in Fig. 5. The classifier’s true positive rate (TPR) is the percentage of correctly identified positive cases; the false positive rate (FPR) is the percentage of incorrectly categorized negative cases as positive. An optimal classifier is located in the upper-left quadrant of the ROC curve, with a TPR of 1.00 and an FPR of 0.00. The more closely the ROC curve resembles this upper-left corner, the better the classifier performs. All practical threshold values are covered by the classifier’s performance summary provided by the Area Under the Curve (AUC), a single metric. AUC of 1.00 denotes flawless execution.

Figure 5 illustrated that all AUC was at 1.00. This means that the developed RF classifier can perfectly separate all five classes from each other, with a few false positives or false negatives. This is a very promising result, and it suggests that the random forest classifier is well-suited for this LCS classification task. However, it is important to note that the ROC curve is only one metric for evaluating the performance of a classifier. Other metrics, such as accuracy and precision, may also be important to consider, depending on the specific application.

However, the confusion matrix shown in Fig. 5 showed that there were five classes in the categorization test. The number of data points that the random forest classifier correctly and erroneously classifies is displayed in the matrix. The confusion matrix’s diagonal elements show how many data points were successfully classified. The amount of data points that were erroneously classified is represented by the confusion matrix’s off-diagonal elements. In the confusion matrix, for instance, the top left element is 12,317.

This means that 12,317 data points with the true label “0” were also predicted to have the label “0”. The other elements of the confusion matrix can be interpreted in a similar way. For example, the element in the first row and second column of the confusion matrix is 17. This means that 17 data points with the true label “1” were incorrectly predicted to have the label “0”. Overall, the confusion matrix shows that the random forest classifier was able to correctly classify a large majority of the data points. However, there were some data points that were incorrectly classified.

Ensemble machine learning techniques, as proposed by Breiman (1998) and Schapire et al. (1998), offer an effective approach for managing high-dimensional data, overcoming challenges related to limited sample sizes, and dealing with intricate data structures. Nonetheless, it’s crucial to recognize that the adoption of ensemble methods can lead to increased model complexity, as highlighted by Kuncheva (2014). In cases involving noisy data, the random forests algorithm has demonstrated superior performance when compared to the decision tree algorithm, as emphasized by Breiman (1996) and Dietterich (2000). However, it’s essential to acknowledge that random forests may lack the capacity to provide insights into the importance of features or the internal mechanisms behind their results.

Table 4 presented optimal hyperparameters of each ML algorithm based on Bayesian algorithm. The FR’s hyperparameters were as following: max_depth controls the maximum depth of each decision tree in the random forest where each tree in the forest can have a maximum depth of 10 levels. max_features determine the maximum number of features that the algorithm considers when looking for the best split at each node of the decision tree. A score of 0.780 indicates that each node will divide by taking into account roughly 78% of the attributes that are accessible. The minimum quantity of samples needed to be in a leaf node is specified by min samples leaf. As it is set to 5 in this case, a leaf node should have a minimum of 5 samples. The minimal number of samples required to perform a split at an internal decision tree node is determined by the min_samples_split parameter. A node with a value of 5 is deemed to require a minimum of 5 samples in order to qualify for splitting. The random forest’s number of decision trees is indicated by the variable n_estimators. There are 55 trees in the forest in this instance, and they all work together to forecast the outcome.

As shown in Fig. 6, learning curves serve the purpose of assessing the effectiveness of an ML model and ascertaining the quantity of training data required to attain a desired performance level. Additionally, learning curves aid the detection of overfitting instances. Various methods exist for illustrating learning curves, with the prevalent approach involving graphing MSE values concerning both the training and cross-validation datasets against the number of training instances. The red line in the graph represents the training score, which is the MSE on the training data. The cross-validation score, or MSE on a held-out set of data that the model has never seen before, is represented by the green line. According to the learning curve, the training score drops as the volume of training data rises. This is because when the model sees more data, it will be able to identify patterns in the data more precisely. Though more slowly than the training score, the cross-validation score likewise drops as the quantity of training data rises. This suggests that the model is approaching the point at which receiving more training data won’t significantly alter its performance.

Explainable artificial intelligence with SHAP

The Shapley value, a game-theoretic method, is used to calculate SHAP values. These variables can be used to explain any machine learning model, regardless of its category, because they are model-agnostic. According to Mangalathu et al., SHAP values are used to clarify the relative importance of each characteristic as well as the ways in which various features interact (2020). In Fig. 7, the illustration demonstrates the average impact on the model output magnitude by class, providing a valuable tool for understanding how the model behaves. Consequently, it was found that P2, P1, P5, and P11 had the most significant influence on Lost Circulation Severity (LCS). On the contrary, P3, P9, P4, and P16 had the least influence on LCS.

It can be used to identify potential biases in the model, and to take steps to improve the model’s performance. However, the graph shows that the model has the greatest impact on the output magnitude for class 0, followed by class 2, class 1, class 3, and class 4. This means that the model is most likely to produce a large output value for class 2 and class 0, and least likely to produce a large output value for class 4. However, random forests can still make accurate predictions for the minority class, as the aggregation of multiple trees can capture some of the minority class instances. Moreover, cost-sensitive learning technique has been used to specify the costs associated with misclassifying different classes.

LCS mitigation optimization model

Evolutionary computing (EC) is rooted in the concept of “survival of the fittest,” as introduced by Charles Darwin in 1859. Genetic algorithms (GAs), a subset of EC, are commonly employed for optimization and search-related tasks, first proposed by John Holland in 1975 and discussed by Siddique and Adeli (2013). A chromosome is represented in this context as a vector (C) that has ‘n’ genes, which are indicated by ‘ci.’ Elmousalami (2020b) states that every chromosome (C) is a point in an n-dimensional search space. To represent the 17 input parameters in the current case study, each chromosome has seventeen genes, which correspond to the well drilling parameters (P1, P2, P3, P4, P5, P6,…, P17), as listed in Table 2. One of the membership functions (MFi), where ‘i’ spans across the boundary conditions for every variable (P1 through P17), is linked to each gene.

To initiate the process, an initial population of 10 chromosomes is established, and the genetic algorithm is run for a total of 10,000 generations. The crossover probability is set at 0.7, and the mutation probability is configured to be 0.03. Subsequently, a subset of the initial population of chromosomes is selected for evaluation using a fitness function (Fig. 8).

The fitness function (F) serves as a tool for assessing the quality of potential solutions. In the genetic algorithm (GA) process, crossover and mutation operations are employed to generate new generations of offspring. The primary goal here is to minimize the probability of a well becoming high lost circulation severity. Consequently, the GA’s objective function revolves around minimizing the LCS class by optimizing the seventeen input parameters to achieve acceptable lost circulation characteristics. The fitness function can be represented by Eq. (7), where the primary aim is to minimize the fitness function, denoted as:

$${\text{F}} = {\text{ Minmization }}\left( {{\hat{\text{y}}}_{{\text{i}}} } \right)$$

(7)

Within this formula.The projected classification based on the Random Forest (RF) model is represented by ${\widehat{\text{y}}}_{\text{i}}$ and ‘F’ stands for the fitness function in Eq. (7). Each of the seven input parameters has lower and higher bounds that must be met in order to keep the input variables within acceptable bounds. In addition, functional restrictions have been added in compliance with design standards, such as making sure that the total percentage of solids and water does not surpass 100%. It’s crucial to remember, nevertheless, that choosing a sensible pairing of these seventeen factors for the drilling procedure frequently necessitates a high level of discretion.

However, the key limitation of the mitigation module is that the model cannot mitigate lost circulation in many cases especially severe or natural ones. The nature of lost circulation could be induced or natural. Mitigation plan could be provided if the losses were induced due to high ROP or high pump pressure. Additionally, if losses were caused by induced factors and escalated to severe levels, reversing them becomes exceedingly difficult. Therefore, the conclusion should emphasize that mitigation efforts primarily target partial losses and aim to transition them to seepage losses or ideally eliminate losses altogether.

Practical applications of ALCSCMS

ALCSCMS serves as a valuable tool in the oilfield, particularly before drilling operations commence. Its application lies in several key areas:

1.
Pre-drilling risk assessment By inputting geological data, wellbore characteristics, and drilling parameters, the model evaluates the likelihood of circulation loss occurrences in different formations and scenarios. This information allows operators to proactively identify high-risk areas and implement preemptive measures to mitigate potential circulation loss issues.
2.
Optimized well planning Utilizing the predictive model during the well planning phase allows operators to optimize well design and trajectory to minimize the risk of circulation loss. By simulating various drilling scenarios and assessing their associated risks, operators can make informed decisions regarding casing design, drilling fluid selection, and wellbore strengthening strategies.
3.
Mitigation strategy development The model assists in the development of targeted mitigation strategies customized to specific drilling environments and challenges. By analyzing the factors contributing to circulation loss risks, such as formation properties, pore pressure, and drilling parameters, operators can develop effective mitigation plans. These plans can include the pre-treatment of formations, the use of specialized drilling fluids, and the implementation of contingency measures to address potential circulation loss events.
4.
Cost reduction and efficiency improvement By accurately predicting circulation loss risks and implementing proactive mitigation measures, operators can reduce drilling downtime, minimize costly remediation efforts, and enhance overall drilling efficiency. The application of the predictive model leads to optimized drilling operations, resulting in significant cost savings and improved project outcomes.

Conclusions

Lost circulation can lead to considerable financial setbacks due to expenses related to drilling fluids, equipment, and the operational downtime required for remediation. This can have a significant adverse impact on the overall financial planning for drilling projects. In addition, Extended Drilling Time, Wellbore Instability, Environmental Impact, and Safety Hazards are direct impacts of lost circulation. To tackle this problem, the study introduces Automated Lost Circulation Severity Classification and Mitigation System (ALCSCMS) to enable decision makers to reliably predict the severity of lost circulation (LCS) using a few key drilling parameters before commencing drilling operations. The paper concluded the followings:

Effectiveness of ALCSCMS The study demonstrates the effectiveness of the Automated Lost Circulation Severity Classification and Mitigation System (ALCSCMS) in accurately predicting lost circulation severity using key drilling parameters.
Optimal model performance The Random Forests (RF) model, optimized through Bayesian optimization, achieves remarkable 100% classification accuracy in predicting lost circulation severity, making it the preferred choice for classification tasks.
Mitigation strategy integration Incorporating a mitigation optimization model based on genetic algorithms allows for the transformation of highly severe lost circulation situations into more manageable categories, enhancing overall drilling efficiency and cost-effectiveness.
Insights from SHAP analysis The use of SHapley Additive exPlanations (SHAP) provides valuable insights into the key parameters influencing lost circulation severity predictions, aiding decision-making processes in drilling operations.

Additionally, the research incorporated a mitigation optimization model based on a genetic algorithm. This model’s role is to transform highly severe lost circulation situations into more manageable categories. Furthermore, the SHapley Additive exPlanations (SHAP) technique was used to offer insights and explanations for the key parameters that served as inputs for predicting lost circulation severity.

While the research on the Automated Lost Circulation Severity Classification and Mitigation System (ALCSCMS) is promising, there are several limitations such as real-time data gathering, cost–benefit analysis of the mitigation procedures, and environmental considerations. Therefore, the future research directions for the study on an integrated system for automated lost circulation severity classification and mitigation (ALCSCMS) could encompass several areas:

1.
Real-world implementation and validation To assess the practical applicability of ALCSCMS, further research may involve field trials and validation on actual drilling operations for several sites and fields. This will confirm the system’s effectiveness under real-world conditions.
2.
Data enhancement Expanding the dataset used in the research or continuously updating it with more recent information can improve the accuracy of the models and their adaptability to evolving drilling scenarios. Integration with internet of things (IoT) and Sensor Data: Incorporating real-time data from sensors and internet of things (IoT) IoT devices on drilling rigs can enable the system to adapt and respond dynamically to changing drilling conditions. This can help in real-time decision-making and mitigation strategies.
3.
Advanced machine learning techniques Future research can explore the application of more advanced machine learning techniques, such as deep learning, reinforcement learning, or hybrid models, to further enhance the predictive capabilities of ALCSCMS.

Abbreviations

F:: The fitness function in equation
F2 Score:: F1 score that place more weight on recall
F-Beta Score:: Generalized form of the F1 score that allows to give different weight to precision and recall using the parameter β
FN:: False negative
FP:: False positive
Gen:: Generation
I:: Iteration
TN:: True negative
TP:: True positive
TPR:: True positive rate
AdaBoost:: Adaptive boosting
AI:: Artificial intelligence
ANNs:: Artificial neural networks
CART:: Classification and regression trees
CNN:: Convolutional Neural Networks
cp:: Centipoise (a unit of dynamic viscosity)
DNNs:: Deep neural networks
DT:: Decision tree
EC:: Evolutionary computing
Extra Trees:: Extremely randomized trees
GA:: Genetic algorithms
Inch:: In (inches)
IoT:: Internet of things
Klbs:: Thousands of pounds (Kilopounds)
klbs-ft:: Thousands of pounds-feet (Kilopounds-feet)
kNN:: K-nearest neighbors
lbs/100 ft2:: Pounds per 100 square feet
m:: Meters
m/hr:: M/hr
ML:: Machine learning
MSE:: Mean square error
pcf:: Pounds per cubic foot
psi:: Pounds per square inch
ReLU:: Standard rectified linear unit
RF:: Random forest
RMSE:: The root mean squared error
RPM:: Revolutions per minute
sec/qt:: Seconds per quart
SVM:: Support vector machine
XGBoost:: Extreme gradient boosting

References

Abbas AK, Al-haideri NA, Bashikh AA (2019a) Implementing artificial neural networks and support vector machines to predict lost circulation. Egypt J Pet 28(4):339–347. https://doi.org/10.1016/j.ejpe.2019.06.006
Article Google Scholar
Abbas AK, Hamed HM, Al-Bazzaz W, Hayder A (2019) Predicting the amount of lost circulation while drilling using artificial neural SPE-198617-MS predicting the amount of lost circulation while drilling using artificial neural networks : an example of Southern Iraq oil fields. https://doi.org/10.2118/198617-MS
Agin F, Khosravanian R, Karimifard M, Jahanshahi A (2020) Application of adaptive neuro-fuzzy inference system and data mining approach to predict lost circulation using DOE technique (case study: Maroon oilfield). Petroleum 6(4):423–437. https://doi.org/10.1016/j.petlm.2018.07.005
Article Google Scholar
Al Hameedi ATT, Alkinani HH, Norman SD, Flori RE, Hilgedick SA, Amer AS (2017) Limiting key drilling parameters to avoid or mitigate mud losses in the Hartha formation, Rumaila Field, Iraq. J Pet Environ Biotechnol. https://doi.org/10.4172/2157-7463.1000345
Article Google Scholar
Al hameedi ATT, Dunn norman S, Alkinani HH, Flori RE, Hilgedick SA (2017) Limiting drilling parameters to control mud losses in the Shuaiba formation, South Rumaila Field, Iraq. Paper AADE-17-NTCE-45 presented at the 2017 AADE National Technical Conference and Exhibition, Houston, Texas, April 11–12. Available at OnePetro: https://onepetro.org/ARMAUSRMS/proceedings/ARMA17/All-ARMA17/ARMA-2017-0930/124538
Al hameedi ATT, Dunn Norman S, Alkinani HH, Flori RE, Hilgedick SA (2017) Limiting drilling parameters to control mud losses in the dammam formation, South Rumaila Field, Iraq. In: 51st US rock mechanics/geomechanics symposium, 5(April 2018), 3298–3311
Al Hameedi ATT, Alkinani HH, Dunn Norman S, Flori RE, Hilgedick SA, Amer AS, Alsaba MT (2018) Using machine learning to predict lost circulation in the Rumaila field, Iraq. In: Society of Petroleum Engineers—SPE Asia Pacific oil and gas conference and exhibition 2018, APOGCE 2018. https://doi.org/10.2118/191933-MS
Al Hameedi AT, Alkinani HH, Dunn Norman S, Flori RE, Hilgedick SA, Alkhamis MM, Alshawi YQ, Al Maliki MA, Alsaba MT (2018) Predictive data mining techniques for mud losses mitigation. All Days, April. https://doi.org/10.2118/192182-MS
Al hameedi ATT, Alkinani HH, Dunn norman S, Al alwani MA, Mohammed M (2019) SPE-196243-MS application of artificial intelligence in the petroleum industry: volume loss prediction for naturally fractured formations. L
Aljubran M, Ramasamy J, Albassam M, Magana-mora A (2021) Deep learning and time-series analysis for the early detection of lost circulation incidents during drilling operations. IEEE Access. https://doi.org/10.1109/ACCESS.2021.3082557
Article Google Scholar
Alkinani HH, Al-hameedi ATT, Dunn-norman S (2020a) Energy and AI Data-driven decision-making for lost circulation treatments: a machine learning approach. Energy AI 2:100031. https://doi.org/10.1016/j.egyai.2020.100031
Article Google Scholar
Alkinani HH, Al-Hameedi ATT, Dunn-Norman S, Dunn S (2020b) Artificial neural network models to predict lost circulation in natural and induced fractures. SN Appl Sci 2(12):1980. https://doi.org/10.1007/s42452-020-03827-3
Article Google Scholar
Alkinani HH, Al-Hameedi ATT, Dunn-Norman S (2019) Using artificial neural networks to estimate mud losses prior to drilling for natural fractures formations. In: Paper AADE-19-NTCE-022 presented at 2019 AADE national technology conference and exhibition, Denver, Colorado. pp 1–8
Alsaihati A, Abughaban M, Elkatatny S, Abdulraheem A (2021) Detection of loss zones while drilling using different machine learning techniques. J Energy Resour Technol Trans ASME. https://doi.org/10.1115/1.4051553
Article Google Scholar
Alsaihati A, Abughaban M, Elkatatny S, Shehri DA (2022) Application of machine learning methods in modeling the loss of circulation rate while drilling operation. ACS Omega 7(24):20696–20709. https://doi.org/10.1021/acsomega.2c00970
Article CAS Google Scholar
Arshad U, Jain B, Ramzan M, Alward W, Diaz L, Hasan I, Aliyev A, Riji C (2015) Engineered solution to reduce the impact of lost circulation during drilling and cementing in Rumaila field, Iraq. https://doi.org/10.2523/IPTC-18245-MS
Behnoud P, Hosseini P (2017) Estimation of lost circulation amount occurs during under balanced drilling using drilling data and neural network. Egypt J Pet 26:627–634
Article Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(1996):123–140
Article Google Scholar
Breiman L (1998) Arcing classifier (with discussion). Ann Stat 26(3):801–849. https://doi.org/10.1214/aos/1024691352
Article Google Scholar
Bishop CM, Nasrabadi NM (2006) Pattern recognition and machine learning. Springer, New York
Google Scholar
Bode J (2000) Neural networks for cost estimation: simulations and pilot application. Int J Prod Res 38(6):1231–1254
Article Google Scholar
Burgoyne AT, Millheim KK, Chenevert ME, Young FS et al (1991) Applied drilling engineering, SPE Textbook Series, Volume 2
CP C (2014) Drilling specialties company a division of chevron Phillips Chemical Company LP, pp 1–82
Deng S, Pei C, Yan X, Hao H, Cui M, Zhao F, Cai C, Shi Y (2023) Lost circulation prediction method based on an improved fruit fly algorithm for support vector machine optimization. ACS Omega 8(36):32838–32847. https://doi.org/10.1021/acsomega.3c03919
Article CAS Google Scholar
Dietterich TG (2000) Ensemble methods in machine learning. In: International workshop on multiple classifier systems. Springer, Berlin, Heidelberg, pp 1–15
Elmousalami HH (2020a) Data on field canals improvement projects for cost prediction using artificial intelligence. Data Brief 31:105688
Article Google Scholar
Elmousalami HH (2020b) Comparison of artificial intelligence techniques for project conceptual cost prediction: a case study and comparative analysis. IEEE Trans Eng Manag 68(1):183–196
Article Google Scholar
Elmousalami HH, Elaskary M (2020) Drilling stuck pipe classification and mitigation in the Gulf of Suez oil fields using artificial intelligence. J Pet Explor Prod Technol 10(5):2055–2068
Article Google Scholar
Elmousalami H, Sakr I (2024) Artificial intelligence for drilling lost circulation: a systematic literature review. Geoenergy Sci Eng 239:212837
Article CAS Google Scholar
Elmousalami HH, Darwish A, Hassanien AE (2021) The truth about 5G and COVID-19: basics, analysis, and opportunities. In: Hassanien Aboul Ella, Darwish Ashraf (eds) Digital transformation and emerging technologies for fighting COVID-19 pandemic: innovative approaches. Springer, Cham, pp 249–259
Google Scholar
Elmousalami H, Elshaboury N, Elyamany AH (2024) Green artificial intelligence for cost-duration variance prediction (CDVP) for irrigation canals rehabilitation projects. Expert Syst Appl 249:123789
Article Google Scholar
Feurer M, Hutter F (2019) Hyperparameter optimization. Automated machine learning: methods, systems, challenges. pp 3–33
Feurer M, Springenberg J, Hutter F (2015) Initializing bayesian hyperparameter optimization via meta-learning. In: Proceedings of the AAAI conference on artificial Intelligence, 29(1)
Fidan E, Babadagli T, Kuru E (2004) Use of cement as lost circulation material—field case studies. In: Proceedings of the IADC/SPE Asia pacific drilling technology conference and exhibition. pp 303–312. https://doi.org/10.2523/88005-ms
Geng Z, Wang H, Fan M, Lu Y, Nie Z, Ding Y, Chen M (2019) Predicting seismic-based risk of lost circulation using machine learning. J Pet Sci Eng 176:679–688
Google Scholar
Gul S, Van Oort E (2020) A machine learning approach to filtrate loss determination and test automation for drilling and completion fluids. J Pet Sci Eng 186(2019):106727. https://doi.org/10.1016/j.petrol.2019.106727
Article CAS Google Scholar
Hammam AA, Elmousalami HH, Hassanien AE (2020) Stacking deep learning for early COVID-19 vision diagnosis. In: Hassanien A-E, Dey N, Elghamrawy S (eds) Big data analytics and artificial intelligence against COVID-19: innovation vision and approach. Springer, Cham, pp 297–307. https://doi.org/10.1007/978-3-030-55258-9_18
Chapter Google Scholar
Hou X, Wang J, Cao B (2020) Lost circulation prediction in South China Sea using machine learning and big OTC-30653-MS lost circulation prediction in South China Sea using machine learning and big data technology. May. https://doi.org/10.4043/30653-MS
Jafarizadeh F, Larki B, Kazemi B, Mehrad M, Rashidi S, Ghavidel Neycharan J, Gandomgoun M, Gandomgoun MH (2022) A new robust predictive model for lost circulation rate using convolutional neural network: a case study from Marun Oilfield. Petroleum. https://doi.org/10.1016/j.petlm.2022.04.002
Article Google Scholar
Kang Y, Ma C, Xu C, You L, You Z (2023) Prediction of drilling fluid lost-circulation zone based on deep learning. Energy 276:127495
Article Google Scholar
Kuncheva LI (2014) Combining pattern classifiers: methods and algorithms. John Wiley & Sons
Kurita T (2019) Principal component analysis (PCA). In: Computer Vision: A Reference Guide, pp 1–4
Lavrov A (2016) Lost circulation: mechanisms and solutions. In: Lost circulation: mechanisms and solutions. https://doi.org/10.1016/C2015-0-00926-1
Li Z, Chen M, Jin Y, Lu Y, Wang H, Geng Z, Wei S (2018) Study on intelligent prediction for risk level of lost circulation while drilling based on machine learning. In: 52nd US rock mechanics/geomechanics symposium
Magzoub MI, Kiran R, Salehi S, Hussein IA, Nasser MS (2021) Assessing the relation between mud components and rheology for loss circulation prevention using polymeric gels: a machine learning approach. Energies 14(5):1377. https://doi.org/10.3390/en14051377
Article CAS Google Scholar
Mahmoud O, Nasr-El-Din HA, Vryzas Z, Kelessidis VC (2018) Using ferric oxide and silica nanoparticles to develop modified calcium bentonite drilling fluids. SPE Drill Complet 33(01):12–26. https://doi.org/10.2118/178949-PA
Article Google Scholar
Mahmoud O, Nasr-El-Din HA, Vryzas Z, Kelessidis VC (2016) Nanoparticle-based drilling fluids for minimizing formation damage in HP/HT applications. Society of Petroleum Engineers Paper 178949, presented at SPE International Symposium on Oilfield Chemistry, Woodlands, Texas, USA, 22–24 February 2016. https://doi.org/10.2118/178949-MS
Mangalathu S, Hwang SH, Jeon JS (2020) Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach. Eng Struct 219:110927
Article Google Scholar
Manshad AK, Rostami H, Niknafs H, Mohammadi AH (2017) Integrated lost circulation prediction in drilling operation
Mardanirad S, Wood DA, Zakeri H (2021) The application of deep learning algorithms to classify subsurface drilling lost circulation severity in large oil field datasets. SN Appl Sci. https://doi.org/10.1007/s42452-021-04769-0
Article Google Scholar
Mehrabian A, Jamison DE, Teodorescu SG (2015) Geomechanics of lost-circulation events and wellbore-strengthening operations. SPE J 20(6):1305–1316. https://doi.org/10.2118/174088-PA
Article Google Scholar
Moazzeni A, Nabaei M, Kharrat R (2011) A breakthrough in controlling lost circulation in a pay zone by optimizing the particle size distribution of shellfish and limestone chips. Pet Sci Technol 30(3):290–306. https://doi.org/10.1080/10916466.2010.483438
Article CAS Google Scholar
Moazzeni A, Taylor P, Abaei M, Jegarluei SG (2012) Decision making for reduction of nonproductive time through an integrated lost circulation prediction. Pet Sci Technol. https://doi.org/10.1080/10916466.2010.495961
Article Google Scholar
Ning YC, Ridha S, Ilyas SU, Krishna S, Dzulkarnain I, Abdurrahman M (2023) Application of machine learning to determine the shear stress and filtration loss properties of nano-based drilling fluid. J Pet Explor Prod Technol 13(4):1031–1052. https://doi.org/10.1007/s13202-022-01589-9
Article CAS Google Scholar
Olukoga TA, Feng Y (2022) A case study on the classification of lost circulation events during drilling using machine learning techniques on an imbalanced large dataset. University of Louisiana at Lafayette. pp 1–21
Otchere DA, Aboagye M, Mohammed MAA, Boakye TB (2022) Enhancing drilling fluid lost-circulation prediction using model agnostic and supervised machine learning. SSRN Electron J. https://doi.org/10.2139/ssrn.4085366
Article Google Scholar
Pang H, Meng H, Wang H, Fan Y, Nie Z, Jin Y (2022) Lost circulation prediction based on machine learning. J Pet Sci Eng 208:109364. https://doi.org/10.1016/j.petrol.2021.109364
Article CAS Google Scholar
Parizad A, Shahbazi K, Tanha AA (2018) SiO2 nanoparticle and KCl salt effects on filtration and thixotropical behavior of polymeric water based drilling fluid: with zeta potential and size analysis. Results Phys 9:1656–1665. https://doi.org/10.1016/j.rinp.2018.04.037
Article Google Scholar
Pilehvari AA, Nyshadham VR (2002) Effect of material type and size distribution on performance of loss/seepage control material. In: Proceedings—SPE international symposium on formation damage control. pp 863–875. https://doi.org/10.2118/73791-ms
Sabah M, Talebkeikhah M, Agin F, Telebkeikhah F, Hasheminasab E, Talebkeikhah F, Hasheminasab E (2019) Application of decision tree, artificial neural networks, and adaptive neuro-fuzzy inference system on predicting lost circulation: a case study from Marun oil field. J Pet Sci Eng 177(February 2018):236–249. https://doi.org/10.1016/j.petrol.2019.02.045
Article CAS Google Scholar
Sabah M, Mehrad M, Ashrafi SB, Wood DA, Fathi S (2021) Hybrid machine learning algorithms to enhance lost-circulation prediction and management in the Marun oil field. J Pet Sci Eng 198(September 2019):108125. https://doi.org/10.1016/j.petrol.2020.108125
Article CAS Google Scholar
Salih AK, Abdul Hussein HA (2023) Lost circulation prediction using decision tree, random forest, and extra trees algorithms for an Iraqi oil field. Iraqi Geol J 55(2):111–127. https://doi.org/10.46717/igj.55.2E.7ms-2022-11-21
Article Google Scholar
Schapire RE, Freund Y, Bartlett P, Lee WS (1998) Boosting the margin: A new explanation for the effectiveness of voting methods. Ann Stat 26(5):1651–1686
Google Scholar
Shadravan A, Nabaei M, Amani M (2009) Development of underbalanced drilling implementation in Parsi oilfield. In: Offshore Europe Oil and gas conference & exhibition. September, 8–11
Shi X, Zhou Y, Zhao Q, Jiang H, Zhao L, Liu, Y, Yang G (2019) A new method to detect influx and loss during drilling based on machine learning
Siddique N, Adeli H (2013) Computational intelligence: synergies of fuzzy logic, neural networks and evolutionary computing. Wiley, Hoboken
Book Google Scholar
Snoek J, Larochelle H, Adams RP (2012) Practical bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems, 25
Sokolova M, Japkowicz N, Szpakowicz S (2006) Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In: Australasian joint conference on artificial intelligence (pp. 1015–1021). Berlin, Heidelberg: Springer Berlin Heidelberg
Su J, Zhao Y, He T, Luo P (2021) Prediction of drilling leakage locations based on optimized neural networks and the standard random forest method. Oil Gas Sci Technol 76:24
Article CAS Google Scholar
Tootkaboni MG, Ibrahim MNM (2021) Prediction of lost circulation in one of the Iranian oil fields aided by machine learning
Toreifi H, Rostami H (2014) New method for prediction and solving the problem of drilling fluid loss using modular neural network and particle swarm optimization algorithm. J Pet Explor Prod Technol. https://doi.org/10.1007/s13202-014-0102-5
Article Google Scholar
Vryzas Z, Mahmoud O, Nasr-El-Din HA, Kelessidis VC (2015) Development and testing of novel drilling fluids using Fe₂O₃ and SiO₂ nanoparticles for enhanced drilling operations. Day 4 Wed, December 09, 2015. https://doi.org/10.2523/IPTC-18381-MS
Whitfill DL, Hemphill T (2003) Whitfill2003
Wood DA, Mardanirad S, Zakeri H (2022) Effective prediction of lost circulation from multiple drilling variables: a class imbalance problem for machine and deep learning algorithms. J Pet Explor Prod Technol 12(1):83–98. https://doi.org/10.1007/s13202-021-01411-y
Article Google Scholar
Wu J, Chen XY, Zhang H, Xiong LD, Lei H, Deng SH (2019) Hyperparameter optimization for machine learning models based on Bayesian optimization. Jo Electron Sci Technol 17(1):26–40
Google Scholar
Yang J, Sun J, Bai Y, Lv K, Zhang G, Li Y (2022a) Status and prospect of drilling fluid loss and lost circulation control technology in fractured formation. Gels 8(5):1–15. https://doi.org/10.3390/gels8050260
Article CAS Google Scholar
Yang X, Liu H, Zhou B, Zhu J, Zhang S, Wang T, Zhang Z, Lu H, Lou E, Bao D (2022) Research on prediction model of fracture width in loss formation based on artificial neural network. Highlights Sci Eng Technol 25:13–20. https://doi.org/10.54097/hset.v25i.3413
Article Google Scholar

Download references

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Engineering and IT Faculty, Infrastructure Department, Melbourne University, Victoria, Australia
Haytham Elmousalami
General Petroleum Company, Nasr City, Egypt
Haytham Elmousalami
Project Management Institute, Newtown Square, USA
Haytham Elmousalami
Drilling and Workover Administration, Belayim Petroleum Company (Petrobel), Nasr City, Egypt
Ibrahim Sakr

Authors

Haytham Elmousalami
View author publications
You can also search for this author in PubMed Google Scholar
Ibrahim Sakr
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haytham Elmousalami.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Data availability

All data, models, or code that support the findings of this study are available as an open source at https://github.com/HaythamElmousalami/Drilling-Lost-circulation

Additional information

Publisher's Note

Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Elmousalami, H., Sakr, I. Automated lost circulation severity classification and mitigation system using explainable Bayesian optimized ensemble learning algorithms. J Petrol Explor Prod Technol (2024). https://doi.org/10.1007/s13202-024-01841-4

Download citation

Received: 12 January 2024
Accepted: 22 June 2024
Published: 11 July 2024
DOI: https://doi.org/10.1007/s13202-024-01841-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Automated lost circulation severity classification and mitigation system using explainable Bayesian optimized ensemble learning algorithms

Abstract

Similar content being viewed by others

Effective prediction of lost circulation from multiple drilling variables: a class imbalance problem for machine and deep learning algorithms

Development of Heavy Rain Damage Prediction Technique Based on Optimization and Ensemble Method

The use of explainable artificial intelligence for interpreting the effect of flow phase and hysteresis on turbidity prediction