Introduction

Drilling for oil and gas is one of the riskiest activities on Earth. The drilling pipe stuck issue is one of the most critical drilling problems which costs more than $250 million per year. Complications related to stuck pipe can account for nearly half of total well cost, making stuck pipe one of the most expensive problems that can occur during a drilling operation. This problem may reache to the drill string loss or the complete loss of the well (Shadizadeh et al. 2010; Siruvuri et al. 2006). The stuck pipe occurs due to several mechanisms including improper hole cleaning, wellbore stability, poor well trajectory, improper drilling fluid, hole assembly design, and differential sticking forces. The risk of mechanical or differentially stuck pipe can be minimized by adopting drilling variables. Pore pressure rises the probability of the pipe stuck. Moreover, lower mud densities can increase the risk of wellbore instability and mechanical sticking.

Pipe stuck risk can be effectively managed and mitigated based on reliable pipe stuck model. Model types can be divided into three main categories: empirical, physical, and mathematical (Noshi and Schubert 2018). However, empirical and physical models cannot capture high predictive accuracy and generalization. On the other hand, mathematical models statistically need dataset to be developed. In the oil and gas industry, huge dimensions of hourly real-time production data can be measured such as pressure, flow rate, and temperature profiles using sensors and internet of things (IoT) devices on the surface of down hole. Such observed data are known as the big data characterized by volume, velocity, and variety (Mishra and Datta-Gupta 2017). The main motivations to automate stuck classification and mitigation are as follows:

  1. 1.

    To provide a proactive prediction tool that can early predict the stuck occurrence based on the key drilling stuck predictors.

  2. 2.

    To provide a reliable tool that can avoid the stuck cases and optimize the drilling parameters.

  3. 3.

    To present a comprehensive comparison among ML algorithms for pipe stuck prediction.

  4. 4.

    To identify the importance of each predictor for drilling pipe stuck using sensitivity analysis.

  5. 5.

    To present a novel dataset for drilling pipe stuck classification and mitigation in the Gulf of Suez (GOS).

Love (1983) was the first one to use the past data to develop a predictive model for success rate of freeing stuck drill pipe using a trial-and-error method for key predictors’ selection. ML can be applied to identify stuck pipe incidents where the predictors have been collected based on historical data, reports of stuck pipe, and published literature. The collected predictors have been ranked to identify the key predictors. After validation and testing processes, the model showed promising results where the proposed model enhanced the describing and monitoring of the drilling data streams (Alshaikh et al. 2019). Using real-time drilling operations, a framework for the early accurate detection of stuck pipe has been developed based on random forests. The model has automated data extraction module and reliable prediction classifier that helps drilling engineers and the rig crew to predict the stuck pipe risk (Magana-Mora et al. 2019). Natural language processing and ML can be developed for the analysis of drilling data. The objective is to improve reservoir management and determine the non-productive time and extract crucial information. The model shows successful performance in the fields in North and South America and fields located in the Middle East (Castiñeira et al. 2018).

ANNs have been used for the stuck drill pipe prediction in Maroon field where the model is capable of producing reliable results (MoradiNezhad et al. 2012). Chamkalani et al. (2013) have proposed a new methodology based on SVM for stuck pipe prediction. ANNS and SVM have been implemented for stuck pipe prediction where both models present accurate result of 83% based on binary classification (Albaiyat 2012). Based on 40 oil wells, multivariate statistics has been conducted for prediction of stuck pipe. Multivariate statistical analysis consisted of regression analysis and discriminate analysis with success rate up to 86% (Shoraka et al. 2011). A convolutional neural network (CNN) approach has been used to predict the stuck occurrence in the Gulf of Mexico. Back-propagation learning rule and sigmoid-type nonlinear activation functions have been used to develop the model. The model presents reliable results for stuck prediction based on the collected data (Siruvuri et al. 2006). This literatures review did not reveal any comparison for different ML techniques designed to prevent the sticking of the drill pipe.

Based on the literature survey, there is no a comprehensive comparison study of the different AI algorithms for the drilling stuck pip prediction. The key objective of this research is evaluating the classification accuracy of different AI models to produce the most accurate classification model. Moreover, this research aims to present a comprehensive performance comparison for AI model to guide the researchers and practitioners during the drilling stuck classification modeling. This research consists of five steps as follows as illustrated in Fig. 1:

  1. 1.

    The past literature has been reviewed to know the past practices for drilling stuck modeling.

  2. 2.

    Real data of cases of the drilling pipe stuck have been gathered. The data have been quantitatively collected based on the site records for each drilling well.

  3. 3.

    The third step includes a model development based on AI models where a total of 12 predictive models have been built.

  4. 4.

    The fourth step is the models’ validation to select the most accurate model.

  5. 5.

    The fifth step is to analyze the results and conduct a sensitivity analysis to identify the contribution of the parameters on the pipe stuck.

  6. 6.

    Finally, an optimization system has been incorporated into prediction model to optimize drilling parameters to mitigate stuck and partially stuck cases.

Fig. 1
figure 1

Research methodology

Application to drilled wells in the Gulf of Suez

The process of data acquisition is the most difficult and critical part of any statistical learning (Elmousalami et al. 2018a, Elmousalami 2020). As shown in Fig. 2, a total number of 103 wells were drilled offshore and onshore during the five-year period from 2010 to 2015 by General Petroleum Company (GPC) and petroleum sector in Egypt. These data were collected using sensors and measuring devices on the drilling rig where these sensors are validated based on quality control and safety procedures before and during the drilling process. Moreover, these data have been handled to the drilling experts and engineers to check its quality and reliability where all outliers and missing data have been removed. The data contained 26 stuck and 77 non-stuck and partially stuck cases. The type of stuck pipe is mechanical pipe sticking due to poor hole cleaning, wellbore collapse, and key-seating. The parameter set includes a total of seven drilling parameters recorded on a daily basis as illustrated in Table 1.

Fig. 2
figure 2

Oil fields map in the Gulf of Suez

Table 1 The drilling pipe stuck parameters

The drilling pipe stuck issue could be a dynamic problem which could exist at different time periods of a drilling project. Thus, using a binary string cannot effectively represent the whole problem. Therefore, the output can be three general groups of data: stuck, partially stuck, and non-stuck. The output probability ranges from 0 to 1 where the range from 0 to 0.4 represents non-stuck case, the range from 0.4 to 0.7 represents partially stuck, and the range from 0.7 to 1 represents the stuck case.

Correlation analysis has been done to identify the key performance indicators (KPIs) as shown in Fig. 3. Scatterplots of all the independent variables with each other were drawn to check the collinearity among the variables. Of the two variables which showed collinearity, the one that showed a weak correlation with the outcome was dropped. This deletion was also based on discussions with the experts, common wisdom, and knowledge about the subject and statistics. The characteristic of the formation along the drilling trajectory has been excluded from the collected features because the formation of the collected dataset has the same characteristic in the Gulf of Suez fields. Moreover, the proposed classification model aims to classify the stuck case based on the least number of the input parameters.

Fig. 3
figure 3

The predictors correlation heat map

Machine learning methods

ML algorithms are scalable algorithms used for pattern recognition and obtaining useful insight based on the collected data (LeCun et al. 2015, Bishop 2006). AI and ML are general purpose techniques which can be applied for several applications (Elmousalami 2020; Witten et al. 2016). The ML models in this study can be applied in the abroad area of oil and gas industry where modeling methodology can valid for different projects types. ML can be single or ensemble type. Single ML models are such as SVM, DT, and ANN. On the other hand, ensemble ML models are bagging, booting, XGBoost, and random forest. Before training the ML algorithms, the data input values have been normalized using min–max feature scaling (Dodge and Commenges 2006). The normalization process improves the computation for each classifier.

Single AL model

Support vector machines (SVM)

SVMs are supervised learning algorithms that can be used for both classification and regression applications (Elmousalami 2019a, 2020). SVM optimizes hyperplanes distance and the margin as shown in Fig. 4. Hyperplane distance can be maximized based on two classes of boundaries using the following equation (Vapnik 1979):

$$ {\text{Linear}}\,{\text{SVM}} = \left\{ {\begin{array}{*{20}l} {W \cdot X_{i} + b \ge 1 ,} \hfill & {{\text{if}}\;y_{i} \ge 0} \hfill \\ {W \cdot X_{i} + b < - 1 ,} \hfill & {{\text{if}}\;y_{i} < 0 } \hfill \\ \end{array} } \right. $$
(1)

For i = 1,2,3,…, m, a positive slack variable (\( \xi \)) is added for handling the nonlinearity as displayed in Eq. (2):

$$ y_{i} \left( {W.X_{i} + b} \right) \ge 0 - \xi ,\quad i = 1,2,3, \ldots \ldots m $$
(2)
Fig. 4
figure 4

Linear support vector machine

Accordingly, the objective function will be as shown in Eq. (3):

$$ {\text{Min}} \mathop \sum \limits_{i = 0}^{i = m} \frac{1}{2} w \cdot w^{T} + C \mathop \sum \limits_{i = 0}^{i = m} \xi_{i} $$
(3)

Decision trees (DTs)

Decision tree (DT) is a statistical learning algorithm that is dividing the collected data into logical rules hierarchically (Elmousalami 2019b; Breiman et al. 1984) as shown in Fig. 5. Splitting algorithm is repetitively used to formulate each node of the tree. Classification and regression trees (CART) and C4.5/C5.0 algorithms are the most common tree models used in the research and practical community. This model is applied for both classification and continues prediction applications (Curram and Mingers 1994). DT algorithm can interpret data and feature importance based on the generated logical statement for each tree node. However, DT is not a robust and stable algorithm against noisy and missing data (Perner et al. 2001).

Fig. 5
figure 5

Additive function concept

Logistic regression

Logistic regression (logit regression) is a predictive regression analysis which is appropriate for the dichotomous (binary) dependent variable (Hosmer et al. 2013). Logistic regression is used to explain data and to describe the relationship between one dependent binary variable and one or more independent variables. No outliers exist in the data, and there should be no high correlations (multicollinearity) among the predictors (Tabachnick and Fidell 2013). Mathematically, logistic regression can be defined as follows:

$$ P = \frac{1}{{1 + e^{{ - \left( {a + bX} \right)}} }} $$
(4)

where P is the classification probability, e is the base of the natural logarithm and (a) and (b) are the parameters of the model. Adding more predictors to the model can result in overfitting, which reduces the model generalizability and increases the model complexity.

K-nearest neighbor classifier (KNN)

KNN algorithm is building a nonparametric classifier (Altman 1992, Weinberger et al. 2006). KNN is an instance-based learning used for classification or regression applications. The object is classified by a majority vote of its neighbors in the training set. If K = 1, then the case is simply assigned to the class of its nearest neighbor. Many distance functions can be applied to measure the similarity among the instances such as Euclidian, Manhattan, and Minkowski (Singh et al. 2013).

Gaussian Naive Bayes algorithm

Gaussian Naive Bayes classifier is an algorithm for classification technique which assumes independency among predictors (Patil and Sherekar 2013). Naive Bayes is useful for very large datasets and known to outperform even highly sophisticated classification methods. Bayes theorem computes posterior probability P (c|x) from P(c), P(x), and P(x|c) as shown in Eq. 5:

$$ P(c|x) = \frac{{P\left( {x|c} \right)P\left( c \right)}}{P\left( x \right)} $$
(5)

where P(c|x) represents the posterior probability of the target class (c, target) given the input predictors (x, attributes); P(c) represents the prior probability of the target class; P(x|c) is the likelihood which is the probability of predictor given class; P(x) is the prior probability of predictor. Naive Bayes algorithm works by computing likelihood and probabilities for each class. Naive Bayesian formula computes the posterior probability for each class where the highest posterior probability class is the prediction outcome (Kohavi 1996).

Artificial neural networks (ANNs)

ANNs are computational systems biologically inspired by the design of natural neural networks (NNN). Key abilities of ANNs are generalization, categorization, prediction, and association (LeCun et al. 2015). ANNs have high ability to dynamically figure out the relationships and patterns between the objects and subjects of knowledge based on nonlinear functions (Elmousalami et al. 2018b). The feedforward network such as multilayer perceptron networks (MLPs) applies the input vector (x), a weight matrix (W), an output vector (Y), and a bias vector (b). It can be formulated as Eq. (6) and Fig. 6.

$$ Y = f\left({W \cdot x + b} \right) $$
(6)

where f (.) includes a nonlinear activation function.

Fig. 6
figure 6

Multilayer perceptron network (MLP)

Ensemble methods and fusion learning

Ensemble methods and fusion learning are data mining techniques to fuse several ML algorithms such as ANNs, DT, and SVM to boost the overall performance and accuracy (Hansen and Salamon 1990). Ensemble methods can merge several ML algorithms such as DT, SVM, or ANNs. Each single ML used in the ensemble method is called a base learner where the final decision is taken by the ensemble model. K is an additive function to predict the final output as given in Eq. (7):

$$ \hat{y}_{i} = \mathop \sum \limits_{k = 1}^{K} f_{k} \left( {X_{i} } \right),\quad f_{k} \in F $$
(7)

where \( \hat{y}_{i} \) represents the predicted dependent variable. Each fk is an independent tree structure and leaf weights w·xi are the independent variables. F is the regression trees space. Ensemble methods include several methods such as bagging, voting, stacking, and boosting (Elmousalami 2019c; 2020). The ensemble learning models deal effectively with the issues of complex data structures, high-dimensional data, and small sample size (Breiman 1996; Dietterich 2000; Kuncheva 2004).

Breiman (1999) proposed bagging technique as shown in Fig. 7a. Bagging applies bootstrap aggregating to train several base learners for variance reduction (Breiman 1996). Bagging draws groups of training data with replacement to train each base learner. Random forest (RF) is a special case of the bagging ensemble learning techniques. RF draws bootstrap subsamples to randomly create a forest of trees as shown in Fig. 7b (Breiman 2001). Using adaptive resampling, boosting method can be established for enhancing the performance of weak base learners (Schapire 1990) as shown in Fig. 7c. Adaptive boosting algorithm (AdaBoost) has been proposed by Schapire et al. (1998). Serially, AdaBoost draws the data for each base learner using adaptive weights for all instances. These adaptive weights guide the algorithm to minimize the prediction error and misclassified cases (Bauer and Kohavi 1999).

Fig. 7
figure 7

a Bagging, b RF, and c boosting

Extreme gradient boosting (XGBoost) is a gradient boosting tree algorithm. XGBoost uses parallel computing to learn faster and diminish computational complexity (Chen and Guestrin 2016). The following equation uses regularization term to the additive tree model to avoid overfitting of the model as shown in the following equation:

$$ L\left( \phi \right) = \left( {x + a} \right)^{n} = \mathop \sum \limits_{k = 0}^{n} l\left( {\hat{y}_{i} ,y_{i} } \right) + \mathop \sum \limits_{k = 1}^{K} \varOmega \left( {f_{k} } \right) ,\quad {\text{where}}\; \varOmega \left( f \right) = \gamma T + \frac{1}{2}\lambda \left\| {w^{2}} \right\| $$
(8)

where L represents a differentiable convex cost function (Friedman 2001). Moreover, XGBoost assigns a default direction into its tree branches to handle missing data in the training dataset. Therefore, no effort is required to clean the training data. Stochastic gradient boosting (SGB) is a boosting bagging hybrid model (Breiman 1996). SGB iteratively improves the model’s performance by injecting randomization into the selected data subsets to enhance fitting accuracy and computational cost (Schapire et al. 1998).

Extremely randomized trees algorithm (extra trees) is tree-based ensemble method which can be applied for both supervised classification and regression cases (Vert 2004). Extra trees algorithm essentially randomizes both cut-point choice and attribute during tree node splitting. The key advantage of extra trees algorithm is the tree structure randomization which enables the algorithm to be tuned for the optimal parameters’ selection. Moreover, extra trees have high computational efficiency based on a bias/variance analysis (Vert 2004).

In ML, many parameters are assessed and improved during the learning process. By contrast, a hyperparameter is a variable whose value is set before training. The performance of the ML algorithms depends on the tuning parameter. The objective of hyperparameters optimization is to maximize the predictive accuracy by finding the optimal hyperparameters for each ML algorithm. Manual search, random search, grid search, Bayesian optimization, and evolutionary optimization are the most common techniques used for ML hyperparameters optimization. However, manual search random search and grid search are brute force techniques which needs unlimited trails to cover all possible combinations to get the optimal hyperparameters (Bergstra et al. 2011). On the other hand, Bayesian optimization and evolutionary optimization are automatic hyperparameters optimization which selects the optimal parameter with less human intervention (Shahriari et al. 2015). Moreover, these techniques can solve the curse of dimensionality. Therefore, this study used genetic algorithms to select the global optimal setting for each model before training stage. Starting with a random population, the iterative process of selecting the strongest and producing the next generation will stop once the best-known solution is satisfactory for the user. Objective function is defined as minimization of classification accuracy (Acc in Eq. 10) for each classifier. Classification accuracy (Acc) computes the ratio between the correctly classified instances and the total number of samples as in Eq. (9):

$$ {\text{Acc}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}} $$
(9)

where TP is the true positive; FP the false positive; TN the true negative; FN the false negative. The domain space is defined as the range of the all possible hyperparameters for each algorithm as shown in Table 2. This study applied a decision tree algorithm as based learners for all ensemble methods. Accordingly, the proposed ensemble models and decision tree have been classified as tree-based models which have the same parameter as shown in Table 2. The maximum number of iterations to be run is defined as 10,000 iterations.

Table 2 Optimal hyperparameters settings

To compare machine learning algorithms, the identical blind validating cases are used to test the algorithms performance. The datasets have been divided into training set (80%) and validation set (20%), where the validation cases are excluded from the training data to ensure the generalization capability. This study applied tenfold cross-validation (10 CV) approach using the validation data set (20% of the whole data set). The K-fold cross-validation boosts the performance of validation process using a limited dataset.

Classification accuracy (Acc), specificity, and sensitivity are scalar measures for the classification performance. Moreover, receiver operating characteristic (ROC) is graphical measure for classification algorithm (Tharwat 2018). The receiver operating characteristic (ROC) curve is a two-dimensional graph in which the true positive rate (TPR) is represented in the y-axis and false positive rate (FPR) is in the x-axis (Sokolova et al. 2006a, b; Zou 2002):

$$ {\text{TPR}} = \frac{\text{TP}}{{{\text{TP}} + {\text{FN}}}} $$
(10)
$$ {\text{FPR}} = \frac{\text{FP}}{{{\text{TN}} + {\text{FP}}}} $$
(11)

Based on ROC, the perfect classification happens when the classifier curve possesses through the upper left corner of the graph. At such a corner point, all positive and negative samples are correctly classified. Therefore, the steepest curve has better performance. Area under the ROC curve (AUC) is applied to compare different classifiers in the ROC curve based on the scalar value. The AUC score is ranging between zero and one. Therefore, no realistic classifier has an AUC score lower than 0.5 (Metz 1978; Bradley 1997). ROC curves for each classifier must be potted to show the performance of classifier against different thresholds. In addition, the cost function is represented in the following equation:

$$ {\text{Error}} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} L\left\{ {\hat{Y}^{\left( i \right)} \ne Y^{\left( i \right)} } \right\} $$
(12)

where Error: TN + FP, N: the number of cases, \( \hat{Y}^{\left( i \right)} \) is the predicted value, \( Y^{\left( i \right)} \) is the actual value, and L is the loss function. In the current study, weights are added to the error formula (Eq. 10) to emphasize the weight of the true negative cases where the case is stuck in the reality and the model predicted it as a non-stuck case. To handle such case, Eq. 11 is added to Eq. 10 to formulate Eq. 13:

$$ W^{\left( i \right)} = \left\{ {\begin{array}{*{20}l} 1 \hfill & { {\text{if}}\quad X^{\left( i \right) } {\text{is}}\,{\text{nonstuck}}\,{\text{case}}} \hfill \\ {10} \hfill & {{\text{if}}\quad X^{\left( i \right)} \,{\text{is}}\,{\text{stuck}}\,{\text{case}}} \hfill \\ \end{array} } \right. $$
(13)

where \( X^{\left( i \right) } \) is the actual classification of the oil well stuck case.

$$ {\text{Modified}}\,{\text{Error}} = \frac{1}{{\sum W^{\left( i \right)} }}\mathop \sum \limits_{i = 1}^{N} W^{\left( i \right)} L\left\{ {\hat{Y}^{\left( i \right)} \ne Y^{\left( i \right)} } \right\} $$
(14)

Results and discussion

In engineering practice, the operator and decision-makers have to select a mathematical model regarding accuracy, the easiness of implementation, generalization, and uncertainty. The scope of the current study focused on the accuracy and the generalization ability of the developed algorithms. Based on validation dataset, accuracy (Acc), and AUC, 12 classifiers were validated as displayed in Table 3. The classifiers have been descendingly sorted from C1 to C12 based on AUC as shown in Table 3. This study presents that extra trees classifier (C1) is the most accurate for pipe stuck classification. Based on ROC comparison as shown in Fig. 8, extra trees classifier was in the first place. Based on the test data set, extra trees classifier (C1) yielded an overall correct classification of 100%, which means that 100% of the time this model was able to identify correctly the wells belonging to the given predictors. DT, RF, XGBoost, and AdaBoost produce RF produced 0.83 and 0.74 for AUC and accuracy, respectively. Ensemble methods such as [extra trees (C1), bagging (C8), RF (C3), AdaBoost (C5), and SGB (C9)] have produced a high acceptable performance.

Table 3 The classifiers’ accuracy
Fig. 8
figure 8

Average ROC curve for different classifiers

High-dimensional data can be effectively handled using ensemble machine learning. In addition, ensemble machine learning solves small sample size and complex data structures problems (Breiman 1996, Schapire et al. 1998). On the other hand, ensemble ML increases the model complexity (Kuncheva 2004). Accordingly, noisy data can be effectively computed by random forests algorithm than decision tree algorithm (Breiman 1996; Dietterich 2000). However, the RF algorithm is unable to interpret the importance of features or the mechanism of producing the results. On the other hand, ANNs, KNN, and logistic regression produced the least performance based on AUC of 0.592, 0.575, and 0.500, respectively.

DT presents an alternative to the black box existing in ANNs based on formulating logic statements (Perner et al. 2001). Furthermore, splitting procedure of DT can compute the high-dimensional data (Prasad et al. 2006). On the other hand, DT produced poor performance for noisy, nonlinear data or time series data (Curram and Mingers 1994). Therefore, tree-based models and ensemble models produce super performance than single algorithms. DT (CART) is inherently used as a based learner for the ensemble methods. Naive Bayes, SVM, bagging, and SGB produced a moderate accuracy where AUC ranged 0.817 to 0.808 and accuracy ranged from 0.501 to 0.61. Table 4 summarizes the limitations and strengths of each classifier. Table 4 guides the researchers and drilling engineers to select the appropriate ML model based on the algorithms’ characteristics.

Table 4 Algorithms comparison

Classifiers computational cost

The prediction accuracy should not be only the evaluation criterion for selecting the optimal ML algorithms. The computational costs (e.g., memory usage and computational time) of the algorithms are also significant criteria during the data processing. Figure 9 illustrates the computational time of the twelve developed algorithms. All models showed an acceptable computational time where the highest time was consumed by logistic regression and KNN algorithms of 192 s and 184 s, respectively. Conversely, XGBoost was the fastest algorithm. On the other hand, Fig. 9 shows that extra trees and DT consumed high memory of 205 and 197 MBs, respectively. ANNs and bagging DT consumed the least memory for classification.

Fig. 9
figure 9

Computational speed and memory for each classifier

Accordingly, using ensemble algorithms require more computational resources such as extra trees, AdaBoost, and bagging. Therefore, the memory usages that were used by all the ML algorithms were acceptable as at least 4 GB RAM memory. As a result, XGBoost, RFR, and DNNs were the most efficient algorithms based on the computational cost criterion. However, the computational cost (time and memory consumed) of the ML algorithms would exponentially increase with increasing data dimensions such as data features or data size.

A sensitivity analysis for the predictors was done to evaluate the impact of each predictor on the model’s performance. The F-score is the harmonic average of the precision and recall, where an F-score reaches its best value at 1 (perfect precision and recall) and worst at 0 (Sokolova et al. 2006a). Moreover, F-score calculates how many times this variable is split. Different ML model would have a different interpolation regarding the input parameters sensitivity. Accordingly, the sensitivity analysis has been done for the most accurate classifier [extra trees (C1)].

As illustrated in Fig. 10, the sensitivity analysis indicated that drilled depth (P7) had the highest impact on the output (drilling pipe stuck). String rotation (P6) and maximum inclination (P5) approximately had the same impact on the output. Similarly, rate of penetration, total drilling time, and mud type had the same impact on the output. The engineering and scientific insights that can be drawn from the sensitivity analysis are as follows:

  1. 1.

    All seven input parameters that are mentioned in Table 1 have significant impact on the pipe stuck classification.

  2. 2.

    Drilled depth (P7) is the key classifier for stuck cases identification where more drilled depth means more stuck probability percentage.

  3. 3.

    String rotation (P6) comes in the second place impacting on the stuck probability. Therefore, drilling engineers must accurately calculate the suitable string rotation.

  4. 4.

    Maximum inclination (P5), rate of penetration, total drilling time, and mud type have approximately the same impact on pipe stuck classification.

  5. 5.

    Mud pump circulation rate had the least impact on the output.

Fig. 10
figure 10

Sensitivity analysis

Drilling stuck pipe mitigation module

The drilling pipe stuck issue could easily exist for various reasons in the field applications. Unless the model could provide an effective way to design the drilling project and avoid the issue, predicting whether pipe sticking would happen or not has very little value for field operation. Therefore, once the well condition had been classified as stuck or partially stuck case, the optimization system is needed to determine the optimal values of the seven input parameters. As a result, an optimization system has been incorporated into optimal classification algorithm [extra trees model (C1)] to convert the seven input parameters form stuck or partially stuck case into a non-stuck case. The optimization system has used the genetic algorithm (GA) to optimize the seven input parameters as shown in Fig. 11.

Fig. 11
figure 11

Stuck mitigation system

The concept of evolutionary computing (EC) is based on the Darwin’s theory: survival for the fittest (Darwin 1859). Genetic algorithm (GA) is a branch of EC applied for optimization and searching applications (Holland 1975; Siddique and Adeli 2013). A chromosome can be represented as a vector (C) consisting of (n) genes denoted by (ci) as follows: C = {c1, c2, c3… ci}. Each chromosome (C) represents a point in the n-dimensional search space (Elmousalami 2020). In the current case study, the chromosomes represent the seven input parameters. Each chromosome consists of seven genes, where seven genes represent the well drilling parameters (P1, P2, P3, P4, P5, P6, P7), respectively, as shown in Table 1. Each gene consists of one of the membership functions (MFi) where (I) is ranging through the boundary condition for each variable (P1: P7). The number of chromosomes (initial population) is set as 10 chromosomes, and the number of generations is determined to be 10,000 generations. Crossover probability and mutation probability are set to be 0.7 and 0.03, respectively. Accordingly, a group of the initial population of chromosomes have been identified to be evaluated through fitness function.

Fitness function (F) is the function that evaluates the quality of the possible solutions. Crossover and mutation processes are used for developing new offspring generations. The objective is to minimize the stuck probability to be in the range of non-stuck case [0,0.4]. Therefore, the objective function of the GA is the minimization of the stuck probability by optimizing the seven input parameters to reach the characteristics of a non-stuck well. The fitness function can be formulated as Eq. (15) where the objective is to minimize the fitness function as follows:

$$ F = {\text{Minmization}}\left( {\hat{y}_{i} } \right) $$
(15)

where (F) is a fitness function and \( \hat{y}_{i} \) is the predicted classification based on extra trees model (stuck probability).

To maintain the variables within the reasonable limits, the seven input parameters have been constrained to defined boundary for each parameter. The boundary constraints have ranged for the minimum and maximum values of each parameter. Moreover, functional constraints have been added based on design criteria such as the summation of the solids % and water % not exceeding 100%. However, relatively high degree of judgment is required to logically select any combination of the seven parameters for drilling process.

Conclusion

Complications of the stuck pipe can account for approximately half of total well cost, making stuck pipe one of the most expensive problems that can occur during a drilling operation (Muqeem et al. 2012). Therefore, the key contribution of the study is to automate the classification and the mitigation of the drilling pipe stuck for the drilled wells at the Gulf of Suez (GOS). Out of 12 machine leaning algorithms, the results presented that the most reliable algorithm was the extremely randomized trees (extra trees) with 100% classification accuracy based on testing dataset. On the other hand, genetic algorithm can optimize the drilling parameters to mitigate the risk of drilling pipe stuck.

The methodology addressed in this study enables the oil and gas drilling industry in GOS to evaluate the risk of stuck pipe occurrence before the well drilling procedure. A comprehensive comparison of ML algorithms has been provided for drilling piping stuck prediction. More data mean more generalization of the trained algorithms. The key limitation of this study is the size of the collected data. However, the collected dataset is sufficient to train the classifiers and to avoid the overfitting problem. Therefore, the future research is to apply this research framework to different datasets in oil fields. The future work will rely on deep learning where deep learning is a powerful tool for pattern recognition. The big data of the drilling projects will be modeled using deep learning algorithms such as deep neural networks and convolutional neural networks.