Introduction

An operation that aims to reach porous and permeable rocks in the basement, which may contain liquid or gaseous hydrocarbons, is known as drilling. Sedimentary basins are studied geologically and geophysical to determine the location of oil drilling. These studies allow researchers to study subsurface compositions and potential deposits, but they cannot specify the existence or absence of hydrocarbon deposits (Hassan et al. 2022; Ayoub Mohammed et al. 2022; Alakbari et al. 2022). Drilling is the only method to demonstrate the hypotheses' validity and reveal the rocks' fluid content. The uncertainty as to the nature of the fluids trapped in the subsoil and the complexity of the sedimentary deposits explain the still high number of negative exploration boreholes and the significant share of boreholes in the cost of research (Chen and Guestrin 2016; Bailey et al. 2018; Zhong et al. 2022; Ayoub et al. 2022). The drilling objectives are always the same: To reach the depth desired by the geologists and to have the possibility of testing or putting the reservoir into production. Optimizing each operation's costs is necessary as the price of a drilling operation is high and negative explorations are unavoidable. Among the performance indicators of a borehole is its rate of penetration (ROP), which indicates the efficiency and speed of an operation more generally. Successfully optimizing the ROP will drastically reduce the costs of the drilling operation. Lateral, axial, or torsional vibrations directly impact this factor. These vibrations have a destructive effect on the drilling rig. In addition, they minimize the ROP and therefore increase the well's maintenance time and cost (Baarimah et al. 2022; Runia et al. 2013; Schwefe et al. 2014; Efteland et al. 2015).

To better understand the source of these vibrations, it must be understood that the drilling rig is, in principle, a motor on the surface. This motor rotates a drill string that can reach up to 5000 m in length, which gives these elements a behavior similar to that of a torsion spring. As the formations to be drilled differ, the torque applied by each of the formations is also different. This creates a discrepancy between the drill bit speed at the end of the string and the motor speed at the surface. This makes it Stick and Slip. Stick and Slip are the number one cause of drill string failure and wear. It is due to the aggressive torsional dynamics experienced by the drilling system, which behaves like a huge torsion spring (Nautiyal and Mishra 2023; Alakbari et al. 2016). These vibrations decrease efficiency, reliability, performance, and safety, which are significant aspects of deep well drilling. The inherent torsional dynamics of the drilling system primarily cause stick and slip vibrations in drilling engineering operations.

The drilling rig functions as a motor on the surface, rotating a drill string extending up to several kilometers in length. Due to variations in the drilled formations, each formation applies a different torque to the drill string, resulting in a mismatch between the desired angular velocity at the drill bit's end and the motor speed at the surface. This disparity between rotational speeds leads to stick and slip vibrations. Stick refers to the phase when the drill bit stops completely, while slip refers to the stage when the bit rotates significantly faster than the desired angular velocity. These vibrations not only diminish drilling efficiency, reliability, and performance but also pose risks to safety. Stick and slip vibrations can cause premature fatigue and wear of the drill string, leading to failures and increased maintenance costs. Understanding the causes of stick and slip is crucial for effectively mitigating these vibrations and optimizing drilling performance. Drill string vibration can cost companies millions due to decreased performance and efficiency. Thus, vibration mitigation and improved drilling performance are of substantial economic interest to the petroleum industry. At the same time, safety remains the top priority. In other words, achieving performance improvement (e.g., by reducing vibrations) implies reduced costs (Elkatatny et al. 2019; Zakuan et al. 2011; Kyllingstad and Nessjøen 2009; Craig et al. 2009). Stick and slip consist of phases in which a bit is completely stopped (sticking), followed by stages where a bit rotates several times faster than the desired angular velocity (slip). A simulation result of a drill pipe with a desired angular velocity of 50 RPM, which suffers from stick and slip vibrations. This phenomenon causes not only the failure of the bit due to premature fatigue of the drill string but also its premature wear due to its high speed. Stick and slip vibrations can be recognized in bottom and surface measurements. Modeling and analyzing these vibrations to attenuate them is an important aspect of drilling.

Approaches to vibration mitigation are classified into two major categories: passive and active. Passive techniques are grouped into three sub-categories: downhole assembly optimization (BHA), bit selection, and downhole equipment use. For example, the anti-stick and slip tools are mechanical tools intended to adjust the drilling torque automatically and reduce the oscillations to increase accuracy and efficiency while drilling (Gupta et al. 2019; Srivastava et al. 2022). Active methods are based on control techniques. The best-known approach is “Soft Torque”. This technique's idea is to rotate the surface rotary system gently. For the synthesis of this regulator, the dynamics of this system are considered in the form of a torsion pendulum characterized by two degrees of freedom (Wu et al. 2012; Forster 2011; Bailey et al. 2008). A PI regulator based on surface data can dampen this torsion mode. But these regulators cannot always eliminate stick and slip vibrations, especially in deep wells and inclined wells. Another reason for these failures is uncertainty (non-linearity) in the interaction between the bit and the rock and a difference between the type of measurements used (downhole or surface measurements). Artificial intelligence (AI) techniques are increasingly being employed worldwide, including in the oil and gas sector, resulting in a substantial advancement in their use to increase accuracy (Otchere et al. 2021a, b; Shen et al. 2017; Vogel and Creegan 2016; Greenwood 2016). Several research papers have utilized automated prediction models to forecast stick and slip vibrations, demonstrating their efficacy in creating patterns among complicated drilling parametric interactions (Zha and Pham 2018; Gupta et al. 2019; Saadeldin et al. 2022).

Srivastava (2022) proposed a hybrid machine learning model to enhance stick and slip prediction in drilling operations. They utilized two machine learning models: the least squares support vector machine (LSSVM) and the multilayer perceptron neural network (MLP-NN). Feature selection was applied to improve prediction accuracy by identifying relevant features, and a Savitzky–Golay (SG) filter was used to reduce noise in the data. The researchers employed the wrapper method in conjunction with evolutionary algorithms to enhance the performance of their models. The LSSVM-Cuckoo Optimization Algorithm achieved the highest prediction accuracy of 0.94 among the hybrid models developed. Additionally, Srivastava et al. (2022) employed an artificial neural network to predict stick and slip problems in offshore oil fields. They divided the datasets into training (60%), validation (20%), and testing (20%) sets and created databases with input features for bit bounce and bending vibrations for data normalization. The application of feature selection led to the passage of the Levenberg–Marquardt function for training the dataset, resulting in high accuracy. Their results showed accuracies of 0.96 and 0.93 for bit bounce and bending vibrations, respectively. Saadeldin et al. (2022) conducted a study to explore the potential of machine learning in predicting stick and slip occurrences during drilling operations. They focused on mining and analyzing drilling data to develop accurate prediction models. To reduce the dimensionality of the dataset, they employed a feature ranking method and selected 18 out of 23 studied parameters as input variables for stick and slip prediction. This approach improved computational processing time and enhanced the efficiency of the dataset. The results showed that the Gaussian kernel support vector machine (SVM) achieved a high level of accuracy with a maximum value of 0.92. The findings of this research paved the way for further investigations, including using different machine learning models that incorporate geological and operational parameters to improve stick and slip predictions.

In their study, Zhong et al. (2022) proposed a new technique for predicting stick and slip vibrations by combining modular neural networks (MNN) with particle swarm optimization (PSO). To improve the model's efficiency, they utilized data normalization to transform the data into a range of 0 to + 1. The dataset was divided into 60% for training, 20% for validation, and 20% for testing. The results showed that incorporating PSO led to more accurate predictions by optimizing parameter variations. It was observed that the choice of machine learning algorithm and input variables varied among different studies focusing on predicting drilling stick and slip vibrations. Supervised machine learning models are only as efficient as the information they are trained with; therefore, when irrelevant data is included as input, the model's performance suffers (Otchere et al. 2021a, b). As such, there is a need to develop a workflow capable of determining the most appropriate features, which involves engineering new features from existing input variables (Otchere et al. 2021a, b; Nautiyal and Kumar Mishra 2023). These new features should be capable of improving model performance compared to when they are inputted singularly. Again, the choice of machine learning models should not be a hindrance if a more robust model appending all robust models into one super machine learning model is used. This tends to mitigate the inherent problems of each model whiles complementing each other’s strengths. This study uses data recorded from measurements while drilling (MWD) in offshore oil fields on the Norwegian continental shelf. The input features and their importance will be assessed by model-agnostic metrics for predicting stick and slip. Once a suitable dataset of relevant features is established, several machine learning models will be developed to predict drill string vibrations. The model with the most accurate results will be optimized using the Bayesian Optimization (BO) algorithm. Our primary objective in this study is to examine and analyze the impact and influence relating to individual features that will generate novel insight into their generalization of explaining the target. A major contribution of this research is the global explanation for variables in stick and slip prediction using explainable AI and developing a machine learning approach to enhance the prediction accuracy for drill string vibrations. This study aims to develop a new approach for real-time prediction of stick–slip vibrations in drilling operations, optimizing drilling parameters, improving drilling performance, and enhancing productivity and safety in the oil and gas industry. We achieve this by accurately identifying stick and slip occurrences using machine learning techniques and incorporating downhole sensor data and a drill string model. The proposed method holds significant potential for integration into automatic driller systems on offshore drilling rigs, leading to more efficient and sustainable drilling practices. This research uses machine learning models to introduce an innovative method for predicting stick and slip vibrations in drilling operations. The approach combines model-agnostic regression models and Bayesian Optimized Extra Trees, improving accuracy with the lowest error metrics. We can effectively reduce stick and slip severity and optimize penetration rate by applying this new model to automatic driller systems in offshore drilling rigs on the Norwegian continental shelf. This innovation can revolutionize drilling practices and enhance the industry's efficiency and sustainability.

Strengths of the proposed method

In this article, the authors have successfully introduced a groundbreaking method for predicting stick and slip vibrations in drilling operations using machine learning models. The combination of model-agnostic regression models and Bayesian Optimized Extra Trees sets this research apart, bringing a novel and powerful approach to address this critical aspect of drilling operations. The strengths of this article lie in several key areas:

Firstly, the robustness of the dataset used in this research—comprising over 78,000 data points from one of the Norwegian continental shelf oil fields—provides a strong foundation for validating the effectiveness of the proposed method. The sheer volume of data ensures that the model is well-trained and tested, increasing its reliability and applicability in real-world scenarios. Secondly, the impressive accuracy of the Extra Trees (ET) model, demonstrated by its low error metrics (r/min MAE and r/min RMSE), highlights the potential of this method to outperform existing techniques in predicting stick and slip vibrations. This improved accuracy is crucial in enabling more efficient and cost-effective drilling operations, ultimately benefiting the oil and gas industry as a whole.

Additionally, the practical implications of the proposed method cannot be overstated. By integrating this model into automatic driller systems in offshore drilling rigs, operators can optimize the penetration rate, reduce stick and slip severity, and increase overall drilling efficiency. These benefits will contribute to reduced non-productive time, lower operational costs, and a decreased environmental impact, showcasing the immense value of this research. Lastly, the article presents a clear and concise explanation of the methodology, making it accessible to a wide range of readers, from industry professionals to academic researchers. As a result, it establishes a solid foundation for further exploration and development of this innovative approach to predicting stick and slip vibrations in drilling operations.

In conclusion, the proposed method's strengths are its accuracy, applicability, and potential impact on drilling optimization and automation. This research marks a significant milestone in enhancing the understanding and management of stick and slip vibrations, paving the way for continued advancements in offshore drilling technology.

Background of machine learning regression models

This research stems from earlier studies where several workflows and techniques were used to predict stick and slip using available input variables. As academics increasingly move away from empirical correlations, machine learning has become entrenched. Individual drilling parameters offer critical information concerning drill string vibrations. However, when some subsets are used, they can predict stick and slip more accurately. Researchers sometimes employ the Pearson correlation coefficient to identify significant variables, while others rely on wrapper approaches, intricacy techniques, and algorithms that use metaheuristics to identify powerful features. This indicates that no definitive collection of features exists for making this prediction. However, a robust workflow is needed to utilize model-agnostic techniques to solve this prediction problem. This study investigates the correlation among the most crucial input-relevant variables in predicting vibrations caused by sticks and slips in oil well drill strings. The relevant features selected by the model-agnostic techniques will then be input features. Based on their learning theory and ability to work with high-dimensional and complex data, several machine learning models will be used to predict stick and slip. The reviewed models perform differently depending on learning theory, dimensionality, small or large data, and different data distributions. Identifying a suitable model that can handle all these problems has become necessary since drilling data are commonly highly dimensional, small or large, with different data distributions.

Based on these assertions, this study establishes relevant features that can lead to lower prediction errors. Improved accuracy in estimating stick and slip vibration will have a massive influence on field operations and the overall integrity of the well. Hence, although dependent on data type, machine learning models can be applied to drilling operations with similar input features. The most common issue with machine learning models and their varying performance is centered on data. Having the capability of explaining the input variables and determining causation has been the pinnacle of this study as more innovative approaches are created to solve this issue. The slightest gain in accuracy is critical to improving the decision-making process within the petroleum business, thus indicating the importance of this field for advancing research. The summary analysis of the reviewed models used in this study is presented in Table 1, which can be found as follows.

Table 1 An overview of the algorithms described in this research and the corresponding authors

Data collection and data description

This research's main objective focuses on improving the prediction of stick and slip using data from the offshore oilfield on the Norwegian continental shelf during drilling operations. Fields located in the North Sea, near the southern end of the Norwegian sector, are described as fault block structures with approximately 173 million barrels of original oil in place (OOIP). Reservoirs are small dome-shaped structures formed when adjacent salt deposits collapsed over the Middle of the Jurassic (Geurts et al. 2006). The data consists of 78,000 data points with 23 features compiled from resources such as measurement while drilling (MWD), daily drilling reports (DDRs), final drilling well reports, and literature. However, after removing missing data, 49,000 points remained. Table 2 shows the description of real-time MWD measurement records in this study. Also, Table 3 shows the descriptive statistics of some selected features. The main input features for this research are;

  1. a.

    Real-time downhole drilling data Stick and slip indicator, collar rotational speed, turbine revolution per minute, transverse shock risk, ARC shock level, Isonic shock, power drive shock risk, equivalent circulating density, annular pressure, and annular temperature.

  2. b.

    Real-time surface drilling data Surface weight on bit, torque, the total flow rate of all active pumps, and pump pressure.

  3. c.

    Formation characteristics Gamma-ray, formation type, bit depth, and penetration rate.

Table 2 Description of real-time MWD measurement records used in this study
Table 3 Summary statistics of input and target characteristics

One of the main advantages of applying data analytics and machine learning to data is finding patterns and hidden information in high-dimensional data. As such, some input variables will be deemed irrelevant in predicting stick and slip vibrations. Table 4 presents the typical lithology observed in the wells of the oil fields located on the Norwegian continental shelf. For this study, the lithology was encoded into numeric variables ranging from 1 to 19, illustrated in Table 5. The coordinates for each well were also removed.

Table 4 General lithology in oil field wells of the Norwegian continental shelf (Frankiewicz 2019)
Table 5 Numeric codes are assigned to formations in the intended oil field on the Norwegian continental shelf

Methodology

Data analysis and visualization

To aid in the understanding of how the input features relate to the output, data analysis and visualization were used. The degree of relationship between input and output variables was quantified using the Spearman rho matrix of covariance. To study linear and nonlinear relationships, we used Pearson's coefficients, Kendall correlations, as well as Spearman's rank coefficients based on Eqs. (1), (2), and (3).

$$r = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \overline{x}} \right)\left( {y_{i} - \overline{y}} \right)}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \overline{x}} \right)^{2} } \sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - \overline{y}} \right)^{2} } }}$$
(1)
$$\tau = \frac{{n_{c} - n_{d} }}{{\frac{1}{2}n(n - 1)}}$$
(2)
$$r_{s} = 1 - \frac{{6\mathop \sum \nolimits_{i = 1}^{n} D^{2} }}{{n\left( {n^{2} - 1} \right)}}$$
(3)

In Eq. (1) \(x\) and \(y\) are our samples, and the variables \(\overline{y }\) and \(\overline{x }\) are an average between samples, and \(n\) is a sample size. \({n}_{c}\) and \({n}_{d}\) indicate the number of concordant and discordant pairs, as given by Eq. (2). Furthermore, Eq. (3) specifies D as the difference between the ranks of the two groups and n as their volume. Pearson, Kendall, and Spearman coefficients are correlated more strongly if they are close to \(+1\) or −\(1\). In contrast, the input parameters are independent if these coefficients are close to zero. Due to this independence of input parameters, more information can be incorporated into the models as a result of selecting features that integrate more information. Knowing the difference between these two is important, and that correlation does not imply causation. This Heatmap of Fig. 1 shows the features that have a moderate correlation to the target. Figure 2 shows stacked horizontal bar charts for different correlation coefficients. Some features correlate to torsional vibration, including shock peak, equivalent circulating density, and Isonic shock. In addition, there is a slight correlation between total pump flow, annular temperature, and bottom turbine revolutions related to stick and slip. The pair plot also depicts the nonlinear distribution between some of the input and the target. Figure 3 shows that none of the features correlated linearly with the target based on the nonlinear distribution. Therefore, nonlinear models were deemed appropriate for this research.

Fig. 1
figure 1

Heatmap to assess the relationships between the input variables and targets; a Pearson correlation, b Spearman correlation, c) Kendall correlation

Fig. 2
figure 2

Stacked bar chart for the different correlation coefficients

Fig. 3
figure 3

Pair plot distribution of some variables color-coded against the formation type

Application of machine learning models

The cross-validation approach involves resampling input data into multiple groups to evaluate its predictive performance and minimize prediction errors on unknown data (Otchere et al. 2021a, b). This study employed the holdout cross-validation approach, where randomly partitioned data was divided into 85% for training and 15% for testing, with 7,350 data points dedicated to testing. This approach ensures that separate data points, not seen by the trained model, are used to validate the model, preventing overfitting and selection bias. This research aimed to identify a model capable of predicting stick and slip vibrations with minimal errors. Consistency was maintained by using the same out-of-sample data throughout the study, ensuring that similar data points were used across all models.

The hyper-parameter tuning process was carried out using the Bayesian Optimization (BO) algorithm to improve the performance of the final model. BO is a model-based sequential optimization technique that utilizes the Bayes Theorem to explore the extrema of objective functions efficiently. It involves iteratively training a probabilistic Bayesian approximation of the target model based on prior outcomes approximation. The probabilistic model, which relates hyper-parameters to the objective function's probability score, is then evaluated using an acquisition function to select the best measurements for optimizing the accurate parameter. Hyper-parameter tuning aims to update the model parameters that control accuracy and performance during the training process, thereby optimizing the model's performance by minimizing the cost function (Otchere et al. 2022a, b, c).

Explainable AI

This research will employ model-agnostic methods to assess the significance of input features and provide explanations independent of the machine learning model used. The objective is to identify features that contribute meaningfully to the model's predictions, regardless of their type or nature. Two techniques, namely permutation feature importance (PFI) and Shapley values, will be utilized to analyze the results generated by the models. These techniques require the model, feature vectors, target variable, and error metrics as inputs. By applying PFI and Shapley values, the features that fail to adequately explain the target variable will be identified and removed from the input feature vector.

Permutation feature importance

The Permutation Feature Importance (PFI) approach, as described by Otchere et al. (2022a, b, c), is a model-agnostic method used to assess the significance of features. It works by permuting the association between a variable and the model result, assigning random values to each feature. The importance of a component is determined by measuring the increase in estimation error after permuting the feature multiple times. If shuffling the parameters leads to higher errors, the feature is considered important, whereas if the error remains constant, the feature is deemed unimportant. The PFI method offers advantages such as straightforward interpretation, providing a global overview of a predictive model, and not requiring retraining. However, whether training or test data should be used to measure PFI is unclear. The permutation importance function in Scikit-Learn (Pedregosa et al. 2011) can be used for PFI analysis, calculating the feature importance scores based on specified criteria, with higher values indicating better predictive power.

Shapley values

A Shapley value, introduced by Shapley (1997) from the theory of cooperative games, serves as a critical means of measuring the importance of each feature and its contribution to a model's performance. Shapley values work by allocating among participants the entire surplus value created through the coalition of all participants in each cooperative game. According to this theory, four properties must be met: efficiency, symmetry, dummy, and additive properties. The players represent the input features used in machine learning and interpretability, and the Shapley value determines how each component influences the outcome. Knowing how the features contribute to the model output helps understand the essential factors in generating the model's results. The Shapley value is not dependent upon any particular model kind; hence, it applies to any model type and structure.

In contrast to other methods of interpreting model results, Shapley's values are based on a rigorous conceptual framework. While intuitive reasoning is essential for interpretability, other methods lack the same rigid conceptual framework. Shapley values have the advantage of being evenly distributed among an instance's feature values. It has been suggested that the Shapley decision may provide the only way to offer detailed and complete clarifications in cases where the law mandates it. This is the right to explanations. Shapley, like any other approach, has inherent drawbacks. In most real-world scenarios, it is only possible to calculate an approximate solution because it is computationally expensive. It is also easy to misunderstand. Shapley’s value does not reflect the difference between a model's expected and actual values when the feature is removed from the training process. How much does an input value contribute to the real and average prediction gap?

The open-source SHAP library works well with SHAP values and other metrics. SHAP establishes a link between efficient credit allocation and local explanations, making it model-agnostic. Doing so brings together optimal credit allocation and reasons at the local level. The use of game theory's traditional sharp values and their associated expansions has been the focus of numerous recent works. Recent academics have expanded on this notion by developing approaches incorporating SHAP values into deep learning models and gradient explainers. These approaches can estimate SHAP values for any model using specifically weighted local linear regression. It also provides various charts to help view the data and comprehend the model. This study does not include the mathematical foundations of the models utilized because there is a vast collection of published material detailing them. An overview of the methodology and workflow applied to this research is shown in Fig. 4.

Fig. 4
figure 4

A flow chart summarizes the procedure and methodology used in the research

Results and discussion

Model performance evaluation

These models employed in this study were evaluated using an Akaike Information Criterion (AIC) to establish which model provided the most optimal fit concerning the data analysis. The assessment of the results shows that lower AIC values have a higher probability of relevant data than larger AIC. The following factors were considered when comparing and selecting the most suitable model. A difference in AIC findings across models is considered insignificant if it is less than 10, moderate between 10 and 50, significant between 50 and 100, and strong if above 100. The results show that the extra tree algorithm was superior and most accurate for stick and slip vibration predictions. There was a difference of ~ 932 AIC when the extra tree model was compared to the random forest model representing the two models with the lowest AIC (Fig. 5). Hence, the extra tree model showed a strong disparity and improved likelihood of fit compared to the other models. Even though AIC can calculate the relative quality of a model in terms of its convergence with the actual data, the model's bias for overfitting or under-fitting remains.

Fig. 5
figure 5

AIC results comparing all the models on the test data

The models' performance was assessed based on the out-of-sample data results. Since the models have been trained with the training data, any attempt to reproduce the values will likely result in high accuracy. High training accuracy, however, is not always desirable since it may overfit the data, collecting intrinsic noise and yielding a non-generalized model. Test accuracy based on non-sample data refers to the possibility of accurate predictions made by algorithms based on variables that have not yet been seen or observed. Since the model aims to improve unseen data, models must have high test accuracy. Correlation coefficient (R2) values greater than 0.85 in training and accuracy test scores were deemed an acceptable trade-off over biases and variance for this work. Table 6 shows that the Extra Tree model had the highest R2 on the test data for predicting stick and slip. Figure 6 compares all of the model estimates to actual results.

Table 6 Training and testing the coefficient of correlation score for each model used in this study
Fig. 6
figure 6

Cross-plot of predicted vs. actual stick and slip for all models based on test data

Any meaningful gain in accuracy, and hence a reduction in inaccuracy, achieved by machine learning models have a major influence on making decisions. A result of the models employed in the study produced mixed performance results which were heavily influenced by both their theoretical and statistical underpinnings. Figure 7 depicts the models' performance as a cross-validation evaluation to analyze their exactness, efficiency, and reliability. Based on each model's root mean square error (RMSE) findings, it can be concluded that the Extra Tree model evaluation is the most reliable compared to the real stick and slip values. The Mean Absolute Error (MAE) also suggests that the Extra Tree model is the most accurate for predicting stick and slip vibrations. In selecting the most suitable model, the ranking features used in this study are MAE, RMSE, AIC, and R2. The performance of the extra tree model is attributed to the bias-variance concept used to build this model. When the cut-point and features are explicitly randomized, coupled with ensemble averaging, the algorithm can significantly reduce variance compared to weaker randomization techniques employed by several algorithms. Aside from having similar advantages as Decision Trees based on consistency for universal generalization and approximation, Extra Trees also provide resilience regarding gross model errors. This is because outliers affect its predictions slightly and locally. When the input features are significantly more than the random splits, Extra Trees potentially outperform Decision Trees in terms of computational efficiency while resisting irrelevant input features.

Fig. 7
figure 7

Comparison of test data RMSE and MAE prediction errors for all models

Model agnostic results

Permutation importance based on XGBoost has been computed to estimate the variables' significance and analyze the influence of multicollinearity among input parameters. Accordingly, the results in Fig. 8 show that the collar rotational speed, annular temperature, and gamma-ray highly correlate to predict stick and slip variations. The results indicate that permuting an input feature reduces the model's accuracy by, at most, 0.125, suggesting that some of the inputs are relevant. However, Isonic shock, transverse shock risk, ARC shock level, rate of penetration, and pump pressure do not offer any improvements to the model prediction. A comparison is made between this result and the result of the XGBoost algorithm presented in Fig. 9. A model-agnostic approach can be used to infer causality, despite the discrepancy in the result. This is necessary because correlation does not mean causation; hence, feature selection techniques based on correlation may not entirely improve model performance.

Fig. 8
figure 8

Permutation importance plot

Fig. 9
figure 9

XGboost feature importance to the target features

Analysis of features using agnostic model metrics

An illustration of the PFI results for the input features in the entire data set is shown in Fig. 10. The final model prediction was made using the same train/test split as in the previous study to maintain consistency. As a result of the initial testing, R2 using this Random Forest (RF) model reached 0.94. The importance score is calculated so that higher values indicate higher predictive power. A significant part of the accuracy score 0.94 is attributed to the value of the most relevant features. According to the results, some input variables are significantly underweighted. According to these outcomes, we can conclude that a very limited number of variables have significant predictive power. This result indicates that some of the characteristics are more relevant than others. Based on the high test accuracy computation, this outcome follows the results.

Fig. 10
figure 10

Permutation features the importance of the input variables

The initial permutation importance calculated on the training data shows the model dependency on each feature during training. However, it is vital to continue to mention that this analysis is done for both the training and testing datasets. This is to help account for features that may help with the generalization power of the model. If a feature is considered important for the train set but not for the test set, the model will likely overfit. Sensitivity analysis was performed to select the relevant features for further prediction. The RF model was used to predict stick and slip vibrations using the top 24–18 features, and the result is illustrated in Table 7. The results indicate that the optimal number of features to use in predicting stick and slip is 22.

Table 7 Selection of input features based on accuracy performance

Analysis of features using shapley values model agnostic metrics

The extra tree model was used to predict drill string vibrations using all the features. The same training and testing datasets used in the PFI analysis were used. Since machine learning models are supposed to be interpretable to help understand how a model made a particular prediction, model-agnostic metrics are helpful. Shapley's values originated from game theory, and it is necessary to clarify its application to supervised machine learning interpretability. A game represents the prediction assignment for a specific event in the dataset. The gain is calculated by comparing the actual forecast for a particular instance to the average estimates applied to all the cases. A given instance's feature values collaborate to achieve the gain in the stick and slip forecast. Figure 11 visualizes the shapely values as absolute values in a feature importance plot. In shapely feature importance, the most important features are those with high absolute values. The average mean of the Shapley values for each feature across the data is computed to evaluate the global importance. The collar rotational speed, annular temperature, and gamma-ray feature variables were highly correlated, with gamma-ray playing the most significant role and affecting the target by an average of 34.06 r/min on the x-axis. The plot also indicated that the correlation of formation type, bottom turbine revolutions, and ARC shock level to the target was low, with the inclusion of ARC shock level, disagreeing with PFI results.

Fig. 11
figure 11

A measure of the importance of a SHAP feature is the mean absolute Shapley value

Based on this result, further investigation was necessary to ascertain which features were irrelevant to the target. The importance of Shapley values is an alternative to the significance of permutation features. Both importance measures have significant differences. The decline in model performance determines the permutation feature's relevance, whereas Shapley values are primarily determined by the magnitude of the features' provenances. In terms of feature importance, the plot is informative; however, it provides few other details outside of this significance. The bee swarm technique, a more informative plot, is used for analysis.

Figures 12 and 13 visualize the shapely values as absolute values in a bee swarm summary plot for train and test data, respectively. The y-axis determines a feature, and the x-axis defines a Shapley value. The feature importance with associated effects is combined into global explanations whereby a feature matrix of Shapley values is achieved for every instance. According to the summary plot, the first indications of a positive and negative correlation between the value of a feature and its impact on a target are identified. All the data points, made up of all the training data points, overlap and are scattered on the y-axis. This type of visualization depicts how Shapley values are distributed across features. Feature values are arranged in descending order of significance. From the plots, low feature values are represented in blue, while high feature values are denoted in red. Analyzing the influence of gamma ray and annular temperature, it is observed that low values predict high stick and slip values, whereas high values predict low stick and slip values. The distribution of the gamma-ray data points also indicates a global explanation that explains the entire model's behavior. This analysis confirms that this correlation alone cannot be termed causality. As a result, to fully understand the influence of other variables on stick and slip prediction, a model-agnostic approach is needed. From the bee swarm plot, it can be observed that about 22 features can globally explain how the predictions were made.

Fig. 12
figure 12

A bee swarm summary plot of the effect of SHAP values on a model target using the train data

Fig. 13
figure 13

A bee swarm summary plot of the effect of SHAP values on a model target using the train data

The results of the bee swarm plot confirmed that features that have high importance in both the train and test results exhibit their significance in the global explanation of the target. From the demonstrated results, the following analysis was derived:

  1. 1.

    Feature importance: The features are ranked in descending order, and torque and bit depth are ranked first from the train and test data plot. The formation type feature has close to zero importance because it does not have any causal effect on predicting stick and slip.

  2. 2.

    Impact: There is a negative correlation between torque and bit depth, with a high prediction effect in general, as shown by the horizontal location of the data points.

Feature values based on Shapley values are based on a solid theoretical basis in game theory and compute feature values reasonably distributed to the target. This method also contributes to the unification of linguistic learning. Fast computations allow multiple Shapley values to be calculated for global model interpretations. Because Shapley values represent a small element of global variations, those interpretations are compatible with local explanations.

Evaluation of top features

Further analysis was performed using the top 5 performing models for the top 22 features based on PFI and Shapley values. This was done to see if they would outperform the initial model predictions. PFI resulted in the removal of formation type and Isonic shock. While Shapley values resulted in the reduction of formation type and bottom turbine revolutions. Table 8 provides a summary of the results of the research. All three top-performing approaches improved when new features based on PFI and Shapley values were used. Although Shapley values selected the most relevant features, most of the enormous improvement in model accuracy can be seen.

Table 8 Computed the accuracy of the test data using the top 5 algorithms

Model optimization

This study employed the Bayesian optimization (BO) technique to tune specific model parameters to improve model performance. In developing our Bayesian optimized extra trees (ET) model, a range of hyper-parameters were carefully selected and tuned to optimize the algorithm's performance. A comprehensive table detailing the chosen hyper-parameters, their descriptions, and their respective values can be found in Table 9. This table provides a clear understanding of the algorithm's configuration. It allows for easy replication of our methodology. By referring to Table 9, readers can gain insights into the intricacies of our model and understand how the selected hyper-parameters contribute to the prediction of stick and slip vibrations in drilling operations.

Table 9 Hyper-parameters of Bayesian Optimized Extra Trees (ET) Algorithm

The results in Fig. 14 show the model performance of the extra tree model using the top 22 features identified by Shapley values and the BO_ET model. The results were compared to those of Srivastava et al. (Srivastava et al. 2022). The authors used the same data as this study but employed different techniques and models for predicting stick and slip vibrations. Notably, although machine learning models depend, most data are expected to be identical. Consequently, this comparison focuses on the input features and how they explain the target rather than downgrading the other researchers' results.

Fig. 14
figure 14

Comparing the estimation errors of top-performing models

From the error analysis performed, it was observed that the features selected based on the Shapley values greatly reduced the model's error. The initial extra tree model recorded an MAE of 5.50 r/min, while the MAE from the Shapley selected features was 1.E-06 r/min. This result represents about a 95% reduction in MAE. Similarly, the RMSE for the extra tree model using all 24 features was 9.9672 r/min. The RMSE from the Shapley selected features was recorded as 4.4616 r/min. Again, this represents about a 99% reduction in this error metric. The extra tree was further improved by hyper-parameter tuning where BO_ET recorded MAE and RMSE values of 0.002156 as well as 0.024495 r/min, accordingly. Moreover, the superiority of the extra tree model can largely be attributed to the bias-variance concept used to build this model, which makes it resilient to outliers. Based on all evaluation criteria, it can be concluded that Shapley selected parameters are highly relevant, offers a global generalization of the target, and improve model efficiency.

Sensitivity analysis

In Fig. 15, we demonstrate the kernel density estimates for predicted and actual stick and slip results. A blue line indicates predicted values, while a purple line indicates actual values. As a result of observing the testing data, the BO_ET has a much higher degree of agreement with the real data. Specifically, this investigation indicates that the BO_ET can capture a wider range of values than the other algorithms used in this study. Consequently, this observation makes it suitable for evaluating stick and slip vibrations in other wells using the relevant drilling parameters. As a result, the sensitivity analysis verifies the assessment metrics outlined above.

Fig. 15
figure 15

Kernel density estimation for BO_ET model stick and slip prediction demonstrating the closeness of projected values to real values

In this study, explainable AI was demonstrated using model-agnostic metrics. Since correlation does not mean causation, simple statistical techniques may not be able to indicate the causal effects of each input on the target. The findings show that gradually increasing the input variables above 22 causes a degradation in the model's accuracy. Based on Shapley values, formation type and bottom turbine revolutions are among the variables that have no direct influence on stick and slip vibrations. Hence, it should not be used to train the model. Using relevant causal features, model-agnostic approaches have provided some trustworthiness and useful solutions for drilling-string vibration prediction. The ability to adequately predict stick and slip before drilling gives vital information to help choose optimal RPM and WOB for suppressing such a phenomenon. As a result, forecasting stick and slip can reduce costly downtime, including downhole motor failures, twisting off, and damage to drill bits (Fig. 16). Predicting stick and slip vibrations can give valuable insight into the type of interventions required to be implemented.

Fig. 16
figure 16

BO_ET versus actual stick and slip per formation type (R2 = 0.9999)

Assessing results validity in stick–slip prediction models for oil fields

Regarding the validity of the results, we point out that the test set was kept separate from both the training and validation sets throughout the entire procedure. It was essential to determine the models' performance on unseen data and compare them to one another. This is discussed in Sect. 4.2. However, it is critical to note that test accuracy varies significantly within the individual wells of the test set. This is despite our efforts to test the model as accurately as possible. In this regard, the findings might have differed had the data been split differently. Additionally, the data itself had a significant impact on this paper. The data sets for the desired oil field were carefully chosen. This is discussed in section "Data analysis and visualization". The data sets for the desired oil field were carefully selected for accurate generalization.

Because the model was not trained on the test set, it could not predict torsional vibrations. This is because there was a stick or slip in the test set. As stated in the paper, we used data from the Norwegian Petroleum Directorate (NPD), which can be regarded as a valid source of information. There are various sources and periods from which the data were reported to NPD, which may result in the differing quality of the reported data. Since the equipment used for data collection has evolved and improved over the years, the time aspect of data collection is relevant. In addition, differences in data collection processes between companies may also have affected the results. This paper presents artificial intelligence (AI) learning algorithms that can only be applied to predict stick and slip based on the data given to them. Humans can determine the right stick and slip formations using their experience and possibly other measurements. A machine learning algorithm will detect false patterns if fed erroneous data.

Recognizing and overcoming limitations

Although our research demonstrates promising results regarding predicting stick–slip vibrations in drilling operations using machine learning, it is important to acknowledge some potential limitations of this study. By addressing these limitations, we can improve the validity of our findings and identify areas for future research.

  1. 1.

    Limited data: The model's performance depends on the quality and quantity of downhole sensor data and drill string model information. The availability of comprehensive and accurate data is essential for the model to make accurate predictions. However, if the data is insufficient or of low quality, it may hinder the model's performance.

  2. 2.

    Generalizability: Our model may be limited to the specific drilling operations or conditions it was trained on, reducing its applicability across different geological formations or drilling operations. Further testing and adaptation may be necessary to ensure the model's effectiveness in diverse scenarios.

  3. 3.

    Model complexity: Our study's Bayesian Optimized Extra Trees approach may be computationally expensive. This could limit the model's applicability in real-time drilling operations or where computational resources are constrained. Future research could explore optimizing the computational efficiency of the model.

  4. 4.

    Comparison with other models: While our model outperformed several reported models, a more extensive comparison with a wider range of existing models would provide greater insight into its true value and relative performance. This could also highlight potential areas of improvement for our model.

  5. 5.

    Implementation challenges: Integrating our model into automatic driller systems could encounter hardware limitations, integration difficulties, and resistance from industry practitioners reluctant to adopt new technologies. Future work should address these challenges and facilitate the model's adoption in practical settings.

In conclusion, recognizing and addressing these limitations can further refine our model and enhance its applicability in predicting stick–slip vibrations in drilling operations, ultimately contributing to more efficient and cost-effective drilling processes.

Conclusions

  1. 1.

    This research introduces a novel method for predicting stick and slip vibrations in drilling operations using machine learning techniques, specifically model-agnostic regression models combined with Bayesian Optimized Extra Trees. The approach demonstrates significant innovation in addressing downhole vibrations and holds the potential to enhance drilling efficiency and safety in the oil and gas industry.

  2. 2.

    Various data analysis and visualization techniques were employed to quantify the relationships between input and output variables, revealing both linear and nonlinear correlations. The selection of input features based on metrics such as MAE, RMSE, AIC, and R2 aided in identifying the most relevant variables for accurate stick and slip predictions.

  3. 3.

    The Extra Tree model outperformed other models, achieving superior accuracy in stick and slip vibration predictions as indicated by lower AIC values and higher R2 scores. The model's performance was validated using out-of-sample data, emphasizing its reliability and ability to generalize to unseen data.

  4. 4.

    Integration of downhole sensor data and a drill string model enabled the model to accurately identify stick and slip occurrences in real-time, providing opportunities for optimizing drilling parameters such as revolutions per minute (RPM) and weight on the bit (WOB). By mitigating stick and slip severity, the proposed method improves the rate of penetration (ROP) and contributes to enhanced productivity and safety in the oil and gas industry.

  5. 5.

    The research highlights the significance of feature selection and the importance of relevant input features. Collar rotational speed, annular temperature, and gamma-ray exhibit high correlations with stick and slip vibrations prediction. On the other hand, variables such as Isonic shock, transverse shock risk, ARC shock level, rate of penetration, and pump pressure have limited impact on the model's prediction.

  6. 6.

    The methodology employed in this research, including permutation feature importance and Shapley values, provides insights into the relevance and impact of input variables on stick and slip predictions. The use of Bayesian Optimization for hyper-parameter tuning enhances the performance of the Extra Tree model, resulting in even lower MAE and RMSE values.

  7. 7.

    The proposed method demonstrates its superiority over existing models reported in the literature through rigorous testing on data from a Norwegian oil field. The potential integration of this method into automatic driller systems on offshore drilling rigs offers real-time identification and management of stick and slip vibrations, contributing to improved drilling efficiency and safety in the oil and gas industry.

  8. 8.

    The adoption of machine learning techniques, as demonstrated in this research, presents the potential for solving complex problems in the oil and gas sector. The reduction in the need for human operators to identify and manage stick and slip vibrations saves time and resources while minimizing drill string failures and associated costs.

  9. 9.

    By embracing data-driven technologies and integrating this approach into existing drilling systems, the oil and gas industry can transform its practices, leading to improved performance, cost savings, and positive environmental impacts. The proposed method's ability to optimize the penetration rate contributes to more sustainable drilling operations by reducing energy requirements.

  10. 10.

    This research provides a comprehensive analysis of the proposed method, its advantages over existing models, and the potential practical implications for the oil and gas industry. The findings emphasize the innovative nature of the approach and offer valuable insights for decision-making and future advancements in machine learning in the oil and gas sector.