Machine learning-based rock characterisation models for rotary-percussive drilling

Vibro-impact drilling has shown huge potential of delivering better rate of penetration, improved tools lifespan and better borehole stability. However, being resonantly instigated, the technique requires a continuous and quantitative characterisation of drill-bit encountered rock materials in order to maintain optimal drilling performance. The present paper introduces a non-conventional method for downhole rock characterisation using measurable impact dynamics and machine learning algorithms. An impacting system that mimics bit-rock impact actions is employed in this present study, and various multistable responses of the system have been simulated and investigated. Features from measurable drill-bit acceleration signals were integrated with operated system parameters and machine learning methods to develop intelligent models capable of quantitatively characterising downhole rock strength. Multilayer perceptron, support vector regression and Gaussian process regression networks have been explored. Based on the performance analysis, the multilayer perceptron networks showed the highest potential for the real-time quantitative rock characterisation using considered acceleration features.


Introduction
Drilling and exploration accounts for more than 50% of the total expenditures of the oil and gas industry [1]. The advantages presented by rotary-percussive drilling in regards to reducing the expenditures have made it of high research interest, in order to further improve and optimise it. The benefits of rotary-percussive drilling in terms of increased rate of penetration (ROP) and increased tools lifespan are well documented in literatures [2,3]. The percussive impacts exhibited by the technique exert greater load on the rock material, hence, better fracturing compared to the conventional rotary drilling. The intermittency of the percussions reduces the bit-rock contact time to about 2% of the entire drilling time [4] and also produces a reduced weight on bit (WOB) compared to conventional rotary drilling. The reduced contact time and WOB help to reduce the tearing and wearing of the drill-bit, thus prolonging its lifespan. The rotary action of the technique on the other hand supports its usage for deep well drilling and also increases its material removal rate compared to the ordinary percussive drilling which often re-compacts previously fractured materials that are not removed on time.
The development of rotary-percussive drilling has progressed from those that use eccentric weights [5], pressurised fluid from a multivalve or servovalve [6], friction-induced vibrations [7] and piezo-electric vibrations [8] to those that use resonance [9,10] to impart percussions on the drilling system. A recent development is the vibro-impact drilling (VID), see Fig. 1a, also referred to as the resonance enhanced drilling [11]. The system uses the resonance between the drill-bit assemblage and the drilled rock formation to generate high frequency, low amplitude periodic impacts at the drillbit. The impacts which are axial to the drilled rock formation alongside existing drill-string rotation form the rotary-percussion system. A major concern at the initial stage of the VID development was the issue of how to instigate vibrations on the drill-bit at depths downhole without affecting the rest of the drill string. As a solution, an electrically controlled piezo-electric or magneto-restrictive displacement transducer was used to generate high-frequency axial vibrations of about 1 kHz and 1 mm maximum amplitude [9,12]. A vibrotransmission connects the transducer to the bit, while a vibro-isolator unit connects the whole drilling module to the drill-string. The vibro-transmission effectively transmits the vibrations from the transducer to the drillbit while the vibro-isolator helps to isolate the drilling module from the drill-string, allowing the vibration energy to be concentrated on the bit with minimal effect on the drill-string.
Derivable benefits of VID have been highlighted to include increased ROP and borehole stability, reduced tool wearing and non-productive time, reduced hazards and increased personnel safety due to controlled fracturing, and reduced environmental footprint and emissions due to the reduced time on site [13]. However, an issue of interest is the inhomogeneity of downhole rock layers which makes it difficult to maintain resonance during the entire drilling procedure. As the drilling progresses, the rock material as well as the required resonance conditions continue to vary. Nonetheless, while investigating strategies for maintaining resonance in the VID system as it cuts through inhomogeneous downhole rock materials, Wiercigroch [12] proposed a rock strength estimation strategy. Estimating the rock strength is useful for defining the ranges of parameters on which the VID system can be operated to ensure resonance with its surrounding rock material. Defining the ranges of parameters is also helpful in ensuring that generated radial cracks propagate just ahead of the bit and that they do not extend more than necessary such as can compromise the stability of the drilled hole. Different methods are available for rock strength characterisation; however, most of them do not meet the requirements needed for the VID system. In this present study, a non-conventional method of rock characterisation that uses drill-bit vibration dynamics and machine learning will be investigated to quantitatively estimate drilled rock strength. Regression networks including multilayer perceptron (MLP), support vector regression (SVR) and Gaussian process regression (GPR) will be adopted for the machine learning.
Before now, no real-time adaptable method has been developed to quantitatively characterise the continuously changing downhole rock layers for the VID system. However, our previous studies [14][15][16] have attempted online characterisation of its resulting impact responses with experimental validation. The studies were carried out to enable the selection of the best performing impact motion in terms of ROP when multistability (co-existing impact motions) is encountered while using the VID system. To further optimise the VID system, the present study makes the following contributions: -A non-conventional method of rock characterisation that can be adapted for online usage has been demonstrated using drill-bit vibrations and machine learning. -The proposed method is less computational, not restricted to a particular impact category, and it is autonomous, thus requiring minimal human interaction. -Unlike most downhole rock characterisation methods which are often qualitative, the present method is capable of quantitative rock evaluation. -Extraction of rock strength-dependent features from drill-bit vibration data with little or no manual interaction has also been demonstrated. -Various regression networks have also been compared to arrive at the most preferable network for rock strength prediction using the proposed method.
The rest of the paper is arranged as follows. Section 2 discusses rock characterisation and their usage for drilling optimisation. Section 3 introduces the  1 a Rock fragmentation mechanism of a vibro-impact drilling system, and b physical model of the impact oscillator representing the bit-rock impact actions of the VID system impact oscillator as a representative model of downhole bit-rock interaction. Estimation and comparison of stiffness-related features from measured acceleration signals are investigated in Sect. 4. The utilised regression networks and evaluation metrics are discussed in Sect. 5, and developed stiffness prediction networks and their performances are reported in Sect. 6. Finally, conclusion and further works are stated in Sect. 7.

Formation characterisation for drilling optimisation
Drilling optimisation for the purpose of increased ROP, minimised drilling cost and improved wellbore quality begins with an accurate estimation of the lithological properties of the drilled rock formations. Studies such as [17] have shown close relationship between rock properties and drilling performance. Hareland et al. [18] proved a 30% reduction in drilling cost using optimal parameters defined from the lithological data of adjacent wells. Hankins et al. [19] achieved a reduction in drilling cost and time using drilling parameters that were based on the rock strength interpreted from previously drilled reference wells. Rampersad et al. [20], also demonstrated how geological drilling logs obtained while drilling were used to optimise the drilling of upcoming wells in the same field. These studies have established that once the drilled rock strength is known, the drilling parameters can be tuned to optimise the drilling process.
Available means of rock strength characterisation include cored rock samples testing [21,22], testing of drilled rock cuttings [23], petrophysical evaluation of wireline logs [24] and rock strength evaluation from modelled ROP analyses [25]. Amongst these methods, the most commonly used is the petrophysical evaluation of wireline logs which are series of continuous measurements versus depth or time of formation properties using electrically powered sondes [26]. The process of taking the measurements in real-time as the drilling progresses is referred to as logging while drilling (LWD). Despite being the standard method for rock characterisation during oil and gas drilling, LWD is of limited usage when it comes to rock characterisation for the VID system. The limitations arises from the facts that: (i) Sensors are located at distances away from the drill-bit, hence, measurements do not represent the immediate rock material surrounding the drillbit [27,28]; (ii) LWD sensors being electrical are likely to fail at great depths from the effects of high temperature and pressure [29]; (iii) Transmitted data are often high dimensional, nonlinear and noisy, thus making them difficult to be analysed [30]; and (iv) The multiple sensors housings attached to the bottom hole assembly occasionally get it stuck [27,31].
In this present study, a non-conventional method adaptable for characterising downhole rock layers in real-time using readily measurable drill-bit vibration dynamics and machine learning is proposed. The bitrock impact dynamics are envisaged to carry informa-tion that can be extracted and quantitatively mapped into rock strength using regression networks. Nonconventional methods of rock characterisation have in recent time gained ground in the oil and gas industry due to improvements in downhole-to-ground surface data transmission [32,33]. Oloruntobi and Butt [34], based on the concept that the total energy required to break and remove a unit volume of rock is a function of lithology, developed a method for characterising subsurface lithologies in real-time using drilling specific energy. Their defined lithologies were in excellent agreement with those from traditional lithology identifiers like gamma ray and sonic velocity ratio. Akbari et al. [35], in their study, established a relationship between mechanical specific energy, uniaxial compressive strength, differential pressure, and confining pressure. Some studies [36,37] have also investigated the use of drilling load parameters, such as ROP, WOB and rotary speed, for downhole lithology identification.
As regarding the use of drilling vibration for downhole rock characterisation, some studies although not many have been carried out. Myers et al. [38] developed a proxy for identifying lithologic boundaries based on the amplitudes of measured drill string acceleration signals. Esmaeili et al. [39] experimentally evaluated the classification of mechanical formations based on the variation of higher order frequency moments calculated for their drill-string vibration measurements. Most of these previous methods have considered qualitative evaluation rather than the quantitative evaluation required for the VID system. In an attempt to carry out quantitative evaluation, Liao et al. investigated the use of bifurcation (a change in dynamical stability) analysis [40] and estimated impact duration [41] as nonconventional methods for downhole lithological characterisation. Their results showed that bifurcation analysis was only applicable to period-doubling bifurcation scenarios while impact duration was only applicable to stable period-one one impact (P-1-1) motions. These limitations coupled with their associated complex analysis made these methods unsuitable for on-line rock characterisation. Also, since large volume of data are produced downhole and are required to be transferred to the surface in real-time, a possible draw back to the use of vibration data for lithological characterisation would have been the limitation of data transmission. However, the development of reliable and high-speed telemetry techniques has made large data transmission possible [42,43].
In this present study, drill-bit vibration dynamics in the form of acceleration measurements have been used alongside notable regression networks to quantitatively characterise downhole rock lithologies. Acceleration measurements were chosen amidst other drill-bit vibration dynamics, such as displacement and velocity, because of their high sensitivity to impact motions. The use of artificial networks was based on their ability to learn complex nonlinear relationships [44][45][46], as may exist between the dynamics of an impacting drillbit system and the stiffness of its impacted rock formation. The use of artificial networks for lithological evaluation, though qualitatively, has been carried out in the past. Al-Khdheeawi et al. [37] and Li [36] developed neural network models for identifying drilled rock lithology using drilling data, such as ROP, WOB and rotary speed. Chen et al. [47] developed a convolutional neural network algorithm for lithological classification of downhole rock layers utilising drilling string vibration data. Sun et al. [48] explored a datadriven approach for lithology identification based on parameter-optimised ensemble learning using well log data, while Esmaeili et al. [49] also investigated the use linear and nonlinear models for formation lithology prediction using drill-string vibration measurements from a laboratory scale rig. Unlike the traditional LWD, where rocks are qualitatively evaluated and at a lagging distance from the actual drilling depth, the proposed method in this study quantitatively evaluates the rock at the drilling depth. Also, compared to LWD, the new method is less cost intensive as few sensors are required to measure the drill-bit vibrations. The use of few sensors on the bottom hole assembly also helps to prevent stuck-pipe issues.
To harness these benefits from the proposed method, a comprehensive understanding of bit-rock impact dynamics is essential; hence, the next section describes the dynamic impact oscillator as a useful system for investigating the proposed method.

Dynamic impact oscillator as a bit-rock impact system
The impact oscillator driven by external excitation and undergoing intermittent contacts with motion limiting constraints is the simplest mechanical system representing impacting engineering systems. The rich nonlinear dynamics resulting from sudden change of stiff- ness at the point of contact of its impacting bodies makes it useful for mimicking the intermittent impact actions of the drill-bit on downhole rock. Studies such as [50][51][52][53][54][55][56][57][58], have investigated the nonlinear dynamics of similar systems based on the stability and bifurcation phenomena that are associated with its long-term behaviour during operation. Liu and Páez in their study [59] adopted the impact oscillator to investigate the control of co-existing attractors in an impacting system. Páez et al. [60] investigated the dynamic responses and the complex bifurcation scenarios associated with similar impact oscillator with drift. Liao et al. [40,41] also used it to carry out stiffness estimation for the impacted constraint, while Afebu et al. [15] employed it to investigate a machine learning-based categorisation of downhole impact responses. Figure 1b shows the physical model of the impact oscillator representing the bit-rock interaction. As a piecewise-smooth system, the impact oscillator consists of a mass m connected to a rigid frame via a linear spring with the stiffness k 1 and a damping coefficient c representing the drill-bit. A secondary linear spring with the stiffness k 2 acting as the impacted rock media is attached to the opposite side of the frame and kept at a distance of g away from the equilibrium position of the mass m. The variation in the strength of the encountered downhole rock is modelled by varying the stiffness of the secondary linear spring. The system is operated by subjecting the rigid frame to a harmonic excitation of amplitude A and frequency Ω. When the displacement of the mass y exceeds g (i.e. y > g), impact occurs between the mass and the secondary spring. After the impact, the mass is restored back with a force that is dependent on the resultant stiffness of the two springs, i.e. k 1 + k 2 . The general equation of motion of the system in a compact form according to [55] can be written as where y ′′ and y ′ are the acceleration and velocity of the mass, respectively, and H (·) stands for the Heaviside step function. By introducing the following variables and parameters, where y 0 > 0 is an arbitrary reference distance, Eq. (1) can be rewritten in a nondimensional form as where Γ is the nondimensional forcing amplitude. For simulation purposes, the natural frequency ω n was taken as 1, and the fourth-order Runge-Kutta method was coded in MATLAB to iteratively solve Eq. (2) at a fixed time step. Categories of the resulting impact response were designated as P-n 1 -n 2 with n 1 representing the period(s) required for the system to make a complete cycle back to its original position and n 2 the number of impact(s) made within the n 1 period(s). Figure 2 shows typical system variables The impacting case shows 10 periods of the resulting P-1-1 signal, and it reveals that acceleration time histories are more indicative of impact motions compared to the time histories of displacement and velocity. For this reason, the dynamic acceleration signals were simulated by pair-wisely sweeping a range of excitation frequency ω and amplitude a over a range of β values representing the stiffnesses of downhole rock formations. Figure 3 shows the plots of the pair-wise parame-ter sweeps and their resulting impact motion categories. In all, 2999 samples of dynamic acceleration signals were simulated. Impact duration detection from the second derivation of the acceleration signal, where S pk and E pk indicate the start and end of each impact motion, respectively compared to displacement and velocity. At impact, the drill-bit acceleration starts to decrease rapidly due to the restraining elastic force from the impacted rock.
Depending on the stiffness of the rock, the deceleration of the bit reaches its maximum, and it returns back to its initial position. This effect results in the occurrence of high amplitude jumps along the acceleration signal of an impacting drill-bit. The acceleration jumps thus appear as peaks whose durations are commensurate to the stiffness of the rock. Based on this phenomenon, impact duration-related features will be extracted from acceleration signals and used alongside operated system parameters to estimate impacted constraint stiffness. To implement the aforementioned as an online non-conventional LWD technique, attention must be paid to the method used in extracting the impact durations as well as the method used in mapping them to stiffness values. The method must require little or no human interaction, less computative and easily adaptable for real-time application. In an attempt to extract impact durations from acceleration signals for a similar purpose, Liao et al. [41] made use of nonlinear time series and tangent vector analyses. Aside being  computationally intensive and only applicable to P-1-1 acceleration signals, the procedures require time-totime manual interaction, thus making them unfit for an on-line application.
In the case of this study, a self-implementable technique is adopted for extracting impact duration records from the acceleration signals. The difference between the start and end of each impact motion represented as peaks on the acceleration signals is deduced from the second derivative of the signal as illustrated in Fig. 4 via a find-peak analysis. With the simulation signals being in a nondimensional time domain, the calculated difference is converted to a nondimensional duration using the nondimensional time-step that is specific to each signal. For each signal, the impact durations are calculated as where τ i is the impact duration of the i th peak in the signal and τ s is the nondimensional time interval. Figure 5 shows the variation of estimated τ i values for signals of different β values. The variation is compared between signals of same impact category and for signals of different impact categories. It can be observed that for some impact categories (e.g. P-2-3 and P-5-4), the estimated τ i values tend to overlap and interwove into each other, thereby not showing a clear distinction between different β values. This effect may later impair the training and performance of the networks that will be developed. To neutralise the effect, the average value of the resulting τ i values were calculated for each signal. The averages of the τ i values represented in Fig. 5 are shown in Fig. 6, where the signals can be seen to be well separated with respect to their β values. The average impact duration τ i is seen to decrease with increase in β values. The same applies even when the impact motions are of different categories (see Fig. 7); however, Fig. 8 shows that there exist a very small difference (mean absolute difference of 0.0103) in the resulting τ i values for signals of same β but different Γ . Figure 9 shows the variation of estimated τ i values with their corresponding β, a, and ω values alongside their impact motion categories. It shows that no linear relation exists between the operating system parameters and the estimated τ i . This observation alongside the difference in estimated τ i for signals of same β values but different Γ values formed the basis for using neural networks and for including forcing parameters in the network input features. Figure 10 compares β values with their differences (△β) and the difference between their correspondingly estimated average impact duration (△τ i ). It can be observed that the ranges of τ i values estimated for the smaller β values are greater than those estimated for the larger β values.
Aside using the average impact durations, certain statistical measures, as shown in Table 1, were also computed for impact durations extracted along each acceleration signal and also for normalised raw acceleration data. These statistical measures are to be used alongside the system's forcing parameters as input features for the regression networks. The Pearson correlation coefficient of estimated impact durations statistics and the raw data statistics with the actual β values is shown in Fig. 11. In both cases, the plots show some statistical measures to either have high negative or positive correlations which are consistent for both the training and testing data. However, this consistency needs to be further confirmed especially with experimental data. Every regression model thus create continuous-valued multivariate function which tries to estimate the dependent variables y from independent variables x while ensuring minimal mean-squared error. For this study, supervised regression networks, including MLP, SVR and GPR, were utilised. The developed networks were evaluated using coefficient of determination (R 2 ), normalised mean absolute error (E nmae ) as and normalised mean square error (E nmse ) which are defined as: where y n is the actual value of n th target variable,ŷ n is the predicted value of n th target variable, y = 1 (y n − y) 2 is the mean variance of actual target variables. It should be noted that R 2 values vary between 0 and 1, and values closer to 1 are considered to be better. For E nmae , the closer the value is to zero, the better the prediction. A perfect model tends to have E nmse equal to zero; however, experiment and real-life are rarely perfect. For an acceptable and a reliable model, Kumar et al. [61] suggested E nmse ≤ 0.5.

Multilayer perceptron networks
MLP networks have been used as universal approximators for nonlinear problems modelling including classification [62] and regression [63]. They are referred to as feedforward networks due to their forward flow of information and are typically made up of an input layer, one or more hidden layers, and an output layer. The number of hidden layers and their neurons can be varied to make the network deeper; however, the number of neurons in the input and output layers are initially set to zeros but adjusted to the number of elements in the input and output data, respectively, during training. Assuming an input data x i , where i = 1, 2, 3, ..., N , the network output y out can be expressed as [64]     where N is the number of input data, M is the number of hidden neurons, x i is the ith input data, W ij is the weighting parameter between the i th input data and j th hidden neuron and W j is the weight parameter between the j th hidden neuron and the output neuron. For regression problems, the activation function is given as a linear activation function as while the hidden function is a hyperbolic tangent function written as f hidden (r ) = tanh(r ) = 1 − e 2r 1 + e 2r .
To maximise performance, the network weights and biases are iteratively adjusted based on the mean square error between its predicted output (ŷ) and the actual target (y) during training.
Inadequate connections due to small number of hidden layers and neurons can prevent the network from sufficiently adjusting its parameters, while excess connections may cause it to overfit the training data [65]. In this study, the input data varied between 2 and 26 elements and were mapped to a single output, thus justifying the architectures shown in Fig. 12. Best performances were achieved using a single hidden layer with 3 or 5 neurons (depending on the number of inputs) and a Levenberg-Marquardt optimisation function [66] to update the weights and biases.

Support vector regression
SVRs try to fix a function f (x) in a high dimensional feature space such that its outputs have at most, ε deviation from the actual targets of the training data, and also at the same time as flat as possible. In other words, attempt is not made at reducing the training errors in SVR as long as they are less than ε; however, values larger than ε will not be accepted. SVR thus permits the flexibility of defining an acceptable model error while finding an appropriate line (or hyperplane in higher dimensions) to fit the data.
Consider a training data set D TR = (x i , y i ) , where x i ∈ R m represent the input vectors with m dimension and y i ∈ R represent the target vectors with i = 1, 2, 3, ..., n o , and n o is the number of observed samples. In an attempt to find a model that properly describe this data, SVR initially proposes a linear function as Since real-world problems usually require a nonlinear formulation, a nonlinear mapping function is used to transfer the input values to a higher dimension feature space as where f (x) represents the general nonlinear regression function, and ·, · is the inner product of two vectors. (x) is a nonlinear mapping with the given space x, b and w are scalar and vector weights, respectively.
The flatness of the latter function and the SVR training requirement necessitate minimising w and b via nominalisation of regularised risk, and this results into a convex optimisation given as where C is a penalty factor for the error term and determines the trade-off between flatness and larger deviation of tolerance. The ξ i and ξ * i represent loose variables used to define the allowable soft margin of tolerance for the model, while the constant ε referred to as the tube size represents the maximum allowable error, and it defines the performance of the optimisation process [67,68].
The resulting dual convex optimisation problem shown in Eq. (13) can be transformed into a dual Lagrangian problem as which can be solved into a regression function as by replacing the dot product (x i ), (x) with a nonlinear kernel function κ(x i , x). The kernel function helps to map the nonlinear separable feature space to a linear separable feature space [69], and in SVR, the aim is to minimise the function represented by the first part of Eq. (13). Different types of kernel functions exist, however, in this study the Gaussian (or sometimes called radial basis) kernel function where x i − x is the Euclidean distance between x i and x, with an automatic kernel scale as available on MATLAB was used on the z-score normalised input data (with a zero mean and a standard deviation of 1). The parameters ϒ i and ϒ * i are the Lagrange multipliers, while γ is the kernel function parameter. During SVR training C, ε and γ are the parameters that are usually optimised [70].

Gaussian process regression
GPR is a network that tries to interpolate between given data by proposing multiple probabilistic prediction functions and validating them over the data. They are referred to as nonparametric models as they assume that the information from a data can only be captured by an infinite dimensional parameters which are thought of as functions rather than a finite set of parameters [71]. The functions constantly update themselves as more data are added. A typical GPR is defined by a mean function and a covariance function which is usually defined with a kernel function. The kernel function helps to describe by how much a point influences Fig. 13 Graphical model for a typical GPR [73] another, thus controlling the smoothness of the functions in the distribution [72]. Assuming a set of training data set (X, Y ) = (x i , y i ) , and testing data set , where x i and x * i represent the input vectors for the training and testing data, y i and y * i represent the target labels for the training and testing data, respectively, with i = 1, 2, 3, ..., n o . The main task of GPR is to find the best set of latent functions represented as F = { f 1 , f 2 , ..., f n o } whose joint distribution best match the data. Figure 13 represents a Gaussian process model, and the latent functions are seen to be inter-connected with each other by a covariance function that is defined based on their impact on each other. The prediction of y * is dependent on its corresponding single latent function f * while an individual latent function f i corresponds to an input data x i .
To describe and derive the unknown latent functions, GPR first assigns a prior distribution to the functions even before observing any data and continues to update as data are observed. The updated distribution referred to as posterior distribution is computed using Bayes' theorem on the prior distribution and the observed data distribution. With the fitted distribution of functions, y * i is predicted for each test data point x * i alongside some probabilistic bands on the predictions which serve as the confidence interval.
For a set of latent function F = { f 1 , f 2 , ..., f n o }, the distribution over them is defined as where M is the mean function representing the expected function value at input x i and K is the covariance func-tion matrix that defines the relations among inputs and the characteristics of the predicting functions. The prior mean function is often set to zero (i.e. M = 0) in order to avoid expensive posterior computations, allowing inferences to be made using the covariance function only. The zero prior mean function is achieved by subtracting the (prior) mean from all observations, while the covariance matrix or function K is calculated using different kernel functions. This study utilised the automatic relevance determination (ARD) squared exponential kernel function given as to show the covariance between two points x i and x j , where q is the characteristic length scale defining how far apart the input values in X can be for their response values to become uncorrelated. The ARD Squared Exponential kernel uses separate length scale (σ q ) for each predictor q (where q = 1, 2, ..., Q) while σ f on the other hand represents the signal variance and with q, they form the hyperparameters of the kernel covariance function. For widely separated data points, increasing q increases the impacts they have on each other. Going by the kernel function, close data points are considered and modelled as highly covariant while data points that are further apart are considered as low covariant. In supervised learning, it will thus be expected that points with close predictor values x i , will naturally have close target values y i . Also, a training data point that is close to a test data point should be informative enough about the prediction at that point, and these will all be expressed by the kernel covariance function.
To link the testing target labels Y * to the training target labels Y , GPR assumes a joint distribution given as wherẽ K (X, X ) and K (X * , X * ) represent the training and testing data points covariance matrix, respectively, while K (X, X * ) is a covariance matrix that represents the correlation between them.
The covariance matrix between all inputs in X can be computed as The test target labels Y * conditioned on the training labels are now then predicted using the probability distribution of Y * |Y ∼ N (µ, ), where represent the mean and the variance, respectively. Predicted Y * is estimated as the mean of the distribution µ while the uncertainty of the estimate is captured in the variance . To tune the hyperparameters σ f and σ q in the chosen covariance kernel function, the log marginal likelihood given as It thus shows that GPR do not only yield the mean prediction (µ) of the latent functions ( f * ) as output , but do so with a measure of uncertainty variance ( ) which is reported as confidence interval [74]. The narrower the confidence interval plots, the more precise the predictions are, and the wider the confidence interval, the less precise the predictions are. However, the presence of in-process noises can result in large fluctuation of the confidence interval, which reduces the reliability of the predicted results [75].

Stiffness prediction networks and performances
For the purpose of developing networks that map extracted information to the stiffness values of impacted constraint, the signal properties as compared in Fig. 9 except for the impact motion categories were utilised. Impact motions categories are exempted because compare to previous studies [40,41], we intend to develop models that are independent of impact motions categories. Also, it will be very difficult to replicate impact motion categories that are consistent for both simulation and experiment. The notation of the networks and their composed parameters are as presented in Table 2.
In each notation, "Featur es" represents the information extracted from the acceleration signal including the average of the impact durations from each acceleration signal (SimAvID), the statistics of the impact durations from each acceleration signal (SimStaID) and the statistics of each raw acceleration signal (SimStaRaw), while N et represents the network being used including MLP, SVR or GPR. Using the trial-and-error approach, some of the networks hyperparameters were tuned to values shown in Tables 3, 4, and 5 for the MLP, SVM and GPR models. For the remaining parameters, their default values as defined in MATLAB were utilised. Out of the 2999 simulated acceleration signals, 1569 were used for network training, while 1430 were used as out-of-sample data for cross-validation. The performances of the networks developed from different combinations of "N et" and "Featur es" on the testing data sets are presented in Tables 6,7,8,9,10,11,12,13,and 14 and Figs. 14,15,16. The networks input data had each row feature already standardised to have zero mean and a unit variance; hence, normalisation was set to "none" in the MLP models, while standardisation was set to zero in the GPR and SVM models.
The reported evaluations were carried out on the out-of-sample testing data set, and all the networks showed R 2 values not less than 0.9, while the NMAE and NMSE ranged between 0.006-0.232 and 0.0002-0.148, respectively. For the average impact durationbased networks, the MLP-2-SimAvID, SVR-2-SimAvID and GPR-2-SimAvID networks were better in performances, while for the impact durations statisticsbased networks, the MLP-3-SimStaID, SVR-2-Sim StaID and GPR-3-SimStaID networks showed better        performances. For the raw data statistics-based networks, the MLP and GPR network models were of exceptional performances with the MLP-1-SimStaRaw and GPR-2-SimStaRaw being the best. The actual-vs-  Fig. 20. The red and green circles represent the predicted and actual β values, respectively, and the dotted black lines in the GPR plots represent the 95% confidence interval. The initial high performances of the 23 features statistical measurements made it not necessary to implement a further feature selection on them; however, their variation with β values needs to be further investigated and established. Going by the metric values, relative distribution of errors around zero mean, shorter training time, lower memory usage and consistency, the MLP networks are most likely to be preferred for the quantitative on-line rock characterisation.

Conclusions
A quantitative and real-time characterisation of downhole rocks has been established as an integral requirement of the VID system in order to instigate resonance and controlled fracturing during operation. Before now, no generalised method has been developed for this as existing ones [40,41] were impact category dependent, computationally intensive and non-adaptable for online purpose since they require time-to-time manual interaction. In this study, an unconventional and quantitative characterisation of downhole rock layers using resulting drill-bit vibrations and machine learning has been carried out for the VID system. This is useful for selecting and tuning the operating parameters of the VID sys-   tem including the excitation amplitude and frequency for optimal performance. As an impacting dynamic system, a complex nonlinearity is expected to exist between the operating parameters and the resulting variables of the system [57]. For this reason, regression networks which are capable of learning nonlinear relationships have been used to force learn the nonlinear relationship between the stiffness of impacted rock layers and extractable features from impact acceleration signals. Acceleration signals have been selected due to their sensitivity to impact activities and their easy measurability. Explored networks include the MLP, SVR and GPR networks while extracted impact acceleration features include average impact duration, statistics of impact durations and statistics of raw acceleration data. Simulation data from a mathematical impact oscillator model representing the bit-rock impacts of the VID system have been considered. The networks evaluation results showed R 2 values between 0.900-1.000, NMAE values between 0.006-0.232 and NMSE values between 0.0002-0.148. Considering the performances of the networks, their relative distribution of errors around zero mean, their training time, memory requirement and consistency in their performances, the MLP networks are most likely to be preferred for downhole rock characterisation using any of the acceleration signal features. However, experimental validation is necessary to conclude on the best acceleration signal feature for the task.
Future works in this line of study will include further investigation into other rock stiffness-dependent features from impact acceleration signals and experimental verification of proposed methods.

Data accessibility
The data sets generated and analysed during the current study are available from the corresponding author on reasonable request.

Conflicts of interest
The authors declare that they have no conflict of interest concerning the publication of this manuscript.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/ by/4.0/.