Keywords

1 Introduction

1.1 Background

Electrical discharge machining is a non-traditional process that is widely used for jet engines turbine airfoils cooling holes. One of the key challenges is to properly detect a moment when the machine should stop the drilling process—when electrode reaches opposite site of the metal wall. This moment is called the breakthrough. Correct detection [1] is critical for parts quality. Literature examples where evaluation of pulse current and voltage was conducted showed that it is possible to identify the fact that a breakthrough took place [2, 3]. Such models make it possible to analytically indicate in the process data record the moment when surface damage occurred [4, 5]. This has allowed the use of machine learning tools in breakthrough detection [6]. However, in each of the mentioned studies, the moment of breakthrough was detected either after or during breakthrough phenomena occurrence. The purpose of this study was to verify whether it is possible to predict a breakthrough moment, and to analyse how far in advance it is possible. Currently evaluated CNC EDM records data and recognizes if a breakthrough has occurred based on analogue algorithms. However, as indicated, higher accuracy alternate solution based on AI models might be considered. It may be possible to develop additional sub-system that allows more accurate evaluation of the drilling process in the future. Development of such AI based system requires learning data properly labelled so based on that data set algorithm can indicate the need to complete drilling before a breakthrough occurs.

1.2 Process Description

EDM drilling process is widely used for small holes manufacturing. Such are made within fuel nozzles in diesel engines, medical equipment and in jet engines high pressure turbine cooled components. Once process is based on set of parameters and numerical control (NC) code parts could be manufactured. There are separate setups for electrical parameters, electrode movement and breakthrough detection. Part is placed in the EDM machine where its position is verified. After that automatic drilling process starts. All elements of the hole drilling process: surface opening, hole drill, breakthrough are NC code controlled. For complex parts such as jet engines elements drilling itself is important but breakthrough is critical. Once part is complete it is withdrawn from the machine for subsequent operations. Test coupon operation is shown in Fig. 1.

Fig. 1
figure 1

Photo from a process of making a test drills

2 Methods of Feature Extraction from Time Series Data

2.1 Methodology of Analysis

The data acquired in the EDM process have time-series characteristics, which means that they are time-indexed variables. This raises the necessity of using models that can cope with the autocorrelation of parameters, rather than just cross-sectional analysis of the relationships between them [7, 8]. Nevertheless, the methods used in time series modelling are often the same as in cross-sectional analysis. The simpler the methods, the faster the recognition. Hence the conclusion is that it is more reasonable to use the fastest possible algorithms in a situation where fractions of seconds are decisive. Despite the fact that artificial neural networks or support vector machines are a fairly common methodology [6], it was proposed to assess whether traditional methods based on decision trees, fast and easy to interpret the models, could provide satisfactory results. Hence it was decided to focus on three methods based on decision trees: DecisionTree, RandomForest and XGBoost from sklearn and xgboost libraries in Python.

2.2 Decision Trees

In the discussed study, among others, decision trees techniques were used. Using the sklearn library (Scikit Learn [9]), trees with an algorithm based on Classification and Regression Trees (CART) were created. The CART algorithm is one of many decision tree induction algorithms. It is based on hierarchical, binary divisions of a data set for better segregation of cases that are representative of the values of the dependent variable. The algorithm strives for an ideal situation where the division (leaf) created contains cases with the same value of the dependent variable. However, these methods are not as effective in prediction as neural networks or support vector machine. They do not achieve equally precise results, mainly due to the fact of discretization of quantitative variables, and thus forced generalization. Decision trees are graphical representations of rules obtained from the analysis of data structure. The undoubted advantages of tree-based classifiers are their graphical representation, clear and easy to interpret and verify based on domain knowledge, the ability to determine the significance of predictors, insensitivity to noise and outliers, the result in the form of a set of rules that can be used in other applications. A decision tree is a graphical method of decision support. It is a tree structure in which the internal nodes contain tests of attribute values, and the leaves describe decisions on object classification. A decision tree is essentially a structure composed of a series of conditional instructions. Decision trees are an advanced form of knowledge representation that provides a wide range of interpretation possibilities, both at the stage of the knowledge acquisition itself (data mining) and at the stage of its application in the decision-making process. Decision trees are a popular method used in industrial and research applications [10].

2.3 Random Forest

Random Forest (RF, also known as Bagged Decision Trees) is a method based on the principle of decision tree induction [9, 11]. It involves creating complex models consisting of multiple decision trees combined into a single classification model. The trees themselves calculate a value for each successive input, and then the result is averaged. With this approach, the model eventually became independent of outliers, which was previously a major drawback of individual decision trees. A random forest is a set of weak learners that can solve more complex classification problems. The random forest classifier starts by randomizing the training data into several different data sets. Each training data set is then used to create a customized classification tree model. Based on the individual trees, the values of individual attributes are evaluated as predictors by measuring data contamination, entropy. The set of attributes with the highest value will be used to split the data and a random forest will be built combining multiple decision trees.

2.4 XGBoost Classifier

XGBoost model is based on a gradient boosting decision tree (GDDC) algorithm where all separated models are calculated in parallel. It is possible thanks to a novel distributed weighted quantile sketch algorithm which works on a large dataset [12]. That algorithms helps to find proper split point for a given dataset which will be used to create basic decision tree models. Thanks to that it is possible to use the power of simple decision tree algorithms and power of gradient descent learning algorithms in a faster way. Implementation of the error function provides a feature that adds a penalty for giving extra leaves in a tree [13]. Penalty is proportional to the size of the leaf weights. Thanks to that it is less sensitive for overtraining and is more stable with local anomalies triggered by measurement errors. The most important thing is the possibility to calculate the XGBoost algorithm on a graphics processing unit (GPU) processor. That is possible by the process of parallelising the construction of individual trees.

3 Data Description

Data was prepared on a single part of material in which number of holes have been drilled. Pulse parameters were identical for all holes used in this study. Process data has been saved in comma-separated values (CSV) files. Every file represents data from a single hole drill. At first a simple visualization method was used (Figs. 2, 3 and 4). The breakout moment is clearly visible from a CNC machine data. First of all there is a significant change in Z_location data. Change in voltage value is also visible—it occurs in the same place where Z_location disorder has been spotted. Some stationary disorders are at the beginning of a drilling process; however, those disorders are triggered by an initiation process of machine starting the process. That anomaly should not be used in further data evaluation. In some drills the BT_DETECT was set up as true after a local disorder. After consultation with an expert, it was concluded that it must be a delay of a machine in a detecting a breakthrough moment.

Fig. 2
figure 2

Plot shows changes in a Z_location in a machine. Z_location represents a head of a machine that is responsible for holding an electrode

Fig. 3
figure 3

Plot shows variability of an Analogin_Voltage variable

Fig. 4
figure 4

Plot shows a moment where breakthrough moment is detected (BT_Detetc, value 1 means that machine has detected a breakthrough)

3.1 Feature Extraction

Visualization of data (Figs. 2, 3 and 4) showed that there is a need to extract information regarding stability of a variable in data. Z_location parameter data is similar to a linear function before a breakthrough. It was decided to create a feature which monitors a change of that parameter in a time window. It was achieved by the use of a simple linear function y = x property.

$${\text{X}}_{{{\text{n}} + {\text{step}}}} - {\text{X}}_{{\text{n}}} = {\text{step}}$$
(1)

where step is length between points on X-axis.

For a Z_location step = 300 was used. This value was chosen after initial trial. For selected value output had visible two states: low changes and a high changes. Use of a larger step value makes a feature less sensitive to variation. In contrast a lower value makes a feature too sensitive (Fig. 5).

Fig. 5
figure 5

Plot that describe a difference between every 300 steps

Extracting information from Analogin_Voltage is quite different because of the fact that the frequency of changing voltage is much higher than a visible process. However it has been decided to select a metric that describes a changing of data in time. That decision was made because it is clearly visible that in the same place like in a Z_location, Analogin_Voltage value is more “stable” (character of changes in data is the same). The first step was to concentrate on a single window and calculate a metric for that window and see how they change in a time. The first basic metric is a mean value in a given window. After calculating a mean for a single window it is visible that in a critical moment there is a place where that value is bigger (Figs. 6 and 7).

Fig. 6
figure 6

Plot that describes a mean value for a window equal to 200 steps

Fig. 7
figure 7

Plot that describes a various metric for a window equal to 100 steps

Another metric that shows us a change of values is a variance, that is why it has been also used, as another metric, however the change is not significant as in a data based on the mean value.

The last probe was calculating a subtraction between max value in a window and a lowest value in a window (Fig. 8). There is a huge advantage of that method because of the fact that we can control a window from max values and for minimal values.

Fig. 8
figure 8

Plot that describes a subtraction of max value in a window equal to 100 and minimal value for a window equal to 50

The method is quite sensitive in this example and the moment of disorder is clearly visible but only for a short time. After the end of this step, our dataset has an extra 3 columns with metrics that describe a change of a value in a time in a current moment.

3.2 Data Preparation

Data preparation was focused on cleaning data and preparing them that can be easily used by machine learning algorithm implementation. The most important thing was to separate data on test, validation and train set. To do that, it has been chosen 3 different groups of drills (one for a training, one for a valid set and one for a test set to calculate final metrics). Data from a machine doesn’t have not a number (NaN) values and are pretty well prepared (no visible records missing, etc.). That is why it has been decided to delete only the beginning of series where there is a disorder on a plot triggered by initial strengths. The value of BT_DETECT after detection is always set up as a true value. That is why it has been decided to leave 100 steps after detection and delete the rest of a tail. Thanks to that, machine learning algorithms would not learn wrong patterns (when the anomaly does not exist but the predicted value is still set up as true).

Data from separated drills needed a special preprocessing which allowed them to set up them in one dataframe. Thanks to that it is possible to learn models not only on a single probe. To do it whole preprocessing was generated on a single probe, the main target was to create on a single probe as many vectors that present each state of BT_DETECT in a time as possible. After that all probes were joined, thanks to that only final train set has 26,000 probes.

4 Results

4.1 Hypertuning

To prepare data for machine learning algorithms, it has been decided to use variable values from previous steps to show an algorithm a history of changes. Extra features which are described in Chap. 3 were added in the preprocessing section. It is given in a single vector thanks to the fact that there is no need to use special algorithms like recurrent neural networks.

Set of hyperparameter that has been used—only that parameters were changed in a model:

DecisionTree:

  • Max_depth [1;2;3;4;5;6;7;8;9;10]

  • Splitter [‘best’, ‘random’]

  • Class_weight [“balanced”]

RandomForest:

  • Max_depth [1;2;3;4;5;6;7]

  • N_estimators [10,3-,50,100]

  • Max_features [‘auto’, ‘sqrt’, ‘log2’]Class_weight [“balanced”]

XGBoost:

  • Max_depth [1;2;3;4]

  • Learning_rate[0.01, 0.05, 0.1, 0.15,0 0.2, 0.3, 0.5, 0.75,0.9, 1]

  • Colsample_bytree [0.3, 0.5, 0.7, 0.8, 1]

  • Reg_lambda [0, 0.1, 0.5, 1, 10, 20, 50, 100]

  • Reg_alpha [0, 0.1, 0.5, 1, 2, 5, 10, 20, 50]

To optimize hypertuning, it has been used a random search algorithm. The best models were sought in the parameter space given above. As a metric it has been used as an accuracy score [14]. The train set was used only for trains, and a valid set of data has been used to select the best algorithm. The final result has been calculated on a test data (Table 1).

Table 1 Metrics of models accuracy

4.2 Models Comparison

The models found the best accuracy that could be obtained within the assumed range of parameters. The simplest models were created by indirection with the basic CART algorithm, random forests and XGBoost resulted in much more elaborate trees. Examples of graphs can be seen in Fig. 9. A classic problem in publishing large graphs is the difficulty of maintaining readability if we want to transfer the model to a paper, so we present only an illustrative fragment of the XGBoost model. As we can also see, decision trees did not perform very well in mapping complex patterns.

Fig. 9
figure 9

Sample fragments of models: XGBoost and Classification Tree

A summary of accuracy metrics and measures of false positives and false negatives are summarized in Table 1. The results of the BT_Detect prediction are shown in Fig. 10.

Fig. 10
figure 10

The results of the BT_Detect prediction

From the point of view of the usefulness of the models, it is most important that breakthrough detection occur as early as possible before the actual breakthrough, so that the prediction allows the device to respond. As can be seen in Figs. 5, 6 and 7, currently breakthrough detection (by CNC device) occurs with some delay, so the prediction should offset this delay and even get ahead of the event. All models have succeeded in getting ahead of the event, so they can be considered to have made the prediction correctly, but they are not free of flaws. The problem can be a false-positive error, i.e. detecting an event even though it did not happen. The accuracy of the models (Table 1) indicates that this error is negligible, but in further work we will strive to eliminate it completely.

5 Summary

The final result shows that a XGBoost algorithm is the best algorithm to detect anomalies in data which provide information about a breakthrough moment. The main fact is that a decision tree and random forest look like they are too sensitive. That is why there is a lot of FALSE detection at the beginning of a process. However XGBoost algorithm only focused on specific changes that are right before or in a moment of breakout moment.

That information is very useful in in future work and gives a chance for better detection of breakthrough moment in a real time or it can be used to improve actually existing algorithm in a machine which could be very useful because of the fact that better breakthrough detection means better and faster production with a reduction in material (more components will pass a quality test for the first time). For that moment it is important to detect a proper anomaly in a smaller time window which can be modified to an algorithm delay.