Quality prediction for milling processes: automated parametrization of an end-to-end machine learning pipeline

The application of modern edge computing solutions within machine tools increasingly empowers the recording and further processing of internal data streams. The datasets derived by contextualized data acquisition form the basis for the development of novel data-driven approaches for quality monitoring. Nevertheless, for the desired data-driven modeling and data handling, heavily specialized human resources are required. Additionally, domain experts are indispensable for adequate data preparation. To reduce the manual effort regarding data analysis and modeling this paper presents a new approach for an automated parametrization of an end-to-end machine learning pipeline (MLPL) to develop and select the best-performing quality prediction models for usage in machining production. This supports domain experts with a lack of specific knowledge of data science to develop well-performing models for machine learning-based quality prediction of milled workpieces. The results show that the presented algorithm enables the automated generation of data-driven models at high prediction performances to use for quality monitoring systems. The algorithm’s performance is tested and evaluated on four real-world datasets to ensure transferability.


Introduction
In production with machine tools it is increasingly possible to record and process the internally processed data at high frequencies. By means of context-sensitive data acquisition [11], it is becoming feasible to automatically evaluate the obtained high-quality datasets and to utilize these for developing new data-driven approaches to process optimization and quality monitoring. Nevertheless, for the desired data-driven modeling and data handling, heavily specialized human resources are required [24]. Therefore, it is important to use the domain knowledge of experts for adequate data preparation and to automate the subsequent development of ML-based predictive models as well as possible.
Based on the results from Fertig et al. [12] this paper presents a new approach for an automated parametrization of an end-to-end machine learning pipeline (MLPL) to develop quality prediction models for usage in machining production. The presented algorithm provides an individual identification and parameterization of appropriate methods for feature extraction and selection. The obtained findings are used to build models for the prediction of the manufactured workpiece quality. This enables domain experts with a lack of specific knowledge of data science to develop automatically predictive models based on an acquired production dataset. Further these models can be used in machine learning based quality monitoring systems.

State of the art
In the field of quality prediction of machined workpieces, there are some research publications, which, as shown in Fertig et al. [12], can be divided into the three categories machining theory-based approaches, designed experiments Christoph Preis and Matthias Weigold contributed equally to this work.

3
approaches, and artificial intelligence approaches according to Benardos and Vosniakos [4]. The artificial intelligence-based approaches consider on one side the possibility to use varying technology parameters to train models for the prediction of surface qualities. To improve the model predictions these approaches are extended by applying signals from accelerometers as model inputs [2,14,20,21,28,37]. In addition to the prediction of the surface quality schuh, schorr, ziegenbein and brecher illustrate studies for predicting quality parameters, such as diameter, roundness, and concentricity of drilled and reamed holes as well as straightness of milled surfaces using internal machine tool data as input for the models' predictions [5,6,34,35,38].
However, these approaches require manual and process-specific data analysis. A high level of expertise in the area of data analysis and modeling is required for feature extraction and selection, as well as for the subsequent development of prediction models. In addition, these solutions are individually created for the respective process, which requires the manual steps to be repeated when transferring them to further processes. For this reason, methods from the field of automated machine learning (AutoML) are increasingly being used in production engineering, which enable automated prediction models to be created based on a specific input dataset. However, existing approaches usually provide domain-unspecific solutions. [24] Furthermore, to apply AutoML to time series classification, prior domain-specific feature extraction is required. The methods are designed to automatically select and train models by optimization of their hyperparameters based on a given feature-based dataset [8,18,22]. To overcome these issues this paper introduces an algorithm, which considers domain-specific requirements to automate the parametrization of an implemented machine learning pipeline for developing quality prediction models.

Machine learning pipeline
To develop the algorithm for automated pipeline parameterization, the contextualized machine tool data collected according to Fertig et al. [11] was used. A MLPL implemented with Python serves as the basis, which takes segmented time series data according to Fertig et al. [12] for each geometric element located on the workpiece along with the associated quality data. The module-based MLPL consists of submodules for feature extraction, feature selection, and machine learning, which can be parameterized via a main PipelineConfig.json file.

Feature extraction
In signal processing, the techniques for extracting relevant features to perform process monitoring tasks can be divided into time domain, frequency domain, and time-frequency domain methods [1,26,27,36]. Typically, it is necessary to manually identify and select the appropriate features with respect to the underlying process. To reduce the manual effort and maximize the automation of the feature extraction process, the Python library TSFEL [3] is used to compute the process describing features. The extracted features are defined using a features.json file, in which the available features are categorized by domain into statistical, temporal, and spectral features.
To additionally consider information from the timefrequency domain an extension of TSFEL by the domain temporal-spectral is implemented. The extension applies the discrete wavelet transform (DWT) and Hilbert-Huang transform (HHT) to the input time series data. The DWT decomposes the input signal into individual frequency bands by repeating high-pass and low-pass filtering. The decomposition level determines the number of transformation steps. In each decomposition level, the high-pass filtered signal components are coded as wavelet coefficients. The low-pass filtered signal components finally serve as a basis for the subsequent decomposition step. The calculation of the wavelet coefficients c( , s) within the DWT is done for discrete values of the scaling parameter s and shifting parameter according to Eq. (1) whereby x(t) corresponds to the time series under investigation and represents the selected wavelet basis function [10,23,29]. For the determination of characteristic features, the calculation of the mean, root mean square, standard deviation, kurtosis, crest factor, and the peak to peak distance for the coefficient profiles of each decomposition level is performed according to [36]. Additionally, the residual was taken into account during feature extraction.
The empirical mode decomposition as the initial step of the HHT decomposes the signal into a finite number of intrinsic mode functions (IMF) based on the sifting algorithm. To identify an IMF, the local maxima are connected via a cubic spline to create the upper envelope. Similarly, the lower envelope is obtained by using the local minima. By averaging the two envelopes, the resulting time series m 1 is obtained. The subtraction of m 1 from the original signal x(t) results in the first IMF component h 1 , which serves as input signal for the following iteration. These steps are repeated until a previously defined stopping criterion is reached. The second step of the HHT, is the application of the Hilbert transform to the IMFs, which uses the resulting instantaneous frequencies and instantaneous amplitudes of the signal to form the energy-frequency-time spectrum. This is known as Hilbert-Huang Transform [15][16][17]33]. Due to the high output dimension of the HHT matrix, the characteristic features mean, root mean square, standard deviation, crest factor, peak to peak distance and absolute energy are calculated in the same manner as described above for the DWT [7,31].

Feature selection
A majority of the features obtained from the automated feature extraction based on Sect. 3.1 typically yield nonrelevant information regarding the prediction task or exhibit indifferent behavior under changing process conditions. Additionally, an enlarged number of input dimensions leads to an increased demand for training data and the risk of overfitting rises. Therefore, in order to achieve highest prediction performances with correspondingly high generalization capability of the predictive models, subsequent to feature extraction, the designed feature selection algorithm (illustrated in Fig. 1) is applied. It is based on the presented feature selection method from Fertig et al. [12], which has been extended within this work to ensure a more general application. Initially, all features exhibiting a variance of 0 across the analyzed dataset are removed by means of a variance threshold filter. The next step applies a set S of feature selection algorithms on the reduced feature set. S represents a subset of the implemented features selection algorithms (eg. S = uniStat, LogisRe , cf. Sect. 3.4). A final proprietary feature set (propFeatSet) is obtained for each geometric element (geomElem), located on a workpiece (cf. the workpiece shown in Fig. 3 consists of seven quality-relevant geomElem) by the set of features which are selected by the previously defined amount of SW fs,prop feature selection algorithms. Thereby SW fs,prop (eg. SW fs,prop = 2 , cf. Sect. 3.4) can be varied from 1 to the quantity of implemented and thus available feature selection algorithms.
The implementation provides four selection algorithms, which can be combined arbitrarily. The first one consists of a univariate feature selection (uniStat) based on the determination of the mutual information between the feature vector and target variable. In addition, logistic regression (LogisRe) with elastic net regularization is utilized for feature selection. This regularization technique is particularly suited for a large number of features and a small number of training samples [39]. Lasso regression (Lasso), a shrinkade method, uses L1 and L2 regression penalty terms to shrink the coefficients of irrelevant features to 0. This model-based selection method allows a straightforward selection of the influential features by analyzing the model coefficients [13,19]. For Lasso feature selection, the Least Angle Regression (LARS) algorithm developed by Efron et al. [9] is used to compute the coefficients, which calculates all Lasso estimates at high computational efficiency. In particular, LARS shows its advantages in highdimensional datasets. Efron et al. [9] The fourth method for feature selection consists of the efficient wrapper approach boruta (Boruta). It aims to the identification of all relevant features for the prediction task. For this purpose, shadow features exhibiting random values are taken into account in addition to the real features. Finally, the feature selection is performed by comparing the feature importance, given by the used random forest, between the real and the shadow features. Kursa and Rudnicki [25]

ML algorithms and optimization
Adapted from Fertig et al. [12] the 6 ML algorithms, Support Vector Machine (SVM), k-Nearest Neighbors (KNN), Ridge Regression classifier (RidgeRe), Gaussian Naive Bayes classifier (GNB) Decision Tree (DT), Multilayer Perceptron (MLP) and the 3 ensemble algorithms Random Forest (RF), Extra Trees classifier (XT), AdaBoost classifier were implemented within the MLPL. The optimization of the hyperparameters for each algorithm is done using a grid search combined with stratified 3-fold cross-validation using precision as scoring function. For implementing the algorithms as well as the feature selection the python library scikit-learn was used [30].

Interim conclusion
By using the underlying pre-processed internal data from the machine tool and quality assurance, the developed MLPL enables automated individualized modeling for each quality-relevant geomElem on a workpiece. The following  The promising prediction results show the potential of the obtained models for quality prediction. Nevertheless, the numerous parameterization options of the MLPL lead to the assumption that the performance of the models can be further increased by suitable parameter combinations. After each run, 9 models are available for each geomElem. Thus the appropriate model suitable for the quality prediction must be selected to be used in the application. The presented algorithm in the following sections provides the automated parameterization of the MLPL via the PipelineConfig. json to obtain and select the best suitable models for the quality prediction task.

Algorithm for automated parametrization of MLPL
The objective of the automated parameterization is to determine the configuration that leads to the best-performing models using the MLPL for the given use case. In addition, for each geomElem, the appropriate model intended for use in data-based quality prediction will be selected, based on a domain-specifically elaborated scoring approach. This enables domain experts to efficiently create quality prediction models for individual workpieces and machine tools without additional manual intervention.

Algorithm description
The developed algorithm consists of multiple consecutive optimization steps (optSteps), which are executed sequentially to identify the optimal parameters gradually. The decision to perform a stepwise parameter optimization was motivated by the consideration that a full factorial implementation of the selected parameter combinations requires more than 600,000 MLPL runs, which leads to an unacceptable computational effort. After performing an optimization step, the corresponding identified parameters are set and the optimized configuration is applied to perform the next optimization step. Fig. 2 shows the operation procedure of the algorithm, which consists of running the MLPL, These are defined in a configTable_optSteps. For each optStep a separate configTable_optStep_runs provides the specified parameter values run_param to be set for the MLPL runs. Accordingly, the MLPL is iterated within an optStep according to the number of parameter combinations of the configTable_optStep_runs with individually adjusted parameter values. The results for each run are stored as a metrics report, which contains the resulting prediction metrics. Additionally, the MLPL provides the trained models for the corresponding run. Basically, the main concept consists of a modular and extensible design, which allows the algorithm to run through additional optimization steps by extending the configTable_optSteps with the corresponding configTable_optStep_runs.
After performing an optimization iteration, the metric-sReports are read for each run to rank and sort the results per geomElem following the developed scoring approach (cf. Sect. 4.2). These results are subsequently used to determine which parameter values lead to the highest performance metrics. Finally, the PipelineConfig gets reparameterized for the next optStep based on the identified parameters.
Both the DWT and the HHT exhibit hyperparameters, which need to be adapted to the characteristics of the available data for adequate feature extraction. For this purpose, the optStep DWT thus envisages runs under altered basiswavelets as well as decomposition levels. Included are the wavelet families daubechies, coiflet, symlet, biorthogonal with 9 shapes and the decomposition levels 3-7. Owing to the different processes resulting from the individual geometry of each geomElem, individual DWT hyperparameters are selected for each geomElem, which leads to the parameterization of the created individual features.json configuration files. The same applies to the HHT, in which the number of IMFs (in this case 1-7) is selected individually for each geomElem. Following the determination of the individual hyperparameters, a subsequent decision is required on whether features extracted using HHT should be considered in addition to DWT based features. Preliminary tests showed that the DWT is considerably more powerful compared to the HHT in terms of prediction performance.
The combination of both time-frequency methods yielded no improvement in the results. Since it cannot be excluded that in particular cases the additional use of HHT features may achieve better performance, the described optStep 2 is included as well. The final step of optimizing the feature extraction is to select which domains should be included in the model building process. For the spectral domain, the Hanning, Hamming, and Blackman window functions are additionally examined to improve the quality of the spectral analysis [32]. The selection of domains to be considered is based on different subsets of the available domains. The subsets consist of each domain individually (4 subsets), the combinations of temporal-spectral with the other domains (3 subsets) as well as all domains together (1 subset).
The feature selection is optimized based on the feasible combinations of the four implemented algorithms. SW fs,prop is adjusted according to the number of algorithms used per run. When using one feature selection method, a threshold SW fs,prop = 1 follows. Above two to four methods, SW fs,prop iterates between 2 and 4 respectively. Each of the resulting 21 parameter combinations is executed once with the standardization and normalization of the scaling methods, requiring 42 runs for optimizing the feature selection. The final run finally performs the MLPL using the optimal configuration identified by the algorithm to build and select the final prediction models for each geomElem.
The modular architecture and the various configuration files as well as tables enable a flexible extension of the algorithm by simple adaptation. This ensures broad applicability by allowing additional values to be easily added to the verification procedure if desired.

Scoring and parameter value selection
At the completion of each optStep, the results need to be analyzed and the parameter values that lead to the best prediction performance need to be determined based on the metricsReports created for each geomElem, which contain the classification performance measures for accuracy (ACC), precision (PREC), recall (REC) and specificity (SPEC) [12]. Additionally, the ROC AUC and the number of false positive predictions (FP). The metrics within the met-ricsReports are determined on the validation set. To achieve the identification of the parameters, a rank is assigned to each model. The ranks are determined using the number of FP predictions, the values for specificity and accuracy, as well as ROC AUC. A lower value for the number of FP as well as higher values of the remaining metrics lead to a better rank and thus to a preferred selection as a final model of a geomElem. Due to the desired application in quality prediction, FP predictions are considered to be particularly critical in a production environment. These lead to further processing and assembly of a workpiece that has been manufactured 1 3 in violation of its tolerances. As a result, it may not fulfill its function and may not be able to withstand the operating loads acting on it. For this reason, the ascending sorted ranks belonging to the number of FP forms the first basis to select the best performing models. If the number of FP is equal for several models, the subsequent sorting base considers the sum of ranks across all metrics. If this is still insufficient for unambiguous identification, the sorting procedure takes into account the ranks of ROC AUC, specificity, and accuracy for decision-making. This individual analysis for specific geomElem allows the hyperparameters of the time-frequency feature extraction methods to be determined and adjusted. The subsequent optSteps are evaluated globally across all geometric elements. Preliminary tests have shown that this individualized consideration yields significant improvements in model performance. However, no improvements and thus no advantages were obtained by individualized evaluation of the other optSteps. To reduce complexity and preserve comprehensibility, the subsequent optSteps are evaluated globally across all geometric elements. For each parameter combination, the resulting sum of FPs on the validation set predicted from the previously determined best-performing models per run is calculated across the entire workpiece thus all geometric elements. The parameter combination which leads to the lowest number of FPs is finally selected.
This domain-adapted scoring approach allows the identification of the best-performing models per geomElem within each optimization step. The underlying configuration parameters are finally set in the corresponding config files to be considered during the next optStep.

Results
In this chapter, the effectiveness of the presented optimization algorithm is demonstrated by applying it to different available datasets.

Datasets
The datasets were generated in the TEC-Lab of PTW using the 3-axis DMC 850 V machining center from the manufacturer DMG MORI (DMG) and the 5-axis GROB G350 2. Gen (GROB) machining center, which both are equipped with a Sinumerik 840 D control system. For data acquisition per machine, an installed edge computing solution was used, that provides the internal drive signals from the controller at a sampling frequency of 500 Hz. The respectively recorded signals and the experimental design as well as the data acquisition and matching from quality measurement can be obtained from Fertig et al. [12]. In addition to the dataset DS DMG1 presented in Fertig et al. [12], two further datasets were generated using the given experimental design and the considered reference geometry. DS DMG2 was produced approximately 1.5 years apart from DS DMG1 on the same machine tool. DS GROB accordingly using the GROB machining center. Table 1 summarizes additional information about the datasets.
The fourth dataset is applied for validation of the presented algorithm using a different workpiece geometry. For this purpose the pocket geometry made from the material 42CrMo4V, which is shown in Fig. 3 is considered. It consists of 7 quality-relevant geometric elements and was manufactured using the DMG machine tool. A total of 392 pockets were produced based on the previously mentioned experimental setup. For finishing the pocket, a carbide end milling tool (article number: 203089 10) made by Hoffmann Group was used. The tool features N z = 5 teeth at a nominal diameter of D tool = 10 mm. The technology parameters were summarized in Fig. 3. The pockets were arranged on 8 cuboidal plates made of 42CrMo4V with the dimensions l × w × h = 305 × 305 × 30 mm 3 in a 7 × 7 grid for optimized experimental procedure. This results in 392 pockets for the subsequent analysis.    Table 2 summarizes the percentage improvement of the average values relative to the default configuration. Except for the recall at DS DMG1 and DS DMG2 , the metrics in the total view across all models and geomElems show a partially significant improvement by the optimization algorithm. The improvements can be particularly observed when considering the scores of specificity and ROC AUC. This reflects the optimization target of minimizing the FP predictions since the number of FP, in this case, impacts the specificity values most. After optimizing the parameters, it is shown that for each of the 4 datasets for feature extraction, the temporal-spectral domain without HHT is considered solely for the prediction model development. The base-wavelets and decomposition levels selected individually for each geomElem thus provide the highest information density for quality prediction regarding the analyzed datasets. These best-performing hyperparameter values used for DWT furthermore differ among the geomElems and datasets. This supports the design of the algorithm toward individual parameter identification. In addition, the optimization algorithm selects different model algorithms for each geometric element, which leads to the conclusion that multiple model algorithms are required for the most accurate prediction. Nevertheless, these results only provide an overall impression of the algorithm's performance. When considering Fig. 4 several cases can be identified where models show poor scores, especially for the specificity. For this reason, it is necessary to select the best performing model for each geomElem based on the presented scoring approach (cf. Sect. 4.2). Table 3 summarizes the results of the final model selection. Shown are the mean values of the achieved metrics on the test dataset across all geomElems. Additionally, the achieved metrics of the non-optimized pipeline are displayed. This clearly shows that for each data set the

Conclusion
In this paper, a new optimization algorithm for parametrization of an end-to-end machine learning pipeline (MLPL) for the development of an artificial intelligence-based quality monitoring system for milling processes is developed.
The basis consists of a domain-specific developed MLPL, which automates feature engineering, feature selection and model training. The module based implementation allows to wrap the presented optimization algorithm around the MLPL. Using the preprocessed machine tool and quality data as described in Fertig et al. [11] and Fertig et al. [12] the algorithm is able to optimize the Hyperparameters of the different steps in the MLPL to train quality prediction models without the invest of manual effort by a domain expert. In particular, methods for feature engineering based on the time-frequency domain, which usually require elaborate pre-analyses of the data by experts, are parameterized and adapted to the specific task automatically using the optimization algorithm. Furthermore, the modular implementation via appropriate configuration files allows an effortless extension of the search space regarding the parameters to be optimized. The results of the analyzed four datasets demonstrate that the algorithm automatically trains and selects models with high prediction capabilities. The successful application using four different datasets further highlights the broad applicability and transferability to different machine tools and workpieces of the developed approach.
To improve the models' scores and thus deployability in production more samples of well contextualised data to increase the training datasets have to be aquired. The size of the training datasets used within this study seems to small to represent highly complex interrelationships between internal machine tool data and resulting workpiece quality in order to get even better prediction capabilities. In addition, it is necessary to examine the robustness of the models' predictions to obtain an estimation of how confident a particular model is with respect to its prediction.