A machining process of a machine can also nearly be described by the variables defined in the controller. For example, in the case of a milling machine, these include values for the position of the milling head or tool center point (TCP), the spindle current, or the feed rate. Figure 1 displays such a signal, extracted from a machining center performing different milling tasks.
By comparing the target values for these parameters with the actual measured values, it is possible to detect anomalies. This can be achieved, for example, by comparing measured values at certain time indices using a definition of the range of actual values allowed for a process. However, this has the disadvantage that permitted thresholds must be defined on a process-specific basis. It is often the case, however, that a single machining process consists of different sub-processes, some of which may occur again in other processes or other parts. The presented approach uses this transferability and therefore presents itself in two steps:
-
1.
Extraction of process-describing patterns from available time series and retrieval during machine operation.
-
2.
Anomaly detection for indirect tool condition monitoring.
This has the advantage that patterns can be extracted automatically from available time-series data, thus bypassing the process-specific manual matching of target and actual values. In this way, the system can detect anomalies based on the machining sequence being run and not based on generally applicable intervention limits. It should be noted however that process differentiation does not need to group all processes of a similar nature into one cluster. As an example, when feed velocities differ, different sub-processes can be grouped into different process clusters based on the different feed velocities. This leads to a much more granular distinction of processes which helps anomaly detection due to less intra-cluster variance but comes at the disadvantage of needing more process cycles for training data collection.
Based on these properties, the approach is heterogeneously applicable to different machines. With the additional availability of sensor data, this can also be integrated to enable more precise detections.
For the application of the approach, it is necessary to enable standardized access to the data in the controller, which can be achieved by implementing an OPC UA server in combination with intelligent parameter identification [23].
Pattern recognition
To recognize recurring sub-processes, it is necessary to divide existing time series into sub-sequences. Utilizing these sub-sequences, reference sub-signals can be generated, which can later be searched for in the on-line data. The splitting of the machine tool position time series into sub-sequences works by segmenting the provided input time series based on the detection of local minima. Using this criterion, sub-sequences as shown in Fig. 2 are generated.
These sub-sequences do not necessarily represent entire processes, but rather processing segments (sub-processes) that can reappear across different processing procedures. Afterward, the time series sub-sequences based on the position signal are grouped with the respective signals across other channels (such as currents, torque, etc.), based on the timestamp the splits in the position signal appear. If the position segment appears again, clusters can be generated across these other channels. An approach for clustering these signals was previously presented using Mean-Shift Clustering in [24], subsequently, the approach was extended by extensive data pre-processing, involving smoothing of the positional time series and offset corrections to further improve the matching with previously appearing positional signals of the same type.
To enable re-detection of these positional signals in on-line data, a sliding buffer on the on-line data is used, matching positional signals in the offline database to signals appearing in the data stream. To reduce matching time, the patterns in the offline database are matched at different positions in the signal buffer using the mean absolute error [23]. Using iterative calculations of the distance between the offline patterns and the signal in the buffer in combination with stopping distance calculations for individual offline patterns early if the distance increases above a specified threshold leads to pattern matching that can be applied for streaming data. Once a pattern is re-detected, the respective clustering of the other channels (torque, current, ...) can then be used to make a comparison to the respective signal that appears in parallel to the on-line position signal.
Anomaly detection
The anomaly detection now employed is based on a comparison of the identified pattern references (e.g. torque) with the signals during the operation of the machine.
The reference patterns identified in the pattern recognition are generated by a modification of the arithmetic mean adapted to the different pattern lengths.
$$\begin{aligned} \sigma _{C_{k}}^{2} = \frac{1}{T_{k}}\sum _{t=1}^{T_{k}}\sum _{x_{i,t}\in C_{k,t}}^{}\left\| x_{i,k,t} - \bar{x}_{k,t} \right\| _{2}^{2} \end{aligned}$$
(1)
where \(C_{k}\) is the cluster for pattern k, \(\bar{x}_{k,t}\) is the cluster mean of Cluster k at timestep t, \(x_{i,k,t}\) is the data point i of Cluster k at timestep t, \(T_{k}\) is the number of timesteps available for cluster k.
Thus the within-cluster-variance (Eq. 1) can be calculated for each individual data point of the respective cluster and then be used to calculate tolerance limits for this reference pattern. The limits can for example be calculated by using the standard deviation of the cluster members (Eq. 2), which acts as an easy but extendable approach for identifying deviations:
$$\begin{aligned} \begin{aligned} Tr_{k}^{U} = \{\bar{x}_{k,t} + \epsilon \sqrt{\sigma _{C_{k}}^{2}}, \quad 0\le t \le T_{k}\} \\ Tr_{k}^{L} = \{\bar{x}_{k,t} - \epsilon \sqrt{\sigma _{C_{k}}^{2}}, \quad 0\le t \le T_{k}\} \end{aligned} \end{aligned}$$
(2)
The parameter \(\epsilon\) can be used to manually weight the standard deviation of a sample.
The disadvantage of this generic approach is that in areas of low signal variance, threshold values are very close to actually observed (and thus clustered) signal values. An example of this problem is demonstrated in the visualization in Fig. 2, where low and high variance regions exist within a time series. This complicates the choice of a parameter \(\epsilon\). In addition, it is visible that in areas with large gradients, signal trajectories are laterally close to the boundaries, making anomaly detection susceptible to shifts.
To overcome these problems, a modified approach for the calculation of the tolerance range is used.
Increasing the spacing of the tolerance range in areas with high signal gradients in lateral direction, the calculation of the upper and lower threshold values is implemented by moving maxima and minima. Two parameters \(t_{b}\) and \(t_{f}\) are used to define the window size for the moving calculation, which results in a hull curve of the signal as follows:
$$\begin{aligned} \begin{aligned} x_{k,t}^{U} = \max _{t-t_{b} \le t \le t + t_{f}, i \in I_{k}} x_{i,k,t}, \quad 0 \le t \le T_{k}, \\ x_{k,t}^{L} = \min _{t-t_{b} \le t \le t + t_{f}, i \in I_{k}} x_{i,k,t}, \quad 0 \le t \le T_{k} \end{aligned} \end{aligned}$$
(3)
The calculation of the adjusted tolerance range taking into account the waveform is carried out based on these upper and lower thresholds according to the following rule:
$$\begin{aligned} \begin{aligned} Tr_{k}^{U} = \{ x_{k,t}^{U} + \epsilon _{1}(P_{99}^{k}) + \epsilon _{2}(x_{k,t}^{U}-x_{k,t}^{L}) + \epsilon _{3}\sigma _{C_{k}} ^{2}, \quad 0 \le t \le T_{k}\} \\ Tr_{k}^{L} = \{ x_{k,t}^{L} - \epsilon _{1}(P_{99}^{k}) - \epsilon _{2}(x_{k,t}^{U}-x_{k,t}^{L}) - \epsilon _{3}\sigma _{C_{k}} ^{2}, \quad 0 \le t \le T_{k}\} \end{aligned} \end{aligned}$$
(4)
\(x_{k,t}^{U}\) and \(x_{k,t}^{L}\) denote the previously formed hull curves, which result from the sliding maxima and minima over the respective clustered signal values. \(\epsilon _{1}, \epsilon _{2}, \epsilon _{3}\) denote hyperparameters which determine the influence of the corresponding variables with \(\epsilon _{1}\) weighted value \(P_{99}^{k}\) corresponds to the 99th percentile of the difference between the hull curve and the representative for cluster k. This is used to form a wider tolerance range in the case of noisy signals since the value of \(P_{99}^{k}\) is larger here. Using \(\epsilon _{2}\), the distance of the upper value of the envelope from the lower value can be weighted and included in the calculation. Finally, the intra-cluster variance \(\sigma _{C_{k}} ^{2}\) is included in the calculation of the tolerance band via weighting parameter \(\epsilon _{3}\), which can be used alternatively or in combination with \(P_{99}^{k}\) for the inclusion of the noise.
Figure 3 shows the formation of the tolerance band according to these rules for two different parameter combinations, while the tolerance range for a full signal, which has not been divided into individual sub-sequences, is shown to emphasize local changes due to stronger noise.
If the system detects a previously occurring pattern as described at the beginning of the chapter, a comparison is made with the previously stored reference. It is therefore checked whether the observed value lies outside the tolerance range formed for the associated cluster (Fig. 4).
For certain anomalies, movement away from normal values begins to appear over a prolonged period, which requires additional care, as extracted sub-sequences are usually short and such anomalies may therefore not be detected straight away. To solve this issue, time series are marked specifically during training, which helps with detecting such transient anomalies later on.
Human-in-the-loop
After an anomaly has been detected, labeling is carried out by integrating a human component so that, upon re-occurrence, anomalies can be classified cluster-dependently, and thus across processes, since the anomalies are considered on a sub-sequence basis independently of the higher-level signal and the associated processing procedure, as we assume that anomalies such as blowholes that appear in specific sub-processes present similarly independent of the individual machining process. This labeling extends the unsupervised system so that classical supervised machine learning approaches can be used. Additionally, such labeling in combination with the prior partitioning of signals into sequences independent of the originally performed machining process circumvents a common problem in the context of error detection in production scenarios: Since similar training data is usually required to train a classifier, it is necessary to generate such data at a high cost. In addition, a classifier trained in this way can only be applied to new data if it has a similar structure to the training data. This ensures that many conventional systems can only be used in a very process-specific way. However, by separating the signal from the actual process, it is now possible to reuse similar patterns across processes for anomaly detection, which simplifies data acquisition.