Continuous modelling of machine tool failure durations for improved production scheduling

Unforeseen machine tool failures due to technical issues can cause downtimes leading to delays during production. To reduce delays, rescheduling of the production is, in most cases, necessary. However, warranting such a change requires reliable knowledge about the duration of the failure. This article presents a method to provide this knowledge by estimating the duration of a machine tool failure based on previous failure durations. Using the cross-industry standard process for data mining (CRISP-DM) and statistical methods, the embedded model for failure classification and duration is continuously improved. The method is thoroughly tested using multiple distributions, parameters and a practical use case. The results show high potential for predicting the duration of machine tool failures, which consequently could lead to improved quality of rescheduling.


Introduction
Growing customer expectations and technological developments have led to an increased complexity of manufacturing systems. Companies face stochastic disturbances and cyclic demands and consequently they struggle to achieve their full manufacturing capacity [1]. Additionally, they have to deal with shorter product and technology lifecycles, increasing number of variants, customized products, short delivery times and competitive pricing [2][3][4]. Thus, efficient production planning and control (PPC) has become a crucial competitive advantage for many companies [1].
However, stochastic influences on the production time make exact planning difficult. For example, machine tool failures (MTF), defined by Iserman as "a permanent interruption of a system's ability to perform a required function under specified operating conditions" [5], are very common. These are often caused by technical issues. Hence, different approaches exist in the area of maintenance (preventive or predictive maintenance to avoid them) [6]. Nevertheless, the implementation of these approaches is not always possible. Therefore, the quality of production control depends on the ability to forecast the time to restoration (TTR) as accurately as possible and to use this information in a targeted manner when rescheduling orders.
In practice, a forecast of MTF durations is often difficult to obtain due to the poor quality of the operating data used for the forecast. These are usually recorded manually and are therefore often faulty. Hence a methodology for predicting MTF durations is required. This must ensure the quality of manually recorded failures by using suitable routines. At the same time, the quality of predicted failure durations by means of a structured and as complete as possible classification of the recorded MTF must also be guaranteed.

State of the art
In common methods for the modelling of MTF events, an exponential distribution [7][8][9] or an Erlang distribution [10,11] is assumed for the entire failure duration of a machine tool. The Weibull distribution or the logarithmic normal distribution are also often applied [12]. To model these distributions, the mean value of the TTR (MTTR) is used. Due to the complexity of disturbances like MTF, such a description 1 3 is a simplification [13] because a distinction between different MTF classes is often not made. As a result the MTTR is obtained on the basis of data from many different failure events. Thus, in practice, the data is enhanced by the expert knowledge of the production planner or maintenance engineer. However, there is no systematic analysis which may lead to sub-optimal planning results.
An approach with automatic and structured failure classification and individualized forecast values for each failure class is presented by Oladokun. Variables, which affect the duration of an MTF, are identified as input for an artificial neural network. Output variable is a failure duration class which divides the duration into three possible time intervals, depending on how strongly the duration influences production planning. Even if the accuracy of the forecast model is specified with 70% [14], it only uses a fixed and small number of failure duration classes. Hence, the output cannot serve as a basis for high-quality rescheduling decisions.
Research in the area of maintenance as well as condition and health monitoring focuses on the diagnosis of faults and the prognosis of wear on tools and components [15,16]. For example, event-related or condition-related data is used for information retrieval in condition-based maintenance [17]. The event-related data include machine failures, maintenance measures or information on the tool, the machining object or the process [17][18][19]. In contrast, condition-based data are measured values that reflect the condition of the unit under investigation. For this purpose, the term sensor fusion refers to various theoretical and practical approaches to combine several or different signals of machine tools to obtain more desirable results [20,21]. In this context, this means more precise, complete, reliable or robust results than a single signal or one type of signal [22]. Neither summaries of the state of the art [20][21][22][23][24][25] nor the individual publications considered in this article provided results for the prognosis of MTF durations.
Other research areas investigate the suitability of different methods to determine an exact prognosis of accident durations [26], the occupancy time of hospital beds [27,28] and the duration of surgeries [29]. Although these cases are similar to the problem of MTF duration prognosis discussed in this article, the solutions cannot simply be transferred because extensive data sets are needed to train the underlying models. Especially, small and medium-sized enterprises (SMEs) often do not have the resources or the incentive for a detailed and continuous data recording of their production processes [30].

Methodology for predicting MTF durations
The review of the state of the art shows that there is a lack of research with regards to the prognosis of MTF durations in the area of PPC. Established indicators such as unclassified MTTR are too imprecise for the prognosis of individual MTF. Therefore, the methodology presented in Fig. 1 was developed. It was developed according to the concept of "Cross-industry standard process for data mining" (CRISP-DM). CRISP-DM defines the requirements and work steps shown in Fig. 2 for the creation of models with a high forecast quality. The four core steps are Sects. 3.1-3.4. These steps are included in the methodology for MTF prognosis. As a result, the methodology enables systematic processing of MTF events and provides the production planner with statistically validated prognosis of MTF durations.

Data comprehension
A prediction model must first be trained to forecast MTF durations. This can be done by either using historical data or information about MTF that occur after implementation.
In the latter case, the use of the method must ensure that a forecast is not made until sufficient data for statistic validation is available. For data recording, tuples consisting of the MTF duration and the MTF class (e.g. tool breakage) are entered into the model via a user interface. If there are at least n = 3 + l observations for the current MTF class, an adjusted box plot is calculated for determining the position parameters. This enables a descriptive characterization and evaluation of the observations. The variable l represents the threshold value for manual intervention (cf. Sect. 3.2). The adjusted boxplot is used because it takes into account the skewness of a distribution when calculating the whiskers by using the Medcouple (MC) [31]. This is done to avoid the incorrect identification of many potential outliers in case of skew distributions. The MC is calculated according to Brys as follows [32]: with X is the sorted sample with n independent observations; The whisker interval is calculated as follows.
For MC ≥ 0 For MC ≤ 0 IQR is the interquantile distance; Q 1 is the lower quantile; Q 3 is the upper quantile.

Data preparation
For data preparation, the observations are compared with the previously determined limit values. Observations that lie outside the whiskers represent extreme values and, thus, could have a negative influence on the prediction quality of the model. Possible causes for such extreme values are: 1. The observations represent correct MTF durations of the MTF class considered, they are real extreme values. An annotation is internally made in the model, so that these points are no longer considered when comparing with the threshold value l. 2. The observations represent the correct MTF duration for the failure class considered but a modification (e.g. maintenance measure, improvement) has taken place on the machine, which has caused a significant change in the failure distribution. The data collected up to the time of modification are excluded from further prognosis. 3. The observations are outliers which do not reflect the distribution of the failure duration as expected but are influenced by further effects (e.g. input errors). These values can be removed or corrected by the employee from the dataset. 4. The observations are classified as extreme values because they belong to a different MTF class. In this case, it is possible for the employee to transfer the observations to another or a new class.
According to the recommendations in the literature, case 2 to case 4 lead to manual interventions by the machine operator or production planner. To keep the interventions at an appropriate level, the number of potential outliers u is compared with the threshold value l. If there are enough observations for the failure class considered and at least l observations outside the whiskers at the same time, the employee is made aware of this fact. For this purpose, the adjusted box plots of the failure classes are displayed and the employee can qualitatively evaluate the extreme values. After a manual intervention in the data sets, a new check is initiated for the affected failure classes, as long as that sufficient observations are available.

Modelling
If the considered data set passes the test for extreme values, data modelling follows. For individual MTF classes, it can be assumed on the basis of the central limit value theorem that a production system consisting of overlapping stochastic input variables results in normally distributed output variables [33]. This assumption is tested during the modelling step using the Shapiro-Wilk test (SW test) [34]. Depending on the results, either the median or the mean value of the implementation interpretation / evaluation modelling data comprehension data preparation process comprehension data Fig. 2 Cross-industry standard process for data mining [39] data set under consideration is used for MTF duration prognosis. The tested data set deviates significantly from a normal distribution, if the null hypothesis of the test is rejected and the alternative hypothesis is assumed. In this case the robust duration of MTF prediction is based on the median because of its high break point compared to the mean value. This prevents individual atypical MTF durations from influencing the forecast quality and can have a positive influence on the prognosis quality [35]. In addition, the median represents the entry value with the highest probability in unimodal distributions, which is not normal or skew, and is therefore usually preferred over the mean value [36].

Evaluation and interpretation
Lastly, the statistical significance of the position parameter is examined. For normally distributed data, Kröning's method for accuracy prediction is used. Accuracy e is calculated as follows [37]: For not normally distributed data observations, j and k are identified, which correspond to α/2 = α j or 1α/2 = α k of the cumulated binomial distribution B(n, 0.5) [38]. Since the binomial distribution is a discrete distribution, the limits of the confidence interval for the median are additionally linearly interpolated. In this way, the required confidence probability of 1α is achieved. However, if identical values occur at the neighboring ranks of j and k to the respective ranks themselves, an interpolation may lead to false results. In this case, the confidence interval without interpolation is used.
If the minimum of one is defined as the intervention limit for the threshold value l, the iteration ends at this point. If the confidence interval was able to maintain the required accuracy, a reliable forecast value is available. If the intervention limit is set as higher than one, this leads to a further loop-like check. From all permissible observations of the current MTF class, the most recent observation is removed in l iterations and the verification of the confidence interval for the required accuracy is repeated. The purpose of this is to secure the forecast against potential outliers whose number u is below the threshold value l. At the same time, the volatility of the confidence interval in stochastic observations is compensated because the randomness of MTF durations does not permit a strictly monotonously decreasing confidence interval with an increasing number of observations. Thus, even with a high intervention limit and the ranking-based confidence interval around the median, the quality of prognosis is assured.

Experimental evaluation
First, the functionality of the developed methodology for prognosis of MTF durations and, in particular, its practical applicability is tested. For this purpose, it is examined whether the method fulfils the following characteristics: 1. Quantity of data The prediction of MTF duration should be able to deliver trustworthy results even with small quantities of data, so that a fast adoption for planning and scheduling processes is made possible. 2. Data quality Against the background of stochastic effects of a real production, the forecast should be able to identify and process incorrect values. At the same time, the system should be able to adequately process atypical behavior. 3. Interaction Interaction with the user should be reduced to a minimum to save resources. However, this must not contradict data quality. Atypical behavior must trigger an interaction quickly.
Subsequently, the effect of the more precise prognosis options on corporate key figures is examined using an exemplary production scenario.

Model functionality and practical applicability
For the first part of the evaluation, different configurations of the model are considered as part of sensitivity analyses. Effect direction of parameter changes becomes apparent and problematic configurations are identified. In addition, various generated distributions are used.

Model behavior with small amounts of data
Particularly with small amounts of data, the choice of an appropriate error probability α is subject to a conflict of objectives. On the one hand, a high probability of error makes it possible to obtain a forecast value for small amounts of data. On the other hand, an error probability that is too high contradicts the objective of obtaining reliable values for the forecast. The aim of first investigations is therefore to clarify whether a situation that influences the choice of the confidence probability 1α exists or enables specific recommendations for action for its choice. Therefore, the mean and median value of a normal distribution as a function of standard deviation σ, error probability α and number of observations n are investigated. The detailed test configurations are shown in Table 1.
The investigations of the confidence interval around the mean value for normal distributions with different scattering initially show only trivial correlations, which confirm the basic functionality of the method.
For the confidence interval around the median, the case without interpolation was focused on because here the actual confidence probability does not necessarily correspond to the required one. The test results presented in Table 2 show the actual confidence probability α k − α j , with the lower rank j and the upper rank k of x(i), which represent the limits of the confidence interval. The number of observations n α must be available for all four probabilities of error examined to ensure that they do not fall below the required confidence probability 1 − α. These are indicated by the dotted lines in Table 2. To ensure the functionality of the prognosis model during the evaluation part (cf. Fig. 1 and Sect. 3.4), n α has to be corrected by the expression (l− 1). As a result, the required minimum number of observations n min is calculated as follows: n min is the minimum number of observations; n is the number of observations to ensure confidence probability without interpolation.
The graphical comparison of confidence and error probability as well as the number of observations in Fig. 3 demonstrates that this problem can be disregarded by the method from α = 0.125, since at least four observations are inevitable for a forecast.
However, there is no specific restriction of the use of the method based on the facts presented because the method also ensures compliance with the confidence probability for α < 0.125. It is only necessary to individually examine whether a deviation from the standard value α = 0.05−0.0625 is logical as the minimum number of observations can then be reduced by one.

Interaction effort with different data quality
The choice of the intervention threshold l and the distribution function of the MTF have a decisive influence on the interaction effort. The latter depends on the considered machine tool as well as other framework conditions of a company (e.g. capacities for maintenance) and can therefore not be directly influenced. On the other hand, the threshold value l represents a configuration variable of the forecast model. Therefore, the effect of different threshold values on the resulting interaction effort is investigated. Detailed information on the experimental settings are shown in Table 3.
The main experimental results are summarized in Fig. 4. The probability for none, one and two manual interventions is shown as a function of the threshold value l. (9) n min = n + l − 1   , the probability of no necessary manual intervention for all three distributions examined is highest from l = 5. This also implies that, in the worst case, outliers are only detected if five extreme values are already present. The choice of l = 3 as threshold value is therefore more practical. Here, the relative frequency for a single manual intervention is highest for the first time.
The model behavior at Weibull distributed input values is similar to that at normal distributed values. However, for the logarithmic normal distribution, the limit value for no manual intervention is l = 6 and the threshold value for one manual intervention is l = 4. Since the adjusted box plot takes into account the skewness of the distribution, this cannot be the only cause for the slightly deviating model behavior. The extent to which the scattering behavior could be the cause is still to be examined.
Lastly, it should be noted designing data sets that can lead to undesired, but mathematically correct model behavior is possible. Thus, the median for a data set with two perfectly alternating growing centers, which differ exclusively in the mean value, lies exactly in the middle of the centers at every second iteration. This corresponds to the mean value between the two centers and does not allow adequate differentiation. In addition to the accuracy e, the threshold value l also has a significant influence on the prevention of this problem. For l > 3, the two distributions add up enough observations to cancel out the detection of skewness with further uniform growth. In this case, the box plot is not able to provide sufficient differentiation via the parameters used.
To investigate the extent to which these theoretical limitations influence the practicability of the methodology and to estimate its practical potential, the model behavior is analyzed in a use case in the next step. According to the results above, for this α is set to 0.0625 and l is set to 3.

Exemplary application of the model
The added value of a more accurate prediction possibility of MTF durations is investigated on the basis of the production of a sample component. Details on the use case are given in Fig. 5. Oladokun's classification into the three fault classes "short term" (≤ 0.5 h), "medium term" (0.5-2 h) and "long term" (≥ 2 h), which has been available so far as prediction model, is used as reference for evaluating the newly introduced methodology [14].
To evaluate the planning quality, a labelled training data set of MTF reports for six exemplary MTF classes (cf. Table 4) with 1000 fault events each is generated and randomly mixed. This data set is manipulated in such a way that the number of correct messages (assignment of MTF class to MTF duration) is approximately 50%. This simulates the poor quality of manual production data collection (PDC) inputs which frequently occurs in practice. Required confidence probability Minimum, actual confidence probability without interpolation  The prognosis model is then trained with the aid of the manipulated data set by entering the disturbance events one by one. The class assignment is corrected in the case of an automatic error detection. In the next step, the resulting forecast values x prog are stored for planning in a Manufacturing Execution System (MES) as a one-time MTF for the second milling machine. An exemplary production program with 40 orders of different batch sizes is created and the resulting lead times (LT) are calculated. In addition, the mean values of the disturbance classes according to Oladokun are used for further three planning scenarios.

Type of distribution
To analyze the planning quality of all considered scenarios, production is simulated with the help of a material flow simulation model in Tecnomatix Plant Simulation. This digital twin of the considered production system takes into account the correct distribution function s(x) for the six disturbance scenarios (cf. Table 4). To statistically secure the simulated LT, five simulation runs per scenario are performed.
The results show that an underestimated MTF duration due to a lack of accuracy of prognosis can quickly lead to large deviations between planned and actual LT. The analyses indicate that, with the new methodology, the disturbance durations used for forecasting are a good approximation of the real disturbance distribution, even if the quality of the manual PDC messages is medium. The real mean values are predicted with a quality of at least 97.3%. In comparison, the Olandokun model only achieves a quality of 70% with an ideal input database. This aspect was neglected in the comparison in Table 4, which is why the gained accuracy represents a minimum value.
Due to the very good interference duration prognosis, the predicted LT deviates from the real one by a maximum of 0.12% (absolute 0.4 h). With the old Olandokun model, the maximum deviation is 1.69% and 5.97 h is absolute. As more than one single fault event usually occurs during the production of an order in the production system, this reduction can already have a strong influence on the PPC performance of a company.

Conclusion
The model developed within the scope of this work on the basis of the CRISP-DM concept is used for the reliable prognosis of MTF durations. Depending on the underlying MTF duration distribution, the median or mean value is used for the prediction. An input-parallel check of PDC inputs helps to avoid incorrect reporting and allows forecast accuracy of over 97%. For practical validation and research, the prognosis model was tested with different configurations and different data sets. It has been proven that the model can handle small amounts of data as well as poor data quality. Additionally, it has been shown that α = 0.0625 and l = 3 are suitable as default settings for the forecast model. An additional case study could prove that the presented methodology is applicable for LT prediction as the maximum variance between target and actual LT was 0.12% and thus clearly below the value of an alternative MTF forecast model.
The aim of further research is to examine the performance of the approach in the context of a more complex failure class distribution. The ability of the prognosis to cope with a higher number and closely arranged MTF classes is of significant interest. In addition, the required amount of input data should be an object of investigation. Moreover, the extent to which the model correctly recognizes error classes that occur in rare cases must be examined. Finally, the transferability of the method should be investigated. The universal approach should be able to handle any kind of interruption (e.g. shortages of material) in the production.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.