The goal of this evaluation was to gather new, useful knowledge about the assembly line using the proposed data analytics method and identify the best techniques for individual areas. In this study, an exploratory validation approach is used to find the best ML model.
In Fig. 3, different areas of data analytics are described, and an evaluation is presented based on these different areas.
Area A
Experts from the manufacturing company provided a set of the most relevant measurements corresponding to faults. In Phase 1, the objective was to find the correlation coefficients between each of the 42 measurements and STATION. However, this method was found to be time-consuming. The MATLAB command ‘corrplot’ for finding correlations resulted in a 42×42 matrix that was difficult to interpret. Another method of implementing Phase 1 analysis is analysis of variance (ANOVA), where p-values are used to select the most informative measurements [28]. The authors in [28] discarded measurements depending on the p-value. However, this work does not use the ANOVA method because the dataset was not normally distributed in certain cases.
Implementation of Phase 1 analysis could also be accomplished by following the methods used by Andrew and Srinivas [29]. The authors deleted one measurement at a time to find the most important measurements; however, this method is time-consuming. Due to these problems, we did not consider Phase 1 to be a suitable analysis method.
In the next step, we found a different set of relevant measures in Phase 2 (ML algorithms). Two SVM classifiers were created: one with default hyperparameter values and another with optimized hyperparameters. Both classifiers provided the same measurements based on relevance, and the identified relevant measurements found with both SVM classifiers are shown in Table 2. However, a large amount of overlap was observed between the measurements provided by the experts and measurements identified using the ML algorithm SVM. Thus, SVM classification was used to classify the samples into two groups: ‘faulty’ and ‘nonfaulty’. Then, linear coefficients associated with the predictors (measurements) were compared. We have listed the 18 most relevant measurements. A comparison between the list of 18 measurements provided by the manufacturer and those uncovered using SVM showed that the lists agree. After discussion with the experts, it was confirmed that whenever a fault takes place, technicians can check the measurements in Table 2 for possible faults.
Table 2 (Area A) Relevant measurements found with the help of SVM
The classification results using the test dataset and classifier with default hyperparameters and optimized hyperparameters are shown in Table 3. The classifiers are useful based on these measurements. None of the samples were incorrectly classified as faulty or nonfaulty by the classifiers, and both classifiers had 100% accuracy, specificity, and sensitivity. The motivation of creating a hyperparameter-optimized model is to see if there is any change in performance.
Table 3 (Area A) Faulty and non-faulty data classification using SVM on test dataset
Phase 1 analysis is also shown to be unsuitable for Area A. With increments in the number of measurements, the difficulty of implementing Phase 1 increases exponentially. Thus, Phase 2 is best suited for this area, considering implementation time and difficulty.
Area B
Both Phase 1 and Phase 2 analyses were implemented. Three measurements—‘Gear (Pinion) height’, ‘PTU housing measurement’, and ‘Manual adjustment’—were analysed for correlations with the shim dimension. In Phase 1, the correlation coefficients of these measurements with the shim dimension were calculated and are shown in Table 4.
Table 4 (Area B) correlation between shim dimension determining measurements. ‘Measurements’ column indicates measurements that determines shim dimension
As shown in Table 4, ‘PTU housing measurement’ has the highest correlation with the shim dimension, and this result also aligns with the experts’ opinions.
In Phase 2, the relative importance (i.e. linear coefficients of measurements associated with shim dimension) was found by the ML algorithms LR, SVR, and RFR with default hyperparameters and optimized hyperparameters (Table 5). These ML algorithms predicted the shim dimension with the help of regression models.
Table 5 (Area B) The linear coefficients associated with shim dimension determining measurements. Negative values indicate if the measurements change in positive direction than shim dimension will change in negative direction
From the table, it can concluded that if there is any fault in the shim dimension, it is highly probable that ‘PTU housing measurement’ has a problem. A technician can check this measurement for probable adjustment. Both the default and optimized hyperparameter models provided the same result except for the default hyperparameter RFR model. In the case of the default hyperparameter RFR model, ‘gear (pinion) height’ has the highest importance with regard to the shim dimension. However, this result does not align with the results of the remainder of the models. Because hyperparameter-optimized SVR and LR have higher accuracies (Table 8), we considered ‘PTU housing measurement’ as the most important measurement. Additionally, a comparison between the default hyperparameter and optimized hyperparameter models (SVRs) showed that the overall relative importance of the predictors is lower in the optimized hyperparameter model than in the default hyperparameter model. The effect of predictors on the shim dimension is lower in the optimized hyperparameter model than in the default hyperparameter model.
Although both Phase 1 and Phase 2 analyses were implemented in this area, Phase 1 was easier to use than Phase 2. Phase 2 involved the creation of regression models with hyperparameter tuning. Additionally, knowledge of ML is required to implement Phase 2 analysis. The application of ML is not necessary when the target problem can be easily solved with traditional mathematics or statistics. Therefore, for this area, Phase 1 is the most suitable method of analysis.
Area C
The correlation (Phase 1) between different station codes for the ‘PTU housing measurement’ was calculated, and the most highly correlated station codes are shown in Table 6 (i.e. faults with a correlation coefficient higher than 0.80). The remainder of the station codes appeared random because their correlation coefficients were comparatively low and are thus not listed in Table 6.
Table 6 (Area C) Correlation table of station codes, i.e. column ‘88’ indicates correlation coefficient of station code ‘88’ with other stations
In Phase 2 analysis, association rules were mined using the Weka platform, and the results are shown in Table 7. All of the rules have confidence levels higher than 90%. For example, we can interpret the first row as if Station 114 does not have any fault, then there is a 100% chance that Station 140 will not have any fault as a confidence level of 1. A lift value greater than 1 indicates that the rule body and rule head occur together more often than expected. Additionally, if the conviction value is 1, then the rule body and rule head are independent. A conviction value other than 1 indicates a better rule. A high leverage value indicates a higher probability of the rule head and rule body happening together. All of these measures, as shown in Table 7 indicate that the rules are reliable.
Table 7 (Area C) Rule mined for station code
However, the stations that have a high correlation according to Phase 2 do not align with the results of Phase 1. Manual checking of the stations suggests that Phase 2 is more accurate. Statistical analysis only measured the correlation by the number of faults and ignored the relationship when a fault was absent. ML considered the relationships between stations according to both faults and non-faults. Therefore, for this area of analysis, Phase 2 is most suitable.
Area D
To use Phase 1 in Area D, we reviewed 50 peer-reviewed papers published in 2019–2020 and selected certain statistical techniques. For example, we attempted to use spatial statistics [30]; however, this method has basic applications in feature extraction, not prediction. Similarly, Cox proportional hazards regression [31] was used to predict the next occurrence of an event; however, predicting the shim dimension was not possible with this algorithm. The accelerated failure time (AFT) model was also considered. However, this model uses the same method as the Cox proportional hazards regression. Logistic regression was considered as a statistical method in one study [32]; however, logistic regression is a classifier that cannot be used for regression. Thus, we could not find any other statistical techniques that could be implemented in Area D. For this reason, Phase 1 was not implemented in Area D.
In Phase 2, both the LR and SVR (default and optimized hyperparameter) algorithms predicted the shim dimension with an accuracy near 100%. A small deviation was observed in the predicted value from the real value in the case of RFR (both default and optimized hyperparameter) compared to LR and SVR. All eligible hyperparameters were optimized in one of the RFR models; however, the deviation was also the same for that model. Figure 5 shows the parity plot for the shim-dimension prediction using the test dataset and the optimized hyperparameter RFR algorithm. These deviated values were within 10% of the real values.
Table 8 lists the coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE), and mean square error (MSE) values of the regression models (default and optimized hyperparameter). In the hyperparameter-optimized models, the R2, RMSE, MAE, and MSE values were marginally improved compared to the default hyperparameter models. However, for the RFR model, there was no improvement in the hyperparameter-optimized model. In Table 8, a lower RMSE value indicates a better fit, and the observed data points are near the model’s predicted values. Conversely, the R2 values are 1 or near 1, indicating that the models can significantly predict the shim dimension.
Table 8 (Area D) Error rates using regression models on the test dataset
Additionally, the MAE and MSE of the models are near zero, indicating that the models can predict without any error. However, the dataset to which the results are compared is labelled by technicians and thus may be labelled incorrectly. Thus, there may be faults in the model.
Table 9 shows the estimated coefficients of the linear regression model, where ‘Gear (Pinion) height’, ‘PTU housing measurement’, and ‘Manual adjustment’ are the predictors. The term ‘Estimate’ indicates the relative importance (coefficient value) of the predictors in the model. The predictor ‘PTU housing measurement’ is the most important of the three predictors.
Table 9 (Area D) Estimated coefficients of linear regression model
‘SE’ is the standard deviation of the estimate and indicates the standard error of the coefficients, which represents the model’s ability to estimate coefficient values. A lower SE indicates a better estimate. In Table 9, the SE is small, meaning that the model accurately estimated the values of the coefficients.
‘tStat’ is used to determine whether a null hypothesis should be accepted or rejected by measuring the precision of measurement estimates. ‘Null hypothesis’ indicates that there is no relationship between the input and the output. The higher the tStat value, the more significant the estimate is in the regression model. Thus, the null hypothesis can be rejected because tStat is high.
The ‘P-value’ in the linear regression analysis indicates whether the null hypothesis can be rejected. In this study, the null hypothesis can be rejected if the p-value is low. Additionally, there is a high correlation between the input and the output.
In Table 9, all p-values are 0, indicating that predictors are highly correlated with the response.
For Area D, Phase 2 is the most suitable method because Phase 1 could not be implemented.
Area E
The ‘Serial number’ column was checked for duplicate instances of a PTU unit, and a duplicate instance was created if a fault were present. The faulty item was repaired, and the same ‘serial number’ was provided. In Phase 1, analysis was performed on faults with station codes 90 and 110. A total of 3,930 items with station codes 90 and 110 were found to be faulty. Out of these 3930 faulty items, only 360 items with the same ‘serial number’ were repaired. According to discussions with experts in this field, PTUs with faults can be assigned new ‘Serial numbers’, or can be considered scrap.
Phase 2 was not implemented in this area because it is not necessary to use ML to find duplicate instances within a given set of numbers; traditional statistics are sufficient for this purpose. ML is necessary the following cases [33]:
-
A task that is too complex for a human to solve
-
A task requiring large amounts of memory
-
A task requiring adaptivity
Therefore, for Area E, Phase 1 is the best suited method.
Area F
Phase 1 was implemented to find the error distribution. The relationship between faults and measurements follows a Gaussian distribution, except for ‘housing measurement from loading house/measuring house’, which has a large bar at 59. We assume that the data were equivalent to 59 and not due to a programming error. After double-checking, it was confirmed that these data were correct. The error distribution of the ‘PTU housing measurement’ is shown in Fig. 6. At a threshold of 103.58, the error rate is high. Conversely, the error rate decreases below a threshold of 103.68.
Phase 2 was not implemented for the same reason stated in Area E; therefore, Phase 1 is the most suitable method for Area F.