Keywords

1 Milling Tool Assessment in the Machining Industry

In the manufacturing industry the product quality needs to be optimized and production cost minimized in order to compete with other enterprises. While the usage of worn tools decreases product quality the underuse of a tools remaining lifespan results in an increase in production cost [1]. In order to maintain a sufficient product quality only 50%–80% of the mean tool life is generally used [2].

Thus, arises the need for effective assistance systems to determine tool wear in order to reduce production costs by as much as 10%–40% [3].

In medium and small enterprises, the decision whether or not a milling tool can still be used is often made by the machine operators. While they have specific tools such as magnifying glasses or microscopes at their disposal when handling the milling tools, the classification is still subjective and contains an underlying, individual bias, which can lead to different individuals classifying the same milling tool differently.

In order to classify tool wear on a more accurate, deterministic basis an automated assistance system is necessary. This assistance system could, by means of an industrial robot, remove milling tools from a predefined buffer and then feed them to a camera, which takes several images of the milling tool. Using these images, the tool wear could be examined. Following the image-based classification, the milling tools could be sorted into separate, predefined output buffers based on the classification results. In the context of the described assistance system, the required image processing of the collected image data is a key component. Methods of machine learning can be used in order to classify the images, therefore enabling the usage of the aforementioned system.

2 Classification of Tool Wear Using Methods of Machine Learning

Tool wear can be classified using either an indirect on a direct approach. The indirect approach utilizes cutting parameters such as force, vibration, acoustic emission or the measured power of the CNC-machine [4,5,6,7]. Since these parameters can be measured during the milling process, no intervention in the process is necessary to draw conclusions about tool wear [5, 6]. Using statistical methods, the indirect approach determines a correlation between tool wear and the recorded sensor signals as a basis to classify tool wear [7].

In the direct approach, the tool wear is measured by means of optical sensors via the geometric properties of the tool [4, 6, 7]. For the optical measuring of the tool wear it is generally necessary for the milling tool to be removed from the machine [6]. This disadvantage causes machine downtime [7]. The direct measurement of the tool wear offers a higher recognition accuracy under ideal conditions than the indirect approach [4, 6]. Uncertainties may arise from the interpretation of the image data by human operators [4]. The presence of chips or cutting fluids in the image data effects the recognition accuracy as well [4, 6].

Classical methods of computer vision such as the sobel- and canny algorithms as well as the active contour method have been applied in the literature to detect tool wear [4]. Deep learning approaches outperform classical approaches in regard to the classification of images [8]. Additionally, methods of machine learning are more robust in terms of the classification accuracy towards changing light conditions [4].

Methods of machine learning can be used in order to classify tool wear based on both the indirect and the direct approach. Neural networks are the most used method for the indirect classification of tool wear [6]. Machine learning approaches such as neural networks are able to extract knowledge from large amounts of data and map this knowledge in a model, which is then able to apply the learned knowledge to the specific application. For the classification of tool wear, deep learning methods are particularly suitable, since they can detect patterns in the input data independently, which is why external feature detection is not necessary [9]. These methods require a large amount of data, which is not always accessible [6, 10]. This is particularly true for use-cases where expert knowledge is required to label the data. For these specific cases, which includes the classification of tool wear, deep learning approaches such as ensemble learning or transfer learning look promising [6].

Since deep learning methods are able to detect features in the datasets without the use of external feature detection algorithms, they can be used to find correlations in sensor data, which is recorded during milling processes. This data can be processed through the use of deep learning methods such as convolutional neural networks (CNN) [10]. This is done by encoding the time series data as images which can then be processed by CNN [11, 12].

Table 1 shows an overview of the presented literature and their key parameters. The indirect approaches using sensor data in order to classify tool wear reach an accuracy of 86% to 90%. These approaches use sensor data based on the entire lifespan of milling tools in order to classify the wear. The predicted classes range from no wear to steady state wear and finally tool failure. The direct approaches use image data to classify tool wear. The approach proposed by Wu et al. does not classify whether an image depicts a worn or a not worn tool but different kinds of wear phenomena [13]. Bergs et al. use image segmentation instead of image classification to detect tool wear. Therefore, they use the Intersect over Union (IoU) metric instead of accuracy to evaluate their results. Ambadekar et al. classify images of surface quality of workpieces in order to classify the wear of the used cutting tool. Using this approach, they classify the wear state of the tool with an accuracy of 87.26% [9].

Table 1 Comparable approaches for the classification of tool wear

Many state of the art CNN-architectures are trained on publicly available datasets, such as the ImageNet dataset, in order to evaluate their performance. CNN trained on the ImageNet-dataset are observed to be biased towards detection textures instead of object shapes [14]. This property is beneficial for the detection of tool wear, since the detection of tool wear is a texture recognition problem [4]. This should enable network architectures that are good at classifying images on the ImageNet dataset to reliably classify tool wear. CNN-architectures such as VGG [15] or ResNet50 [16] have successfully been used to classify tool wear [9, 13].

3 Approach to Aligning the Classification Accuracy of a Machine Learning Algorithm With Expert Knowledge

3.1 Image Acquisition Device

The images necessary to train the neural network are taken using a Nikon D5600 camera using a Sigma 150 mm camera lens. The Camera is mounted on top of a special fixture which prevents relative movement between the camera and the tool holding fixture for the milling tools. This ensures that the images are taken under identical initial conditions. To prevent image blur when the camera shutter button is pressed, a remote control is used. The tool holding fixture, in which the milling tools are inserted can be rotated around its axis by 360 degrees. The edges of the tool holding fixture allow it to be manually turned in intervals of 45 degrees, so that the entire circumference of the milling tools can be photographed though eight individual images. In order to capture images of the front side of the milling tools the tool holding fixture can be attached at a different angle. The entire image acquisition device is placed inside of a photo box when taking the images in order to ensure constant illumination, as shown in Fig. 1. The photo box contains LEDs which illuminated it with diffuse light. Tool holding fixtures of different diameters can be used for different milling tools.

Fig. 1
figure 1

Image acquisition device within a photo box

3.2 Dataset and Preprocessing

The dataset acquired by using the aforementioned device consists of 328 images of 41 different milling tools. These milling tools were classified by an expert into the categories worn or not worn, using magnifying glasses or microscopes. This classification is taken as the ground truth for the images in the dataset. Therefore, uncertainties in the dataset can be expected. The dataset is split into different subsets for training, validation and test at a ratio of 60:20:20. In order to enable the used method of machine learning to process the image data more efficiently, the images are preprocessed.

Initially the images are cropped, so that the majority of the background, which contains no information of the tool wear, is removed, therefore reducing the size of the image. In order to further increase the datasets, different filters are applied to the individual images, which increases the robustness of the model after training [4]. These filters include the increase of contrast, the increase of illumination, as well as the use of a sharpening and softening filter. Image augmentation techniques such as translation and rotation are not used, since the position of the milling tools relative to the camera is fixed. Therefore, these augmentation techniques offer no benefit. By applying these filters to the images, the dataset is increased to 1640 images. Fig. 2 shows four images of the same milling tool with the different filters.

Fig. 2
figure 2

Images of same milling tool using different filter. From left to right: contrast, illumination, sharpening, softening

3.3 Convolutional Neural Network Implementation

State of the art CNN-architectures such as VGG and ResNet50 are capably of classifying tool wear since the recognition of tool wear is a texture recognition problem instead of an object detection problem as described in the previous chapter. In order to classify the wear on milling tools based on the dataset, several training runs are conducted using the VGG [15], ResNet50 [16] and EfficientNet_b0 [17] architectures. The EfficientNet scores a better result than the VGG and ResNet50 when trained on the ImageNet dataset, while utilizing less parameters and training quicker. Since a large number of parameters is one factor that attributes to overfitting, which was observed in previous papers when classifying tool wear using VGG and ResNet50, the EfficientNet is employed as well.

Transfer learning is one possibility to reduce the effect of overfitting, especially for small datasets. In order to evaluate the classification results of the different architectures and the influence of the use of transfer learning based on the ImageNet dataset, the VGG-16, VGG19, ResNet50 and EfficientNet_b0 model are trained with and without the usage of pretrained weights based on the ImageNet dataset. The base models are extended by the following layers, in order to fine tune the model to be able to detect tool wear. After the base model global average pooling is used. Following the global average pooling a fully connected layer with 128 neurons is added. This fully connected layer (FC-Layer) uses the ReLu activation function. The last layer is another fully connected layer with two neurons, representing the two possibly classification results. This layer uses the SoftMax activation function. The architecture is depicted in Fig. 3.

Fig. 3
figure 3

Architecture of the CNN

For the training of the model a NVIDIA RTX 2060 graphics card is used. The code was implemented in python, using the TensorFlow 2.6 framework [18].

For training the models the adam optimizer is used. The loss function is categorical cross entropy. The images used for the training are passed to the models in a resolution of 405 × 150 pixel in batches of eight images. The images are downscaled in order to reduce the number of parameters and increase training speed. The models are trained for 50 epochs. Since the accuracy of the models does not increase after a certain number of epochs, no further training runs above 50 epochs are conducted.

4 Results

The results of the training runs are shown in Table 2. Transfer learning significantly improves the accuracy and decreases the loss of every model. The models perform in accordance to their performance on the ImageNet dataset. The VGG-16 model scores an accuracy of 50% without the use of transfer learning and 67.65% with the use of transfer learning. The VGG-19 model scores an accuracy of 50% without the use of transfer learning and 72.06% with the use of transfer learning. The second-best model is the ResNet50, which scores 76.76% without the use of transfer learning and 89.71 % with the use of transfer learning.

Table 2 Results of the training runs

The best overall accuracy is achieved by the EfficientNet_b0 model with transfer learning, which scores an accuracy of 91.47%. This model also achieves a significantly lower loss on the test dataset.

The course of the accuracy and error of the EfficientNet_b0 model using no transfer learning are shown in Fig. 4. The accuracy on the training dataset shows a linear increase in the first 20 epochs before rising significantly and converging to one. The loss on the training dataset shows a drop at the first epoch and remains almost constant for another twenty epochs. After twenty epochs the loss decreases in a volatile manner to values between 0.2 and zero. The accuracy on the validation dataset barely increases at all. The loss on the validation dataset is highly volatile and does not decrease below a value of 0.7.

Fig. 4
figure 4

Accuracy and loss of the EfficientNet_b0 model without transfer learning

Figure 5 shows the course of the accuracy and error of the EfficientNet_b0 model using transfer learning. Similarly, to the model without transfer learning the accuracy and error of the model on the training dataset converge to one and zero respectively.

Fig. 5
figure 5

Accuracy and loss of the EfficientNet_b0 model with transfer learning

The significant difference between the models is that the convergence is achieved at a substantially faster rate. The validation accuracy increases significantly in the first few epochs and converges around 0.9. The error on the validation dataset decreases significantly in the first few epochs and up to a value of 0.2. While the course of the validation error on the model with transfer learning is still volatile, it is significantly steadier than the validation error of the model that does not make use of transfer learning.

The confusion matrix of model eight is shown in Fig. 6. The correct classifications on the test dataset are shown on the main diagonal of the matrix. 46.7% of the tools that show no tool wear were classified correctly. Tool wear is correctly classified in 44.7% of cases. The model misclassifies 5.29% of the samples were tool wear is present and 3.26% of the samples were no tool wear is present. This results in 91.4% accuracy, which is higher than most accuracies that can be found in the literature.

Fig. 6
figure 6

Confusion matrix of the EfficientNet_b0 with transfer learning

The dataset that is used to train, validate and test the models consists images of milling tools which were classified into the categories wear or no wear by a human expert, as described in Sect. 3.2. Therefore, the labels in the data set can be expected to contain an individual bias, resulting in some uncertainty in the classification of tool wear by the models. Thus, it is unlikely that 100% accuracy can be achieved. Taking this background into account, the classification accuracy achieved by EfficientNet_b0 is all the more remarkable. The results show that it is possible to reproduce human expert knowledge using CNN without having to perform metrological evaluations for the annotation of the data. In comparison to the existing literature, it was not investigated whether or which wear can be detected, but whether the tool would be classified as worn or not yet worn by a human expert. In contrast to the classification of different wear features, this type of classification is particularly challenging, as the number and extent of the wear features in the image have a non-trivial influence on the wear condition of the depicted tool. Contrary to the approaches in the literature, all potential wear features have to be considered at the same time.

In order to evaluate how well the model is able to match the knowledge of machine operators regarding the classification of milling tools, ten different milling tools were classified by 14 different machine operators using magnifying glasses to help them with the classification. These milling tools are a subset of the tools used for the creation of the data set and were classified by the same expert for their wear condition. The results are shown in the right matrix of Fig. 6. The average accuracy of the 14 humans when classifying these ten milling tools is 68.6% which is significantly lower than the model accuracy on the test dataset. Thus, it can be concluded that the CNN is able to match human expertise very well.

Therefore, the usage of an assistance system which classifies the tool wear on milling tools based on images can be an effective tool for humans in making these decisions while handling the milling tools, thus reducing production costs and conserving resources.

5 Conclusion and Outlook

Dealing with tool wear is a challenge faced by every company in the machining industry. The decision whether or not to a tool can still be used is often made by human machine operators, specifically in small and medium enterprises. Tool condition monitoring systems can help to provide objective decision making and therefore help to reduce the costs. The usage of image processing via machine learning serves as an enabler towards the development of such an assistance system, that stores, handles and classifies milling tools by the state of their tool wear.

In recent years methods of deep learning have proven to be able to detect tool wear based on indirect and direct approaches. For the direct approach, which classifies the tool wear using images of the tools, state of the art CNN-architectures that perform well on the ImageNet dataset, such as VGG and ResNet50 have proven to be able to detect tool wear. The VGG-16, VGG-19 ResNet50 and EfficientNet_b0 model were trained based on the created dataset, which consists of 1640 images based on 41 different milling tools that were classified as worn or not worn by experts. The usage of the weights based on the ImageNet dataset significantly boosted the performance of every model. The EfficientNet_b0 model with the use of the ImageNet weights performed best with an accuracy of 91.47%. The model outperforms human machine operators in classifying the wear on milling tools by 22.87%.

An image-based assistance system that helps machine operators in classifying the wear on milling tools could decrease production costs, since a larger proportion of the possible tool life expectancy could be used. Furthermore, the usage of worn tools would become less likely by using such an assistance system, which leads to an increase in product quality.

To achieve further improvements in detection performance, the dataset should first be enlarged. Furthermore, the dataset contains an inherent bias, since the data was labeled by a human expert. This bias could be removed by classifying the samples based on measured wear phenomena in their geometry.

A comparable approach could be used to assess the wear of turning tools or in optical quality assurance. The approach is particularly suitable in areas where there are no clearly defined boundaries between the classification results, which means that classic, analytical approaches cannot be used.