Introduction

Metal cutting is a significant production operation that consumes a huge amount of energy during the machining of varied materials (Demirsöz & Boy, 2022). Because of their greater level of mechanical properties, Ni-based alloys present machinability issues during the cutting process (Yurtkuran, 2021). The concern about tool failure, tool life, and machinability has sparked a great deal of scientific interest, the majority of which has been dedicated towards conventional machining processes (Waydande et al., 2016).

Traditionally, several oil-based (mineral) coolants were utilized to curtail the cutter wear. Under traditional cooling, these cutting fluids provided longer tool life, well-controlled tolerances, and flatness (Çakır Şencan et al., 2021). Dry cutting, on the other hand, creates high temperatures at the tool-material contact, resulting in fast Tool wear (TW). At elevated heat, tools are specifically sensitive to adhesion, resulting in poor job finish and tool failure (Pekşen & Kalyon, 2021). Due to their poor processing parameters, several researchers are interested in the sustainable machining of superalloys. The use of MQL is a viable technique for providing the lubricating effect at interface (Gupta et al., 2023) and also helps in achieving the outstanding surface smoothness. Race et al., (2021) investigated that prior to cutting tests, Tool wear efficacy while utilizing Minimum quantity lubrication (MQL) was checked using typical tribological high-frequency reciprocating testing. When machining using dry and MQL coolants against standard flood coolant, perfection in surface condition and cutter wear were recorded with MQL. Measured test findings demonstrate that MQL has a substantial impact on the surface; it chiefs to a significant lessening in tool wear and increases the machining precision of the machined product (Duc et al., 2021).

Cryogenic (cryo) gas, such as nitrogen and carbon dioxide (CO2), are used as the coolant in cryogenic machining, a new emerging technology (Krolczyk et al., 2019). Cryo CO2 has also been favored because it leaves no hazardous residue. In addition, cryo CO2 has an excellent chilling capability that can lengthen tool life when working with refractory materials (Kim et al. 2016). When machining Ni alloy with CO2, abrasion and attrition mechanisms are decreased, which results in significantly much less flank wear than with traditional flooding (Khan & Ahmed, 2008). As a result, appropriate control of these factors is critical. In a production environment, data-driven intelligent manufacturing can be used for the cutting process. Optimization of cutting parameters used varied techniques such as Grey and Topsis analysis (Gok, 2015), Taguchi (Gok et al., 2013; Harun YAKA, Halil DEMİR 2017), and so on were used by researchers to find the optimal cutting variables but can be used for limited data.

Monitoring the machining processes gives you intuition through expressive, analytical, or prophetic analytics and helps you make smart decisions (Balazinski et al., 2002). The impact of in-process-created flaws is undetectable by numerical and analytical models available in the literature. Real-time data is utilized in realistic models to capture the process variables (Jemielniak, 2019). It is categorized into online/indirect and offline/direct categories. The Direct approach includes vision-based methods that use computer vision techniques, scanning electron microscopy, and so on. This is often used to investigate defects of an unpredictable defects. The indirect technique, on the other hand, entails the collecting of information via a variety of sensors and transducers, such as cutting forces, sound emissions, vibration, etc. (Abhishek Dhananjay Patange and Jegadeeshwaran 2021). Tool fault monitoring (TFM) is crucial in machining monitoring and decision-making (Benkedjouh et al., 2015; Madhusudana et al., 2017), which includes TW detection and forecasting. The flank wear (Vb) that a cutting tool experience is often the most noticeable form of degradation that it undergoes in contrast to adhesion and abrasion.

Direct digital image processing-based techniques have been extensively employed in prior research to track tool faults and breakage due to their reliability and low cost. Because of the geometry of cutting, the unpredictability of the wear nature, and the lack of knowledge about how wear can alter the measured signals, indirect approaches are exceedingly difficult to design and implement. Additionally, there are some limitations to the use of these approaches, and the cost of the sensors are still very expensive (Sortino, 2003). Modern manufacturing and process monitoring systems have undergone a full transformation, thanks to machine learning (ML). Artificial neural network (ANN) (Ross et al., 2022), hidden Markov model (HMM) (Li & Liu, 2019), support vector machine (SVM) (Lu et al., 2013), and other techniques were specifically used in feature identification of TW monitoring and prediction. A method for using machine vision during cutting to predict the escalating tool flank side wear was presented by (Dutta et al., 2016). They developed a technique to extract information on feed marks and waviness from the machined surface using distinct approaches. The decision-making approach of SVM has been applied to accurately describe the tool state. Li & An (2016) established a novel micro-vision system for TW monitoring, which is a crucial facet of intelligent manufacturing. To reach each section of TW, an adaptive version of the Markov Random Field (MRF) technique was designed. According to the findings, automatic focusing and segmentation of the TW area by region are likely to improve precision and resilience, in addition to enabling the collecting of TW images in real-time. Although monitoring and predicting tool wear had seen significant progress, the methods employed above for doing so had major flaws. In order to monitor and predict TW using typical ML techniques, features must first be extracted. The particular extraction of features and method selection had a significant impact on how well various ML approaches performed. Deep belief networks (DBN),Convolutional neural networks (CNN), and other deep learning (DL) models have been developed in the last ten years as solutions to these issues. DL could address the aforementioned problems, as it related to a class of ML approaches in which several layers of data processing steps in hierarchical architectures were used for pattern categorization and prediction (Wang et al., 2021). In order to forecast surface unevenness and precise energy usage during 5-axis milling, Serin et al., (2017) used DMLP neural networks. To identify TW conditions Ou et al., (2021) projected an online sequential learning by means of a stacked denoising autoencoder to take out abstract characteristics. Cao et al., (2019) created a 2D CNN for TW monitoring. The input (i/p) parameters of the CNN comprised of a high signal-to-noise ratio for vibration signals. For the TW estimate, Aghazadeh et al., (2018) utilized a CNN with a mixed feature extraction strategy to estimate the volume of TW. This method employed wavelet time-frequency transformation and spectrum subtraction methods. The raw i/p data were converted into a CNN model by Martínez-Arellano et al., (2019) who developed the model using time series photography. An LSTM network was developed by Sun et al., (2020) to forecast several flank wear metrics based on raw data. Bidirectional LSTM networks were used by Zhao et al., (2017) to monitor the fault in milling tools after machining.

However, to avoid overfitting and achieve higher prediction accuracy, a huge portion of annotated data is needed to train the DL model. Unfortunately, obtaining adequate tagged data is high-priced and time-consuming, and owing to the difficulty of real engineering situations, even for the wear forecasting model with the similar tool under diverse working conditions is not universal. As a result, the transfer learning (TL) method has find its importance in recent years as a hotbed of research where the issue of target prediction in the presence of inadequate data sets is explored. To encourage learning in the established domain (i.e., source domain), many labelled samples might be used (i.e., target domain) (Wang et al., 2022). Learning a discriminant model to lessen the disparity in distribution amid two domains is essential for TL. By reweighting the source domain samples, the conventional TL method merges the samples from the source and the target domain into a single feature space (Naveen Venkatesh et al., 2022).

The TL uses TW pictures taken after machining to recognize and classify TW in the absence of experimental observations. The evaluation of the TW classification using deep learning algorithms has not received considerable attention in past findings, according to a thorough examination of the related literature. As far as we are aware, TL has not yet been applied to measuring TW while milling with distinct cooling environments. The study included 24 trials to explore the effects of i/p process parameters for reducing flank wear (Vb). The authors suggested augmentation to get over the problem of Vb pictures not being easily available in industrial applications and to build a strong ML model that harvests high forecasting accuracy. Comparative and in-depth analyses have been accomplished with five TL models and four classifiers to assess the efficacy of the proposed methodology (class A-D). To identify Vb pictures when Nimonic 80 A is machined, the results from all models may be relied upon to be accurate enough. According to the authors, the method established will be very beneficial for manufacturing applications. The objective of this work is to speed up the automation and estimation in various production systems when there is a lack of data.

Materials and methods

Experimental setup

To analyse the behaviour of the tool wear, experiments were performed using Nimonic 80 A of dimensions of 100 mm x 100 mm x 10 mm and distinct coolants (Dry, Flood, MQL, and cryo). The YCM-EV1020A milling machine was utilized for doing the experiments, which has a top speed of 8100 rpm. For the investigation, PVD-TiAlN-coated inserts with a4 micron coating thickness was used. The TiAlN layer tries to deflect the temperature away from the tool and the workpiece, sending it back into the chip where it originated. Because it has greater ductility, it is an excellent option for interrupted cuts. The key advantages are increase in production levels achieved at higher feed-speed combinations as well as the prolongation of tool life in situations involving high heat. The coated insert was attached to a TaeguTec tool holder type BAP 300R C12-12-130-1T. For all the trials under cryo and MQL condition, a 45º nozzle angle and 30 mm nozzle distance was maintained. The experimental arrangement for the study is represented in Fig. 1. The machining parameters were chosen in accordance with the manufacturer’s specifications and results from earlier investigations, as shown in Table 1. Figure 2 presents the proposed technique of the current investigation in the form of a flowchart.

Fig. 1
figure 1

Experimental methodology

Table 1 Machining parameters adopted in current work
Fig. 2
figure 2

Workflow of the process

Measurement of tool wear values

After every trial, the insert was removed to evaluate the flank side wear. To assess the level of cutter wear, a video measurement device has been employed (make: Hexagon). The measured Vb pictures were presented in Fig. 3. In this investigation, the value of the mean roughness was utilised because its acceptance in industrial settings is above 50%. The SE-3500 roughness tester was employed to quantify the roughness of the machined surface under distinct environmental conditions. The instrument was calibrated with slip gauges before each test to ensure that the results were accurate.

Fig. 3
figure 3

Flank wear pictures under diverse speed-feed combo and cutting environments

Deep learning models

Deep learning is a subfield of ML that gives computers the capacity to understand in terms of a hierarchy of concepts and learn from experience. DL techniques are quickly developing; some of them have progressed to become specialised in specific fields. The most extensively used DL techniques include LSTM, CNN, and Recurrent neural networks (RNN), etc. One of the most well-known DL methods is CNN. This is used for image processing, it has convolutional, pooling, and fully connected layers. CNN training has two stages: feed-forward and back-propagation. The most common CNN architectures are GoogLeNet (Ashraf et al., 2020), VGGNet (Muhammad et al., 2018), AlexNet (Mahdianpari et al., 2018) and ResNet (Wei et al., 2022) and MobileNet (Howard et al., 2017). There have been many works reported regarding tool condition monitoring with various DL models, as is evident from the literature review (Table 2). However, there is no work encountered on the machinability of Nimonic 80 A under varied environmental conditions (dry, flood, MQL CO2) using different algorithms, which represents an important gap in the field of sustainable manufacturing.

Table 2 Pioneered work in the area of DL models

Transfer learning

Typical deep learning models have proved quite effective and have also been extensively found in various practical applications (Zhang et al., 2020), but they still have some limits for specific real-world settings. When there are numerous labeled training examples that match the pattern of the testing data, DL performs best. However, it can be prohibitively expensive, time-consuming, or otherwise infeasible to collect adequate training data in many scenarios. Knowledge transfer between domains is a viable DL strategy for dealing with the aforementioned problem. In TL, the model will not be trained from scratch instead it will utilize the features from the previously trained model. Pre-trained CNN models are typically trained on vast datasets that serve as a common standard in the field of computer vision. Weights derived from the models can be applied to other computer vision applications. With the ImagNet dataset, CNN models like AlexNet, VGGNet, ResNet, MobileNet, and Inception were trained before they were used. The ImageNet is a large database with more than 20,000 categories and 14 million images. So, these models that have already been trained can also be used to train a new range of data with the knowledge of how to classify. Improving model generalisation is a difficult ML task. Fewer data leads to overfitting and poor generalizability. Employing TL prevents overfitting. Due to restricted datasets, AlexNet, VGG 16, ResNet, MobileNet, and Inception-V3 models were used to train the machining datasets. After training and testing, a superior prediction model was picked.

AlexNet

Figure 4 demonstrates that AlexNet is comprised of a total of eight layers, with five convolutional layers and three fully-connected layers. It operates on the same fundamentals as CNN, but it covers a far more extensive network. By employing ReLU as the activation function of CNN, AlexNet can circumvent the issue of the gradient gradually disappearing as the network becomes more complex. Dropout is a method that AlexNet employs during training to avoid overfitting by randomly ignoring some of the network’s neurons. This dropout method is utilized almost exclusively in the most recent few fully-connected layers. A stochastic gradient descent optimization function is implemented within the model.

Fig. 4
figure 4

AlexNet Architecture

VGGNet

In the course of the competition, the VGG-VD group presented a total of six deep CNNs; however, only two of these CNNs, namely VGG16 and VGG19, were able to achieve the desired results. The VGG-16 and VGG-19 have 13 and 16 convolutional layers, and 3 fully connected layers each, respectively. Both of these versions have three completely connected layers. Both of these networks make use of a stack of small convolutional filters with dimensions of 3 × 3 and a stride of 1, which is then followed by many non-linearity layers, as shown in Fig. 5. This contributes to learn features that are more complex while also increasing the depth of the network. The remarkable outcomes of the VGG experiment demonstrated that the extent of the network is a critical component in achieving a high level of classification accuracy.

Fig. 5
figure 5

VGGNet Architecture

ResNet

Learning is rendered unimportant at the initial stages of the backpropagation step due to the deep neural network’s high training error and the declining gradient. These are the primary challenges presented by the deep neural network. The ResNet architecture overcomes the problem of vanishing gradients by employing additive identity transformations and a deep residual module, as shown in Fig. 6. Each stacked layer fits a residual mapping rather than a desired underlying mapping since the residual module uses a direct link amid the i/p and o/p. This is the case because of the nature of the residual mapping. The optimization process on the residual map is visibly a lot simpler when compared to the unreferenced version of the original map.

Fig. 6
figure 6

ResNet Architecture

MobileNet

MobileNet v1 makes use of depth-wise separable convolutions, which breaks down into a depth-wise and a pointwise convolution (1 × 1 convolution), as seen in Fig. 7. To be more specific, the conventional convolution process involves applying each kernel to all of the input channels. In contrast to this, depth-wise convolution applies each kernel to only one channel of the i/p data and then uses 1 × 1 convolution to combine the results of the depth-wise convolution.

Fig. 7
figure 7

MobileNet Architecture

Inception-V3

The Inception-V3 architecture (Fig. 8) and TL have both been applied in this work. Due to the remarkable performance that these network structures have on a range of tiny datasets, studies have become interested in network structures that are centered on Inception-V3 and integrate with TL. Dong et al., (2020) successfully classified five representative snakes with excellent precision, and the categorization of the German Traffic Sign Recognition Standard was adopted by Lin et al., (2019). Xia et al., (2017) obtained accurate results for the classification of florals from the Oxford-102 and Oxford-i7 floral datasets.

Fig. 8
figure 8

Architecture of Inception-V3 in Transfer Learning

Comparison of inception-V3 with other models

The technique suggested in this research uses Inception-V3 model that was previously trained on ImageNet as a foundation dataset and is now being used to learn (or transfer) features to be trained on a new dataset. Compared to alternative architectures like AlexNet, ResNet, VGG, and MobileNet, Inception based Networks like Inception-V3 has been demonstrated to be further computationally intensive, both in terms of the number of features generated by the network and the financial cost sustained.

The Inception-V3 levels are depicted in Fig. 8. As presented in the architecture, the Inception-V3 model has three distinct forms of modules. Convolutional and pooling layers run parallel in each Inception module. To lessen the number of learning parameters, the Inception modules use brief convolutional layers like 3 × 3, 1 × 3, 3 × 1, and 1 × 1 layers. While the picture size in the dataset was 224 × 224, Inception-V3 i/p size is 299 × 299 pixels. When developing and testing Inception-V3, the photos have not been resized to 299 × 299 pixels. Due to the fact that this only altered the dimensions of the feature maps created by the technique and not the number of channels, the outcome was sufficient. The feature map ended up with 55 dimensions and 2,048 channels following the application of the Inception modules and convolutional layers. Then, at the end of the Inception modules, three entirely linked layers are added to exploit the pre-trained model and change the parameters for our specific purpose. In the last step, a softmax layer was added as a classifier that produces probabilities for each class. The projected class was chosen based on whatever class had the highest likelihood. The original Inception-V3 network created 1,000 classes, however, it has been limited to four classes now: Dry, Flood, MQL, and Cryo. Therefore, the last layer’s output channel count was reduced from 1,000 to 4. Dropout with a 50% rate was employed throughout the training phase. A common method to prevent over-fitting is a dropout, which randomly discards some i/p’s to a layer. New machining pictures have been used to improve the TensorFlow pre-trained model. It is contained in the TensorFlow-Slim image classification package and was trained using the ImageNet dataset. Since ImageNet has over 14,000,000 pictures, the parameters have been initialized from the pretrained model. With the training pictures utilized in the current investigation, it performs better as a result.

Results and discussions

Machining parameters on flank wear

Flank wear is created by friction amid the tool’s flank side and the milled workpiece surface, resulting in the loss of the sharp end. As a result, Vb has an impact on tool geometry and surface qualities (Maruda et al., 2018). In practice, Vb is commonly utilized as the TW criteria. For dry cutting, the wear varies from, 0.215–0.261 mm, for flood condition the wear varies from 0.158-0.0.185 mm, for MQL, Vb varies from 0.121 to 0.145 mm and for cryo, it was 0.056–0.074 mm. At a Vc of 75 m/min, and a f of 0.08 mm/rev, the Vb formed were, 0.261 mm, 0.185 mm, 0.145 mm, and 0.074 mm for dry, flood, MQL and cryo, respectively. Cryo condition decreases the Vb by 71% over dry cutting, 60% over flood cooling and 48% over the MQL approach. Cryo cooling at the cutting region reduces the Vb drastically by lessening the cutting temperature (Ramoni et al., 2021). An increase in Vb was displayed as a reason for the increase in feed (Ciftci et al., 2004). It is very clear from previous studies, a rise in feed increases the wear at the flank side (Çakıroğlu, 2021). This is because, the cutter advances in a faster way, which is the reason. Cutting speed during the metal cutting is proportional to feed.

To check the linearity of the data in this investigation and to separate the classes, few well-known ML regression approaches like Ridge regression (Ye et al., 2014), Random Forest (Methkal et al., 2022) and J48 (Madhusudana et al., 2018) prediction models were used. ML is frequently used in a variety of disciplines to resolve challenging issues that are not easily addressed using conventional methods (Bustillo et al., 2018). The actual Vb and the prediction values of distinct ML algorithms were presented in Fig. 9. The Ridge regression produced R2 of 97.10%, Random Forest with 98.05% (R2) and decision tree with 99.3% (R2). From the approaches employed, it is clear that the data are linear and close with one another (actual and predicted). Based on the results, the classes were segmented according to various environmental strategies.

Dry condition-class A

Class A was defined as the range of wear between 0.2 and 0.3 mm. When Nimonic 80 A was machined without any coolant, the produced heat amid too-work material contact directly attacks the flank side and increases the wear. In this, class blunt edge was found. Milling is an intermittent cutting, the time taken to complete a slot was around 21–23 s with the distinct speed-feed combinations.

Flood condition-class B

Flood cooling environment curtails the wear on the flank side somehow by providing cooling and lubrication (C/L). The range of class B was between 0.15 and 0.2 mm. In this class, the TW is reduced and it increases the life of the tool. With a specific combination of speed and feed, the amount of time needed to finish a slot was approximately 17–19 s.

Fig. 9
figure 9

Actual and prediction with distinct ML techniques

MQL condition-class C

MQL environment provides C/L to the cutting area which helps to reduce the amount of wear that occurs on the flank area. Class C had a range that was anywhere between 0.1 and 0.15 mm. As a result of participating in this class, there is less TW to flood and dry-cutting condition. The amount of time necessary to complete a slot was around 15–16 s.

CO2 condition-class D

The cooling effect was provided to the cutting region by the cryo CO2 cutting strategy that helps to limit the wear on the flank region related to all other cutting strategies. The wear range for Class C is from 0.05 to 0.1 mm. From Fig. 9, it is very clear that the wear was very much reduced in comparison to the cutting strategies used in this study. The period of time that was required to finish the cut was approximately 13–15 s.

Machining parameters on surface morphology

Machining was carried out at a depth of 1 mm; after machining with distinct variables, a small portion was cut and taken out with a help of wire EDM from the machined specimen. This was done to examine the machined surface. VMD was used to take pictures after milling. To check the Vb against surface quality, the highest speed-combo was taken from this investigation i.e., Vc = 75 m/min and f = 0.08 mm/rev. The milling experiments were accomplished with distinct environmental conditions. Figure 10 presented the 2D and 3D surface profiles of milled surfaces under distinct environmental conditions. Under a dry-cutting environment, a rough surface was produced as a reason of the high TW (Nimel Sworna Ross and Ganesh 2019), which was discussed in Sect. 3.1. High peaks and valleys were seen under a dry-cutting environment. The Vb generated was in the range of 0.2–0.3 mm. The 2D roughness profile also represents the variations. The profile revealed that the machined surfaces have surface flaws like micro gaps or scratches. Flood condition has a positive impact on flank wear. It reduces the Vb to some extent and decreases the height of peaks and valleys in a dry-cutting environment. The roughness profile (2D) displays low height variation as shown in the picture under the flood-cutting strategy. The reason behind this was the lubrication behaviour of water-soluble mineral-based coolant. Good lubrication and cooling effect under MQL condition, decline the friction and diminishes the wear on the flank side (Maruda et al., 2016, 2021). The reduced Vb increases the surface quality, and the range of wear for this cutting strategy was 0.1–0.15 mm. The 2D profile decreased drastically as a reason for the formation of a thin layer on the contact area under the MQL cutting strategy. Low peaks and valleys were seen under MQL condition, it is drastically reduced when compared to flood and dry-cutting strategies. Then comes the cryo condition, which produces better surface traits by lowering the peaks and valleys. The 2D profiles show less deviation in relation to dry, flood and MQL cutting strategies. The minus degree CO2 gas was the reason for reduced Vb which in turn created a good surface finish. The maximum Ra produced was 1.95 μm, 1.57 μm, 1.19 μm and 0.75 μm, under dry, flood, MQL and cryo cutting conditions, respectively. Similarly, the Sa values produced were 1.02 μm, 0.91 μm 0.78 μm and 0.64 μm.

Fig. 10
figure 10

Surface profiles at a Vc of 75 m/min and f of 0.08 mm/rev under (a) dry, (b) flood (c) MQL and (d) cryo

Relationship of flank wear and surface morphology

The plots of Ra as a function of the Vb were shown in Fig. 11 (a-d). The graphs demonstrate that no consistent and stable roughness value was established during machining; yet, as the Vb advanced, an increase in roughness value was seen. These numbers also suggest that the quality of the surface was significantly impacted by worn land. Under all cutting situations, the same was evident.

Fig. 11
figure 11

(a-d). Effect of flank wear on surface roughness with (a) dry, (b) flood (c) MQL and (d) cryo

This work aims to predict the surface quality with the help of Vb. With the help of distinct speed-feed combo and cutting environments, the examinations were tried. The 2D roughness values were considered because it is widely accepted. The tests were repeated three times and each time the values were taken in three distinct places. The Ra ranges from 1.6 to 2.0 μm for the dry cutting environment; for the flood condition, the range was 1.2–1.6 μm; for MQL and cryo cutting environments, the ranges were 0.8–1.2 μm and 0.6–0.8 μm, respectively. The Vb condition has been categorized into four classes (dry, flood, MQL and cryo). The most important problem with utilizing CNN-based DL applications is the insufficient dataset in size. The data augmentation technology can be efficiently increasing the number of datasets (Molitor et al., 2022), and some traditional image data augmentation methods, such as flip, rotation, shearing, contrast, etc. Data augmentation was employed to solve the problem which has insufficient data. The augmented pictures of the current investigation are presented in Fig. 12.

Fig. 12
figure 12

Generated pictures with augmentation

Classification of tool wear conditions

In this study, 24 trials have been carried out, and an initial set of pictures were collected of four different conditions such as dry, flood, MQL, and cryo during machining. Hence, it is to design a machine learning model to classify a given i/p picture into one out of four classes. Thus, the inception-V3 model has been adopted and modified to classify the given i/p pictures into four different classes, namely, dry, flood, MQL, and cryo. To design an efficient model with generalizability, an adequate number of pictures is required, and the pictures collected during experimentation are insufficient. Thus, to improve the number of pictures for training, augmentation has been performed on the experimental pictures and yielded a total of 3836 pictures which is needed for feeding into the machine learning models. Further, it has been divided into a 75:25 ratio, where 75% of the data has been utilized for training and the rest for validation and testing. Thus, a total of 2868 pictures were utilized for training, 775 pictures for validation and 193 were used for testing.

Various DL models, such as AlexNet, VGG-16, ResNet, MobileNet, and Inception-V3, were investigated. The performance of a machine learning model can be measured using distinct metrics like Accuracy, Recall, Precision, and F1-Score using a confusion matrix. This helps to calculate the number of incorrect and correct predictions by each class. From the confusion matrix, measures like True Positives (\({t}_{pos})\), True Negatives( \({t}_{neg}\)), False Positives\({ (t}_{neg})\) and False Negatives( \({f}_{neg}\)) are calculated. Thereby Accuracy, Recall, Precision, and F1-Score are evaluated using Eqs. 14 respectively.

$$Accuracy=\frac{{t}_{pos}+{t}_{neg}}{{t}_{pos}+{t}_{neg}+{f}_{pos}+{f}_{neg}}$$
(1)
$$precision=\frac{{t}_{pos}}{{t}_{pos}+{f}_{pos}}$$
(2)
$$Recall=\frac{{t}_{pos}}{{t}_{pos}+{f}_{neg}}$$
(3)
$$F1 Score=2\times \frac{Precision\times Recall}{Precision+Recall}$$
(4)

The recall is a metric for how many actual, pertinent results were ultimately returned. When the cost of false negatives is significant, a model’s recall becomes extremely important. The recall is also known as sensitivity. In general, the harmonic mean of precision and recall is the F1 Score. A model’s high F1 Score indicates that there are fewer false positives and false negatives.

Comparison of prediction results with AlexNet, VGG-16,ResNet, MobileNet, and Inception-V3

In this section, the evaluation parameters, such as accuracy, recall, Precision, and F1-Score for classifying TW on different conditions during machining are analyzed. The augmented data pictures were given for training to different models like AlexNet, VGG-16, ResNet, MobileNet, and Inception-V3 and the predicted results were taken for analysis. Figures 13 and 14 presented the accuracy and loss concerning epochs. A summary of prediction results with selected architectures are represented in the form of confusion matrix in Fig. 15. The confusion matrix depicts the various instances in which the classifier is perplexed when making predictions (Kothuru et al., 2019). The values in the diagonal are correctly predicted values with different conditions. The effectiveness of the proposed approach with the Inception-V3 model is compared with the other methods such as AlexNet, VGG-16, ResNet, and MobileNet.

The experimental outcomes of different measures are given in Fig. 16. In this, the experimental outcomes of the proposed Inception-V3 method are compared with the other experimental models such as AlexNet, VGG-16, ResNet, MobileNet for various conditions such as Dry, Flood, MQL, and Cryo. Figure 16 (a) shows the accuracy of all the experimental models with different machining conditions. In the examination, the accuracy of the proposed Inception-V3 method achieves better performance in all four conditions with an overall accuracy of 99.4%. The precision values of different models are shown in Fig. 16 (b), and the Inception-V3 model has the highest precision value of 99% when compared to all other models. Figure 16 (c) shows the recall values of all the models under distinct environmental conditions and shows that the Inception-V3 performs well among all the other models with 99.8%. Figure 16 (d) shows the comparison results of F1-Score, the proposed Inception-V3 outperforms with 98.9%. Thus, the overall performance of the model is higher with the Inception-V3 model.

Fig. 13
figure 13

Accuracy of training and validation with distinct models (a) AlexNet, (b) VGG-16, (c) ResNet, (d) MobileNet, and (e) Inception-V3

Fig. 14
figure 14

Loss of training and validation with distinct models (a) AlexNet, (b) VGG-16, (c) ResNet, (d) MobileNet, and (e) Inception-V3

Fig. 15
figure 15

Prediction Results as Confusion Matrix for distinct models

Fig. 16
figure 16

Assessment of Results (a) Accuracy, (b) Precision, (c) Recall and (d) F1-Score

Conclusion

Tool wear is a constant focus of research in industrial applications. Furthermore, predicting the flank wear of the tool over diverse environmental conditions with a limited number of pictures is a difficult task. In this paper, Inception-V3 for TL has been adopted as it performs well when compared with other models such as AlexNet, VGG-16, ResNet, and MobileNet. A total of 3836 images were taken into account, of which 2868 images were utilized to train the ML models and 193 images to test them. Following is a list of observations made using the suggested methodology:

  • VMD was employed to take pictures of Vb under distinct environmental conditions. Cryo condition shows the lowest wear value and dry with the highest. The TW at the cutting insert is the reason of poor lubrication and cooling at the cutting area.

  • Surface finish analysis explained that the quality surface was produced under cryo condition due to less flank side wear. The ranges of surface quality produced under different classes of flank wear were proposed. From the study, it was proved that the TW has a direct influence on surface waviness.

  • The effectiveness of the proposed TL using Inception-V3 model to predict TW in machining is analyzed with distinct performance measures like accuracy, Recall, precision, and F1 -Score, which achieves 99.4%, 99.8%, 99%, and 98.9% respectively.

  • A comparison of results shows that the Inception-V3 is the best suitable transfer learning model to predict the TW pictures with 99.4% accuracy, where the other models such as AlexNet, ResNet, VGG 16 and MobileNet with 93.6%, 94.03%, 95.1% and 94.2% accuracies, respectively.

  • The transfer learning model with Inception-V3 architectures takes augmented pictures is not reported in the literature and is extremely useful for intelligent machining applications.

Funding/acknowledgement

The research leading to these results has received funding from the Norwegian Financial Mechanism 2014–2021, Project Contract No 2020/37/K/ST8/02795. The authors also acknowledge the “Polısh Natıonal Agency For Academıc Exchange (NAWA) No. PPN/ULM/2020/1/00121” for financial support.