1 Introduction

Stainless steel alloys are widely used in many industrial fields such as automotive, aerospace, and medical. However, machining of stainless steel possesses a lot of challenges and difficulties. This is due to the tendency of work hardening, low thermal conductivity, and high strength. These are very hard materials which are difficult to machine and result in poor surface finish, tool failure, and irregular wear. The presence of a build-up edge increases tool wear rate and deteriorates the surface integrity of the work. Face milling of stainless steel can solve these difficulties. Milling is a common and efficient machining operation employed in modern industrial manufacturing for fabricating various mechanical parts, such as flat surfaces, grooves, threads, and other complex geometric shapes [1]. Cutting tools are key components in machine milling operations that are inevitably subject to wear during milling and therefore present conditions that vary over their effective lifetimes. Machining is a vital material removal technique in manufacturing that requires significant attention owing to the time and money required. As the applications for these machines expand, a system for monitoring tool wear becomes important. Tool condition monitoring (TCM) is an important technology in automated manufacturing processes because it increases productivity by reducing downtime, reduces damage to the cutting tool and workpiece, and ensures product quality [2, 3] because a broken tool causes permanent damage to the workpiece’s surface. As a result, tool wear affects the machined part surface quality, dimensional accuracy, and operating cost [4, 5]. As a consequence, it is vital to monitor tool wear using direct and indirect methods. The direct method requires the cutting tool to be removed from the machine and an optical microscope is generally used to measure wear, whereas the indirect method uses various sensor signals such as acoustic emission (AE), milling force, tool/workpiece vibration, and spindle motor current, to estimate tool wear [6]. Under typical machining conditions, it is noticed that flank wear is the most prevalent. The width of flank wear (VB) is the most commonly used measure to determine the cutting tool life [7]. This may be quantified directly [6] or indirectly (using sensors). Direct inspection methods include using optical microscopy to examine the condition of the cutting tool edges and measuring the tool wear. However, using this method introduces undesirable interruption periods which slows the machining process and increases the manufacturing costs. On the other hand, the indirect approach makes use of sensors that can monitor the tool wear without interrupting the machining process, therefore reducing the machining time and improving the productivity. Tool condition monitoring is one of the direct methods which are used for determining the value of tool wear without stopping the machining operation [8]. Recent studies have shown an interest in tool wear prediction since this would bring a significant benefit to the industry when it comes to waste reduction, production cost, and accuracy [9]. The typical machine downtime due to tool wear is between 7 and 20% [3]. Numerous methods have been developed in the recent two decades to quantify tool wear, including the use of outputs from acoustic emission sensors, vibration sensors, and current sensors [10, 11]. From the data collected by these sensors, it is possible to estimate tool wear, allowing for a more efficient machining process. A handful of research have been conducted to establish a link between machining parameters and tool wear using the indirect approach of tool condition monitoring [12, 13]. Vakharia et al. [14], for instance, employed Symlet wavelets to extract statistical characteristics from vibration and acoustic emission signals to estimate tool wear. To characterize the acoustic emission (AE) signals captured during cutting, Liang and Dornfeld [15] utilized an autoregressive time series. Their results suggested that detection of tool wear could be accomplished during machining by monitoring the evolution of the model parameter vector. Mohanraj et al. [16] predicted tool wear in the end milling machining process using wavelet characteristics and Holder’s coefficient. The authors analyzed flank wear with vibration signals by implementing various machine learning (ML) algorithms. The confusion matrix was used to analyze the accuracy of ML algorithms and later verified by using benchmarking datasets. The obtained results from the analysis have shown an accuracy of 100% and 99.86% for the support vector machine and decision trees, respectively. Tool life estimation is estimated using various approaches starting from mathematical formulation [17] to stochastic modeling [18], to more complex statistical models [19], and recently, the application of various AI algorithms. Machine learning (ML) algorithms which is a subset of AI techniques can bring automation in a variety of machining tasks, without much involvement of humans. In ML, various models, like SVM, ANN, KNN, etc., initially learn and trained through input data and output data and later can be used for prediction for unseen data [20,21,22].

Data-driven TCM has grown significantly as the need to incorporate automation and Industry 4.0 in manufacturing industries increases. Furthermore, with more inclusions of multi-sensors, sensor networks, and complex and unstructured data, big data poses a challenge for developing robust models. At present, deep learning (DL) serves as a bridge that efficiently connects big data coming from machinery with intelligent machine condition monitoring techniques. DL is another type of ML algorithm and has been applied in a variety of applications, since 2006. It mimics the functionality of ANN and consists of multiple information processing layers which learn the hierarchical representations of data. Recently, Serin et al. [23] did a comprehensive study about the applications of various deep learning algorithms for TCM. A methodology has been developed for automatic detection of tool wear in the face milling process using convolutional neural network (CNN) that is capable of identifying wear rates with minimal error [24]. Kothuru et al. [25] investigated the application of hyperparameter tuning to improve the accuracy of tool condition monitoring in the face milling process using CNN. Recently, Dzulfikri et al. [26] proposed a deep metric learning approach for stamping tool condition diagnosis. Several DML approaches were examined to see which one was best for determining the state of stamping tools. Authors concluded that the triplet network provided the most favorable results.

Numerous researchers have identified the lack of adequate experimental data to develop machine learning models as a barrier for effective tool condition monitoring and prediction of tool wear. The task becomes challenging when DL models need to be developed as they require large experimental data for training. To overcome this obstacle and to enable effective automation, the authors developed and investigated the utility of SinGAN for the generation of additional scalograms. Additionally, a thorough review of the literature indicates that prior research has paid little consideration to the prediction of tool wear using a combined approach of wavelet scalograms, SinGAN, and DL models like LSTM. Further authors applied the mother wavelet selection criterion relative wavelet energy (RWE) to select the base wavelet to generate scalograms from acoustic and vibration signals and extract relevant information from image quality parameters and, finally, tool wear has been predicted with various types of LSTM models.

A methodology has been presented related to the generation of additional data. Based on this, the advantage of the proposed method is verified after conducting experiments on publicly available milling datasets from Prognostics Centre of Excellence-Data Repository, NASA [27]. Comparisons of various LSTM models are discussed, and findings are evaluated with the standard performance metrics to determine the efficacy of models for prediction of tool wear. The remainder of the article is structured as follows. In Sect. 2, relevant information related to the experimentation and utilization of SinGAN are described. Section 3 discusses LSTM architecture. In Sect. 4, the results are described and in Sect. 5 the concluding remarks are highlighted. The methodology of the proposed work can be seen in Fig. 1.

Fig. 1
figure 1

TCM approach using SinGAN and LSTM

2 Materials and methods

2.1 Workpiece setup and cutting parameters

The experiments were conducted on a milling machine with a varying machining conditions in order to predict tool wear [27]. Cutting speed was set to 200 m/min, the feed rate was varied between 0.25 and 0.5 mm/rev, and cutting depth was varied between 0.75 and 1.5 mm as shown in Table 1. Two types of material, cast iron, and stainless steel J45 were used with an insert of type KC710. The size of the workpieces was 483 mm × 178 mm × 51 mm.

Table 1 Cutting parameters for end milling process

A 70-mm face mill with six KC710 inserts coated with TiC, TiC-N, and TiN for toughness was used in the milling tests. Tool wear was considered and analyzed with three distinct cuts: entrance cut, standard cut, and exit cut. From two locations, i.e., table and spindle, signals were collected using three different types of sensors: acoustic emission, vibration, and current sensor. Among these data, this study is concentrating on the spindle’s vibration and acoustic emission data when milling is performed on a steel workpiece. The experiment was performed 58 times with different cutting parameters, resulting in the generation of 58 AE and 58 vibration signals. The milling machine’s schematic diagram is shown in Fig. 2. Table 1 shows the cutting parameters for the end milling process used in the current study.

Fig. 2
figure 2

Schematic diagram of milling machine

2.2 Vibration and acoustic signals

Depending on the quality of the cutting tool, vibration levels might vary greatly throughout the machining process. Work-holding or tool-holding devices in one or more directions are the primary focus of the vibration. Vibration data for this experiment were collected using a vibration sensor attached to the spindle of a Matsuura MC-510 V milling center. The accelerometer was used to detect vibrations (model 7201–50, ENDEVCO). The signal was routed via a Phoenix contact cable connection and amplified and filtered using LP/HP filters before being calculated and entered into the computer for data collection. Acoustic emission (AE) is the spontaneous release of transitory elastic stress energy during the deformation of a material. Acoustic emission data were collected for this experiment utilizing an acoustic emission sensor (model WD 925, Physical Acoustic Group) with a frequency range of up to 2 MHz. This sensor was secured using clamping support. The signal was linked to the terminal of a preamplifier (model 1801, Dunegan/Endevaco) fitted with a high-pass filter set to 50 kHz, and then amplified by a dual amplifier (model DE 302A). The signal is routed via a custom-designed RMS meter and then through a cable to a high-speed data collection board (MIO-16). Figure 3 shows the vibration and acoustic signals captured through various sensors.

Fig. 3
figure 3

a Acoustic and b vibration signal at feed = 1.5 mm/rev and depth of cut = 0.5 mm

2.3 Selection of mother wavelet

To effectively predict the tool wear from the signals, wavelet transform (WT) was used for pre-processing of signals and to generate scalograms. Wavelet is a short wave that is symmetrical and has a mean value of 0. WT was formulated to solve the constraints of extracting useful information from non-stationary signals which Fourier transform was not able to do because of fixed window size [28]. Since then, it has been used in a variety of signal processing applications [29, 30]. In contrast to the short-time Fourier transform’s windowed representation, the WT produces a smooth representation. As a result, sudden changes and abnormalities as well as similarities can be effectively analyzed from captured signals. WT is analogous to a mathematical microscope in such a way that it can analyze signals at a variety of scales [31].

Based on the comprehensive literature review, the abovementioned mother wavelets have been chosen for comparison and the wavelet which is giving maximum relative wavelet energy (RWE) has been chosen to generate scalograms. RWE refers to the energy relative with the distinct frequency bands and further can be applied to characterize and identify specific phenomena in both the temporal and frequency domains [32, 33]. For a given signal, RWE is calculated as follows:

$$R=\frac{{Y}_{i}}{{Y}_{t}}$$
(1)

Here, \({Y}_{i}\) and \({Y}_{t}\) represent the energy content and total energy content of a signal.

2.4 SinGAN

When ML or DL models need to be constructed for either classification or regression, the primary issue is the availability of experimental data. The number of experiments conducted for any manufacturing operations is limited, which results in data scarcity. There are several limitations for inadequate data which include associated cost, restrictions in the upper and lower limits of operating parameters, duration of experiments, etc. Thus, to address this issue, and overcome the abovementioned limitations, generative adversarial networks (GAN) are attracting importance in the research community, and it is a type of unsupervised learning. Formulations of new instances from the original data with the applications of GAN are possible in most of the applications as evident from recently published works of literature [34, 35]. GANs have made significant advances in the modelling of high-dimensional distributions of visual data [36]. The goal of GAN is to allow two or more neural networks Generator G and Discriminator D to compete against each other. The Neural Network Generator creates new data instances, while the Neural Network Discriminator examines them for authenticity. When the discriminator is trained, the generator values are maintained constant, and the discriminator values are kept constant when the generator is trained. A duel between the Generator and the Discriminator against a static opponent is used to teach both the Generator and the Discriminator. Following the framework of minmax game, the model can be represented mathematically as [37, 38]:

$$\begin{aligned}min_G {max}_{D} V (D,\; G)& = {E}_{x\sim pdata(x)} [log\; D(x)] \\&+ {E}_{z\sim pz(z)} [log(1-D(G(z)))]\end{aligned}$$
(2)

Here, x represents the input image, E is cross-entropy, z is the latent space G draws samples from, and p represents the respective probability distributions.

In the current study, authors have formulated SinGAN, which is an unconditional image generation approach that can learn from a single natural image [39, 40]. Once trained, SinGAN can generate a wide range of high-quality image samples with different aspect ratios as well as different sizes that are semantically similar to the training image from coarse “0” to fine “4” scale.

2.5 Long short-term memory network

Deep learning approaches have been initially used for video and image analysis. However, the vibration signals or acoustic signals are usually one-dimensional sequential data. It is observed that recurrent neural network (RNN), which is a type of DL algorithm, suffers from short-term memory and also vanishing gradient problems. Layers having small gradients stop learning and, as a consequence, RNNs forget what is seen in longer sequences. LSTM is a kind of RNN that is generally considered more appropriate for sequential data, and as a result, they are frequently utilized in natural language processing and speech recognition [41]. Furthermore, it is also observed that in comparison to conventional RNNs, LSTMs perform better in time series prediction [42]. To circumvent the issue of short-term memory loss, LSTM is used which consists of cells and gates through which the flow of information is regulated. To understand which data information should be kept and which should be discarded, gates in LSTM play a vital role, whereas cell carries relevant information about the data throughout the processing stage. The LSTM cell is controlled by three gates: a forget gate \(({F}_{a})\), an input gate \(({I}_{a})\), and an output gate \(({O}_{a})\) whereas \({K}_{a}, {K}_{a-1}\), and \(\widehat{{K}_{a}}\) denote the current, previous, and temporary cell states, respectively, as shown in Fig. 4. A sigmoid activation function of type tanh is used inside gates which keeps the values between 0 and 1. The closer value to 0 means forget information, and the closest value to 1 means keep information. The forget gate \(({F}_{a})\), the input gate \(({I}_{a})\), and the output gate \(({O}_{a})\) decide the information that will be discarded from the previous cell, or added to the current cell, and exported from the current cell, with the help of the following equations:

Fig. 4
figure 4

LSTM cell structure

$${F}_{a}= \sigma ({W}_{f}\cdot\left[{H}_{a-1}, {x}_{a}\right] + {B}_{f})$$
(3)
$${I}_{a}= \sigma ({W}_{i}\cdot \left[{H}_{a-1}, {x}_{a}\right] + {B}_{i})$$
(4)
$${O}_{a}= \sigma ({W}_{o}\cdot \left[{H}_{a-1}, {x}_{a}\right] + {B}_{o})$$
(5)

Here, \({W}_{f}\), \({W}_{i}\), and \({W}_{o}\) are three weight matrices with respect to forget, input, and output gates respectively. Similarly, \({B}_{f}\), \({B}_{i}\), and \({B}_{o}\) are the offset vectors for the forget gate, the input gate, and the output gate. The hidden state \({H}_{a}\) of an LSTM cell is computed as:

$${H}_{a}= {O}_{a}*\mathrm{tanh}({K}_{a})$$
(6)

whereas \(\sigma\) and \(tanh\) activation functions are computed as follows:

$$\sigma = \frac{1}{1+{e}^{-x}}$$
(7)
$$\mathrm{tanh}= \frac{{e}^{x}-{e}^{-x}}{{e}^{x}+{e}^{-x}}$$
(8)

In our study, we have considered three LSTM models: vanilla LSTM, stacked LSTM, and bidirectional LSTM [43] for in-depth analysis of tool wear prediction for end milling.

3 Results and discussion

To predict tool wear from the end milling process, the authors computed the RWE from different mother wavelets. It has been observed from Table 2 that the Morlet wavelet exhibits maximum RWE for both signals. Fifty-eight scalograms each have been generated from the Morlet wavelet coefficients from both AE and vibration signals, with different operating conditions [44]. Figure 6 illustrates scalograms generated from AE and vibration signals at various scales. In the current study, the authors created 29,000 images (14,500 each from AE and vibration signals) through the Morlet wavelet scalograms which are shown in Fig. 5.

Table 2 Maximum relative wavelet energy of different wavelets
Fig. 5
figure 5

Morlet scalograms

Feature extraction is a process to extract useful information from images. To predict the tool wear with a deep learning network from generated images as seen in Fig. 6, fourteen image quality parameters (IQP) were extracted according to Table 3. Figure 7 shows the feature extraction procedure from scalograms. A feature vector of size 14,500 × 14 each was constructed from both AE and vibration scalograms. This feature vector is then fed to deep learning models for training and prediction of tool wear. Long short-term memory network (LSTM), which is a type of deep learning model, is utilized for prediction.

Table 3 Image quality parameters
Fig. 6
figure 6

a–j Generated scalograms from SinGAN at DOC (0.75 mm) and feed (0.5 mm/rev).

Fig. 7
figure 7

Feature extraction from scalograms

Sample feature vectors are mentioned in Tables 4 and 5 which were constructed from scalograms. As observed from both tables, there are a lot of variations in extracted features. Therefore, the robust transformation of the feature vector is needed to reduce the biasedness and to train the LSTM model effectively. In robust feature vector transformation, scaling will be required so that every feature value lies between 0 and 1. It is achieved by subtracting a particular feature column from the median and then dividing it by interquartile range and afterward applied to the full feature vector. The robust transformation was chosen since it is a widely used approach in prediction; Tables 6 and 7 show the sample robust transformed feature vectors. These transformed feature vectors are fed into three different LSTM models namely vanilla LSTM, stacked LSTM, and bidirectional LSTM for prediction of flank wear. To assess the prediction capability, five performance parameters, R2, adjusted R2, MAE, RMSE, and MSE, were calculated, and formulas are listed in Table 8.

Table 4 Sample features from AE signals
Table 5 Sample features from vibration signals
Table 6 Sample robust transformed AE features
Table 7 Sample robust transformed vibration features
Table 8 Performance parameters

Here, \({{\varvec{y}}}_{{\varvec{r}}}\) is actual tool wear, \({{\varvec{y}}}_{{\varvec{p}}}\) is predicted tool wear, \(\overline{{\varvec{y}} }\) is mean of actual tool wear, and \({\varvec{N}}\) is the total number of observations.

In the current study, 70% of the tool wear features were used for training and the rest (30%) of the tool wear features for testing of the model. Prediction results, from performance metrics, are shown in Fig. 8a–d. It is observed that there are significant variations in values of performance metrics from all LSTM models. In order to further validate the tool wear prediction capability by trained LSTM models, testing of models was carried out and separate plots are included which are shown in Fig. 8b, d. Out of the three LSTMs models, the tool wear prediction error values, i.e., MAE, RMSE, and MSE, were observed to be very low from AE tool wear feature vector as 0.006, 0.012, and 0.0001, respectively, from the stacked LSTM training model, whereas R2 and Adj. R2 values were observed to be 0.998, as shown in Fig. 8a. Similarly, the tool wear prediction error values of MAE, RMSE, and MSE observed with testing of models are 0.008, 0.023, and 0.0005, respectively, with stacked LSTM model, whereas R2 and Adj. R2 values are observed to be 0.995 (Fig. 8b), which is significantly high and near the ideal value as mentioned in Table 8. Therefore, stacked LSTM performance is better than bidirectional LSTM and vanilla LSTM models, when AE tool wear features were used for prediction of wear rate. The performance metrics graphs obtained from vibration features are shown in Fig. 8c, d respectively. The lowest MAE, RMSE, and MSE observed for tool wear prediction are 0.005, 0.008, and 0.00005 again with the stacked LSTM model during training, whereas significantly high values of R2 and Adj. R2 were observed as 0.999, which is shown in Fig. 8c. Similarly, the lowest MAE, RMSE, and MSE observed from testing of feature vectors which are used for tool wear prediction are 0.005, 0.016, and 0.0002 with the stacked LSTM model and R2 and Adj. R2 was 0.997, which can be observed from Fig. 8d. The performance of the stacked LSTM model to predict the tool wear is superior as compared to that of bidirectional LSTM and vanilla LSTM models with both training and testing as well as with AE and vibration features respectively as evident from the results. The probable reason why the stacked LSTM model works so well for tool wear prediction with robust scalar transformed features is that robust scalar is resilient against possible outliers present in the feature vector extracted through wavelet transform, and the feature vector is transformed in such a way that outliers have no negative influence when the prediction model is built for tool wear prediction. Furthermore, vanilla LSTM utilizes only one LSTM layer, whereas stacked LSTM utilizes many layers, which are connected very well with each other enabling the model to compute information easily which boosts the model’s effectiveness in predicting tool wear values obtained through experimental results. To highlight and justify the utility of proposed methodology, a comparison table (Table 9) has been prepared with the available literature in which various authors have utilized the same dataset.

Table 9 Comparison table with existing literature to highlight the utility of proposed methodology
Fig. 8
figure 8

a–d Performance metric values.

4 Conclusion

In the present paper, a methodology is presented to predict tool wear based on wavelet scalograms, SinGAN, and deep learning models. Initially, 58 scalograms each from AE and vibrations signals were generated from Morlet wavelets and, afterward, SinGAN was applied to generate additional images which are extremely useful to trained LSTM models. Fourteen IQP were extracted to form the feature vector and to randomly split into training and testing. The three models vanilla LSTM, stacked LSTM, and bidirectional LSTM were explored for efficient prediction of tool wear. To analyze the performance of models, five performance metrics were used, and the outcomes are summarized as follows:

  • Tool wear prediction was found to be extremely well from both AE and vibration feature vectors.

  • The lowest MAE, RMSE, and MSE values (testing) observed from AE feature vector are 0.008, 0.023, and 0.0005, respectively, whereas from vibration signals 0.005, 0.016, and 0.0002 values (testing) are observed.

  • Significantly high R2 and Adj. R2 values of 0.997 are observed from the vibration feature vector as compared to 0.995 with the AE feature vector.

  • Stacked LSTM predicted tool wear much better as compared to bidirectional LSTM and vanilla LSTM models in case of AE and vibration feature vectors both.

  • Superior prediction of tool wear is achieved with the proposed methodology, specifically when the availability of experimental data set is less to train the model.