Introduction

Milling machines play a crucial role in various industry sectors, enabling the cutting or shaping of raw materials to meet specific requirements. This research focuses on a peripheral milling machine used in the marine industry, where continuous production is the norm and conventional methods for investigating milling blade wear progress are not often feasible. Therefore, the primary objective of this study is to increase the understanding of the wear phenomena in milling machine spindle cutting blades by combining different methods to accurately evaluate the wear progress. To achieve this, the study involves a thorough analysis of in-use data and physical characteristics associated with the milling process and combining measured wear data from used milling blades. The main physical components of the peripheral milling machine are illustrated in Fig. 1, being an alternating electric motor, gearbox, shaft, and spindle with cutting blades. The main cutting components of the test case peripheral milling machine comprise a spindle equipped with cutting blades arranged in four rows, with 18 blades per row around its circumference. The milling is operating in a down-feed direction.

Fig. 1
figure 1

The case peripheral milling machine’s main components

Presently, the ongoing research in this field places a growing emphasis on the integration of diverse methodologies for prediction-making to overcome their respective limitations. One of the most critical aspects is integrating data that is relevant to the observed phenomena because inadequate data may also contain information that impacts the expected output negatively. Dimensionality reduction methods, which are based on machine learning, play a crucial role in unveiling hidden patterns and relationships within complex machining processes. The attractiveness of dimensionality reduction methods lies in their non-parametric nature, their efficiency in terms of computational requirements, and their straightforward implementation (Sarmadi et al., 2022). Wu et al. researched a physics-informed machine learning model to demonstrate that the physics-informed model incorporation with the long short-term memory (LSTM) prediction model can achieve high-accuracy and reliable prediction performance in real-life milling operations surface roughness prognostics (Wu et al., 2022). However, the overall concern on model operation only on the limited datasets as well as potential shortcomings in the time-varying black box feature extraction process remains unanswered. In addition to the uncertainties created by the black box process, the direct usage of the input signals with LSTM may increase the possibility of bad-quality data. Processing raw time series data directly with the LSTM network might lack robustness due to the presence of noise in the sensor data (An et al., 2020). To address this issue, an integrated and transparent feature selection is needed to perform local feature extraction from the original signal sequence data.

Other components frequently incorporated into hybrid models include physics- and data-based models. Physics-based models provide essential data for prediction models that cannot be obtained through machine or sensor data, including information on structural integrity, material behavior, and machining dynamics (Elsayed, 2012). In general, the physics-based methodologies have the capability to assess the health status of a specific system by utilizing a set of equations that are derived from foundational principles in physics and engineering (Sikorska et al., 2011). Their drawback is that they often become excessively complex and require a deep understanding of the physical dynamics within the system of interest (Wu et al., 2017). Therefore, despite the progress in academia aimed at finding ways to optimize complex systems with multiple conflicting objectives, such as the data-driven sequential learning framework proposed by Khosravi et al. (Khosravi et al., 2024), it may still hinder the widespread implementation of the created model in other applications. In contrast, data-driven models are constrained by the extent of their training datasets (Arias Chao et al., 2022). These algorithms rely on historical data and big data rather than a comprehensive understanding of the system's physics (Heng et al., 2009). When evaluating the predictive uncertainties linked to the observed data, model parameters, and structures (Tian et al., 2023), data-driven models face limitations due to their training data.

Hybrid techniques may offer more in-depth information on the asset behaviour in contrast to physics-based modelling or data-based model used alone, as both models often suffer from their comprehensive applicability to complex real-world domains (Arias Chao et al., 2022). As such, hybrid approaches are continuously explored to leverage the strengths of both methods across research fields. Sahoo et al. proposed a hybrid model that merges the cutting force coefficient derived from finite element method (FEM) simulations with a revised undeformed chip thickness (UCT) algorithm to predict cutting forces in micro end milling. The comparison between the forecasted and actual results showed a significant correlation, with the average peak force error ranging between 8.1 and 10.21% in the x- and y-directions, respectively. Despite the encouraging outcomes of their study in predicting cutting forces using hybrid models, it did not explore the relationship between cutting force and Vb predictions (Sahoo et al., 2019). Yang et al. (2022) proposed a novel hybrid method that merges data-driven strategies with insights from models for real-time wear detection in face milling machines, using power or force measurements. This model was put to the test with synthetic data created from simulations of a physics-based model, considering a variety of operational conditions, levels of measurement noise, and tool wear degrees. The model achieved an accuracy rate of 92% in data where 1% noise was artificially introduced. Importantly, this hybrid model significantly reduced the number of false alarms compared to using either data-driven or physics-based models on their own, demonstrating its effectiveness in accurately detecting tool wear and anomalies in real-time. Zhang et al. (2021) combined the digital representation of data with the physical inputs through a digital twin. They proposed a digital twin-enhanced dynamic scheduling methodology, which is based on the physical machine and virtual machine inputs. Their model outputs are used to enhance machine availability prediction, disturbance detection and performance evaluation. The highlighted limitation of the study emphasizes the time-consuming and costly work of the digital twins, which are required for the efficient implementation of the model. To overcome this challenge (Zhang et al., 2021), the authors are proposing the usage of a partial digital twin, comprising solely the relevant objects and essential model types (e.g. geometry models, physics models, or behaviour models) based on specific requirements.

In various research, the developed methods are implemented in controlled circumstances, yet, the real-world domain predictions require memory effects due to environmental noise and other natural disturbances in production. Li et al. (2022) developed a hybrid method to predict the Remaining Useful Life (RUL) of cutting tools by considering their wear state. They used support vector regression to map the relationship between sensor signals and tool wear in a controlled test setup. The findings indicated that this approach achieved enhanced prediction accuracy when contrasted with the utilization of exclusively physics-based or data-driven methods. However, the original support vector regression is known for its broad applicability (Santos et al., 2021) but is not widely acknowledged to accommodate large datasets (Rivas-Perea et al., 2013) or to effectively handle long-term dependencies in data (Bathla, 2020).

To capture memory effects more effectively from past occurrences, a version of the Recurrent Neural Network (RNN) known as LSTM has demonstrated its potential in predicting RUL. Zhou et al. (2019) proposed a method involving the creation of a unified representation of working conditions and the extraction of wear characteristics from the processing signal. These extracted wear features, along with the corresponding working conditions, are combined into an input matrix for predicting tool wear. They utilize an LSTM model to capture the complex spatio-temporal relationships under variable working conditions and establish a model for predicting the remaining useful life of the tool. In another study, Nie et al. adopted an alternative approach by integrating a convolutional neural network (CNN), bidirectional long short-term memory (BiLSTM), and an attention mechanism to predict RUL of milling cutters. The CNN in their approach is responsible for handling sensor-monitored data, extracting crucial local feature information. Simultaneously, the BiLSTM neural network adaptively extracts temporal features, while the attention mechanism processes critical degradation features and extracts information related to the tool wear status. Their study demonstrated promising results compared to traditional approaches in terms of predictive accuracy (Nie et al., 2022).

In the context of milling blade wear prediction, there is a noticeable paucity of attainable and implementation-easy models concerning hybrid approaches suitable for deployment in ongoing production assets. Furthermore, prior studies have overlooked the utilization of prediction models that incorporate historical knowledge of past occurrences into physics-based simulation model usage and transparent feature extraction processes. In summary, this research introduces a novel Fused Data Prediction Model (FDPM) approach to fill this gap by combining advanced simulation model physics, rake wear results, and transparent feature extraction process with the recurrent neural network for RUL prediction.

The main contributions of this research are:

  1. 1.

    A novel data simulation model is established to emulate machine behaviour based on cumulative trend behaviour in terms of average cutting force (Fc), torque (Mc), and material removal rate (Q).

  2. 2.

    A recurrent neural network called VbRNN is established to predict rake surface wear in the milling machine context.

  3. 3.

    A novel FDPM model is developed, which combines simulated wear trend behaviour, real sensor data, a transparent feature extraction process, VbRNN and the Exponential Triple Smoothing (ETS) to extrapolate offset in the Vb predictions.

Methodology

A programmable logic controller (PLC) collects operational data from the peripheral milling machine, presented in Fig. 2a. The collected dynamic inputs consist of the following variables: table feed speed (Vf), cutting speed (Vc), radial depth of cut (ae), and axial depth of cut (ap). The online data collection process is described in Mäkiaho et al., (2023). The static inputs required for constructing the physics-based simulation model construction are summarized in Table 3. The machine static parameters Fig. 2b and dynamic variables Fig. 2d are used as the physics-based simulation model inputs, which creates simulated cumulative trend behaviour in terms of Fc, Mc, and Q. The blade wear laboratory measurements are performed for four (4) milling cycles (T1C4, T1C5, T2C4, and T2C5) presented in Fig. 2c. These results are connected to the simulation model, which further creates a time-integrated wear progress trend signal imitating temporal rake wear in the spindle blades. Due to the ongoing machine production, these measurements were taken only once, at the end of pre-determined milling meter targets. The results of the simulation model are evaluated by comparing them to the measured wear levels of individual cutting blades on the rake surface. In the experiments, the best-performing simulation model variable was found to be the cumulative rake wear, which is associated with average cutting force calculations (VbFc). This dynamic trend is then combined with real machine data to create a hybrid dataset for the feature reduction phase.

Fig. 2
figure 2

Overall description of the data inputs and the FDPM model

The hybrid dataset is formed in Fig. 2e, where the dynamic variable data obtained from the PLC and the selected simulation model output VbFc are conjugated. This merged dataset is utilized as input for the Pearson Correlation Coefficient (PCC) algorithm to ensure that only meaningful features are selected for the neural network training. The PCC is a statistical measure that calculates the linear correlation between two variables (Zhang et al., 2016), providing insight into the strength and direction of their relationship. The PCC is used to select optimal degradation features and linear correlations for the wear model (Cheng et al., 2019; Jiang et al., 2021). Therefore, the PCC is used to identify features that have a positive correlation with the VbFc, yet, ensuring more informative and robust features for the RNN to learn and to improve its accuracy.

The subsampled features from the PCC are used as inputs (X1,…, X6) to train the LSTM neural network, named VbRNN due to its aim to predict accumulated rake wear. A generalized principle of a neural network is illustrated in Fig. 2, point (f). The output layer of the neural network supplies a probability value of a selected parameter to forecast the RUL of the system (Li et al., 2018; Zhang et al., 2016), which is detailed in Chapter 5.2. The Exponential Triple Smoothing (ETS) method is used in Fig. 2g to visualize the compensation needed due to inaccuracies in the VbRNN predictions. The prediction model developed under this research is including the process steps depicted in Fig. 2c–h utilizing aggregated simulation model data, blade wear measurements, and real-life observations. The model is herein referred to as the Fused Data Prediction Model (FDPM).

Physics-based simulation model construction

To construct a physics-based simulation model, understanding the forces affecting the milling blade is essential. Cutting forces are the forces generated when a cutting tool is in contact with the milled material. These forces are generated due to the interaction between the tool's cutting edge and the workpiece, as illustrated in Fig. 3. The cutting forces can be divided into three components: feed force, radial force, and tangential force (Wayal et al., 2015). The feed force (dFt) is the force that pushes the tool into the workpiece (Z-direction), while the radial force (dFr) is the force that acts perpendicular to the cutting direction (Y-direction) (Moufki et al., 2015). The tangential force (dFa) is the force that acts in the direction of the cutting edge (X-direction) (Moufki et al., 2015). These forces can vary depending on the machined material, Vc, chip depth of cut dimensions (ae, ap), and Vf, as annotated in Fig. 3.

Fig. 3
figure 3

Blade contact parameters and affected forces

The forces affecting the blade will wear the surfaces in contact with the milled material. The maximum surface wear (Vbmax) describes the maximum width of the surface wear land (Uhlmann et al., 2014) on each side of the contact flanks. These flanks are annotated as a rake surface (f1) and flank surface (f2) (Mia et al., 2017; Xie et al., 2012). The following subchapters present the physics-based equations, cutting blade wear measurements, and simulation model creation and validation.

Physics-based equations

The physics-based equations used in the simulation model are created with the input parameters Vf, spindle rotational speed (\(n\)) derived from the Vc, ae, and ap that can create as realistic variating output as possible to imitate physical phenomena occurring whilst the milling machine is in operation. The input parameters were collected from the peripheral milling machine data provided by the embedded PLC. The physical phenomena constructed in the simulation model are tested and compared with Fc, Mc, and Q calculations. The Fc formula is presented in Aaltonen et al. (1997):

$$ F_{c} = h_{m} \times k_{c} \times b \times \left( {\frac{360}{{z \times \alpha }}} \right) $$
(1)

where average chip thickness (\({h}_{m})\), specific cutting force (\({k}_{c}\)), and length of cut (\(b\)) are resolved by multiplication. This product is further multiplied with the result of 360 (degrees) over the product of the calculated contact angle (\(\alpha \)), and the total number of cutting elements (\(z\)) in the spindle. The Mc is constructed according to Sandvik (2017):

$$ M_{c} = \frac{{P_{c} \times 30 \times 10^{3} }}{\pi \times n} $$
(2)

Where the net power (\({{\text{P}}}_{{\text{c}}}\)) calculations and \(n\) are dynamic parameters. Lastly, the Q is calculated directly from the simulation model inputs, as presented in Ersvik and Khalid (2015), Nee (2015):

$$ Q = \frac{ap \times ae \times Vf}{{1000}} $$
(3)

Detailed employment of the dynamic aspects in Eqs. (1) and (2) can be observed in research paper (Mäkiaho et al., 2022), where the simulation model architecture was preliminarily prepared and used for vibration excitation and torque imitation to obtain additional information for pay-per-x (PPX) business model (Schroderus et al., 2022; Uuskoski et al., 2020) lifecycle calculations.

Cutting blade wear measurements

The blade surface wear measurements were performed from four (4) different operational cycles at the customer facility. The operational cycles generate the research datasets that are used in different phases of the FDPM. Table 1 presents the operational cycles along with their respective cumulative meters milled and the number of recorded data points for each cycle. The quantity of data points recorded during the milling process is contingent upon two variables: the duration of the milling operation and the specific profile type being manufactured. This relationship exists because material feed speeds, which are controlled automatically, vary depending on the type of profile being processed. The operational data is collected with an online data procurement system connected to the machine’s PLC system. Once in operation, the data points were recorded with a frequency of 5 Hz, resulting in 200 ms between individual data points.

Table 1 Recorded operational cycles with cumulative milled meter values and recorded data points

The naming convention of T1 and T2 in the operational cycles indicates the unique numbering of two separate spindles. This is done to prevent unnecessary production stoppages caused by the time required for blade changes. Therefore, T1 is the spindle set number 1, and T2 is the spindle number 2. During the blade wear measurement tests, the spindles are changed to the peripheral milling machine once the predetermined stage of milled meters is reached. The predetermined stages were 2000 m and 2500 m which were met in close approximation. In designations, C4 and C5, the letter ‘C’ refers to blade rows on a spindle, which are positioned at a 90-degree cutting angle perpendicular to the material being milled. During the normal milling process, only one blade row is in contact with the material. The numbers ‘4’ and ‘5’ indicate the specific blade edge used in a given operational cycle.

Laboratory measurement setup and wear limit criteria

Optical microscopic measurements are accomplished with a 1200–2400% zooming range from the original pixel frame of 2560 times 2048 pixels. The Vbmax measurements were performed with calibrated Carl Zeiss Jena optics connected with internally created digital photo measurement software based on NI Labview engineering software. Microscopic measurement setup excluding the computer interface with the measurement software is illustrated in Fig. 4. Due to the confidential nature of the manufacturing equipment, a more detailed illustration of the milling machine setup is not provided.

Fig. 4
figure 4

Carl Zeiss Jena optical microscope setup with the digital connection to the NI Labview engineering software-based digital photo measurement tool

The sides of the cutting blades were marked with carved numbering to reduce the risk of confusion after the milled meters were received and the blade edge was turned or changed in the spindle as illustrated in Fig. 5. In addition to each cutting blade numbering, each of the edges was carved for similar reasoning, as presented with numbers 1–4 in Fig. 5a. Similar identification for edges 5–8 exists on the reverse side of the f2. The f1 surface side of the blade is presented in Fig. 5b.

Fig. 5
figure 5

Numbered individual cutting blade #46 flank surface 1–4 in (a), rake surface edges 4 (up left) and 8 (down right) in (b)

In this study, the blade-specific wear measurements are presented by the maximum values of flank wear (Vbmax) for individual flank surfaces, where the maximum peak land width is measured (Siddhpura & Paurobally, 2013; The American Society of Mechanical Engineers, 1985), as annotated in Fig. 6.

Fig. 6
figure 6

Principle of the wear measurement

The Vbmax tool life criterion can be considered as a wear criterion when the wear pattern in the measurement area results is relatively uneven (Siddhpura & Paurobally, 2013), which meets the criterion in the results of this research. The Vbmax is also referred to as the critical flank wear in which the tool can be observed to reach its end of life and requires replacement (Traini et al., 2019). In previous studies, different Vbmax values were used: 0.24 mm in Panda et al. (2008), 0.3 mm in Lin et al. (2020), and 0.7 mm in Caldeirani Filho and Diniz (2002). In this study, a blade change threshold of 0.4 mm was established for the prediction phase. The determination of this threshold was based on expert opinions regarding blade condition, which involved visually inspecting and comparing the blades after their operational cycle. To achieve our research objective, a thorough evaluation of how the blade condition affects milled material quality in the specific machine construction was carried out in collaboration with end users and machine manufacturer experts. This evaluation centred around a critical threshold of 0.4 mm, measured from the rake surface, which played a central role as a defining parameter in our investigation. Therefore, the normal operational rake wear and milling blade change limit used in this study can be summarized by the presented criterion as follows:

Normal operation: 0 mm ≥ Vbmax ≤ 0.4 mm.

The individual cutting blade physical dimensions are given in millimetres to X, Y, and Z-directions according to Fig. 7a, being 19.1 mm, 8.0 mm, and 19.1 mm, respectively. Consequently, the selected Vbmax value corresponds to approximately 5% of the flank’s physical dimensions in the f1 direction and 2.1% in the f2 direction.

Fig. 7
figure 7

Microscopic view of milling blade Vbmax indicating observed blade #44 flank #4 in (a), close view in (b), and Laplacian transformed view in (c)

The f1 surface side of an individual cutting blade is presented in Fig. 7a where the observed cutting edge is pointed with a red colour rectangle. Each measured cutting edge surface was analyzed by the software-based measuring tool to accurately observe the maximum peak land width in Fig. 7b. A Laplacian of Gaussian filter was occasionally employed in the Vbmax pattern recognition, as the filter helps to estimate the scales, shapes, and orientations of an object (Siddhpura & Paurobally, 2013). An example of the Vbmax examination with the help of the Laplacian of Gaussian filter is presented in Fig. 7c to determine the f1 surface wear maximum value.

Analysis of the wear measurements

The wear measurement results were analyzed to quantify the amount of wear experienced by the milling blades during their designated milling cycle. As previously discussed, these forces act in three dimensions, with the primary effect being the wear on the two blade surfaces that come into direct contact with the milled material. Therefore, the milling blades’ f1 surface and f2 surface measurements were performed. The f1 surface-related inconsistency on test run T1C5 can be observed in Fig. 8 where the majority of the measured data points are scattered on the f2 surface side of the blade, which can be interpreted as anomalous behaviour.

Fig. 8
figure 8

Wear pattern comparison between rake surface (f1) and flank surface (f2) wear named as Vbf1 and Vbf2, respectively

Consequently, the T1C5 flank surface wear results are left out from the wear prediction part of the research due to its examined inconsistency on the wear patterns in comparison to any researched time or cycle constraints. Despite this, the rake surface measurements on the T1C5 remain consistent in comparison to the number of milled meters with the other test runs, which fortifies the f1 surface measurement’s usage in the simulation model creation. The scatter plot illustration in Fig. 8 also presents the natural variation in rake surface dimension between the individual blades. Naturally, some degree of variation in the f1 surface dimension is anticipated in the measured results. After analyzing the measurements, the recorded variation is deemed to fall within reasonable limits.

Due to the natural variation in the results, the measured mean value (Vbmean) of the blade wear is calculated for the simulation model validation purposes. Minimum and maximum values are also recorded to obtain the scale in which the Vbmax values are present in the circumference of the blades in a specific row. The calculated mean values used in the simulation model construction (T2C4) and validation phase (T1C4, and T2C5) are found in Table 2 below. A complete list of the blade-specific f1 surface and f2 surface Vbmax values of each test run are recorded in Appendix 1, yet only the f1 surface results are used in the simulation model construction.

Table 2 The measured mean, minimum, and maximum f1 wear values in millimetres

Physics-based simulation model creation and validation

The objective of the simulation model is to reconstruct the continuous-time wear pattern to as close an approximation to the known measured Vbmean -value as possible. All the simulations are performed in MATLAB-Simulink software, which is a commonly known design and programming platform for dynamic and embedded systems. Simulation inputs are divided into static parameters and dynamic input variables. Static input parameters used in the simulation model construction are listed in Table 3, containing physical dimensions of the milling machine components, milling lead angle, nominal component values, and feed material-related static properties. All the static data presented in Table 3, including the material-specific hardness factor Kc1.1, is received from the case machine’s original equipment manufacturer. The chip thickness compensation factor mc is obtained (Sandvik, 2017).

Table 3 Simulation model static input parameters

The equations (Eqs. 1, 2, and 3) represent various calculation formulas that incorporate both the static parameters and the dynamic input variables for the simulation model in Fig. 9. Base calculations for the simulated formulas are presented in Mäkiaho et al. (2022). Due to diversity in the equation output units, a unique external factor (EF) is needed to accommodate the iterated simulation results as close as possible to the measured Vbmean target value measured from the T2C4 dataset. The purpose of the EF is to correct any offset or discrepancy in the signal behaviour that may arise due to differences in the units used in the equation's output. By applying the EF, the simulation results can be calibrated to better match the measured target value, ensuring greater accuracy in the model's predictions. Also, the discrete values provided by the equations require continuous-time integration (1/t) to obtain the cumulative trend behaviour of the parameter's progress in the time domain. Consequently, the simulation model provides cumulative trends of the individual equation outputs to be validated, named VbFc, VbMc, and VbQ, respectively.

Fig. 9
figure 9

Structure of the simulation model

The VbFc, VbQ, and VbMc values presented in Table 4 are the final results received from the physics-based simulation model. The validation of the results is performed by comparing the received simulation results to the measured dataset-specific mean values already presented in Table 2.

Table 4 Simulated Vb results in comparison to measured mean values in Table 2

The T1C4 dataset was primarily used in the simulation model validation due to its relatively similar milled meters values (2009 m) in comparison to the T2C4 (2010 m) used for the model construction. All the simulated results are cumulative values to present wear accumulation to the blades, which are encountering stress behaviour when in physical contact with the milled material. The simulated results illustrate relative error % in comparison to the measured mean values as VbMc and VbQ indicating negative, and VbFc positive error values with the T1C4. The scale of the variations in the T1C4 results is reasonable in accuracy but the variation in the results propagated in further testing of the constructed simulation model with another validation test run, T2C5. The model was calibrated using data for a shorter milling cycle (T2C4, 2010 m); however, its extrapolation capability for producing accurate rake wear simulation results for a longer time domain data cycle (T2C5, 2518 m) was exceeding expectations. The relative error % values obtained when compared to the measured Vbmean at the final data cycle stayed within a relatively small range, indicating much better accuracy than the T1C4 results. The average error percentage of the dataset relative errors also indicates that the VbFc method overcomes the VbMc and VbQ in accuracy.

In conclusion, the VbFc produces the most accurate values to meet the measured Vbmean target value. The simulated VbFc average error value of 2.38% is therefore bolded in Table 4 to highlight the most applicable simulated signal for the prediction algorithm. The accumulated VbFc trend behaviour is visualized in Fig. 10 with the measured minimum and maximum values for the test run. All the simulated results are located inside the MinMax boundaries of each dataset. The red asterisk ‘*’ symbols in the figure are presenting the measured Vbmean value location of each test run at the end of the milling cycle.

Fig. 10
figure 10

Simulated VbFc trends with a different test run T2C4 ‘blue’, T1C4 ‘orange’, and T2C5 ‘grey’ with measured MinMax boundaries for the test run, as well as Vbmean rake surface value with ‘*’

To conclude, the results from the presented two validation rounds give good confidence in using the VbFc in the data merging phase together with the operational data obtained from the PLC.

Feature reduction process with Pearson correlation

This stage of the FDPM model integrates the operational data and physics-based simulation model data into one dataset. The data collection was designed to collect data from PLC as well as external vibration measurements during the blade wear test duration (54 calendar days). The collected vibration data contained vibration raw signal data from three (3) directions (axial, horizontal, and radial) as well as calculated Root Mean Square (RMS), zero-to-peak (0-P), and peak-to-peak (P-P) amplitude values. However, a malfunction in the vibration collection setup at the start of the wear measurements hindered the ability to use the vibration excitation data in conjunction with the other process-related data collected. As a result, it was decided to only utilize the PLC data to obtain usable parameters for the RNN, thereby creating a consistent stream of data for the algorithm's training. Although, some methods are recognized in the literature for selecting features on inconsistent data like the feature selection approach on inconsistent data (FRIEND) in Qi et al. (2020) or mean acceptable error (MACE) in Kim et al. (2017), using such additional methods are excluded from this research due to the existence of other operational data adequate for the purpose. The collected data, excluding the vibration data, consisted of operational data from eleven (11) variables, listed in Table 5. Variables 1–10 are obtained from PLC and variable 11 from the simulation model.

Table 5 Input variables to Pearson correlation coefficient algorithm

Two well-known correlation methods, Pearson and Spearman’s (Myers & Sirois, 2006), were initially tested for the T2C4 dataset. Upon observing a significant similarity in the variable correlations between the two methods when comparing VbFc to other variables, the selection of Pearson over Spearman was motivated by its renowned capability in detecting linear relationships between variables measured on continuous scales (Obilor & Amadi, 2018). Consequently, the Pearson correlation coefficient was deemed more appropriate for the analysis.

The selection of input variables for the VbRNN is based on two criteria: an overall positive average score and a positive correlation to the simulated VbFc in at least two out of three data cycles. Applying these criteria resulted in the selection of six variables, denoted as X1 to X6, as presented in Table 6. After exposing the input variables to the PCC algorithm, the results are indicating the highest average correlation to VbFc from’Milled meters’,’Radial depth of cut’,’Spindle motor torque’,’Profile length’, and ‘Table feed’, with the correlation of 0.834, 0.176, 0.166, 0.076 and 0.035, respectively. All the selected variables were found to be statistically significant with both the tested methods at a significance level of p =  < 0.01, as correlations are deemed significant when p-values are below 0.05 (Obilor & Amadi, 2018).

Table 6 PCC scores for the datasets, the highest denoted as X1 to X6 inputs to the VbRNN algorithm based on their average correlation to the VbFc variable

Remaining useful lifetime prediction with a recurrent neural network

The RNNs are commonly used in real-life applications due to their proven applicability to detect dependencies in the data as well as solve several types of time series forecasting problems in high-dimensional data structures (Sagheer & Kotb, 2019). A recurrent neural network learns not only from the current time series input but can accommodate relevant information from past states of a neuron in the network (De Beaulieu et al., 2022). A widespread version of the RNN neural network is called LSTM, which was introduced by Hochreiter and Schmidhuber in 1997 (De Beaulieu et al., 2022; Samek et al., 2019).

The cell architecture in the LSTM algorithm comprises this specialized capability to learn and remember long-term dependencies with the help of dedicated backward flow. Therefore, the LSTM algorithm is selected as part of the network in a supervised manner to perform the forecasting for the FDPM model dataset containing multidimensional data in time series. The predictions are performed with different learning ratios of 50%, 70%, and 90% to obtain information and knowledge on the model’s prediction accuracy towards the end of the spindle blade set life cycle. The operational data from the test run 'T2C5' is selected for testing the RNN algorithm due to its length in data points as well as its absence in the physics-based simulation model construction. The computational evaluation was performed using an Intel(R) Core(TM) i5-8365U CPU processor with 1.60 GHz, 1896 MHz, and 4 Core(s), along with 16 GB of physical memory. The Python algorithm was executed using the Jupyter Notebook computing platform.

Architecture and training

Generally, the deep neural network developed consists of the input layer, LSTM layer, dense layer, and output layer, as presented in Fig. 11. To simplify, the complete deep neural network architecture developed in this research is further referred to as VbRNN, to contain its function to predict blade surface wear with the help of the recurrent neural network.

Fig. 11
figure 11

The VbRNN architecture used for RUL prediction contains an input, an LSTM layer, a dense layer, and an output layer

The input layer consists of six variables based on the PCC results, being’VbFc’,’Milled meters’,’Radial depth of cut’,’Spindle motor torque’,’Profile length’, and ‘Table feed’, the first being the forecasted variable. The input neurons are depicted as X1…6, respectively. To enhance network performance, sample values are normalized to fall within the range of [0, 1] in the model training phase. For predicting the next time step, a sliding window look-back technique with a value of 10 is employed to select the number of previous time steps used as input features. To prevent overfitting and conserve computational resources, early stopping is implemented to monitor the validation loss with patience of 3. This means that if no improvement is observed in the validation loss for three consecutive epochs, the training process automatically stops. The combined use of the sliding window and early stopping ensures the model's performance on the validation set is optimized and prevents unnecessary training iterations.

After the input layer, the network has a single LSTM layer with 64 hidden units and hyperbolic tangent (Tanh) as an activation function, due to its robustness and non-linear insertion capability for neural networks (Sartin & da Silva, 2013). The Tanh function compresses the input between negative 1 and positive 1, being: X \(\in \left(-\mathrm{1,1}\right)\) (Herawan et al., 2016; Zeng & Long, 2022). Rectified linear unit activation function (ReLu) and logistics sigmoid (sigmoid) activation functions were also tested for the dataset. The ReLu is commonly known for allowing to find of complex nonlinear relationships from the data. The ReLu retains the positive numbering and restrains negative numbering to zero (Nanni et al., 2022). Sigmoid is known for its capability to manage the output data of the network layer between 0 and 1 (Xu et al., 2021): X \(\in \left(\mathrm{0,1}\right)\) (Zeng & Long, 2022). However, the comparison of the Keras activation functions with 50% of training data indicates the most sufficient prediction capability for the dataset is addressed with the Tanh activation function. Tanh indicates dominancy to comply with the T2C5 test run Vb value, where the probability to converge the actual Vb value is stated as \(0.402\left(\left.0.378\right|{\text{tanh}}\right)\).

The comparison of the activation function influence on the prediction accuracy, related Mean Absolute Error (MAE) and associated computational effect in terms of training time are found in Table 7. The training time for Tanh and Sigmoid activation functions is similar, while the ReLU activation function requires significantly more time for training. The MAE indicates the model learning capability by estimating the performance degradation by comparing the estimated trend with actual performance (Pecht & Kang, 2018), where all individual differences have equal weight (Sagheer & Kotb, 2019). MAE calculation form is depicted in Chicco et al. (2021), Peng et al. (2010), Tong et al. (2022):

Table 7 Activation function Tanh, ReLu, and Sigmoid impact on the VbRNN prediction capability and associated training times
$$MAE\left(x,\widehat{x}\right)=\frac{1}{N}\sum_{i=1}^{N}\left|{x}_{i}-{\widehat{x}}_{i}\right|$$
(4)

Considering that each of the nodes is connected to the following layer, the architecture is called a fully connected network containing dependency between each active layer. Another terminology for a fully connected layer is a dense layer (Zeng & Long, 2022). The used dense layer consists of 32 nodes to reduce feature dimensionality from the LSTM layer. The dense layer is further connected to a single predicted parameter \({\widehat{X}}_{1}\)(VbFc) in the output layer.

The LSTM and dense layers are added with a dropout function with a rate of 0.2 to improve model generalization in the training (Cheng et al., 2017). Bayesian optimization function with log-uniform was tested with the following limits: low = 0.00001, high = 0.001 to uniformly sample in the logarithmic space between low and high. Adam optimization with a learning rate of 1.585E-5 was selected for the model based on the result from the Bayesian optimization function. The batch size was manually iterated receiving the best scores when the batch size = 30. Following, the MAE loss function is used to evaluate the model performance with different tested training % shares. The MAE loss behaviour is illustrated in Fig. 12, where both the train and test trends are presented with a 50% training share by using the Tanh activation function and aforementioned optimization characteristics.

Fig. 12
figure 12

The mean absolute error with the data set training size of 50%

The learning iteration capability of a model is controlled with epochs. The results declare that the model learning capability is saturated approximately after 15 epochs, however, the network was further trained to achieve good accuracy, learning rate, and loss minimization as proposed in Poornima and Pushpalatha (2019). Another widely used model prediction error was tested by applying root mean square error (RMSE) for the train data. The RMSE measures the standard deviation of the errors that the RNN architecture yields in its predictions (Géron, 2017; Lughofer & Sayed-Mouchaweh, 2019) by squaring the errors before averaging, therefore, making it more sensitive to larger errors (Wang & Lu, 2018). RMSE calculation form is given as in Chicco et al. (2021), Peng et al. (2010), Tong et al. (2022):

$$RMSE\left(x,\widehat{x}\right)=\sqrt{\frac{1}{N}\sum_{i=1}^{N}{\left|{x}_{i}-{\widehat{x}}_{i}\right|}^{2}}$$
(5)

The lowest scores for RMSE and MAE are presented in Table 8. As seen, both the RMSE and MAE scores decrease as the training data amount increases. Considering the low RMSE and MAE scores the model performance is proven, as generally acknowledged that the lower the evaluation indexes of RMSE and MAE are, the better the model performance is considered (He et al., 2021; Wang et al., 2017).

Table 8 RMSE and MAE scoring with 50%, 70%, and 90% train set with the T2C5

Remaining Useful Lifetime (RUL) prediction

The training set size and data quality conclusively affect the prediction accuracy of the model due to its capability to retrieve dependencies from previously learned data. As described earlier, the LSTM layer architecture creates long-term dependencies with the help of dedicated backward flow, which ultimately results in higher accuracy on the prognosis as more data is fed to the model. The numeric results of the prediction accuracy are illustrated in Table 9, which declares the progressive learning impact on the Vb prediction accuracy for the simulated T2C5 dataset. The prediction result of 93,6% with the given 50% training set size incrementally increases being 94,7% with 70% train set size, and 97,6% with 90% train set size when \({\widehat{x}}_{1}\)/Vbmean. Overall, the results indicate that the rake wear phenomenon aggregation with Fc calculations is relatively attainable by the VbRNN algorithm to learn as indicated by the prediction accuracy results.

Table 9 VbRNN prognostic capability for the simulated T2C5 dataset, evaluated by absolute error, related % -value, and the model prediction accuracy %

The trend behaviour between the simulated Vb progression and VbRNN prediction is visualized in Fig. 13. The prediction is distinct to receive higher accuracy together with the increase in the training data set. The’blue’ trend indicates the training share,’green’ is the simulated Vb progression, and’red’ is the VbRNN model’s capability to predict future trend behaviour. The horizontal line’Blade change limit’ is set to the selected rake surface blade change interval target of 0.4 mm to indicate the appointed blade change threshold. The VbRNNs prediction capability is visualized with a training data share of 50%.

Fig. 13
figure 13

VbRNN prediction capability visualized with training data share of 50%

The trend of remaining useful lifetime is further illustrated in Fig. 14, which employs a learning rate of 50%. The figure presents the RUL in minutes, as well as the normalized wear value of Vb for T2C5 (0.4019 mm), expressed as a percentage ranging from 100 to 0%. The normalized RUL for the system degradation is expressed as:

$$ RUL = \frac{{(T_{end} - T_{i} )}}{{T_{end} }} $$
(6)
Fig. 14
figure 14

Remaining useful lifetime in minutes and % with VbRNN prediction and ETS forecast trends

where, Tend corresponds to the time at the selected Vb value of 0.4019 mm at RUL [0%/0 min], and Ti is the current time in operation (Feng et al., 2023). Since the prediction value with a 50% training dataset resulted in the predicted Vb of 0.3761 mm at the last data point (17,748), therefore, extrapolation to reach the targeted Vb of 0.4019 mm was performed to visualize the variation of the model inaccuracy in terms of RUL minutes and % -values. The ETS is a generally acknowledged method used for time-series forecasting. The ETS uses three parameters with different weights: level, trend, and seasonal. The mathematical basis of the ETS is observable in Airlangga et al. (2019), Chen and Ho (2020), where the ETS forecast (Yt) is described in the following equation by summing up level (Ft), trends (Tt), and seasonal (St) over time (t) forecasts.

$${Y}_{t}={F}_{t-1}+{T}_{t-1}+{S}_{t}$$
(7)

The ETS method was employed to integrate 2000 historical events (data points 15,748–17,748) of the predictions made by the VbRNN. The primary utilization in this instance was to display the number of timesteps by which the VbRNN algorithm predictions fell short of the simulated wear Vb = 0.4019 mm. Thus, the RUL RUL = 0%/0 min is illustrated with the extrapolated forecast targeted on the final Vb value. The extrapolated trend based on the ETS calculation performed in the simulation model is annotated with a dashed line in Fig. 14.

The prediction results indicate that the trained dataset with a 50% training deviates by approximately 4 min of machine operation, having 6% in RUL normalized % -value left in comparison to the original T2C5 data set simulated values.

Additionally, Fig. 14 contains proposed threshold limits, which are illustrated in traffic light colours. Such limits are feasible to indicate forthcoming milling blade change operations with predetermined warning thresholds. These thresholds are set and visualized, with the first notification at 27%/15 min, and the second at 17%/10 min of the remaining lifetime. The last threshold indication is undertaken at 5 min and 8% before the predicted end of life, giving the operator sufficient time to react and prepare for the spindle change operation.

Conclusions

The assessment and measurement of blade wear in real-life applications present a significant challenge, primarily due to the reluctance of real-life manufacturing processes to undergo operational stoppages unrelated to planned or forced downtime. This limitation hampers the ability to comprehensively investigate the wear progression of milling blades, a process that often requires repetitive measurements at different stages of the milling cycle and throughout the predicted lifetime of the blades. In this research, a novel FDPM model is proposed enabling accurate blade wear predictions yet minimizing the need for additional downtimes in production often required for empirical studies. While previous studies consider physical dynamics with the inclusion of machining data, they often lack attainability and efficient implementation for in-production industrial assets. The FDPM's open architecture and demonstrated capability to predict availability with limited training data address these gaps left by previous research.

The validation of the physics-based simulation model demonstrates its excellent predictive capability for the VbFc, yielding a low average relative error of 2.38% when compared to the measured mean of the milling cycle rake wear. This finding underscores the model’s accuracy and reliability in simulating real-world scenarios. However, the model's foundation heavily relies on the physics-based simulation model, which incorporates the physical behaviour occurring during milling machine operations. As a consequence, the model’s physics-based simulation model construction process requires only a limited amount of operational data. Whereas the model relies highly on the physical features, the reliance on specific physical effects in the simulation model's architecture limits its direct generalizability for use in other applications.

The PCC stage of the FDPM process is incorporated to retain the variables having a positive correlation to Vb, which enables faster training times and improves the model performance of the VbRNN prediction algorithm by reducing the dimensional space. The VbRNN prediction method is considered suitable for the peripheral milling machine setup due to its capability to retain information on past occurrences, as well as the capability to adapt changes in the production setup while offering transparent implementation of the selected features.

The prediction accuracy of 93.6% with the limited 50% training data gives a profound ground towards further developing the FDPM model’s RUL prediction for onsite use. In the results, the prediction result of 93,6% with the given 50% training set size incrementally increases being 94,7% with 70% train set size, and 97,6% with 90% train set size. These results highlight the excellence of the FDPM model and lay a solid groundwork for the potential development of semi-autonomous or fully autonomous variable selection, prediction offset analyzing, and RUL estimation systems, thereby enhancing productivity and enabling on-site utilization of FDPM in milling machines.

The constrained availability of comprehensive machine-related datasets hindered the opportunity to conduct a more extensive and thorough investigation of the FDPM model over a prolonged time frame. In future research, it is recommended to give priority to incorporating the FDPM model into diverse applications. This will allow for an assessment of the model's generalizability across various configurations, considering the physical differences present in different applications. By testing the FDPM model in different contexts, researchers can gain valuable insights into its adaptability and robustness, ultimately enhancing its practical utility and widening its scope of applicability. Particular attention needs to be given to prediction accuracy made at the beginning of the milling cycle with low training share, where data may be scarce, resulting in higher uncertainty in predictions. Furthermore, future research should address any anomalous behaviour that could affect the prediction accuracy of the FDPM model and how effectively the ETS can be used to extrapolate the offset in predictions while affected by anomalous instances. It is essential to note that the FDPM model in this research is limited to detecting blade wear exclusively during normal operation, and any potential anomalous occurrences should be thoroughly investigated to understand their impact on the model's performance in such circumstances. However, this study has appointed significant avenue for improving availability in high-energy milling applications, especially where other wear measurement methods pose implementation challenges.