Face milling is widely used today to obtain flat surfaces. One of the main precision criteria in face milling is surface roughness, Ra. Surface roughness has in various papers (Fernández-Valdivielso et al. 2016; Pimenov 2013a, 2014) been shown to increase with increased flank wear of face mill teeth. Workpiece specifications will usually detail the maximum roughness value that has to be machined. As the worn surface of the cutting edge increases, the cutting edge of the replaceable indexable inserts (Gong et al. 2017; Mikołajczyk et al. 2017, 2018) must therefore be changed, before the maximum surface roughness is reached. Such an approach permits optimum use of the cutting edges of the replaceable indexable inserts (Grigoriev et al. 2015; Machado and Diniz 2017; Niaki and Mears 2017), while ensuring the specified surface roughness. Thus, the most relevant task is to guarantee a given surface roughness as the wear on the face mill surface increases (Pimenov 2013b). As we approach the sixth technological revolution, the analysis of large volumes of data with artificial intelligence (AI) and the integration of AI algorithms in computer-aided production are becoming increasingly relevant.

Many researchers have studied surface roughness in face milling (Baek et al. 1997; Adamczak et al. 2009; Miko and Nowakowski 2012; Arizmendi et al. 2009; Rosales et al. 2010; Muñoz-Escalona and Maropoulos 2015; Felho et al. 2015; Zhenyu et al. 2015; Moghaddam and Kolahan 2016; Popov and Schindelarz 2017; Jersák and Simon 2017). Baek et al. (1997) presented a mathematical model for surface roughness prediction taking into account the dynamic characteristics of the face-milling operation. Adamczak et al. (2009) and Miko and Nowakowski (2012) developed a generalized mathematical model of surface roughness formation for surfaces shaped with round-nose tools. Arizmendi et al. (2009) presented a model for the prediction of surface topography in peripheral milling operations processing signals that capture tool vibration during the cutting process. However, tool wear is not a parameter of that model. Rosales et al. (2010) investigated the dependence of rotation speed, feed rate, cutting depth, tool geometry, and runout errors in face milling. Muñoz-Escalona and Maropoulos (2015) designed a geometrical model for the prediction of surface roughness in face milling with square insert tools. Felho et al. (2015) presented the estimated relations between the calculated theoretical and measured real roughness values, allowing researchers to predict the machined surface roughness. Zhenyu et al. (2015) suggested an algorithm for predicting surface roughness that takes into account the influence of both static and dynamic factors on the roughness of the face milled surface. Moghaddam and Kolahan (2016) studied the influence of face milling parameters [cutting speed, (V), feed rate (\(f_{z})\), and depth of cut, (\(a_{p})\)] on surface roughness, Ra, in the face milling of AISI1045 steel. Popov and Schindelarz (2017) investigated the effect of hydraulic oil entering the cutting fluid on tool life and roughness when milling stainless steel. Jersák and Simon (2017) studied the influence of cooling lubricants on surface roughness and the energy efficiency of cutting machine tools for the three following techniques: lubricant-free cutting, cutting with the use of a lubricant with the MQL technique, and only utilizing finish-turning and finish-face milling. However, no account of the impact of face mill wear on surface roughness as it forms has been found in these references (Baek et al. 1997; Adamczak et al. 2009; Miko and Nowakowski 2012; Arizmendi et al. 2009; Rosales et al. 2010; Muñoz-Escalona and Maropoulos 2015; Felho et al. 2015; Zhenyu et al. 2015; Moghaddam and Kolahan 2016; Popov and Schindelarz 2017; Jersák and Simon 2017).

Much of the research devoted to the study of surface roughness in face milling (Diniz and Filho 1999; Caldeirani Filho and Diniz 2002; De Souza et al. 2003, 2005; De Escalona and Maropoulos 2010; Prasad et al. 2011; Houchuan et al. 2015; Liu et al. 2016; Shi et al. 2016; Werda et al. 2017) takes tool wear into account. Diniz and Filho (1999) studied the influence of the relative positions of tool and workpiece on tool life, tool wear, and surface finish in the face milling process, taking tool wear into account. Caldeirani Filho and Diniz (2002) investigated the influence of cutting speed and feed per tooth on tool life and the surface roughness of machined surfaces during mechanical milling, taking into account tool wear. De Souza et al. (2003) defined the parameters of surface roughness, Ra, tool life, and burr formation and used them when comparing the performances of two face-milling cutter systems with PCBN (polycrystalline citric boron nitrite) tools. De Souza et al. (2005) defined the parameters of surface roughness, waviness, tool life (based on flank wear), and burr formation and used them to compare the performance of the A system with 24Si\(_{3}\)N\(_{4}\) ceramic inserts and the B system in face milling. De Escalona and Maropoulos (2010) studied the influence of feed per tooth and cutting speed on tool wear and surface roughness in the face milling of martensitic stainless steel. Prasad et al. (2011) studied the impact of offsets due to vibration during face milling and the correlation of surface roughness with tool wear under different cutting conditions while monitoring the tool state. Houchuan et al. (2015) investigated surface roughness, machining defects, microhardness, and microstructure variations at different cutting speeds and tool average flank wear values in the face milling of a titanium alloy, Ti–10V–2Fe–3Al (Ti-1023). Liu et al. (2016) investigated the process of tool damage and its effect on machined surface roughness in high-speed face milling of 17-4PH stainless steel. Shi et al. (2016) investigated tool wear behaviors and their effect on machinability in dry high-speed milling of magnesium alloy at high cutting speeds. Werda et al. (2017) compared tool life and surface roughness in the milling of X100CrMoV5 mold steel under different lubrication conditions: dry machining and minimum quantity lubrication (MQL). Even though many authors (Diniz and Filho 1999; Caldeirani Filho and Diniz 2002; De Souza et al. 2003, 2005; De Escalona and Maropoulos 2010; Prasad et al. 2011; Houchuan et al. 2015; Liu et al. 2016; Shi et al. 2016; Werda et al. 2017) have presented the results of surface roughness studies that take into account tool wear, no solutions to the problem of how to establish a comprehensive correlation of surface roughness, tool wear, and processing time have been advanced from the perspective of managing these settings in automated production. This task is however made possible with AI.

Many researchers have developed surface roughness prediction models in face milling using AI (Srinivasa Pai et al. 2002; Benardos and Vosniakos 2002, 2003; Saglam and Unuvar 2003; Bruni et al. 2008; El-Sonbaty et al. 2008; Lela et al. 2009; Muñoz-Escalona and Maropoulos 2010; Razfar et al. 2011; Bharathi Raja and Baskar 2012; Grzenda et al. 2012; Bajić et al. 2012; Kovac et al. 2013; Simunovic et al. 2013; Grzenda and Bustillo 2013; Elhami et al. 2013; Saric et al. 2013; Rodríguez et al. 2017; Simunovic et al. 2016; Selaimia et al. 2017; Svalina et al. 2017). Srinivasa Pai et al. (2002) presented an estimation of flank wear in face milling based on the radial basis function (RBF) of neural networks using acoustic emission signals, surface roughness, and cutting conditions (cutting speed and feed). Benardos and Vosniakos (2002) introduced the neural network modeling approach to predict surface roughness, Ra, in Computer Numerical Control (CNC) machine face milling. The paper takes into account depth of cut, the feed rate per tooth, the cutting speed, the use of cutting fluid, and cutting tool wear. Saglam and Unuvar (2003) suggested the use of a multilayer neural network for status monitoring and the estimation of flank wear and surface roughness in face milling. Benardos and Vosniakos (2003) aimed to present the various methodologies and practices used in surface roughness prediction. Bruni et al. (2008) offered analytic and Artificial Neural Network (ANN) models for surface roughness prediction at various cutting speeds and under various cooling conditions in face milling finishing processes on AISI 420 B stainless steel. El-Sonbaty et al. (2008) used ANNs to develop models for predicting the correlation between the cutting conditions and the corresponding fractal parameters of machined surfaces in face-milling operations, including surface roughness, without taking into account the cutting tool wear. Lela et al. (2009) studied the influence of cutting speed, feed, and depth of cut on surface roughness in face milling. Three different modeling methodologies, namely regression analysis (RA), support vector machine (SVM), and Bayesian neural network (BNN), have been applied to data determined in the process of experimental design. Muñoz-Escalona and Maropoulos (2010) developed various ANNs for predicting the surface roughness (Ra) of 7075-T7351 aluminum alloys in face milling. Razfar et al. (2011) presented an approach that defines the optimum cutting parameters that provide minimum surface roughness in face milling of X20Cr13 steel by combining an ANN and the harmony search algorithm. Bharathi Raja and Baskar (2012) conducted experimental studies of the influence of machining parameters such as the cutting speed, feed rate, and depth of cut on the surface roughness of aluminum and the provision of design surface roughness in face milling using Particle Swarm Optimization (PSO). Grzenda et al. (2012) presented a new strategy for improving the AI models for predicting surface roughness using small datasets tested in high-torque milling operations. Bajić et al. (2012) examined the impact of various cutting speeds, cutting depth, and tool wear on surface roughness and the cutting force components. Kovac et al. (2013) studied the influence of machining parameters on surface roughness in face milling, comparing prediction models based on fuzzy logic and RA. Simunovic et al. (2013) presented a study of surface roughness in the face machining of an aluminum alloy at low cutting speed using neural networks; however no changes in the degree of tool wear were monitored. Grzenda and Bustillo (2013) focused on the initial data transformation and its effect on the prediction of surface roughness in high-torque face milling operations. Elhami et al. (2013) suggested a genetically optimized neural network system for the prediction of constrained optimal cutting conditions in the face milling of a high-silicon austenitic stainless steel, in order to minimize surface roughness. Saric et al. (2013) presented a study of the prediction of machined surface roughness in the face milling of steel at various numbers of revolutions, cutting speeds, feeds, and depths. Rodríguez et al. (2017) presented an AI-based decision-making tool for selection of the right cutting tools for face milling, one of the criteria for which is the roughness of the machined surface. Simunovic et al. (2016) presented a machined surface roughness investigation based on the features of a digital image such as spindle speed, feed per tooth, and cutting depth, but without considering tool wear; the digital image was produced following a milling operation of an aluminum alloy Al6060. Selaimia et al. (2017) modeled the output responses, namely: surface roughness (Ra), cutting force (FC), cutting power (PC), specific cutting force (KS) and metal removal rate (MRR) during the face milling of the austenitic stainless steel X2CrNi18-9 with coated carbide inserts (GC4040). ANOVA was used for evaluating the influence of the cutting parameters: cutting speed (VC), feed per tooth and depth of cut (aP) on the output responses. Svalina et al. (2017) presented an evolutionary neuro-fuzzy system for recommending optimal cutting parameters and controlling surface roughness during mechanical milling without taking account of tool wear. A further development is a solution to the problem of managing surface roughness in a complex correlation of tool wear, time of machining, and cutting force.

Besides these well-established AI techniques, ensemble methods (Kuncheva 2014) simultaneously use several AI models, where each model provides its own prediction and all the predictions are combined. The high accuracy of ensemble predictions has been demonstrated in many milling processes: Bustillo et al. (2011a) proposed the use of ensembles to predict surface roughness in ball-end milling operations. Maudes et al. (2017) used random forest (RF) ensembles for the prediction of dimensional parameters in laser micro-manufacturing of stents, and Ferreiro and Sierra (2012) used different kinds of ensembles for burr detection in a dry-drilling process on aluminum Al 7075-T6.

Finally, although the application of decision or regression trees to roughness prediction in face-milling operations is not very common, it is necessary to stress that decision trees are one of the most common machine-learning techniques, due to their ability to generate clear rules that the final user of the model can use immediately. Some complex manufacturing problems like laser polishing (Bustillo et al. 2011b), maintenance planning of five-axis milling (Freiburg et al. 2014), laser micromachining of cavities (Teixidor et al. 2015), and deep drilling (Bustillo et al. 2016) have been successfully modeled using regression trees. In all these cases, the decision trees create models that are as accurate as other standard machine-learning techniques, such as ANNs, and are suitable for both continuous and discrete outputs.

As the worn surface of the face mill tooth increases, the cutting force also increases. López De Lacalle et al. (2006) proposed an online correlation, taking account of the geometry of the machined surface and the cutting forces generated during surface milling, by means of a monitoring system capable of detecting cutting forces and cutter tool position of the. Rivero et al. (2008) evaluated the suitability of a tool wear monitoring system based on machine tool internal cutting force signals in dry high-speed milling of aluminium alloys. (Guzeev and Pimenov 2011) presented a mathematical model of cutting forces using various cutting conditions, material properties, and the flank worn surface of face mills as variables. Compeán et al. (2012) described a three-stage multivariable tool used for high-performance milling and the modelling of its dynamic behaviour to predict stability against chatter. However, tool wear was not considered in the paper. (Dugin and Popov 2013) investigated ways of increasing the accuracy of the effect of processing materials and cutting tool wear on the plowing force values. (Pimenov and Guzeev 2017) developed a mathematical model of plowing force to account for flank wear. Artetxe et al. (2017) described the implementation of a cutting force prediction model for milling that introduces radial engagement reduction caused by tool runout and workpiece flexibility, although tool wear is not considered.

An increase in the worn surface of face mill teeth also leads to an increase in cutting power. Shao et al. (2004) described a cutting power model for face milling using a cutting power threshold-updating strategy for tool-wear monitoring. Bhattacharyya et al. (2008) suggested a method for continuous online evaluation of tool wear in face milling based on inexpensive measurements of current and voltage of the spindle motor. Bhattacharyya and Sengupta (2009) used a combination of signal processing techniques to obtain improved and robust estimates of tool wear. da Silva et al. (2016) demonstrated the use of a probabilistic neural network in monitoring tool wear in the end-milling operation via acoustic emission and cutting power signals. Pimenov (2015) suggested a mathematical model of main drive power that can be used for controlling face milling conditions in the process of cutting tool wear. Niaki et al. (2015) used two methods of stochastic filtering, Kalman and particle filter, to predict flank tool wear when machining difficult-to-machine materials through spindle power consumption measurements, but without considering the the correlation of these parameters with the roughness of the machined surface. Urbikain et al. (2017) proposed a methodology to reduce machining problems by means of a simulation utility, which uses the main variables of the system and process as input data, and generates results that help in the proper decision-making and machining plan. Direct benefits can be found in (a) the fixture/clamping optimal design; (b) the machine tool configuration; (c) the definition of chatter-free optimum cutting conditions and (d) the right programming of cutting toolpaths at the Computer Aided Manufacturing (CAM) stage. Many modern machines have such systems, which means that they are low-cost and can be used for online monitoring (Niaki and Mears 2017; Niaki et al. 2015).

It is possible to manage the process of changing surface roughness directly by controlling the amount of tool wear through dimensional wear. To do so, the CNC machine needs to have a contact control sensor, for instance a Renishaw. In this case, wear is monitored between tool replacements. Whenever the tool wear reaches a point where the design surface roughness cannot be attained or the tool wear approaches maximum values, forced cutting tool replacement is necessary. In this case, the direct method of controlling tool wear will be used (Niaki and Mears 2017).

However, not all machines are equipped with such sensors. Many CNC machines, though, are equipped with feedback sensors that, for instance, control the drive power and in particular the main drive power. As soon as the correlation between the tool wear and the main drive power has been established, the specified surface roughness can be attained by monitoring the current power value during the machining process.

The first two variants are impossible to implement with universal manual machine tools. However, by establishing a correlation between the worn surface of the tool and the processing time, the specified surface roughness can be obtained by monitoring the current processing time, for example, through the number of processed machine parts. Cutting tool replacement is forced whenever the current processing time reaches the point where the design surface roughness can no longer be attained or the time value approaches the maximum. In this case, an indirect method of controlling tool wear will be used employing the current machining time.

The disadvantage of the first and third methods is that they can not be implemented in the pause during tool replacement. If, for instance, the tool is damaged during machining, real-time tool replacement is not possible with these methods; a drawback that is not shared by the second method. It may be avoided by combining the first method with the third method, provided that the machine has such a capability.

However, to implement the above methods, it is necessary to have the relevant models of surface roughness and main drive power depending on tool wear.

Thus, the purpose of this work is to create models for predicting the roughness of machined surfaces in a complex correlation between tool wear, machining time, and cutting power using AI with the aim of integrating AI algorithms in online monitoring of automated manufacturing. Furthermore, the analysis of how these models can provide useful and immediate information for the process engineer is considered: in some cases, such as regression or decision trees, this information can be directly extracted from the model structure, but in other cases, such as ANN models, interpretation of the black-box structure is not easy and the development of 3D charts becomes necessary (Bustillo et al. 2016).

Materials and method

Experimental face milling was performed, in order to establish a complex correlation between the machined surface roughness Ra, size of the flank surface \(V_{B}\), input cutting power N, and machining time, t. To do so, a machine part manufactured from 45 steel with the dimensions \(L = 200\,\hbox {mm} \times B = 75\,\hbox {mm} \times H = 100\,\hbox {mm}\) was machined on a Mori Seiki NMV 5000 CNC machining center for drilling, milling, and boring. The experiments were conducted in the Engineering Scientific and Educational Center of South Ural State University.

The composition of carbon-quality structural steel 45 (similar to ANSI 1045) according to GOST 1050-88 is listed in Table 1.

Table 1 Chemical composition and hardness of 45 steels

The hardness of the workpiece was HB190 when tested with a Brinell indenter TB 5004-03. The machining process included cooling and was carried out using a Pramet 8230 insert (the manufacturer recommends using the hard 8230 alloy in milling toughened corrosion-resistant alloy steels and dedicated alloys). The main cutting parameters of the tool are specified in Table 2.

Table 2 Tool parameters

All cutting parameters employed for the face milling operations are specified in Table 3. The geometry is acceptable for mild and mid-carbon steels. These cutting conditions were selected following the cutting tool provider specifications and the machine-tool capabilities. Although cutting conditions are regularly changed in experiments that focus on roughness quality prediction, in order to generate extensive datasets (Bustillo et al. 2011a), the number of different cutting conditions is strongly reduced (Prasad et al. 2011) or just limited to one cutting condition, as in this research (Rivero et al. 2008), in the case of including tool wear.

Table 3 Specifications of cutting conditions

The main drive power was established in accordance with the online readings made using a Mori Seiki NMV 5000 during machining. The minimum and maximum readings were recorded at the beginning, in the middle, and at the end of the mill operating cycle as percentages of the machine power equal to 22 kW. There are therefore six cycles in each experiment (\(k = 6\)).

After each cycle, a macrograph of the flank surface of the face mill tooth was taken. The images were processed on a personal computer and the flank wear land of the mill tooth was measured by comparing the image with a dimensional ruler (see Fig. 1).

Surface roughness measurements, Ra, were taken from Abris-PM7.0 profilometer readings. The measurements were recorded for the basic length \(L = 0.4\hbox { mm}\) at the beginning (point 1), in the middle (point 2), and at the end (point 3) of the working stroke of the mill. Points 1 and 3 are located at a distance of 90 mm from the center of the part (see Fig. 2). The processing time (cutting pass), t, values in each working stroke are established as follows: \(t = L/{ fzn}\) (see Table 3).

Thus, the experimental points were the main drive power, N, size of flank wear-land of the mill tooth for various worn surfaces, \(V_{B}\), processing time, t, and roughness of the machined surface, Ra, and the experimental results for each point are shown in Table 4.

Graphs (a–d) of the average experimental values (see Table 4) are shown in Fig. 3: a—drive power primary motion, N, from Processing time, t; b—surface roughness, Ra, from Processing time, t; c—drive power primary motion N from Flank wear \(V_{B}\); d—surface roughness Ra from Flank wear \(V_{B}\).


The dataset was generated from the experimental results listed in Table 4. Each dataset included five inputs: the first two were the processing time (t) and flank wear (\(V_{B}\)), and the last three were minimum (\(N_{min}\)) and maximum (\(N_{max}\)) drive power and average values (\(N_{mean}\)). The output is the measured surface roughness, Ra. All the inputs and the output are continuous numeric variables. As described in “Modeling” section, for each of the 35 experiments, the roughness and the drive power were evaluated in three positions or ranges: at the beginning, in the middle, and at the end. Therefore each experimental test generated three instances of the dataset for each of these positions; the processing time and the flank wear were extrapolated considering each experiment and the previous experiment. Therefore the datasets include 105 instances. Table 5 summarizes the inputs and outputs, their units, and the range of values presented in the dataset; the output variable is shown in bold.

Fig. 1
figure 1

Wear of changeable indexable insert (CII) on the plowing surface: flank wear \(V_{B}\)

Fig. 2
figure 2

Scheme of control points of surface roughness Ra

Table 4 Experimental results of the drive power primary motion, N; processing time, t; flank wear, \(V_{B}\); and, surface roughness, Ra

Surface roughness prediction is a regression problem from the AI point of view and the machine-learning techniques are known as regressors. The dataset is randomly divided into two sets: the first one with the instances used to train the model is called the training set, while the second one, used to measure accuracy, is called the validation set. Therefore the validation set includes only instances that have not been used during the training stage, limiting the overfitting tendency of the prediction models; i.e. their natural tendency to fit the training dataset perfectly and their loss of accurate predictive capability under new conditions. In this way it is possible to evaluate the generalization capabilities of the prediction model to deal with new instances.

When talking about roughness prediction (Benardos and Vosniakos 2003), the quality of the predictions is usually measured with the Root Mean Square Error (RMSE), calculated with Eq. 1. This indicator is the squared root of the sum of squares of the prediction errors for each instance divided by the number of instances. Although the RMSE does not provide a physical measure of the error variable such as, for example, the mean error, it has the advantage of penalizing the models with very wrong predictions for some instances, as it considers the squares of the errors and not the errors themselves. Therefore the RMSE is more suitable than the mean error for this study, because a roughness prediction that is far from the real value will mean the production of a workpiece that does not meet the end-user’s requirements.

$$\begin{aligned} { RMSE}=\sqrt{\frac{\sum _{t=1}^n {(\hat{{y}}_t -y} )^{2}}{n}} \end{aligned}$$

The following regressors have been tested using Weka machine learning software (Hall et al. 2009): multilayer perceptrons (MLP), radial basis functions (RBFs), regression trees and random forest (RF). Besides, two baseline methods, ZeroR and linear regression, were tested as baseline methods.

MLPs and RBFs, the most common ANN typologies, are considered as standards in the prediction of surface roughness in machining processes (Benardos and Vosniakos 2003). MLPs use a back-propagation algorithm to calculate the weights for each connection between neurons (Bishop 1995), while RBFs use a radial basis function as the activation function in the neurons of the hidden layer (Leonard and Kramer 1991). While MLPs can have more than one hidden layer of neurons, there is only one layer in an RBF that is able to perform non-linear calculations directly.

In contrast, ANNs are considered black-box models, because the end-user cannot extract direct knowledge from them. Regression trees provide visual models of special interest for the process engineers who are in this case the end-users. The engineer should begin to read a regression tree from the upside (root node) and follow the branch that fits the process conditions defined by a certain combination of model inputs until reaching the downside of the tree: the final leaf that provides a linear regression model of the surface roughness. The attributes and numeric thresholds in each decision node are fixed by the M5P algorithm (Muñoz-Escalona and Maropoulos 2010) when it subdivides the training instances into pure subsets (i.e., subsets where almost all training instances are prone to fit the final linear model included in the leaf). The M5P criterion for obtaining pure subsets is to maximize the expected error reduction (Quinlan 1992). Figure 5 in “Results and discussion” section shows an example of the regression trees built in this research.

Fig. 3
figure 3

Graphs ad of the average experimental values: a drive power primary motion, N, from Processing time, t; b surface roughness, Ra, from Processing time, t; c drive power primary motion, N, from Flank wear, \(V_{B}\); d surface roughness, Ra, from flank wear, \(V_{B}\)

Regression ensembles (Mendes-Moreira et al. 2012) were also tested to improve the accuracy of regression trees. The ensembles are combinations of base regressors, referred to as regression trees in this study, that combine their individual predictions in a final result. In this research, the base regressors are Random Trees, a variant of regression trees where a random subset of the attributes is considered when selecting the decision for each node. RF (Breiman 1996) is an ensemble built with Random Trees that are trained with different sub-datasets of the training dataset (Breiman 2001) to make different predictions and the final result is the average of the predictions of each tree.

The last two regressors are often used as baseline methods for the purpose of comparison with the naïve approach and the regression model, as stated by previous works on surface roughness (Maudes et al. 2017; Teixidor et al. 2015). The naïve approach uses the mean value of the output as a prediction that is independent of the input values; in this study, the naïve approach will always predict a surface roughness of 1.20 \(\upmu \)m (the mean roughness value considering the whole dataset). As a consequence, the error of the prediction model should be significantly lower than the error for the baseline method to assure the ability of the machine-learning model to predict new situations in terms of drive power and tool wear. In the Weka software tool, the naïve approach is called ZeroR. The second baseline, a linear regression model, was selected because it is usually simple enough to fit the main relationship between the process inputs and the output, at least in some input ranges, besides having extensive applications in real workshops.

The technique selected to train the models and for their validation was tenfold cross-validation repeated 10 times (Kohavi 1995). In this technique, the original dataset is, first of all, randomly divided into 10 equal-size datasets called folds; then, ninefolds are used to train the regressor and the last fold is used for its validation. As previously mentioned, regressor accuracy is thereby measured on instances that have not been used to train the model. This process is repeated 10 times, each time using a different fold for validation and the other nine for training, and the RMSE of the validation folds in the 10 repetitions is averaged. Hence, the variance of the regressor accuracy is reduced and its prediction results can be generalized (Kohavi 1995).

Table 5 Dataset variables and their variation range

All the machine-learning algorithms have different parameters that should be tuned to find the optimal values for each dataset. A grid search was performed on the main parameters of each regressor, to provide a general overview of the effect of each tuning parameter; the change in accuracy of the prediction model was used to fix the number of steps and the increment in the search grid: if the parameter has an influence of more than 50%, the tuning range is divided into four steps; if the accuracy of the model changes due to a parameter with an extreme value in the tuning range that is higher than 50%, the parameter will be tuned in steps of the same size as the lowest parameter value, while if the change is lower than 50%, each step will double the value of the parameter until the upper extreme is achieved. In this grid search, all the values considered for any tunable parameter are tested against the other parameters, as in a full-factorial experiment; this extensive build-up of prediction models can only be done because the execution time for a small size dataset is not significant. Besides, this factor also allows the grid test to be performed on the tenfold cross-validation scheme.

Table 6 Parameters of each regressor that are tuned, variation range, and steps
Table 7 Precision test results of the machine-learning methods

Table 6 summarizes the tuned parameters, their tuning range, and the number of steps in which the parameters are varied. In the case of MLPs, three parameters are tuned: momentum, learning rate, and number of neurons in the hidden layer; for RBFs, regression trees, and RF, only one parameter was tuned: ridge, the minimum number of instances per leaf, or the number of iterations, respectively. Moreover, RF techniques are easily tuned and generate visual information for direct use by the process engineer, such as the linear relationships between process parameters and roughness, and thresholds for avoiding rapid tool wear. Therefore, the number of different configurations that were tested amounted to 336 for MLPs, 11 for RBFs, four for the regression trees, and five for RF. Because MLPs have three parameters to tune while the other models have only one, longer tuning times and an expert to perform the tuning process will be needed for MLPs compared with the other techniques.

Results and discussion

Table 7 shows the results of both RMSE and computational time for the best configuration of the tested machine-learning models and the two baselines methods for two datasets. One is presented in “Modeling” section and the other is a simplified version of this dataset without the minimum and maximum drive power. The best MLP configuration for the case of the complete dataset (columns under the heading “With \(\hbox {N}_{\mathrm{min}}\) and \(\hbox {N}_{\mathrm{max}}\)” in Table 7) is two hidden layers, a learning rate of 0.05, and a momentum of 0.3, while the most accurate models for RBFs, regression trees, and random forest (RF) are achieved for a ridge of 0.1, a minimum of eight instances, and 400 iterations, respectively. For the case of the incomplete dataset (columns under the heading “Without \(\hbox {N}_{\mathrm{min}}\) and \(\hbox {N}_{\mathrm{max}}\)” in Table 7), the best MLP configuration is one hidden layer (as expected due to the reduction in the number of attributes), a learning rate of 0.05, and a momentum of 0.2 while there are no changes in the parameters for the rest of the machine-learning methods except for the ridge value for the RBFs, which is 0.0001. The computational time has been calculated using an Intel Core i5 2300 2.8-GHz processor. The RMSE values listed in Table 7 show that all the machine-learning models are statistically more precise than ZeroR and linear regression, the two baseline methods considered; the statistically significant differences are calculated considering the corrected resampled t test (Nadeau and Bengio 2003) with a significance level of 5%. RBFs are more accurate than MLPs if \(\hbox {N}_{\mathrm{min}}\) and \(\hbox {N}_{\mathrm{max}}\) are not considered, while the opposite happens if both inputs are considered. Regression trees are at least 15% more precise than the best MLP or RBF configurations. RF has an even greater accuracy of between 33% for the complete dataset and 42% for the uncompleted dataset, with a statistically significant difference in both cases.

Considering the standard deviation of the roughness in the dataset, 0.88 \(\upmu \)m, the RMSE can be considered a little high (30% for the best model), although still acceptable from the industrial point of view compared with similar works (Bustillo et al. 2011a, b; Maudes et al. 2017; Teixidor et al. 2015). Figure 4 shows the dataset prediction error for each instance (cross size) for the RF model to analyze this fact. It can be seen that the higher errors are all related to instances with high roughness values; this result can be expected due to the small account of instances with high roughness values: the machine-learning models are programmed to minimize the total error, so a limited number of instances with high roughness values will be disregarded by the model that will produce higher prediction errors for those instances.

Fig. 4
figure 4

Predicted roughness versus real roughness for the RF model

Fig. 5
figure 5

Regression tree for the prediction of surface roughness

With regard to the computational times listed in Table 7, the quickest machine-learning model was the simplest: the regression tree. Ensembles and ANNs required longer computational times. Although RF required twice as long as the MLPs, it has to be remembered that the tuning process of an MLP is more complex, with 336 configurations versus five configurations of RF, making the total time required to obtain an accurate model significantly worse for the MLPs than the ensembles.

Besides this analysis, although regression trees provide a model that is 11.5% less accurate than RF ensembles, they provide visual and useful information on the cutting process to the process engineer. Figure 5 shows the M5P tree built with the whole of the completed dataset. All the linear models (LMs) built at the leaves of the tree have the same structure as described in Eq. 2, where C1 to C3 are different constants.

$$\begin{aligned} { Ra} = \hbox {C}_{1}*N_{mean} + \hbox {C}_{2}*t + \hbox {C}_{3 } \end{aligned}$$

Besides, all the nodes evaluate the same input: the processing time, t. Therefore some immediate conclusions can be extracted from this tree: first, the relationship between drive power and roughness can be considered linear within small ranges of processing time. Second, \(V_{B}\), \(N_{{ min}}\), and \(N_{{ max}}\) are not required by the model to predict Ra; this result does not mean that those variables play no role in the surface roughness of the face milling process. It does mean, however, that the regression tree is able to extract the information included in these inputs from the processing time and the average drive power. In this way, if the processing time with one cutting tool under fixed cutting conditions is known and if the RMSE of the regression tree is sufficient, the process engineer can predict surface roughness without stopping the machine to measure the flank wear. The third conclusion is related to the LMs: \(\hbox {C}_{1}\) takes the same value (0.1933) for all LMs except LM6 where its value is more than twice as high (0.4279); therefore the process engineer knows that if the processing time is over 43 min, the average drive power should be carefully monitored because of its strong influence on roughness.

Finally, if the RF models are used to evaluate the different cutting conditions, further conclusions of industrial interest can be generated. First a 3D plot was built (Fig. 6), showing the average drive power, \(N_{{ mean}}\), on the X-axis, the processing time, t, on the Y-axis, and the predicted roughness, Ra, on the Z-axis. This 3D plot is built by erasing \(V_{B}\), \(N_{{ min}}\), and \(N_{{ max}}\) from the dataset and by retraining the model (with an RMSE in the same range, 0.2457 \(\upmu \)m, as the completed dataset). As the accuracy of the model is within the same range, RF ensembles are also able to extract the information included in \(V_{B}\), \(N_{{ min}}\), and \(N_{{ max}}\) from the other inputs. The average drive power and the processing time are evaluated in the same experimental ranges as shown in Table 5, because no machine-learning model can securely predict that any process outside of its training range will achieve a reasonably accurate prediction.

Fig. 6
figure 6

Roughness predicted by RF ensemble as a function of time and drive power

Fig. 7
figure 7

Roughness predicted by the RF ensemble as a function of flank wear and drive power

Figure 6 shows that the processing time has the highest influence on workpiece roughness, especially if the processing time exceeds 40 min. The influence of drive power is smaller, but depending on the processing time, there is a border that will make the roughness increase by around 0.15–0.4 \(\upmu \)m, a considerable step depending on the required surface quality of the workpiece.

A second useful 3D graph can be generated considering the average drive power on the X-axis, the flank wear on the Y-axis, and the predicted roughness on the Z-axis (Fig. 7). When generating this 3D plot, t, \(N_{{ min}}\), and \(N_{{ max}}\) are erased from the dataset and the model is retrained. In this case, the accuracy of the model was lower: 0.4139 \(\upmu \)m (RMSE), an expected value considering that the processing time was one of the main information sources for the machine-learning model. Despite the decreased accuracy of the model, the conclusions that the 3D plot can provide can be very useful for the process engineer. Again, the average drive power and the flank wear were evaluated in the same range as for the experiments (Table 5).

This 3D plot shows the case where the processing time is unknown or cannot be evaluated. For example, the very common situation in production centers where different cutting conditions are applied during the machining with the same cutting tool. In this case, the process engineer can stop the machine before the cutting tool begins a new operation, evaluate the flank wear, and monitor the power drive to obtain a rough prediction of the surface roughness. So, if the drive power and flank wear are low, the roughness may remain under low values, but if either of these two parameters exceeds a certain threshold, namely 2.2 kW for drive power and 0.43 mm for flank wear, surface roughness will increase dramatically in a very short period of time (depending on the respective speed of the increase in both drive power and flank wear). Therefore the process engineer can set security thresholds in the cutting process that will not be exceeded by monitoring the drive power, stopping the machine, and measuring the flank wear.

In summary, the proposed AI models can be used in two ways by the process engineer: first, regression trees can be built to provide immediate visual information such as the linear relationship between drive power and roughness in small ranges of processing time or up to a limit of 43 min of processing time; exceeding this limit implies that the drive power should be carefully monitored, because its influence on the roughness becomes very strong. Second, if high model accuracy is required, RF models can be used to build 3D charts considering two inputs. When drive power and processing time are considered, the same limit of around 40 min, defined for the regression trees, was identified. If the values of both drive power and flank wear are low, the roughness may remain at low values. However, if either of these two parameter values exceeds a certain threshold, namely 2.2 kW for drive power and 0.43 mm for flank wear, roughness levels will increase dramatically over a very short period of time (depending on the respective increase of either the drive power speed or the flank wear).


The productivity of milling operations is mainly limited by tool wear, which produces degradation in the surface quality of the machined workpiece. In real workshops, in some cases the cutting tool always works under the same cutting conditions and the total processing time of each tool is well-known, but in other cases the cutting tool is used under different cutting conditions and the only way to evaluate its present state is to measure the flank wear, \(V_{B}\), directly. In both cases, the process engineer can consider the drive power as an indirect indicator of the state of the tool, because all CNC machine-tools provide this parameter in real time.

A practical approach based on AI models has been proposed in this research to provide useful information to the process engineer in real time on the expected workpiece surface roughness in both cases, based only on these parameters, and avoiding the inclusion of additional new sensors on the machine

First, the experimental data were collected to provide a broad number of wear conditions and processing times, while acquiring data on the power drive, for a fixed machining process: the face milling of carbon-quality structural steel 45 (similar to ANSI 1045). Second, different machine learning techniques were tested using a \(10\times 10\) cross-validation, from well-known ANNs such as MLPs and RBFs to more recent techniques such as ensembles of regression trees. The tuning processes of the main parameters of each technique were based on a grid search. The results have shown that:

  • Regression trees are at least 15% more precise than the best MLP or RBF configuration,

  • RF ensembles have even greater accuracy: between 33 and 44% depending on the inclusion in the dataset of the minimum and maximum drive power or only the average value for each cutting experiment.

  • The tuning stage of ensembles and regression trees is shorter compared with ANNs, because they have only one parameter to be tuned, compared with the three parameters that have to be tuned in the case of MLPs.

    Furthermore, the practical use of the most accurate models is presented following two possible strategies:

  • Regression trees can provide immediate visual information, such as the linear relationship between inputs and outputs or thresholds in the behavior of the variables that should be carefully monitored, because exceeding these thresholds implies rapid degradation of the workpiece roughness if a tool change is not performed. Specially, \(V_{B}\), \(N_{min}\), and \(N_{max}\) are not required for roughness prediction by this model, because the regression tree is able to extract the information included in these inputs from the processing time and the average drive power.

  • RF models can be used to build 3D charts considering different inputs: drive power and processing time can be used to identify limits in the behaviors of these inputs, as with regression trees, while drive power and flank wear can be used to identify ranges in the variables where the roughness may remain at low values. Both 3D charts are suitable for one of the previously presented real workshop situations: fixed cutting tool conditions and a well-known processing time, or changing cutting conditions and periodic measurements of flank wear.

Further research will focus on the application of specially designed machine-learning techniques for unbalanced datasets to this industrial problem, taking into account the unbalanced nature of the datasets presented in real workshops, due to the small number of instances that describe low-quality outputs, such as high roughness values, in this study case. This kind of technique may be expected to improve the overall accuracy of the models, especially in conditions outside of the main working range. Besides, broader conclusions could be drawn on the capability of RF to achieve highly accurate prediction models of tool wear, through the development of a more extensive dataset including information on wear evolution measured under different cutting conditions.