Tool wear monitoring and prognostics challenges: a comparison of connectionist methods toward an adaptive ensemble model
 159 Downloads
 1 Citations
Abstract
In a high speed milling operation the cutting tool acts as a backbone of machining process, which requires timely replacement to avoid loss of costly workpiece or machine downtime. To this aim, prognostics is applied for predicting tool wear and estimating its life span to replace the cutting tool before failure. However, the life span of cutting tools varies between minutes or hours, therefore time is critical for tool condition monitoring. Moreover, complex nature of manufacturing process requires models that can accurately predict tool degradation and provide confidence for decisions. In this context, a datadriven connectionist approach is proposed for tool condition monitoring application. In brief, an ensemble of Summation WaveletExtreme Learning Machine models is proposed with incremental learning scheme. The proposed approach is validated on cutting force measurements data from Computer Numerical Control machine. Results clearly show the significance of our proposition.
Keywords
Applicability Datadriven Ensemble Monitoring Prognostics Robustness ReliabilityIntroduction
Prognostics for manufacturing refers to tool wear prediction and estimation of its life span for timely replacement. More precisely, for tool condition monitoring application, the prognostics model uses monitoring data from sensors (e.g. vibration, force or acoustic emission) to predict tool wear after each cut and to determine the number of cuts that could be made safely before failure.
In recent years, research on prognostics for manufacturing has grown rapidly, and a vast number of prognostics algorithms are introduced to enable shortterm or longterm decisions, particularly from datadriven category. According to literature, for prediction in milling operations the Artificial neural networks (ANNs) are the most widely used connectionist methods among datadriven prognostics approaches (Grzenda and Bustillo 2013). As for some examples from the recent publications: Pal et al. (2011) used a standard backpropagation neural network and a Radial Basis Function network for predicting tool condition. This work also evaluated the robustness of ANNs against uncertainty of input data. Das et al. (2011) used a ANN approach to learn relationship of extracted features and the wear magnitude of cutting tool. In Wang and Cui (2013), a Levenberg Marquardt algorithm is introduced to improve the accuracy of auto associative neural network for tool wear monitoring. Wu et al. (2015) proposed bayesianmultilayer perceptron approach to estimate tool wear. Cojbasic et al. (2015) proposed onepass Extreme Learning Machine (ELM) algorithm to estimate roughness of machined surface.

Cutting tools life varies between minutes/hours, therefore time for tool condition monitoring is critical, which requires rapid connectionist approaches.

The common drawbacks of classical connectionist approaches are model complexity, slow iterative tuning, imprecise learning rate, presence of local minima and overfitting.

Due to uncertainties from different sources like tool degradation process, data, operating condition and model, it is essential to manage and quantify uncertainty to enable decisions.

It is difficult to generalize tool wear prediction model on cutting tools data that are not included in the learning phase.
 1.
Define prognostics modeling challenges.
 2.
Compare SWELM with rapid learning approaches.
 3.
Build SWELME with incremental learning scheme.
 4.
Validate SWELME on unknown cutting tools data.
Towards an enhanced datadriven prognostics
Datadriven tool wear monitoring framework
To transform raw monitoring data into relevant behavior models, the framework of datadriven tool wear monitoring is based on the following steps (Fig. 2).
Data acquisition
During the cutting treatment of metal workpiece, the cutter wear can increase due to varying loads on its flutes that are always engaged and disengaged with the surface of workpiece (Das et al. 2011). This may result in increased imperfections in the workpiece surface i.e., dimensional accuracy of finished parts. Most CNC machines are unable to detect tool wear online, which is measured by using optical or electrical resistance sensors. Therefore, the estimate of cutting performance is usually performed through indirect method of tool condition monitoring (without shutting down the machine), by acquiring data that can be related to suitable wear models (Zhou et al. 2011). The most commonly employed are: cutting vibration (Haddadi et al. 2008), and force signals (Zhai et al. 2010). Such data are collected at regular intervals under given operating conditions.
Data processing
The cutting vibration measurements benefit from wide frequency range and are easy to implement (Ding and He 2011). Whereas, cutting force signals are more sensitive to tool wear than vibration (Ghasempoor et al. 1998), and preferred for modeling due to good measurement accuracy (Zhou et al. 2006). Also they are easy to manipulate and considered as the most useful to predict cutting performance (Zhai et al. 2010; Zhou et al. 2009).
The raw monitoring data acquired from the cutting tools are redundant and noisy, which can not be used directly for tool wear prediction. The data processing step enables extracting and selecting features from vibration/ force measurements, preferably having monotonic trends (Javed et al. 2015). The selection of features can be done by transforming them to another space or based on highest information content (Benkedjouh et al. 2013; Javed et al. 2015).
Prognostics modeling
This step aims at building an effective model that is capable of predicting the tool wear during machining process and estimating its life span to enable shortterm or longterm decisions. The datadriven tool wear modeling is achieved in two steps: learning and testing. In the learning phase, data are used to establish model which learns a relation between input features and target measured wear. The learning step is directly linked to tool wear prediction performances in the test phase. For example lack of data, uncertainty of data collection/ processing, and varying context, etc., can strongly impact model performances. Moreover, in the learning phase model complexity, parameter initialization and computational time are the factors which should be properly addressed to build a right model.
Open challenges of prognostics modeling
According to literature, various approaches for prognostics exist, i.e., physics based, datadriven and hybrid approaches (Javed 2014). However, real prognostics systems to meet industrial challenges are still scarce. This can be due to inherent uncertainties associated to deterioration process, lack of sufficient data quantities, sensor noise, unknown operating conditions, and engineering variations, which prevents building prognostics models that can accurately capture the evolution of degradation. In other words, highly complex and nonlinear operational environment of industrial equipment makes it hard to establish efficient prognostics models, that are robust enough to tolerate uncertainty of data, and reliable enough to show acceptable performances under diverse conditions (Javed et al. 2012; Hu et al. 2012; An et al. 2015). Robustness of prognostics models appears to be an important aspect (Liao 2010), and still remains an important issue to be resolved (Javed et al. 2012; Camci and Chinnam 2010). Besides that, reliability performance is also crucial to prognostics. A reliable prognostics model should be capable of dealing with variations in data, that are directly associated to the context (e.g. for machining its variable geometry/ dimensions of cutters, materials differences of components, etc.). It is found that robustness and reliability of a prognostics model are closely related (Peng et al. 2010), and both should be considered as essential to ensure accuracy of estimates (Javed et al. 2012). Moreover, prognostics model has to be chosen according to implementation requirements and constraints that can limit its applicability (Javed et al. 2012; Sikorska and Hodkiewicz 2011).

Robustness of prognostics—it can be defined as “the ability of a prognostics approach to be insensitive to inherent variations of the input data”.

Reliability of prognostics—it can be defined as “the ability of a prognostics approach to be consistent in situations when new/ unknown data are presented”.

Applicability of prognostics—it can be defined as “the ability of a prognostics approach to be practically applied under industrial constraints”.
Choice of datadriven prognostics approach
 1.
Better generality and system wide scope.
 2.
Do not require degradation process model.
 3.
Easy to implement and have low complexity.
 4.
Require few knowledge of the equipment.
 5.
Usually have low computation time.
Brief overview of ANN architectures
Constructing a good neural network model is nontrivial task and practitioners still have to encounter several issues that may affect the performance of ANNs and limit their applicability (Singh and Balasundaram 2007). As for examples, such problems involve: parameter initialization, complexity of hidden layer, activation functions, slow iterative tuning, local minima, overfitting, generalization ability, etc., (Javed et al. 2012).
In general, ANNs are classified into two types of architectures: a feedforward network (FFNN) and a recurrent neural network (RNN). A FFNN has connections in forward direction, and RNN has cyclic connections Fig. 5. It is mentioned, that around 95 % of literature is on FFNNs (Feng et al. 2009). However, such systems must be tuned to learn parameters like weights and bias, in order to fit the studied problem. According to literature, the most popular learning scheme for FFNN is Extreme Learning Machine (ELM) (Huang et al. 2004), and for RNN its Echo State Network (ESN) (Jaeger 2001). Unlike classical techniques for ANNs, the ELM and ESN avoid slow iterative learning and they are based on random projection. In brief, with ELM/ ESN algorithms, inputhidden layer/reservoir parameters are randomly initialized, and learning is only achieved by solving the inverse of a least square problem. In addition, both are sensitive to the number of neurons in the hidden layer/ reservoir. The main differences of ELM and ESN are depicted in Fig. 5.
To the best of our knowledge, ELM has been proved for its universal approximation capability (Huang and Chen 2007, 2008; Huang et al. 2006), whereas for ESN its not the case. In addition, recent survey shows the advantages of ELM over conventional methods to train ANNs (Huang et al. 2011). As a matter of fact, ELM is an effective algorithm with several advantages like: ease of use, quick learning speed and capability for nonlinear activation (Shamshirband et al. 2015). Such findings highlight the importance of ELM as a suitable candidate for prognostics.
The Extreme Learning Machine
Basically, ELM is a batch learning scheme for single hidden layer feed forward neural networks (SLFNs). A slight difference in architecture of ELM and typical SLFN is that, there is no bias for the neurons in the output layer. To initiate rapid learning operation, the input weights and hidden neurons biases are chosen randomly without any prior knowledge of hidden to output layers weights. The random parameters are also independent from the learning data. Consequently, ELM transforms into a system of linear equations and the unknown weights between the hidden layer and the output layer nodes can be determined analytically by applying Moore–Penrose generalized inverse method (Rao and Mitra 1971; Petkovi et al. 2016).
In view of expected performances of a prognostics model highlighted in “Open challenges of prognostics modeling” section, practical considerations related to model accuracy and implementation issues should be addressed for real applications. In context to that, benefits, issues and requirements of ELM algorithm are given as follows.
Benefits

ELM does not require slow iterative learning and it is onepass algorithm.

ELM has only one control parameter to be manually tuned, i.e. number of hidden neurons.
Issues and requirements

Due to random initialization of parameters (weights and bias), ELM model may require a complex hidden layer (Rajesh and Prakash 2011). This may cause illcondition, and reduce robustness of ELM to encounter variations in the input data, and the expected output of the model may not be close to the real output (Zhao et al. 2009). The variance of randomly initialized weights can affect model generalization ability which should also be considered. Moreover, random initialization of parameters results poor consistency of the algorithm. In other words, the algorithm gives different solution for each run, which makes it less reliable.

It is required to carefully choose hidden neuron activation functions that can participate in better convergence of the algorithm, ability to handle nonlinear inputs, and also result to a compact structure of network for a suitable level of accuracy (Javed et al. 2012; Jalab and Ibrahim 2011; Huang and Chen 2008).

ELM does not quantify uncertainty of model like any ANN. Therefore, in terms of prognostics, a single ELM model lacks in real tangible foresight. Thus, it is required to bracket unknown future to show reliability of estimates and to enable timely decisions.
Proposed datadriven approach
Summation WaveletExtreme Learning Machine
SWELM combines ANN and wavelet theory for estimation or predictions problems, which appears to be an effective tool for different applications in industry (Javed et al. 2014). SWELM also represents onepass learning for SLFN. It benefits from an improved parameter initialization phase to minimize the impact of random weights and bias (of inputhidden layer), structure with dual activation functions that can handle nonlinearity in a better manner and it also works on actual scales of the data.
Structure and parameters

Structure: each hidden node holds a parallel conjunction of two different activation functions (\(f_1\) and \(f_2\)) rather than a single activation function. Output from a hidden neuron is the average value from dual activations (\(\bar{f}=\left( f_1+f_2\right) \)/2) (see Fig. 6).

Activation function: convergence of algorithm is improved by an inverse hyperbolic sine (Eq. 6) and a Morlet wavelet (Eq. 7) as dual activation functions, which operate on array (X) elementwise (\(x_{j},~j=1,2,\ldots ,n\)).

Parameter initialization: to provide a better starting point to the algorithm, two types of parameters have to be considered: those from the wavelets (dilation and translation) adapted by a heuristic procedure (Oussar and Dreyfus 2000), and those from the SLFN (weights and bias for input to hidden layer nodes), initialized by Nguyen Widrow (NW) procedure (Nguyen and Widrow 1990).
Learning scheme
SWELM ensemble with incremental learning
In order to elaborate incremental learning procedure for the ensemble structure, consider learning data record of 630 samples (inputs and targets) from two cutting tools. During an online application on a new cutting tool, the input features data sample (after a cut) and their corresponding predicted tool wear value from the SWELME are stored sequentially in the learning data record (which becomes 631 samples). Following that, the SWELME is retrained with that data and model parameters (i.e., weights and bias) are updated before the next input. The learning procedure continues after each cut until the FT is reached. This proposition allows performing incremental learning without actual tool wear values and using artificial data from predictions, which enables improving the adaptability of prognostics model and managing its uncertainty.
Note that, due to rapid learning ability of SWELM algorithm, the proposed incremental learning can be computationally efficient. However, the computational time can increase with the complexity of ensemble structure.
Case study: tool condition monitoring
Experimental arrangements
Data acquisition and processing
Most importantly, even if the operating conditions are constant, cutting force is affected by: cutter geometry, coating and properties of workpiece, which impacts the reliability of tool wear estimation models. Considering complications of tool wear modeling, it is important to highlight the characteristics of all cutters that were used. That is, cutting tools C33 and C18 had the same geometry but different coatings, while cutting tool C09 has its own geometry and coating, Table 2.
Tool wear model settings and performance metrics
 1.
A comparison of tool wear prediction models to encounter prognostics challenges (“Open challenges of prognostics modeling” section).
 2.
Adaptive ensemble to predict tool wear, estimate tool life span and give confidence for decision making.
Selected force features
No  Force feature 

1  Maximum force level 
2  Total amplitude of cutting force 
3  Amplitude ratio 
4  Average force 
Type of cutting tools used during experiments
Cutters  Geometry  Coating 

C33  Geom1  Coat1 
C18  Coat2  
C09  Geom2  Coat3 
To discuss the robustness, reliability, and applicability of the wear estimation (“Open challenges of prognostics modeling” section), model performances are assessed in terms of accuracy, network complexity and computation time. More precisely, metrics for performance evaluation are: coefficient of determination (R2) that should be close to 1, complexity of hidden layer, and learning/testing time in seconds (s).
Comparison of connectionist approaches
Robustness and applicability: results discussion
Robustness and applicability for a single cutter model
Cutter 09  SWELM  ELM  ESN 

Hidden nodes  16  16  16 
Activation function  asinh & Morlet  sigmoid  tanh 
Training time (s)  0.0009  0.0005  0.014 
R2  0.824  0.796  0.542 
Cutter 18  SWELM  ELM  ESN 

Hidden nodes  12  12  12 
Activation function  asinh & Morlet  sigmoid  tanh 
Training time (s)  0.0007  0.0004  0.013 
R2  0.955  0.946  0.59 
Reliability and applicability: results discussion
Reliability on partially known data
Reliability and applicability for three cutters models
Train: C33, C18, C09  SWELM  ELM  ESN 

Test: C18  
Hidden nodes  20  20  20 
Activation function  asinh & Morlet  sigmoid  tanh 
Training time (s)  0.002  0.001  0.04 
R2  0.837  0.836  0.817 
Train: C33, C18, C09  SWELM  ELM  ESN 

Test: C33  
Hidden nodes  16  16  16 
Activation function  asinh & Morlet  sigmoid  tanh 
Training time (s)  0.002  0.0009  0.04 
R2  0.847  0.80  0.75 
It can be observed from results that, even if the learning data are increased, ELM based methods are still faster than ESN. The average learning times for both tests show that, ELM is less time consuming for the same complexity of models. As far as accuracy (R2) is concerned, SWELM showed better reliability performances on cutters data with different attributes. The detailed simulations results are presented in Fig. 13. In brief, Fig. 13a, shows an average accuracy (R2) performances for 5 different network complexities, where best results are achieved by SWEM for tests on cutter C18 and C33 (with no. of hidden nodes 20 and 16). Considering these results, Fig. 13b compares the steadiness of all models (SWELM, ELM and ESN) for 100 trials. One can see that SWELM is more stable to input variations, as its test accuracy (R2) is consistent for 100 trials on cutters i.e., C18 and C33. Finally, Fig. 13c compares average results of tool wear prediction (from 100 trials) on cutter C33.
Reliability on totally unknown data
It can be observed form all tests that SWELM has better reliability performance as compared to ELM and ESN. The averaged accuracy performance of SWELM for the tests is also improved from our previous results in (Javed et al. 2012). Note that, for the tests on cutters C33 and C09 the accuracy of each approach decreased to a poor level i.e., \(R2<0\). Therefore, the model reliability still needs to be improved when totally unknown data of different attributes are used, which is the aim of the following proposition. The detailed simulations results from the best tests case (for SWELM, ELM and ESN models) is illustrated in Figs. 14 and 15.
According to results in Fig. 14, the SWELM has better prediction performances as indicated by the stability of R2 values for 100 trials. The prediction results in Fig. 15 show that except SWELM all other models are unable to estimate tool initial wear (i.e., ELM and ESN). Moreover, all models are unable to estimate tool worn out state from the data of unknown cutter C18.
Synthesis

All connectionist algorithms discussed above (SWELM, ELM, ESN), are based on random projection.

For all tests on robustness and reliability performances, the SWELM outperform ELM and ESN algorithms.

SWELM has better performances due to improved parameter initialization and structure with dual activation functions.

SWELM algorithm requires two parameter to be set by the user, i.e., hidden neurons and parameter initialization constant C.

SWELM takes twice the learning time than ELM.

ELM algorithm has better applicability with only one parameter to manually tune and fastest training time.

ESN requires several parameters to be set by the user and more training time as compared to ELM based methods.

For some cases on reliability performance, ELM and ESN showed close accuracy performances.

ESN is much more sensitive to input variations as compared to ELM based methods.

Like any ANN, the SWELM, ELM and ESN can not quantify or manage prediction uncertainty.
Adaptive SWELME and its reliability
Considering the better performances of SWELM over ELM and ESN. This topic presents the reliability of SWELM ensemble with incremental learning scheme.
Simulation settings
The initial step is to determine complexity of hidden layer for a single SWELM model, which results satisfactory performances. Following that, multiple SWELM models of same complexity are integrated to produce averaged output. The complexity of hidden layer of each SWELM model is set to 7 neurons and the number of SWELM models for an ensemble is set to 50 (Fig. 7).
To reduce the uncertainty of estimates, features from each cutter data are filtered to obtain smooth trends by applying rloess filter with span value 0.9 (Fig. 9). Basically, rloess is a robust local regression filter that allocates lower weight to outliers, see Mathworks (2010). Each individual model is learned with same dataset, but initialized with different parameters, i.e., weights and bias. The parameter initialization constant is set to \(C=0.0001\).
The tests are performed on cutters data using leaveoneout strategy, e.g., learning C33, C18 and testing C09. The cutting tool life span determined when the predicted wear intersects FT (Eq. 15), which is set to the maximum tool wear at 315 cuts. For each test, the lower and upper confidence of tool wear predictions and the evolution of probability density function are given to quantify the uncertainty (Fig. 16). Also, the total time to learntest SWELME online is given to show its suitability for a real application.
SWELME: results discussion
Reliability and applicability for unknown data
Train: C33 & C09  SWELM  ELM  ESN 

Test: C18  
Hidden nodes  4  4  4 
Activation function  asinh & Morlet  sigmoid  tanh 
Training time (s)  0.0009  0.0004  0.055 
R2  0.701  0.44  0.6 
Train: C09 & C18  SWELM  ELM  ESN 

Test: C33  
Hidden nodes  4  4  4 
Activation function  asinh & Morlet  sigmoid  tanh 
Training time (s)  0.0008  0.0004  0.054 
R2  \(\mathbf {}\) 0.5  \(\)1.3  \(\)1.9 
Train: C33 & C18  SWELM  ELM  ESN 

Test: C09  
Hidden nodes  16  16  16 
Activation function  asinh & Morlet  sigmoid  tanh 
Training time (s)  0.0026  0.0013  0.058 
R2  \(\mathbf {}\) 0.73  \(\)1.2  \(\)0.98 
Moreover, for each test case the lower and upper confidence bounds indicate that the final target values are within the confidence level (Fig. 16). Finally, due to ensemble strategy and increased data, for each test case the total time for learning and testing (online) is around 2 minutes, which is quite satisfactory from practical point of view.
Conclusion
Reliability and applicability for unknown data
Tool  Cuts  Estimated  Error  R2  Time 

C33  315  313  2  0.89  119 (s) 
C18  315  311  4  0.74  133 (s) 
C09  315  303  12  0.52  112 (s) 
Footnotes
 1.
Note: classical definition of reliability “the ability of an item to perform a required function under given conditions for a given time interval” (NF EN 13306 2010) is not retained here. Actually, the acception used in this paper is according to application of machine learning approaches in PHM, that do not consider reliability as dependability measure (Bosnić and Kononenko 2009).
Notes
Acknowledgments
This work was carried out within the Laboratory of Excellence ACTION funded by the French Government through the program “Investments for the future” managed by the National Agency for Research (ANR11LABX0101).
References
 An, D., Kim, N. H., & Choi, J. H. (2015). Practical options for selecting datadriven or physicsbased prognostics algorithms with reviews. Reliability Engineering & System Safety, 133, 223–236.CrossRefGoogle Scholar
 Benkedjouh, T., Medjaher, K., Zerhouni, N., & Rechak, S. (2013). Health assessment and life prediction of cutting tools based on support vector regression. Journal of Intelligent Manufacturing, 26(2), 213–223.CrossRefGoogle Scholar
 Bhat, A. U., Merchant, S., & Bhagwat, S. S. (2008). Prediction of melting point of organic compounds using extreme learning machines. Industrial and Engineering Chemistry Research, 47(3), 920–925.CrossRefGoogle Scholar
 Bosnić, Z., & Kononenko, I. (2009). An overview of advances in reliability estimation of individual predictions in machine learning. Intelligent Data Analysis, 13(2), 385–401.Google Scholar
 Camci, F., & Chinnam, R. B. (2010). Healthstate estimation and prognostics in machining processes. IEEE Transactions on Automation Science and Engineering, 7(3), 581–597.CrossRefGoogle Scholar
 Cojbasic, Z., Petkovic, D., Shamshirband, S., Tong, C. W., Ch, S., Jankovic, P., et al. (2015). Surfaceroughnessprediction by extreme learning machine constructed withabrasivewater jet. Precision Engineering. doi: 10.1016/j.precisioneng.2015.06.013.
 Das, S., Hall, R., Herzog, S., Harrison, G., & Bodkin, M. (2011). Essential steps in prognostic health management. In IEEE Conference on prognostics and health management. Denver, CO, USA.Google Scholar
 Ding, F., & He, Z. (2011). Cutting tool wear monitoring for reliability analysis using proportional hazards model. The International Journal of Advanced Manufacturing Technology, 57(5–8), 565–574.CrossRefGoogle Scholar
 Echo state network. http://reservoircomputing.org/software.
 NF EN 13306. (2010). Terminologie de la maintenance.Google Scholar
 Feng, G., Huang, G. B., Lin, Q., & Gay, R. (2009). Error minimized extreme learning machine with growth of hidden nodes and incremental learning. IEEE Transactions on Neural Networks, 20(8), 1352–1357.CrossRefGoogle Scholar
 Gao, R., Wang, L., Teti, R., Dornfeld, D., Kumara, S., Mori, M., et al. (2015). Cloudenabled prognosis for manufacturing. CIRP AnnalsManufacturing Technology. doi: 10.1016/j.cirp.2015.05.011.
 Ghasempoor, A., Moore, T., & Jeswiet, J. (1998). Online wear estimation using neural networks. Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture, 212(2), 105–112.CrossRefGoogle Scholar
 Grzenda, M., & Bustillo, A. (2013). The evolutionary development of roughness prediction models. Applied Soft Computing, 13(5), 2913–2922.CrossRefGoogle Scholar
 Haddadi, E., Shabghard, M. R., & Ettefagh, M. M. (2008). Effect of different tool edge conditions on wear detection by vibration spectrum analysis in turning operation. Journal of Applied Sciences, 8(21), 3879–3886.CrossRefGoogle Scholar
 Hu, C., Youn, B. D., Wang, P., & Yoon, J. T. (2012). Ensemble of datadriven prognostic algorithms for robust prediction of remaining useful life. Reliability Engineering & System Safety, 103, 120–135.CrossRefGoogle Scholar
 Huang, G. B., & Chen, L. (2007). Convex incremental extreme learning machine. Neurocomputing, 70(16), 3056–3062.CrossRefGoogle Scholar
 Huang, G. B., & Chen, L. (2008). Enhanced random search based incremental extreme learning machine. Neurocomputing, 71(16), 3460–3468.CrossRefGoogle Scholar
 Huang, G. B., Chen, L., & Siew, C. K. (2006). Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Transactions on Neural Networks, 17(4), 879–892.CrossRefGoogle Scholar
 Huang, G. B., Wang, D. H., & Lan, Y. (2011). Extreme learning machines: A survey. International Journal of Machine Learning and Cybernetics, 2(2), 107–122.CrossRefGoogle Scholar
 Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2004). Extreme learning machine: A new learning scheme of feedforward neural networks. In International Joint conference on neural networks. Budapest, Hungary.Google Scholar
 Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2006). Extreme learning machine: Theory and applications. Neurocomputing, 70, 489–501.CrossRefGoogle Scholar
 Jaeger, H. (2001). The echo state approach to analyzing and training recurrent neural networkswith an erratum note. Bonn, Germany: German National Research Center for Information Technology GMD Technical Report, 148, 34.Google Scholar
 Jaeger, H. (2002). Tutorial on training recurrent neural networks, covering BPPT, RTRL. GMDForschungszentrum Informationstechnik: EKF and the echo state network approach.Google Scholar
 Jalab, H. A., & Ibrahim, R. W. (2011). New activation functions for complexvalued neural network. International Journal of the Physical Sciences, 6(7), 1766–1772.Google Scholar
 Javed, K. (2014). A robust & reliable datadriven prognostics approach based on extreme learning machine and fuzzy clustering. Ph.D. thesis, Université de FrancheComté.Google Scholar
 Javed, K., Gouriveau, R., & Zerhouni, N. (2014). SWELM: A summation wavelet extreme learning machine algorithm with a priori parameter initialization. Neurocomputing, 123, 299–307.CrossRefGoogle Scholar
 Javed, K., Gouriveau, R., Zerhouni, N., & Nectoux, P. (2015). Enabling health monitoring approach based on vibration data for accurate prognostics. IEEE Transactions on Industrial Electronics, 62(1), 647–656.CrossRefGoogle Scholar
 Javed, K., Gouriveau, R., Zerhouni, N., Zemouri, R., & Li, X. (2012). Robust, reliable and applicable tool wear monitoring and prognostic: approach based on an improvedextreme learning machine. In IEEE conference on prognostics and health management. Denver, CO, USA.Google Scholar
 Khosravi, A., Nahavandi, S., Creighton, D., & Atiya, A. (2011). Comprehensive review of neural networkbased prediction intervals and new advances. IEEE Transactions on Neural Networks, 22(9), 1341–1356.CrossRefGoogle Scholar
 Li, X., Lim B. S., Zhou J. H., Huang, S., Phua S. J., & Shaw, K. C. (2009). Fuzzy neural network modeling for tool wear estimation in drymilling operation. In Annual conference of the prognostics and health management society. San Diego, CA, USA.Google Scholar
 Liao, L. (2010). An adaptive modeling for robust prognostics on a reconfigurable platform. Ph.D. thesis, University of Cincinnati.Google Scholar
 Massol, O., Li, X., Gouriveau, R., Zhou, J. H., & Gan, O. P. (2010). An exTS based neurofuzzy algorithm for prognostics and toolcondition monitoring. In 11th international conference on control automation robotics & vision ICARCV’10. Singapore, pp. 1329–1334.Google Scholar
 Mathworks: Curve fitting toolbox. (2010). http://mathworks.com/help/toolbox/curvefit/smooth.html
 Nguyen, D., & Widrow, B. (1990). Improving the learning speed of 2layer neural networks by choosing initial values of the adaptive weights. In International joint conference on neural networks IJCNN. San Diego, CA, USA.Google Scholar
 Oussar, Y., & Dreyfus, G. (2000). Initialization by selection for wavelet network training. Neurocomputing, 34(1–4), 131–143.CrossRefGoogle Scholar
 Pal, S., Heyns, P. S., Freyer, B. H., Theron, N. J., & Pal, S. K. (2011). Tool wear monitoring and selection of optimum cutting conditions with progressive tool wear effect and input uncertainties. Journal of Intelligent Manufacturing, 22(4), 491–504.CrossRefGoogle Scholar
 Peng, Y., Dong, M., & Zuo, M. J. (2010). Current status of machine prognostics in conditionbased maintenance: A review. International Journal Advance Manufacturing Technology, 50, 297–313.CrossRefGoogle Scholar
 Petkovi, D., Danesh, A. S., Dadkhah, M., Misaghian, N., Shamshirband, S., & Pavlovi, N. D. (2016). Adaptive control algorithm of flexible robotic gripper by extreme learning machine. Robotics and ComputerIntegrated Manufacturing, 37, 170–178. doi: 10.1016/j.rcim.2015.09.006.CrossRefGoogle Scholar
 Rajesh, R., & Prakash, J. S. (2011). Extreme learning machines—A review and stateoftheart. International Journal of Wisdom Based Computing, 1, 35–49.Google Scholar
 Rao, C. R., & Mitra, S. K. (1971). Generalized inverse of matrices and its applications. New York: John Wiley and Sons.Google Scholar
 Ren, L., Lv, W., & Jiang, S. (2015). Machine prognostics based on sparse representation model. Journal of Intelligent Manufacturing pp. 1–9. doi: 10.1007/s1084501511078.
 Rizal, M., Ghani, J. A., Nuawi, M. Z., & Haron, C. H. C. (2013). Online tool wear prediction system in the turning process using an adaptive neurofuzzy inference system. Applied Soft Computing, 13(4), 1960–1968.CrossRefGoogle Scholar
 Saikumar, S., & Shunmugam, M. (2012). Development of a feed rate adaption control system for highspeed rough and finish endmilling of hardened en24 steel. International Journal Advance Manufacturing Technology, 59(9–12), 869–884.CrossRefGoogle Scholar
 Shamshirband, S., Mohammadi, K., Chen, H. L., Samy, G. N., Petkovi, D., & Ma, C. (2015). Daily global solar radiation prediction from air temperatures using kernel extreme learning machine: A case study for Iran. Journal of Atmospheric and SolarTerrestrial Physics, 134, 109–117. doi: 10.1016/j.jastp.2015.09.014.CrossRefGoogle Scholar
 Sikorska, J. Z., Hodkiewicz, M., & Ma, L. (2011). Prognostic modelling options for remaining useful life estimation by industry. Journal of Mechanical Systems and Signal Processing, 26(5), 1803–1836.CrossRefGoogle Scholar
 Singh, R., & Balasundaram, S. (2007). Application of extreme learning machine method for time series analysis. International Journal of Intelligent Technology, 2(4), 256–262.Google Scholar
 Wang, G., & Cui, Y. (2013). On line tool wear monitoring based on auto associative neural network. Journal of Intelligent Manufacturing, 24(6), 1085–1094.CrossRefGoogle Scholar
 Wu, Y., Hong, G., & Wong, W. (2015). Prognosis of the probability of failure in tool condition monitoring application—A time series based approach. The International Journal of Advanced Manufacturing Technology, 76(1–4), 513–521.Google Scholar
 Zemouri, R., Gouriveau, R., & Zerhouni, N. (2010). Improving the prediction accuracy of recurrent neural network by a pid controller. International Journal of Systems Applications, Engineering & Development, 4(2), 19–34.Google Scholar
 Zhai, L. Y., Er, M. J., Li, X., Gan, O. P., Phua, S. J., Huang, S., Zhou, J. H., Linn, S., & Torabi, A. J. (2010). Intelligent monitoring of surfaceintegrity and cutter degradation in highspeed milling processes. In Annual conference of the prognostics and health management society. Portland, Oregon, USA.Google Scholar
 Zhao, G., Shen, Z., Miao, C., & Man, Z. (2009). On improving the conditioning of extreme learning machine: a linear case. In 7th International conference on information, communications and signal processing. ICICS 09. Piscataway, NJ, USA.Google Scholar
 Zhou, J., Li, X., Gan, O. P., Han, S., & Ng, W. K. (2006). Genetic algorithms for feature subset selection in equipment fault diagnostics. Engineering Asset Management, 10, 1104–1113.CrossRefGoogle Scholar
 Zhou, J. H., Pang, C. K., Lewis, F., & Zhong, Z. W. (2009). Intelligent diagnosis and prognosis of tool wear using dominant feature identification. IEEE Transactions on Industrial Informatics, 5(4), 454–464.CrossRefGoogle Scholar
 Zhou, J. H., Pang, C. K., Zhong, Z. W., & Lewis, F. L. (2011). Tool wear monitoring using acoustic emissions by dominantfeature identification. IEEE Transactions on Instrumentation and Measurement, 60(2), 547–559.CrossRefGoogle Scholar