1 Introduction

Chatter is a self-excited vibration occurring at the tool-workpiece interface in machining, causing an overabundance of cutting forces, reduced dimensional accuracy and surface finish, along with accelerated tool wear and tool damage, and lower productivity and production quality [1]. Chatter has been a topic of continuous study since the early twentieth century by extensive authors [2, 3]. Interest in chatter in milling has substantially increased since the middle 2000s [2, 4], as evidenced by the number of publications shown in Fig. 1. There are diverse sources of self-excited vibration in machining, such as regenerative, mode coupling, frictional and forced thermal–mechanical [1, 2]. Regenerative chatter is the most common and studied chatter type in milling. Like in most of the literature, in this paper, the term “chatter” refers to regenerative vibration unless another type is explicitly mentioned.

Fig. 1
figure 1

Publications on chatter in milling over time [2]

Research on chatter could be categorized into 3 different lines: theoretical modelling and prediction; chatter detection; and chatter control and suppression [1, 2, 4]. Online detection aims to identify chatter occurrence, and it classifies the condition of the milling process in terms of stable or unstable cut. The accurate online chatter detection can be used as input to an intelligent CNC controller to suppress or mitigate chatter by adaptive control of machining parameters or to activate active vibration control devices installed on the machine tools such as an active damper or a smart spindle [5,6,7]. In recent years, there has been a continuous development of intelligent machine tools, thanks to the rapid development of new information and communications technologies. Novel artificial intelligence approaches have been utilized for machinery diagnostics and quality enhancement [8]. It has accelerated the investigation of diverse topics in machining process monitoring and tool condition monitoring (TCM), including chatter detection [2, 4, 9].

Chatter detection systems are usually composed of the following modules: data acquisition, signal processing, feature extraction, monitoring model and decision/classification [10]. These are the major elements being reviewed in this paper. The method flow and primary objectives are illustrated in Fig. 2.

Fig. 2
figure 2

A common method flow in chatter detection

Due to the high number of studies on chatter and the relevant areas, there have been diverse reviews published with related topics to the study of chatter detection, as follows. Chatter in milling literature has been reviewed in three different articles by Quintana et al., Zhu et al. and Yue et al. [1, 2, 4], respectively, in turning by Quintana et al., Urbikain et al. and Siddhpura et al. [1, 9, 11], while Wu et al. reviewed chatter in thin-wall machining [12]; robotic machining studies have been discussed by Pan et al. and Yuan et al. [13, 14], and Yan et al. focused on blade milling [15], along with other reviews on the mathematical models of chatter and machining dynamics by Altintas et al. [3, 16], Mamedov et al. [17] and Insperger et al. [18], and the review on vibration measurement by Murthy et al. [19]. Besides, Teti et al. [20] reported strategies for process monitoring, while Jardine et al. [21] reviewed the diagnostics of mechanical systems. Sensors and signal processing are reviewed for condition monitoring of manufacturing processes by Goyal et al., Iliyas et al. and Liang et al. in [22,23,24] and for cutting tools monitoring by Nath, Bhuiyan et al. and Rehorn et al. [25,26,27], while Xu et al. and Sayyad et al. [28, 29] summarized data-driven cases, and multi-sensor applications and data fusion were analysed by Kong et al., Majumder et al. and Khaleghi et al. [30,31,32]. TCM technologies were reviewed by Sun et al. [33]. Zhou and Xue [10], Mohanraj et al. [34] and Pimenov et al. [35] have summarized TCM in milling, including chatter detection. Sensor types were reviewed by Rizal et al. and Dhobale et al. [36, 37], Lauro et al. [38] outlined the signal processing techniques, Kuntoğlu et al. and Wong et al. [39, 40] discussed data acquisition and processing, Imad et al. [41] described advances in machining monitoring, while Zhu et al. and Peng et al. [42, 43] reviewed wavelet-based methods. The use of digital image processing was revised by Dutta et al. [44], and vision-based techniques were outlined by Liu et al. [45]. The applications of artificial intelligence (AI) and data analytics in intelligent manufacturing can be found in various reviews [46,47,48,49]. AI-based systems are reviewed for machining monitoring by Abellan-Nebot et al. [50], in rotating machinery diagnosis by Liu et al. and Saini et al. [51, 52] and by Pimenov et al. for TCM [53]. Machine learning (ML) techniques in manufacturing are reviewed regarding signal processing by Kim et al. [54], for machining processes in four different reviews [37, 55,56,57] and for TCM by Tran et al. [58], while other six manuscripts [59,60,61,62,63,64] listed ML applications for machinery and machining diagnostics. Deep learning (DL) methods in machining monitoring and intelligent manufacturing are discussed by various authors in seven peer-review publications [65,66,67,68,69,70,71]. All these works summarize the findings in machining condition monitoring and reflect the increasing interest in chatter and its impact in advance manufacturing.

However, in spite of the extensive review papers published by Zhu et al. and Yue et al. [2, 4], there is a lack of focus on detailing the utilized methods through chatter detection, including the signal processing approaches, the variety of extracted features, the alternatives for condition classification and their advantages, challenges and implementation. These topics are reviewed in this paper to bridge the gap. This paper aims to provide a critical review of chatter detection in milling and the diverse employed methods. It has a focus on the signal processing techniques and sensor-based methods, including feature extraction and classification models.

The starting point was to conduct a systematic literature search and identify any publications that would contribute to answering What methods for chatter detection, in milling, have been reported since 2008 to date? A total of 134 peer-reviewed journal publications satisfied the systematic search criteria [5, 72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204]. It identified the eleven works—[72, 112, 117, 194, 205,206,207,208,209,210,211]—most cited by chatter detection publications.

Chatter detection approaches, as with any condition monitoring system, can be categorized as “data-driven” or “physics-based”. Data-driven systems evaluate historical data using diverse signal processing techniques. Physics-based systems utilize mathematical functions, physics models and control theory [212, 213]. Recently, Rahimi et al. [149] highlighted the advantages and disadvantages of each methodology in chatter detection. Data-driven approaches may fail due to errors during training or lack of data at different cutting conditions, while the physics-based ones may wrongly detect transient states as chatter and require complex mathematical modelling. To solve this, they proposed combining both alternatives. This review focuses on the data-driven methods which are reported in a larger number.

In the following sections, a critical analysis of chatter detection literature is presented. Section 2 describes the employed elements for data acquisition and the faced challenges during the experiment design. Section 3 summarizes the signal processing techniques. Section 4 outlines the feature extraction and proposed chatter indicators, Sect. 5 discusses the utilized classification models and their effectiveness, Sect. 6 presents challenges and future research directions, followed by the conclusions in Sect. 7.

2 Sensing, data acquisition and experimental design

The first step of monitoring is the acquisition of physical parameters, in form of a signal, through different sensors. Data is acquired by using a single sensor or multiple sensors attached to the machine tool, passing through signal conditioners and being connected to a data acquisition system (DAQ) for sampling, storage and processing. In chatter detection, the common signals are accelerations, cutting force, sound and acoustic emission, while motor current is the common signal extracted from an in-machine sensor. This section introduces the diverse sensor types, multi-sensor approaches and sensor fusion techniques, utilized in milling process monitoring, followed by a briefing on experimental design for data acquisition under stable and unstable milling conditions.

2.1 Types of sensors

2.1.1 Accelerometers—vibration

Researchers have come to the agreement that vibration is one of the most recommended signals for TCM. There is a clear physical reason for that, as acceleration is the first reaction on the acting force (see Newton’s law). Moreover, mechanical vibration can be monitored on the whole machine tool-workpiece system (with various sensitivity), and it is relatively easy to acquire and deliver vibration information highly related to machining dynamics. In general terms, vibrations refer to the amplitude and frequencies of the oscillations that occurred in the relative displacement of the tool-workpiece system. Accelerometers are one of the most utilized sensors, as they can measure dynamic changes up to 20 kHz, and are employed to estimate cutting forces and vibrations in milling [214].

As chatter is a self-excited vibration, it is relatively easy to identify chatter occurrence in acceleration signal, when the process develops from stable to unstable cut, as shown in Fig. 3. The vibration axis with the highest sensitivity to chatter varies according to the milling type, cutting parameters and mechanical properties of the workpiece and the cutting tool; however, the most utilized approach is to monitor the vibration normal to feed direction. A stable machining means forced vibration where tooth pass frequency and its harmonics have dominant amplitudes in the signal spectra, while chatter in milling means self-excited vibrations where chatter frequency is dominant in the signal spectra, as detailed later in Section 3.2.

Fig. 3
figure 3

Chatter development in a vibration signal in the time domain [166]

Literature shows some studies claiming to acquire vibration signals in chatter using a triaxial sensor, but in their results, mostly only one-channel signal was used. They did not clarify which one was chosen, the selection criterion or any changes observed in the omitted signals. For instance, Chen et al. [178] tested different combinations of multi-channel setups and showed that the fusion of signals perpendicular to the feed has the same accuracy as using the three axes and requires a lower computational cost, while utilizing only the X-axis signal (feed direction) gave the lowest accuracy. Though, in subsequent work, Chen et al. [165] successfully detected chatter in the feed direction, literature showed that vibration signals are more sensitive than force signals to chatter, in conventional milling [72] and micro-milling [5]. In addition, some novel sensors have been used for vibration monitoring, such as micro-electro-mechanical systems (MEMS) accelerometers [215]. The use of vibration signals for chatter detection was reported in the works [5, 83, 88,89,90, 95, 96, 106, 111, 113, 118,119,120,121,122, 124, 125, 128, 132, 134, 138, 143, 147, 150,151,152,153, 158, 160, 163,164,165,166, 169, 171, 174,175,176, 178,179,180,181, 183, 188, 189, 194, 197, 198, 200,201,202,203, 216,217,218,219,220,221,222,223].

2.1.2 Dynamometer—cutting forces

Cutting force is the parameter with the highest responsiveness to variations in the machining process and can be easily used for monitoring purposes, along with an unstable cut that raises cutting forces progressively [3, 224]. In chatter detection, Du et al. [225] presented that the distribution of cutting forces had a white noise behaviour at unstable condition, while Huang et al. utilized statistical values of a force signal as chatter detection criteria in titanium milling [86]. The results showed that at chatter, the maximum value of cutting forces increased by at least 60% and its standard variation by about 80%. Wei et al. [114] utilized only the signal in the vertical direction (Z), while Tangjitsitcharoen and Pongsathornwiwat [74, 226] registered the three axes and calculated the average variance of each axis and the normalized ratio of these variances (X/Y, X/Z and Y/Z) in a 1-s frame. In chatter condition, these ratios increased for two different cutter types, flat and ball nose end mills. Tansel et al. [227] suggested a torque-based tool as an alternative to dynamometers, the most reported sensor for the acquisition of force signals. A novel alternative is the use of measurement systems integrated in the spindle or other components of the machine tool, either to sense the cutting forces, as introduced by Altintas et al. [228], or to measure other physical parameters and employ them to estimate the cutting forces. This latest approach has been reported by Denkena et al. [229, 230] who have integrated strain gauges for active chatter suppression [231].

There is still no consensus about if cutting forces are more suitable than vibration signals; nevertheless, literature has shown that both signal types can be effectively used for chatter detection. A multiscale method was utilized by Li et al. with acceleration signals [134] and force signals [157], obtaining accurate results in both cases. Likely, Chang et al. [143] showed that their approach was suitable for both types. In summary, force signals have been employed in chatter detection in the studies [73,74,75,76,77, 84, 86, 101, 112, 114,115,116, 123, 126, 130, 131, 133, 137, 140, 143, 144, 148, 154, 155, 157, 159, 162, 167, 170, 173, 184,185,186, 193, 204, 227, 232,233,234,235,236]

2.1.3 Acoustic emission and sound signals

The use of acoustic emission (AE) has been proposed as an alternative due to its susceptibility to changes in chip formation and having an asymptotic change in the transition from stable to chatter [237]. AE is considered as a transient stress wave generated by the sudden release of elastically stored energy from localized sources within a material, such as due to fracture or crack generation. The typical frequency range of AE sensors is between 20 kHz and 1 MHz, inaudible for humans. Marinescu and Axinte [210] discussed the efficiency of AE, concluding that it provides better accuracy and higher resolution than cutting force, but it requires more data processing. The use of an AE sensor is still argued in the literature, as AE signals can reveal chatter in high-frequency sampled data but are highly susceptible to contact interactions and sensor position. Duro et al. [238] presented that AE sensors are highly sensitive to sensor location and cutting parameters. On the other hand, in spite of the different nature of physics, Delio et al. [207] proposed sound signal acquisition through microphones as an alternative to AE. Microphone is a means for acoustic testing to measure sound, noise and vibration, with frequencies inside the audio band <20 kHz. It was ratified by Sallese et al. [78], while Tsai et al. [187] used the acoustic peak value as a chatter threshold, and Gao et al. [79] employed microphones in thin-wall milling. Cao et al. [80] removed from sound signals the disturbance of tooth passing frequency and its harmonics, increasing detection accuracy. Morgan et al. [239] employed pre-recorded acoustic signals to plot and evaluate the chatter stability. Although acoustic emission and sound is just vibration, the interested frequency range is different. Conventional microphones are limited to a frequency response of 20 kHz, while AE sensors have a higher frequency range of up to 1 MHz. Both alternatives are more sensitive and suitable for wide frequency ranges than conventional accelerometers [240, 241]. A detailed discussion of the different sensors employed for TCM, including their technical specifications, can be found in the reviews by Sun et al. [33], Pimenov et al. [35] and Kuntoğlu et al. [39]. The use of acoustic or sound based signals for chatter detection was reported in [78,79,80, 87, 91, 124, 142, 146, 149, 168, 187, 197, 216, 224, 239, 242, 243].

2.1.4 Other sensors and simulated signals

Other physical parameters have been utilized such as displacement measurement [129, 131, 139] and image of the machined surface [161, 244], which has gained wide attention recently due to the emerging of DL techniques. Other innovative sensor alternatives are the use of thin-films [245,246,247], embedded devices [82], workpiece sensor-integrated fixtures [229, 248] and wireless sensors [99, 249,250,251]. Ma et al. [246] employed a thin-film piezoelectric and obtained the same detection accuracy as the use of a dynamometer and higher accuracy than an accelerometer and a microphone. Similar results were reported by Luo et al. [247, 252]. The use of no conventional sensors for chatter detection is reported in [82, 85, 120, 129, 131, 135, 139, 161, 188, 190,191,192, 215, 229, 244,245,246,247, 249, 252, 253].

In contrast, the use of internal sensors and signals intrinsic to the machine tool is also an alternative. Aslan and Altintas [127] developed a model-based system that uses drive motor current for chatter detection. Such sensorless approaches may suffer limited detection accuracy unless the current signals are compensated. Yamato et al. [254] and Sugiyama et al. [255] detected chatter in turning and milling by phase monitoring in the electrical system, respectively. They also evaluated servo information in the spindle for the detection of chatter by applying a disturbance observer [172, 256, 257]. The works [84, 127, 172, 177, 199, 255,256,257,258,259] reported research on chatter detection using signals intrinsic to the machine tool control, which require complex mathematical modelling and profound knowledge of the machining dynamics. In contrast, Zheng et al. [260] confirmed the suitability of internal signals for chatter detection compared to acceleration signals while using diverse features and signal processing techniques commonly employed with external signals.

Simulated signals have been widely used as a convenient input to assess signal processing algorithms for chatter detection or to validate some proposed chatter features, for instance, as shown in [120, 121, 130, 141, 164, 194, 234, 235, 261,262,263]. Using model-based simulation for AI training and experimental signals for testing has also been reported, such as by Ozgur and Sener [89] and Vashisht and Peng [199]. Different modelling approaches have been used for generating simulated dynamic signals. Some models can represent more realistic scenarios and consider several process variables, such as a complex tool or workpiece geometries and variable cutting conditions over time.

2.1.5 Influence of sensor direction

The literature shows that for some types of sensors, such as force and vibration sensors, the measurements in different directions have different characteristics and sensitivity. Usually using data from one single axis could be enough for chatter detection. However, it is reported that the assessment of multiple channel signals in different directions, along with sensor fusion, increases the accuracy of chatter detection. According to Kuljanic et al. [72], it is identified that chatter is observed from vibration signals in the 3 directions of movement, while Navarro-Devia et al. [264] gave experimental data showing that acceleration signals have a different response to cutting parameters and tool condition in each axis. Moreover, models of chatter stability in milling as a multipoint operation (i.e., multiple teeth with interrupted cut) indicate that the dynamic chip thickness is caused by the vibrations in the feed and normal to feed directions, affecting the forces in both directions, as detailed in [3, 265,266,267]. Consequently, the assessment of signals in the three axes increases the detection performance, as shown by Chen et al. [178] and Tangjitsitcharoen et al. [74] for diverse signal types. Nevertheless, the processing of multi-channel data increases the computation time.

In summary, each sensor type has some advantages and disadvantages. Force signals are not recommended for high tooth passing frequency due to the limited bandwidth of conventional dynamometers [4], and the use of the latest generation equipment may be required. In addition, they have shown some distortion when the sampling frequency is far larger than the natural frequencies, caused by the spindle modes [228]. Acceleration signals have a wide frequency operation range, but they are sensitive to vibrational disturbance from different sources and require filtering to reduce the noise effect. Acoustic emission and sound signals are affected by sensor position and tool wear and are susceptible to external noise. The cost of the sensors, reliability and service life is also a major concern for the selection of sensors in designing a monitoring system that can be used in industry.

2.2 Multi-sensor approach and sensor fusion

Considering the limitations of each sensor type and the increasing demand on higher accuracy and robustness in chatter detection, simultaneous acquisition and processing of multiple signal types has been used. Multi-sensor approach refers to the simultaneous acquisition of signals from two or more different sensors. In chatter detection, most of the published work employed a single-sensor approach. However, chatter detection by using a multi-sensor approach can improve the detection accuracy and reliability through sensor fusion, in which the data from different sensors are combined to reduce the detection uncertainty that may arrive from a single-sensor approach.

There are 3 main types of information fusion strategies, i.e., data-level fusion, feature-level fusion and decision-level fusion. In chatter detection, feature-level fusion is the most reported approach. Data fusion techniques are mostly applied for a multi-channel sensor, as done by Duro et al. [238], to merge data rather than selecting the top-ranked features. Srinivasan et al. [216] integrated vibration and acoustic signals for chatter and tool condition classification in milling and found that the combination of acoustic and vibration data had a higher classification accuracy as compared to the individual data sources. Ding et al. [77] recommended using cutting forces in conjunction with acoustic and acceleration sensors to increase chatter detection accuracy.

Kuljanic et al. [72, 117] carried out a series of milling tests with several sensors to detect chatter onset. The sensors included a rotating dynamometer, accelerometers, acoustic emission and electrical power sensors under different combinations. A set of chatter indicators were obtained in the time and frequency domains, and the single-sensor systems and multi-sensor systems were compared both in terms of accuracy and robustness. Results showed that all the multi-sensor approaches had a higher performance compared to single sensor alternatives. It was concluded that “multisensors systems composed of three or four sensors are the most promising solution for reliable and robust chatter identification.” Furthermore, they recommended that the optimal configuration would be “the multisensor system composed of the axial force sensor and accelerometers” [72]. The scheme of this multi-sensor approach is shown in Fig. 4.

Fig. 4
figure 4

Scheme of multi-sensor approach for chatter detection by Kuljanic et al. [117]

Overall, the use of multiple signals and sensor fusion has achieved better accuracy for chatter detection than that with a single-sensor approach. Sun et al. [81] combined acceleration and force signals for chatter detection and forecasting. Tran et al. [145] developed multi-sensor data fusion with a cheap and easy implementation using microphone and accelerometer sensors. They combined sound and acceleration signals, obtaining a classification accuracy of 97% against a 92% using only sound signal and 90% with vibration signal. Li et al. [156] merged features from force and acceleration signals obtaining 96% of accuracy, and Gao et al. [182] utilized 7 signals from 3 diverse sensors and extracted a total of 36 features. In contrast, Arriaza et al. [268] utilized acceleration and sound signals, but no significant difference was obtained compared to single sensor. However, it could be due to the dataset size or the feature generation. It indicates that the detection performance does not only depend on the type and number of signals, but also the signal processing and features extraction.

Figure 5 illustrates the percentage of the different signal types reported. Vibration signal has been demonstrated as the most reported single-sensor method, and cutting forces measurement is the second, as reported in TCM literature [10, 269]. Besides, this review reveals that the use of multi-sensor approaches in chatter detection has been overlooked over the last decade, but it has gained attention again in the last 2 years. This situation shows that there is a gap in the use of multi-sensor for chatter detection when compared to other TCM areas of research, as this approach has been continually reported over the last decade for tool wear and tool failure monitoring [10, 36].

Fig. 5
figure 5

Percentage of the signal types processed in chatter detection literature

2.3 Experimental design and model-based testing condition selection

In a monitoring system, the selected sensors convert the physical conditions such as force or vibration to electrical signals through signal conditioning circuitry. A data acquisition (DAQ) system is used to sample the conditioned sensor signals to digital values and store them in a computer for analysis and processing. To acquire useful and reliable data from monitoring, a machining process requires an adequate design of the experiments. It includes the selection of diverse parameters which are also relevant and could affect the detection accuracy. For instance, to set up the sampling rate and apply anti-aliasing filters, the sampling rate is a relevant setting for accurate data acquisition. It should comply with the Nyquist–Shannon sampling theorem and be higher than at least twice the useful frequency of the original signal. A too high sample rate could lead to an increase in noise susceptibility and an overlook of low-frequency information [269], as occurred in the study by Dubey et al. [270]. Some authors have acquired data on air cutting and tool idling to identify and remove background noise, as in [78, 91, 101, 138, 239]. Therefore, the cutting parameters should be carefully selected before the cutting tests to cover a wide range of the cutting stability conditions and also to reduce the experimental cost and time. Determination of milling parameters, such as spindle speed, feed rate and depths of cut, can be facilitated by using chatter prediction concepts such as a stability lobe diagram (SLD). These methodologies, their challenges and the reported design considerations are briefly discussed in this section.

2.3.1 Stability diagram and chatter prediction

Modelling and simulation of chatter stability can facilitate the experimental design for chatter detection significantly. A comprehensive review on chatter prediction and stability modelling can be found in [1,2,3,4]. There has been extensive research on chatter stability in milling by many researchers, such as by Altintas and co-workers [16, 271,272,273,274,275]. In milling, chatter occurs in a bandwidth close to a natural frequency or its harmonics. It may be modulated by the fundamental frequencies [112], due to the system dynamics and the cutting conditions [143]. Moreover, milling processes have multiple chatter frequencies, while turning usually has a specific chatter frequency [206]. Chatter stability limits can be estimated in the stability lobe diagram (SLD), which is a figure to expose cutting parameters for a chatter-free process. In an intelligent CNC controller, the SLD can be used for the selection of optimal cutting parameters to avoid chatter occurrence or for the adjustment of the cutting parameters to suppress or mitigate chatter when a chatter onset is detected. The SLD is obtained from the dynamic response of the machine tool, predicting the stability limits as a function of the axial depth of cut (DOC) and the spindle speed [1,2,3,4]. Figure 6 presents a typical SLD and the selection of the experimental parameters based on it. It illustrates how the researchers choose the parameters, before testing, to obtain data from diverse stability conditions. Li et al. [261] proposed a chatter criterion for the dynamic milling process simulation in the time domain to generate a set of stability diagrams, which can be used in the determination of chatter-free cutting conditions.

Fig. 6
figure 6

Selection of the milling parameters based on the stability lobe diagram [159]

The identification of the dynamic response of the machine tool and workpiece system is traditionally through hammer testing to obtain a set of frequency response functions (FRFs). The hammer test has some limitations in dealing with a slender and rotating structure like an end mill [4, 106, 276, 277]. The calculation of SLD using stationary FRFs may be inaccurate for a rotating component, especially at high spindle speed [206, 278]. It may lead to the omission of bistable zones, which are stable zones where a sizeable perturbance can still cause chatter [279, 280]. The stability islands as shown in Fig. 7 are isolated zones in the SLD where the chatter condition shifts back and forth, and it is associated with the time periodicity. They are mostly dependent on the number of teeth, radial depth and the tool geometry [3, 265, 280,281,282,283,284,285] and may occur even with multivariable cutters [286,287,288]. However, it is noted that when using FRF measurements for the calculation of the stability lobes, the results by Otto et al. [289] showed that no stable or unstable islands were found because the probability for stability islands decrease for an increasing number of flexible eigenmodes. The measured FRF means that more structural degrees of freedom are considered.

Fig. 7
figure 7

Representation of stability islands in the SLD. a Stable island in the chatter zone, adapted from [280], and b unstable island in the rightmost stability lobe, adapted from [265]

Some researchers have explored different approaches to obtain more accurate SLDs to estimate chatter stability [290, 291] or novel prediction methods [292]. The effects of the mass and the connecting cable of an accelerometer on the FRF measurement of a thin-wall workpiece were assessed by Olvera et al. [293]. It was found that the eigenfrequencies were shifted to high frequencies. Grossi et al. [294,295,296] identified the speed-varying FRFs under operational conditions based on the spindle speed ramp-up test. Brecher et al. [297] presented a method to determine the SLDs experimentally by varying the spindle speed and/or depth of cut in-process continuously, which was enabled through communication between the machine control and chatter detection algorithm. Kiss et al. [298, 299] identified stability boundaries with an impact test in operational conditions. Bravo et al. [277] proposed a method to determine the stability lobes of thin-wall milling with both the flexibility of the workpiece and the machine considered. In the development of a 3-D stability model for the machining of thin walls and thin floors, Campa et al. [300, 301] also considered the flexibility of both the machine and the workpiece in three Cartesian directions. Olvera and Elías-Zúñiga introduced an enhanced multistage homotopy perturbation method (EMHPM) to solve differential equations with strong nonlinearities [302]. This method has been utilized to calculate the stability lobes for a multivariable milling tool with variable pitch, variable helix and variable rake angle [286, 287, 303]. To avoid mislabelling in terms of the stability condition, i.e. labelling the data as stable or unstable cut, the inspection of the workpiece surface and signal analysis is highly recommended for chatter identification [277, 304].

2.3.2 Cutting parameters and experiment setup

Diverse milling parameters were reported at the cutting tests in chatter detection literature to obtain signals at stable and unstable conditions. Most tests were conducted with constant DOC and different spindle speeds. To acquire data from the transition from stable to chatter conditions, some experiments continuously increase the DOC using a wedge-shape workpiece (i.e. ramp-shape), as shown in Fig. 8. The ramp configuration allows collecting data from a wide range of DOC and easy identification of chatter [301], but it may generate uneven forces in each tooth and reduce the regenerative effect. Alternatively, a step-shape configuration has also been reported in [101, 133, 134].

Fig. 8
figure 8

Wedge-shape workpiece configuration and surface finish transition [153]

Considerations in the chatter detection tests include the cutting tool type and geometry, tool material, tool immersion, tool runout, workpiece material, chip load and milling types. The works by Jensen et al. and Dong et al. [305, 306] studied the effect of diverse milling parameters in the SLD and the impact on chatter occurrence. Tool immersion was studied by Merdol and Altintas [307], while the effect of the workpiece material is evaluated by Ibañez et al. [308]. Tool runout was studied by Insperger et al. [309] and Ma et al. [310], and their results showed that it modifies the chatter frequencies but enhances the stability boundaries, while Otto et al. [289] pointed out the necessity of considering the runout and tool geometry for an accurate prediction of the stability. Mann et al. [311, 312] investigated the effect of milling type, down milling and up milling, from an analytical and experimental approach, respectively. The authors verified that different dynamic response occurs for each milling type at the same cutting parameters. Albertelli et al. [164] detected chatter on variable spindle speed regimen, while Qu et al. [313, 314] analysed the effect of feed per tooth in thin-wall milling stability. Recent studies found that chatter stability limits in thin-wall milling are also dependent on the tool position and workpiece geometry [304, 315, 316].

Regarding the workpiece material utilized  in chatter detection  literature, aluminium alloys are the most employed materials, as in [73, 76, 78, 79, 81, 83, 84, 87,88,89,90,91, 93, 95, 96, 101, 106, 111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162, 172, 183, 194, 198, 200,201,202, 204, 222, 234, 235, 290]. Whereas interest has risen for Ti6Al4V milling and it was utilized in [77, 86, 117, 160, 169,170,171, 173,174,175,176,177,178,179,180,181,182, 184,185,186, 197, 243, 317], other studies employed various steels [5, 72, 74, 117, 135, 149, 163,164,165,166,167,168, 203, 221, 223, 236, 260]. Few scholars either utilized simulated signals, as mentioned in the previous section, no conventional materials as composites ones, or neglected to indicate the machined material [141, 187,188,189,190,191,192,193, 195, 196, 199]. Assessment of chatter detection for dissimilar materials has been studied only in 3 works [117, 135, 160]. The works by Li et al. and Ren et al. [101, 138] claimed to have utilized different materials, while others have employed dissimilar tools as Hauptfleischová et al. [223], but more details on their effect on chatter occurrence and detection were not provided. It is also noted that the effect of tool coating on chatter has not been widely studied, despite the fact that coatings modify the tribological behaviour in machining such as varying wear, friction, cutting forces and temperature [318,319,320,321].

3 Signal processing

Signal processing is performed to obtain useful information on signals acquired by the sensing devices. In chatter detection, the signals are also processed aiming to filter and reduce noise, identify the dominant frequencies, isolate data from specific frequency bands and ease the extraction of chatter features. Different techniques have been employed to identify chatter onset, as in the time domain, the frequency domain, the time–frequency domain or by using other approaches as well, which will be discussed in this section.

3.1 Time domain analysis

Time domain analysis refers to the assessment of a signal over time as chatter causes transients and amplitude modulations, as shown in Fig. 3. The use of statistical parameters has been a constant to reduce computational cost. While most studies aim to extract statistical or other conventional features that represent the signal characteristics in the time domain, as detailed in Section 4.1, some have proposed various dimensionless chatter indicators based on statistical parameters, as done, for instance, in [5, 74, 182]. Others have treated the signals as time series and utilized different mathematical approaches for data analysis, as presented in [144, 147, 170]. All methodologies aim to identify the chatter onset as early as possible by extracting diverse chatter features. It includes the use of nonlinear parameters to evaluate the degree of randomness, complexity or similarity of the signals. Different novel features have been reported in recent years from the time domain analysis, based on entropy concepts, multiscale techniques, topological analysis and fractal patterns, but statistical features are still widely used. Details about signal features utilized as chatter indicators are discussed in Section 4.

Schmitz et al. [322] evaluated a chatter detection technique based on the statistical variance in the once per revolution sampling (OPRS) audio signal during milling, which uses the synchronous and asynchronous nature of stable and unstable cuts, respectively, to identify chatter. Zhao et al. [141, 323] used multiple sampling per revolution (MSPR) to improve the reliability in representing the stability characteristics and to achieve real-time detection of early chatter in high-speed milling. A study on assessing some key statistical features in the time domain from multiple sensors in monitoring titanium milling was presented by Navarro-Devia et al. [324], in which signal segmentation methods were considered, as the window size affects computational efficiency and accuracy of TCM [141, 324].

The Poincaré section method for chatter detection in the time domain is based on two-dimensional vibration, i.e. vibration in a plane. For a stable milling process, the trajectory points at the same angular position will lie close to one another, while for an unstable process, this trajectory becomes chaotic, and the bisection points do not lie close to one another [217]. The Poincaré section approach employing OPRS has been applied to analyse different signal types, such as vibration acceleration [217, 220, 265], displacement [312] and force signals [325, 326], and several indicators can be extracted. The once-per-tooth period sampling has been used to identify and characterize stable milling, secondary Hopf (quasiperiodic chatter) and period-n instabilities [265, 327, 328], as shown in Fig. 9, while subharmonic sampling allows identifying each instability type and the stability islands [265, 329]. Alternatively, to observe the state transitions under diverse stability conditions, scholars utilized the bifurcation diagram, which compiles multiple Poincaré maps over a range of axial depths on a single spindle speed [265, 330].

Fig. 9
figure 9

Adapted from Fig. 7 of [331], corresponding to sampled acceleration signals (d, e, f) and Poincaré diagrams (g, h, i), for unstable period-doubling chatter (d, g), stable cut (e, h), and quasi-periodic unstable cut (f, i)

Time domain representation of acceleration signals under diverse cutting conditions.

All these methods analyse the behaviour of the signal over time to identify changes that indicate the chatter onset. The time domain analysis could require fewer data and processing time than an FFT-based method and show an acceptable accuracy, but frequency-based methods have been more widely utilized, mostly when the chatter frequencies and the system dynamics characteristics are known or estimated beforehand, as described in the next section.

3.2 Frequency domain analysis

Signal information expressed in the frequency spectrum is effective to identify frequency variations due to chatter. FFT and power spectral density (PSD) are the most employed methods in signal pre-processing, data labelling and determination of dominant frequencies. It is because chatter is characterized by a change in the frequency components and the energy distribution, rising uncertainty of the signal [113, 206]. Besides, it is common to use the frequency spectrum to compare machining conditions and signal analysis, identifying if the acquired data corresponds to stable or chatter cut, as shown in Fig. 10. Frequency analysis also works as a reference method to show the viability and advantages of a proposed technique, as in [77, 147, 191, 233]. Recent research efforts are aimed to extend frequency analysis by using new chatter indicators. Jo et al. [253] proposed the sum of frequency components in a high-frequency band to later extract statistical features to identify chatter. Chang et al. [143] searched the vibration frequencies under diverse cutting conditions and identified chatter occurrence without needing a threshold. Jeong et al. [236] employed a liftered spectrum, while Shimana et al. [191] suggested the pseudo-auto-correlation function as a faster alternative.

Fig. 10
figure 10

Analysis of cutting forces in the time domain and the frequency domain to label signals as a stable, b slight chatter and c severe chatter, when employing a two flutes end mill. Adapted from: [112]

Frequency domain methods are also utilized to obtain decomposition parameters, estimate frequency bands, detect chatter occurrence or validate a new chatter model, as seen in [73, 88, 119, 162, 218, 332, 333]. For instance, Afazov and Scrimieri [334] proposed a Digital Twin model for chatter in milling and utilized the FFT of measured signals for model validation. These techniques allow easy identification of chatter occurrence; however, one limitation of the FFT is that it cannot provide a track of the frequency fluctuations throughout time. Hence, it is required to use time-frequency domain methods to assess non-stationarity and time-varying signals. Hence, other methods have been widely employed over the last decade and are detailed in the next section.

3.3 Time–frequency domain analysis

Diverse time-frequency domain analysis methods are reported to analyse frequency components of nonlinear and time-variant signals for mechanical diagnosis. In most chatter applications, the first step is the decomposition of a signal into a group of new sub-signals, usually as a function of the dominant frequencies, specific frequency bands or just to reduce the noise effect. The difference between each method is the criteria and the mathematical manipulation to decompose and filter the original signal.

The short-time Fourier transform (STFT) generates a group of signals from Fourier transforms in a fixed time, with a windowing function. However, the window parameters disturb the resolution of the generated time-frequency spectrograms, where a higher frequency resolution reduces the time resolution and vice versa. For chatter detection, it is recommended to select the window parameters in such a way that each band contains a dominant frequency of the system but as narrow enough so each dominant frequency is within a unique spectrogram [179].

In contrast, the wavelet transform (WT) converts the information into a group of wave-like signals, from which the original data can be reassembled using the weighting coefficient of each signal (i.e. wavelet coefficients). Yoon and Chin [233] showed that WT has the same reliability as FFT, and it can also act as a noise filter. A continuous WT (CTW) provided better resolution and noise filtering compared to an STFT [79, 325]. Tangjitsitcharoen et al. [75] identified that chatter frequencies can appear in different bands when using WT decomposition. Wang and Liang [163] used discrete WT (DWT) for denoise and the probability distribution of wavelet modulus maxima as a non-dimensional chatter index. Cabrera et al. [115] reconstructed a denoised signal through wavelet denoising, and their resultant peaks were defined as chatter thresholds.

WT-based methods provide low-frequency resolution for high-frequency components and low time resolution for low-frequency bands, which may affect the signal analysis with relevant information in some frequency bands. More flexible methods are the wavelet packet transform (WPT) and wavelet packet decomposition (WPD), as they offer the same frequency resolution in the full frequency range [194]. WPD performance increases with the number of generated frequency bands, but this also increases computational cost [176]. Yuan et al. [203] reported that the wavelet coherence function provides a higher performance compared to the wavelet cross-spectrum. Nevertheless, these methods require the selection in advance of some decomposition parameters, and these values modify the obtained sub-signals.

Self-adaptive approaches and nonlinear signal processing methods have been developed as an alternative to WT-based ones. Among these, empirical mode decomposition (EMD) is widely applied to fault diagnosis [335]. EMD decomposes the signals into a group of intrinsic mode functions (IMF) representing simple oscillatory modes. In contrast to simple harmonic functions, the IMFs have a variable amplitude and frequency over time. The Hilbert-Huang transform (HHT) combines the EMD and the Hilbert spectral techniques. It is utilized in chatter identification to compute instantaneous amplitude and frequency. Rusinek et al. [170] concluded that chatter states could be detected by the determination of HHT modal functions and the assessment of IMF’s statistical features. Susanto et al. [125] demonstrated that HHT can be employed for chatter detection and tool condition monitoring simultaneously, while Cao et al. [194] applied WPT and HHT sequentially in end-milling, and Rusinek and Lajmert [193] used the improved HHT in the milling of a composite material. Liu et al. [116] instead combined diverse methods. They obtained IMFs using EMD, then used WPD for signal reconstruction and extracted chatter features from the HHT spectrum.

The ensemble empirical mode decomposition (EEMD) alleviates mode mixing in EMD for noisy signals by adding white noise, suppressing the noise level and enhancing the narrow-band quality, which enhances chatter detection, as presented in [111, 113], while Wan et al. [122] utilized EEMD in the HHT. Local mean decomposition (LMD) decomposes the signal into a group of product functions, and it has also been employed in chatter detection by [81, 87, 151, 336], with better performance than EMD. However, the LMD method is not widely used as it cannot converge when the step size is poorly selected according to Yang et al. [133], and as with EEMD, they are iterative processes and cannot extract the fault feature [337]. Also, the sampling frequency has an effect on mode mixing in noise-assisted EMD-based algorithms [338].

Variational mode decomposition (VMD) is an alternative that overcomes these issues. It performs an adaptive decomposition of the signal in IMFs and does not lack theoretical basis and noise sensitivity, compared to other EMD decomposition methods [176, 337]. For chatter detection, Zhang et al. [112] decomposed force signals using VMD, and they obtained a higher response to chatter than when using WPD, while an adaptive filter was added to remove the spindle harmonics by Li et al. [138]. A constraint in VMD is that the decomposition parameters K and α (the mode number and balancing parameter) are required to be defined in advance. Liu et al. [130] identified the dominant frequencies with FFT and proposed an automatic parameters selection approach, which would generate the Kurtosis value in the reconstructed signal able to represent the condition of interest. Moreover, they compared their approach to the one used by Zhang et al. [112] and showed that the detection sensitivity was increased when the chatter indicator was evaluated in IMFs of non-dominant frequencies. Yang et al. [133] selected the VMD decomposition parameters automatically by using the simulated annealing (SA) algorithm, while Liu et al. [181] utilized particle swarm optimization (PSO) and the maximum crest factor of the envelope spectrum (CE). Both works show that the selection of decomposition parameters could affect the value of the chatter index. It was shown that VMD has a better performance in chatter detection than EMD and EEMD, while Seyrek et al. [339] recently presented a detailed comparison of the results with these 3 techniques. Hence, VMD is one of the most acceptable methods for signal processing currently. Figure 11 illustrates the decomposition of a signal using VMD, the identification of the fundamental frequency (fz), their harmonics and the chatter frequencies along the IMFs.

Fig. 11
figure 11

IMFs obtained through optimized VMD and their FFT spectra, with the identification of the fundamental and chatter frequencies [181]

This review exposes that when decomposing the monitoring signals in different frequency bands, either into IMFs or wavelet packets, the number of decomposed signals varies in the literature in the range of 5 to 16. While each study manages diverse chatter features and classification models, most of the studies only employed up to 3 decomposed components with high energy rates or containing the fundamental and chatter frequencies for signal reconstruction and feature extraction. To enhance the identification of the chatter frequency bands, diverse alternatives have been studied. Energy operators, like the squared energy and Teager-Kaiser operators, have been used in chatter detection to estimate instantaneous frequency and amplitude, along with diverse decomposition methods [179, 189]. Lee et al. [168] utilized the Teager-Huang transform (THT), which combines EMD with the Teager energy operator. Moreover, novel methods have been considered as an alternative to VMD. The synchrosqueezing transform (SST), an STFT-based method, has been employed to decrease the effect of tooth passing frequency and its harmonics. It has been successfully applied to sound [80], displacement [131] and vibration signals [201], while Yan and Sun proposed a multi-SST for early chatter detection [93]. Variational nonlinear chirp mode decomposition (VNCMD) and short-time difference spectrum analysis (STDSA) have been applied in turning, while Tu et al. [148] utilized the iterative nonlinear chirp mode decomposition (INCMD) in milling, which combines VNCMD and HHT. Dong et al. [155] proposed a Vold-Kalman filter and showed that it has better performance against noise compared to EMD and VMD. Singular spectrum analysis (SSA) has been considered for chatter detection using force [126] and acceleration signals [166], showing accurate and rapid detection of the transition state. It consists of a non-parametrical decomposing of the signals, as time series, into a sequence of segments with simpler components.

Albertelli et al. [164] pointed out that it is highly challenging to apply the time-frequency methods to a process with variable spindle speed. Instead, they employed the cyclostationary analysis in the tool angular domain for machining at constant and variable speeds. Cyclostationarity analysis and angular synchronous averaging (ASA) have been employed with vibration [134] and cutting force signals [157]. Lamraoui et al. [259] utilized signals from the spindle encoder to evaluate instantaneous angular speed (IAS), showing that chatter increases the second-order components (i.e. energy properties). Cyclostationary-based indicators were proposed in the angular domain from the periodic and residual parts of angular speed and cutting force signals for chatter detection, and the indicator based on IAS is recommended as it does not require additional sensors [84]. The application of adaptive variational mode decomposition for chatter detection has been lately reported in [142] and [198]. Mishra and Singh [87, 340,341,342,343] investigated a spline-based local mean decomposition technique, while Zhang et al. [137] used a morphological empirical wavelet transform (EWT). Ren and Ding employed an adaptive Hankel low-rank decomposition to adaptively separate the chatter-related components from the observations [101]. However, the works cited in this paragraph agree that they are still diverse challenges to the adaptation of these methods in an industrial application. Some of these challenges are discussed in Section 6.

3.4 Additional analysis approaches

One additional approach is to analyse the information in a bidimensional form, i.e. the data is transformed and processed as an image instead of as a signal. The use of image decomposition techniques, especially deep learning, has become recurrent in chatter detection in recent years. In 3 different investigations [91, 165, 179], different signal types were processed in the time-frequency domain, and the obtained spectrogram is evaluated as an image, resulting in a better performance than conventional methods, as it allows describing non-stationary properties of the signals as scalar features. For instance, in the works by Chen et al. [165, 179], the spectrogram was separated into a group of grayscale 2D images in terms of the domain frequency bands, as shown in Fig. 12. Next, candidate features were extracted using the second-order image statistics (e.g. contrast, correlation, homogeneity, among others). The authors compared their technique against the time domain features used by Lamraoui et al. [106] and the wavelet-based chatter indicators of Yao et al. [209]. Tran et al. [159] utilized a convolutional neural network (CNN) to process scalograms. Meanwhile, Koohestani et al. [220] employed image processing in the Poincaré sections. The bidimensional analysis allows quantifying properties and to assemble the information into a monitoring algorithm in a more flexible way.

Fig. 12
figure 12

Processing of STFT spectrograms into a group of grayscale images [179]

There are some risks with this approach. Additional data processing could lead to an increase in time delay, accumulated error or lost information. This can be observed by comparing the results of [165] and [179], where the number of dominant frequencies changed from one work to the other, resulting in the number of images and the selected features differing from each other. Thus, the transformation of data into an image could potentially increase the required processing before signal information can be used as input in a monitoring model.

Another type of image processing is to analyse the images of the machined surface on the workpiece. Khalili and Danesh [344] employed the WT to decompose surface images, and Zhu et al. [192] employed images of the machined parts with CNN, along with optimization techniques. A similar method was applied to evaluate chatter marks in turning [345]. However, the processing of surface images could be hard to achieve for real-time detection due to the harshness of the machining environment, including narrow spaces, tool and workpiece movements, interference of chips and the effect of cutting fluids in the image quality and the image transmission speed.

On the other hand, the transformation of sensor data into an image has facilitated the use of DL techniques in TCM, as these methods have been extensively studied for image processing tasks. Image processing has been applied to a T-F representation in [136, 152, 159, 196] and to pictures of the surface roughness using vision-based techniques in [135, 161, 188, 190, 192]. Analysis of cutting information as an image, instead of as a signal, has been successfully applied for other machining tasks. Sun et al. [346] combined texture images and features from vibration signal IMFs to monitor the surface condition and evidenced chatter occurrence. Furthermore, image decomposition methods have been reported on diverse applications of fault diagnosis, including bidimensional EMD (BEMD)-based approaches and texture analysis. These methods could also be considered for chatter detection.

In summary, there are extensive signal processing approaches that have been developed and applied to chatter detection. The different signal processing methods employed in data-driven chatter detection are compiled in Table 1. For the publications that compared two or more techniques, the table presents only the method which had the best performance, according to the respective authors. VMD is a relatively new method for this application, and current research efforts are focused on the use of optimization techniques to select and define the best decomposition parameters. Singularity analysis, SST, EWT, synchronous average techniques, cyclostationary analysis and angular sampling are the emerging approaches. Nevertheless, conventional T-F methods are still being studied in terms of extraction of new candidate features, automatic selection of the chatter indicators, use of a hybrid method (i.e. the combination of two or more approaches) and feasibility of novel techniques, as the bidimensional analysis, along with the continuous study of model-based systems.

Table 1 Signal processing methods employed in chatter detection in milling

Regardless of the signal processing technique, most of the reported methods need to identify signal characteristics that can represent or indicate the chatter onset. To achieve this, a wide range of features have been extracted and assessed to assess the stability condition. The following section discusses the diverse features reported in chatter detection literature.

4 Feature extraction

It is essential to extract a signal characteristic which represents the condition of machining processes while reducing the dimension of raw data and keeping the relevant information about the process condition. Over the last decade, diverse features have been proposed and evaluated as indicators of chatter occurrence. In a data-driven machine health monitoring system, including chatter detection, features are proposed based on expert knowledge and extensive human labour may be required for hand-crafted features [65].

4.1 Feature generation

A variety of methods have been used to construct signal indicators that reflect the condition of the machine, known as features. They should have characteristics like high sensitivity to the condition occurrence and which can be computed by a monitoring algorithm. For instance, Fig. 13 shows the response of 3 common chatter features (i.e. fractal dimension, standard deviation and power spectral entropy) over a vibration signal and illustrates how the selected features varied as the process shifted from stable to unstable cut. There is a wide diversity of extracted features in chatter literature, chosen according to the signal type, the number of sampling tests and the expert knowledge. The features can be preliminarily categorized into the following types.

  • Statistical features: They are mostly extracted from the time domain and frequency domain and represent the data as a statistical parameter of a probability distribution.

  • Energy features: As chatter occurrence is characterized by the change of frequency and energy distribution, many authors have extracted features that represent it.

  • Complexity features: Latterly, there is a trend in features that describe the irregularity or complexity of the signals.

Fig. 13
figure 13

Response of 3 common chatter features—fractal dimension (c2), standard deviation (c3) and PSE (c4)—for an acceleration signal (c1) from stable to chatter state. Adapted from [121]

Some signal features, such as Kurtosis, spectral kurtosis, power spectral entropy (PSE) and nonlinear energy operators, among others, have been utilized also as a criterion of decomposition, feature selection, optimization criteria or as a performance indicator. In the frequency domain, it is common to evaluate the amplitude and frequency components, along with other frequency features such as PSE.

It was identified that the most common features selected as a chatter indicator were some common statistical features like Root-mean-square (RMS), standard variation (SD), PSE, Kurtosis, peak-to-peak and other common statistical features. They are still utilized as chatter features from decomposed and filtered signals. However, a sizeable percentile of the literature has introduced and validated new features, along with new criteria to choose the candidate feature and threshold value. Criterion selection usually emerges from multiple experiments under several cutting parameters in both stable and unstable conditions. Hence, many authors aimed to eliminate cutting condition influences by defining a normalized or a dimensionless threshold, while some recent studies in milling dynamics [341, 349,350,351] have suggested a different approach, the use of a quantitative value to represent chatter stability, instead of a qualitative designation (e.g. stable, transition, chatter). In other words, these four studies have proposed that chatter occurrence should be represented as a numeric variable instead as a categorical state. Next, they determined chatter occurrence whenever the calculated value was higher than a defined threshold. This methodology is comparable as it is done in the literature with flank wear (VB) for tool lifetime assessment. Concisely, it involves that flank wear is measured, and next a value of a uniform VB≥0.3 mm is usually utilized as a criterion to determine that the tool is worn and has reached its useful lifetime, as detailed in the international standard ISO 8688-2 - item 7.4 [352]. The study of chatter from a quantitative perspective is detailed in Section 6.

Some features are being used as a reference to validate new proposed alternatives and therefore have been replicated continually in the literature. Cao et al. [111] evaluated the C0 complexity index and PSE, two features related to stochastic noise, which are parameters that decrease due to periodic chatter. They proposed the consideration of features that may decrease, instead of the conventional exploration of a feature that rises in unstable cutting, as most of the traditional statistical parameters. Similarly, Ji et al. [90] extracted fractal dimension (FD) and PSE, two complementary features that one increases and the other one decreases in chatter, to later include the standard deviation [121]. Furthermore, Liu et al. [176] handled decomposed vibration signals, and the RMS of the sub-signals were used as input of a hybrid algorithm. They noticed that the system had a lower error for chatter detection when using the VMD method rather than WPD; showing the fitness of a feature is also dependent on the utilized signal processing and its ability to remove or filter irrelevant information. The energy entropy criterion, proposed by Zhang et al. [112], has been used as a reference for novel features and techniques. Yang et al. [133] used statistical features while using a genetic algorithm (GA) as an alternative to the iterative method proposed by Liu et al. [130]. Liu et al. [177] defined a feature where the threshold value varies with the tool geometry. Dong and Zhang [120] proposed the complexity index, while Zhao et al. [141] used a maximum entropy-based feature.

Energy features have also been studied. Energy ratio (ER) is the energy proportion of the decomposed signal, the IMF, the wavelet packet or the frequency band, compared to the total energy of the original signal. Nonetheless, Kuljanic et al. [72] proposed a different indicator also called energy ratio (CIER), which represents the ratio of the aperiodic component over the total energy of a signal, by frequency domain analysis. Perrelli et al. [92] calculated the CIER using WPD, showing that frequency-based CIER has a better chatter response. Al-Regib and Ni [189] proposed an energy ratio index based on the Wigner time-frequency distribution. Caliskan et al. [124] employed ER, with diverse sensors and a Kalman filter, to identify chatter frequencies. Fu et al. [113] proposed to calculate two energy features, normalized energy ratio (NER) and coefficient of variation of the spectrum (CV), which represent the energy addition and variation in chatter.

To assess the change in energy distribution that occurred with chatter, Zhang et al. [112] employed the energy entropy (EE), which represents the distribution of the ERs based on the Shannon entropy, to classify three types of chatter. Yao et al. [76] evaluated the energy entropy from a unique frequency band and called it relative wavelet packet EE (WPE), while Cuka et al. [219] utilized the Teager’s energy operator, which detects variations in instantaneous amplitude and frequency. It resulted in a smaller computational cost compared to FFT-based features.

Entropy features represent the signal randomness and complexity. Some of them have been reported for chatter detection, including permutation entropy (PE) [134, 157], Rényi entropy (RE) [162], Sample entropy (SampEn) [133, 142, 186], approximate entropy (ApEn) [88, 133, 169, 170, 183] and dispersion entropy [151]. Tran et al. [154] utilized fuzzy entropy for feature selection. Wang et al. [151] achieved the highest accuracy when using conventional statistical features and signal processing rather than entropy-based features from raw data. Yang et al. [133] identified that chatter occurred at the time point when ApEn and SampEn had an asymptotical change of direction, which agrees with the results by Li et al. [134, 157], who alternately proposed to use multiscale entropy features. However, it was only observed in an IMF with the chatter frequency and not in the whole signal or reconstructed signal. Recently, Hauptfleischová et al. [223] compared the performance of several chatter indicators including coarse-grained entropy rate (CER), RE, SD and FD and PSE for the online identification of chatter in milling. It was reported that CER criterion appears to be the most useful with high reliability and relatively fast computation.

In many studies, the threshold value was highly dependent on the cutting parameters, the employed signal processing technique and the number of cutting tests, so recent studies aim for features that are not affected by the cutting condition parameters. Chen et al. [162] employed the threshold value of an entropy-based feature which was previously normalized as a function of the cutting parameters. Yang et al. [133] extracted chatter features from the filtered signal to reduce the cutting parameter effect, while Cheng et al. [198] proposed a coefficient indicator instead of using an absolute threshold. Entropy theories have been also applied as criteria for feature ranking and selection, as in [258]. Mei et al. [126] processed force signals by SSA, and the singular value entropy (SVE) was used as a chatter indicator. A similar approach was presented by Wang et al. [166] using the singular spectrum entropy as a feature, along with the energy ratio in WPD. Liu et al. [130] evaluated an entropy-based feature in a frequency band that does not include the fundamental frequency or its harmonics.

Diverse nonlinear analysis methods have also been studied; for instance, Lyapunov exponents were reported in [169, 193, 326, 353]. Fractal dimension (FD) is also an employed feature to measure the complexity of a pattern and the intrinsic properties of a signal, although fractal properties can also be detected by the CWT [38]. Zhuo et al. [354] employed the FD in the time and frequency domains, while Chen et al. [180] and Liu et al. [355] evaluated multifractal-based features and Feng et al. [356] utilized a dichotomy-binary strategy to reduce the time consumption required by fractal methods. Jing et al. [357] designed two indicators based on the p‑leader multifractal spectrum to identify the stable, weak-chatter and chatter occurrence for a micro-milling scenario, where the high spindle speed over 20,000 rpm, the reduced-sized of the cutter and the miniature dimension of the workpiece affect the process dynamics. Topological features and similarity measure were proposed as they are noise-robust, as shown in turning experiments [358, 359], and in simulated signals of end milling by Yesilli et al. [263]. Among other candidate features, there is the Holder exponent of vibration signals [360], while diverse wavelet packet-based features are reported in turning [361,362,363], including a proposed multiscale wavelet packet entropy (MWPE) to detect chatter regardless of the beat effect [364].

Multiscale features are utilized as they can represent the system dynamics over multiple temporal scales, by the coarse-grained procedure [365], and its concepts have been applied as chatter features. Liu et al. [181] studied the effect of the scale factor and VMD decomposition parameters using multiscale PE (MPE). Li et al. [134, 157] and Liu et al. [234] presented the advantages of multiscale PSE (MPSE) and MPE, while multiscale dispersion entropy (MDE) is reported by Liu et al. [317]. Nevertheless, due to the reduction of data point and scale factor, multiscale entropy features may have information aliasing, but to face this issue Zhao et al. have proposed different alternatives applied in mechanical fault diagnosis [366, 367], while Chen et al. [368] have proposed an improved PE (IPE) with higher recognition of complexity than PE for noisy conditions.

At the same time, some hand-crafted features have been also reported. A distribution-based criterion was proposed as a threshold value independent of cutting conditions [80, 118]. Jia et al. [202] designed a synthetic criterion (SC) that mixes standard deviation and autocorrelation function. Similarly, the multi-dimensional indicator (Q-factor) uses the centre frequencies and the oscillatory characteristics of the signal [129], and it was employed in mirror milling [123]. Sun et al. [81] correlated several features from vibration and force signals with a hidden Markov model (HMM). Vela-Martinez et al. [128] introduced the rescaled range (R/S) which refers to the predictability of a long-term signal, along with a fractal-based index, the Hurst exponent, as an indicator of the stable condition. It was also utilized in detrended fluctuation analysis [150] and with multiscale entropy [171]. The chatter indicators proposed by Kuljanic et al. in [72] and [117] are widely utilized as a reference. Chen et al. [147] proposed a normalized cyclostationary energy ratio (NER), compared it to 2 indicators in [72] (i.e. the autocorrelation coefficients CIAC and CIER), the variance ratio from [122] and some entropy features, with an overwhelming performance as shown in Fig. 14.

Fig. 14
figure 14

Comparison of the NER (proposed) to other indicators, for chatter occurrence at 8.6 s [147]

In summary, it is seen that there is no consensus about the best feature for chatter detection, which has promoted the continuous research of new indicators, as listed in Table 2. It was found that statistical features are still highly employed. However, over the last 10 years, various chatter criteria have been presented aiming to reduce the effect of the cutting parameters and background noise. The use of multiple features instead of a unique index has been investigated to increase the system performance, either in single-sensor, multi-channel, or multi-sensor configurations. As the number of potential features has increased, it is required to select the best features that can provide the highest accuracy with lesser computational cost. The considerations and approaches for feature selection are discussed in the following section.

Table 2 Feature types reported for chatter detection

4.2 Feature selection

The trend of using more than one signal feature for chatter detection demands the utilization of methods for feature selection and dimensionality reduction. They eliminate irrelevant or redundant features, along with reducing the computational cost and the risk of overfitting. The literature review showed that the selection of features and thresholds seems to be dependent on the number of tests, signal type and milling parameters. For instance, time-frequency image analysis was done by Chen et al. for a difficult-to-cut material with 82 tests; the top features were second-order statistical features [179], while for a micro-milling of low carbon steel with 29 tests, the candidate features were first-order statistical features [165]. It has also been reported that the selection of optimal features and sensor type varies with the condition of interest and experimental settings [106, 370, 371].

As to feature dimension reduction, principal component analysis (PCA) is a commonly utilized technique, as shown in [175, 186, 196]. Fu et al. [132], Chen et al. [144] and Dun et al. [185] employed it as a reference to show the advantages of their methods. Jo et al. [253] suggested that the use of the modified independent component analysis (MICA) method outperforms PCA, while Liu et al. [175] illustrated the contribution of PCA with different signal processing and classification methods.

In the cases of using multiple features, or assessment of the same feature from multiple signals, the techniques reported for feature selection include ReliefF [180], t-SNE [153, 195, 236] and the recursive feature elimination (RFE) [145, 151, 200, 372, 373], among others. In a study by Wang et al. [151], different features of the amplitude domain, frequency domain and nonlinear domain were extracted from acceleration sensors. A feature evaluation model based on recursive feature elimination (RFE) was used to obtain the main feature parameters related to the machine tool state, which helped to improve the accuracy of chatter detection in a CNC machine. It was noted that when using only one feature out of 15, the RMS value from the denoised vibration signal provides the highest accuracy regardless of the classification models.

The findings of this review suggest that the number of training tests and experimental conditions may affect the definition of chatter criteria and the selection of the candidate features. It could result in the system not working in cutting conditions outside the parameters considered by the training. However, resolutions are being made through novel techniques in the monitoring model and the use of multiple features for more accurate chatter detection. Hence, the use of diverse AI methods for the classification of data using various features has been reported in chatter detection literature. The next section discusses the diverse techniques published over recent years.

5 Classification models

5.1 Supervised learning

Once features are extracted and selected, the next step is to train and assess the monitoring model, which will be utilized to detect chatter occurrence. Nonetheless, the number of studies that included the use of classification models is significantly low in comparison to the total number of studies on chatter detection. Various methods have been applied in the literature including support vector machine (SVM) and NN, while HMM was reported mostly in turning.

First, it is required to label the input data (i.e. the chatter features), indicating the corresponding classes. Most of the studies utilized a binary classification; however, some scholars have outlined the need to identify the transition zone as a separate class, also named as slight chatter, instead of considering it as chatter state [112, 121, 134, 147, 152, 156, 157, 159, 181, 182, 201]. Figures 3, 9, 10, 12 and 13, in previous sections, show the diverse stability states in different signal types and domains. Figure 15 illustrates the change of a force signal throughout different states and its effect on the surface quality [157]. There are 3 main approaches for data labelling, i.e. using the traditional SLD, as shown in Fig. 6; analysing the frequency spectrum of the signals, as shown in Fig. 10; or by inspection of the machined surface, as illustrated in Fig. 15. However, each approach has its challenges which could cause a mislabelling of the data, especially to determine the transition between states, as discussed in Section 2.3. Wan et al. [153] evaluated the impact of wrong labelling on detection accuracy and proposed to analyse unlabelled data with an ANN, combined with Adaboost-SVM to reduce the impact of mislabelling. Dun et al. [185] proposed an unsupervised DL to reduce mislabelling, and Liu et al. [355] investigated a semi-supervised method requiring a small quantity of labelled data. The k-means clustering has been utilized with different signals and features in [75, 197, 243] to avoid the complex process of the training model and mislabelling. Tangjitsitcharoen et al. [75] used the ratio of the average variances of dynamic cutting forces to the absolute variances of themselves, while Wang et al. [197] employed a structure-function method (SFM).

Fig. 15
figure 15

States of the force signal and the corresponding workpiece surface quality captured by a portable microscope with a magnification of × 100 [157]

The most employed classification method was SVM, as it gives high accuracy in the classification of machine conditions. For instance, SVM was utilized by Chen et al. with EEMD [178] and image processing [165] and shows a similar performance to k-nearest neighbours (KNN) [180]. Wang et al. [151] compared different classification models with single and multiple features. Their results, as in Fig. 16, showed that the accuracy of each model varies according to the utilized feature, and SVM had the highest susceptibility compared to the other methods, while the probabilistic neural network (PNN) had a slightly lower accuracy. The performance of NNs is similar to SVM, and it has some advantages such as lower training time [145, 180]. Wang et al. [166] showed that an SVM with PSO outperforms diverse models, including NNs. It is reported by Shi et al. and Cao et al. [146, 195] that reinforced KNN has higher accuracy than SVM and decision trees; however, it uses the training observations as part of the model itself, requiring a sizable memory space. The performance of classification methods is quite diverse in different literature, possibly due to the differences in each study regarding the cutting conditions, the type of signals, the employed features, etc. The development of other classification approaches, such as novel neural network architectures and deep learning, is needed to overcome the aforementioned limitations.

Fig. 16 
figure 16

The detection accuracy of diverse classification models under single feature condition [151]

5.2 Deep learning

Deep learning (DL) approaches have been utilized in monitoring different machining processes, including TCM. DL techniques can automatically identify, select and extract features, without needing an expert intervention and have shown a better performance than SVM in classification tasks. Moreover, DL allows the classification of multiple conditions or the unsupervised use of multi-channel signals in the condition monitoring of machinery. However, it should be mentioned that DL is a relatively new approach and the application of DL for chatter detection has only been conducted recently by limited researchers.

Rifai et al. [135] developed a system for surface roughness estimation and chatter identification based on machine vision. They integrated deep learning with convolutional neural networks into the system to bypass the feature extraction method. Fu et al. [132] trained a deep belief network (DBN) using vibration signals. Their method was compared to WT for feature extraction and to SVM for chatter classification. Tran et al. [159] utilized a CNN and CWT of force signals, as shown in Fig. 17, which resulted in higher accuracy than conventional ML methods. Instead, Zhu et al. [192] employed images of the machined parts with CNN, along with optimization techniques, but in contrast, Jeong et al. [236] proposed a one-dimensional CNN. Gao et al. [196] utilized a deep residual CNN (DR-CNN) and CWT of a vibration signal. Later, they used SVM-VMD for chatter detection. Unver and Sener [89] evaluated the capabilities of a CNN using model-based simulation for training and experimental signals for testing. However, Zheng et al. [260] show that CNN requires a sizable training dataset for chatter detection regardless of the employed type of signal.

Fig. 17
figure 17

CNN architecture for chatter detection [159]

Long short-term memory (LSTM) networks are commonly utilized to process sequential data. Shi et al. [136] utilized it with the sigmoid function to process the STFT of multiple accelerations as images and Sharmila and Balasubramanian [100] compared the performance of 3 different DL architectures. Although DL techniques report accurate performance for chatter detection, it faces the same challenges that DL approaches shown in other applications, including dependency on the scale and quality of the dataset, use of domain knowledge to reduce the size of the DL and to enhance performance, management of imbalanced class, transfer learning and visualization of learned representation, as detailed by Zhao et al. [65].

5.3 Discussion of classification models

A summary of the commonly employed classification models in chatter detection is listed in Table 3. It shows that less than half of the literature included the classification model as part of the methodology. As to the classification accuracy of various chatter detection models, a detailed comparison has been reported by Tran et al. [154], as shown in Table 4. The latest systems have achieved accuracies of over 93% while using multiple features. Nevertheless, there are diverse metrics for the performance in binary classification besides accuracy, which is only recommended when using balanced data. Some of these metrics are recall or sensitivity, precision, specificity or selectivity and F1 score. They evaluate the incorrect predictions and the rate of accuracy for each class. Only the works [146, 152, 165, 178, 180, 192] have considered additional performance metrics, besides the commonly employed accuracy. In addition, other common subjects in classification using ML, such as underfitting and overfitting, class imbalance, misclassification cost, transfer learning and data mislabelling, have not been extensively explored in chatter detection.

Table 3 Methods employed as classification model in chatter detection in milling (2007–2022)
Table 4 Classification accuracy of diverse chatter detection models using feature selection. Adapted and expanded from [154]

One of the factors which reduce the accuracy is the influence of random values for initial weights, hyperparameters and thresholds at the training phase, mainly for NNs [269]. The definition of threshold values also varies with the dynamic conditions of the machine reducing the classification accuracy [375]. Some optimization methods have been used for the selection of various parameters in the signal processing and training of AI as a classification model, as reported in [96, 121, 133, 136, 145, 166, 176, 181, 188, 192, 234, 317, 343], identifying the recurrent application of PSO and GA. Tran et al. [154] proposed to use of a similarity classifier, as the performance of conventional approaches is greatly dependent on the parameters adjustment. Zhang et al. [112] advised that the selection of decomposition parameters highly affects the chatter sensitivity. It is worth mentioning that Sick [269] reviewed TCM with ANN in turning and described overall considerations and criteria for the optimal design of any intelligent TCM system. Most of the author’s recommendations might be also valid for chatter detection systems using NNs or other alternatives. Other considerations to be studied in classification models may include class imbalance, misclassification cost, mislabelling, multi-class and multi-label cases and diverse performance metrics. While the use of DL has shown promising results, to achieve a real-world application, it is also recommended to carry out research in data curation, model interpretation and physics-based alternatives, as recently indicated by Zhang et al. and Arias et al. [376, 377].

6 Challenges and future perspectives

This literature review reveals that for chatter detection in milling, research efforts have been focused on the development of new techniques for signal processing and the definition of new features and chatter indicators. In comparison with TCM literature, studies on chatter detection using multi-sensor and information fusion approaches are relatively low. A practical and reliable online chatter detection system is required for advanced manufacturing processes, but it has not been reported yet within a wide industrial application. It is noted that Altintas and Aslan [378] introduced a process control and monitoring system, which was implemented in a production scenario detecting diverse tool conditions. It was claimed that the system could detect and avoid chatter in real time, but more details are expected. The following section summarizes some major challenges in the development of accurate and robust chatter detection systems and future perspectives for further research efforts to address.

6.1 Multi-condition monitoring and identification

There are different types of abnormal conditions in a machining process, such as chatter onset, tool wear, tool chipping or breakage. In a machining process monitoring system, either for tool wear monitoring, tool failure identification or chatter onset detection, the system extracts and evaluates the main attributes in the acquired signal to classify the process condition. Most of the existing monitoring approaches are focused on a single malfunction condition since it is difficult to extract the signal features that can distinguish multiple conditions [379]. Further, the occurrence of one could affect the feature values employed for the other monitoring purpose. Hence, it is meaningful to study the tool wear effect on chatter occurrence and detection. Tool wear has been studied and modelled for frictional chatter, and some researchers reported how this issue affects the dynamics greatly in metal cutting [380,381,382,383], while the coexistence of regenerative and frictional chatter has been studied and recently modelled by Yan et al. [384,385,386] and Weremczuk et al. [387], and the no co-occurrence of mode coupling and regenerative chatter is reported by Celikag et al. [388]. Similarly, Matsumoto et al. [389] and Stavropoulos et al. [390] concluded that chatter reduces TCM accuracy.

It has been reported that tool wear progress changes the tool-workpiece contact and affects the process damping and cutting force coefficients [3]. The chaotic vibration becomes dominant when tool wear width increases, as identified by Moradi et al. [391, 392], while regenerative chatter accelerates tool wear and reduces lifespan, as demonstrated by Ghorbani [393]. Li et al. [394], Gurdal et al. [395] and Wang et al. [396] agree that the stability limit is time-varying due to tool wear progress, as it modifies cutting forces and tool geometry, affecting the dynamics and process damping of the cutting system. Nevertheless, the conclusions contradicted each other on some points. Wang et al. [379] have proposed a multi-condition identification approach for both worn tool and chatter, although they used dissimilar spindle speeds for each case which may have affected the AI accuracy and the co-occurrence of both events under spindle speeds different than those employed in their experiments. Similarly, Srinivasan et al. [216], Tansel et al. [204] and Lamraoui et al. [95] reported that it is possible by using a single monitoring system to acquire signals from indirect sensors for simultaneous chatter detection and tool wear diagnosis, but the correlation between the two phenomena has not been widely revealed. In addition, diverse studies [264, 397,398,399,400,401,402] provided experimental results of the tool wear effect on the signal waveform and the energy distribution of the frequency bands, during the machining of different materials, as shown in Fig. 18. Recent models also indicated that chatter frequencies are time-varying, particularly in thin-walled milling as studied by Wang et al. [403]. Consequently, it is suggested to study the tool wear effect on chatter occurrence and the subsequent chatter detection more comprehensively from a data-driven view. The development of monitoring systems able to assess and identify diverse malfunction conditions in machining operations simultaneously will be of interest, along with investigations to understand the correlations among them.

Fig. 18
figure 18

The variation of tool wear and vibration waveforms [401]

6.2 Robustness enhancement

A robust monitoring system must be able to work with varying machining conditions. The literature review has shown that few studies in chatter detection have been applied to different tool geometries, workpieces and cutting parameters, as discussed in Section 2.3. Real-time chatter detection should be able to identify chatter onset for dissimilar machining conditions. This could be enhanced by using novel AI methods. Chatter threshold definition and model training require a comprehensive set of data under wide cutting conditions, and one method may not be suitable for a condition different from the range of training data. Yesilli et al. [373, 404] evidenced that the monitoring models have lower accuracy when tested with dissimilar cutting parameters. To address this, one potential is transfer learning, which studies the knowledge transfer capabilities of an ML, in other words, the response to test data from a different domain of the training condition. It has been researched in diverse engineering applications on fault diagnosis and anomaly detection and recently reported in chatter detection in milling by Unver and Sener [89, 405]. This approach may attract more research efforts in the near future, due to the advantages and contributions to enable chatter detection under dissimilar conditions. In addition, the establishment of open-source datasets is also needed to facilitate and accelerate the development of chatter detection systems, like the milling datasets of NASA-Ames and the 2010 PHM challenge which are frequently employed for assessment of TCM and tool wear prognostics approaches. The lack of access to real-word datasets makes it difficult for fair benchmarking of novel methods to the published ones, along with the high cost and effort for data acquisition as indicated by Theissler et al. [406].

The advantages of the multi-sensor approach and information fusion have been reported in diverse literature [10, 269, 407, 408]. These methods can be employed to improve the reliability of chatter detection systems. The fusion of different chatter indicators [121], the feature reduction and selection [145] or the use of multiple classification methods with data fusion [409] could be useful for the determination of transient states, early chatter detection or the identification of the instability type.

6.3 Hybrid approaches and parameters optimisation

Model-based chatter prediction can facilitate online chatter detection significantly. However, quite often the dynamic conditions from the computation of the stability lobe diagram (SLD) have some inaccuracies, which make the predicted stability parameters not so useful for real-time chatter detection. One example is the common ignoration of the possible occurrence of stability islands when labelling the data in most literature. Kai et al. [134, 157] identified a few cutting conditions where the experimental results did not fully match with the stability states predicted by the conventional SLD. Hence, the use of hybrid chatter detection approaches and novel AI algorithms may lead to promising solutions. For instance, Hanachi et al. [410] fused data-driven and physics-based results to enhance tool wear prognosis and RUL estimation in milling, while Rahimi et al. [149] and Liu et al. [140] have proposed hybrid chatter detection systems which combine data-driven and physics-based methods. Oleaga et al. [411] employed three different machine learning techniques to predict chatter frequency and critical depth in milling, and Postel et al. [412] used DL with ensemble transfer learning to improve the prediction of the SLD using experimental data.

The optimal and automatic selection of parameters for signal processing is a current research trend in chatter detection, especially for VMD and WPD. For each database, a unique setup of constraints might be required throughout the monitoring system, including the decomposition parameters and the classification model hyperparameters, with various ML. Literature does not have a consensus about the best features representing chatter occurrence, while the features with the highest response for a tool condition are not necessarily the optimal features for accurate classification [370, 413]. The use of diverse optimization methods through chatter detection processes is an emerging trend, as discussed in Section 5.3.

6.4 Industry 4.0 and Digital Twins

In recent years, Industry 4.0 has been the latest trend in manufacturing towards automation and data exchange, thanks to the rapid advancement of cyber-physical systems (CPS), internet of things (IoT), cloud computing and artificial intelligence. The success of a smart factory relies heavily on the functionality and reliability of the manufacturing equipment, such as machine tools [8, 46, 47]. Thus, the demand for machining process monitoring is crucial. Li et al. [6] introduced a system framework based on Industry 4.0 concepts for fault diagnosis and prognosis in machine centres based on data mining approaches. However, this investigation is still at an early conceptual stage, and there exist quite some challenges as pointed out by the authors. In the context of Industry 4.0, instrumented machine tools are widely used, along with the proliferation of sensory technology and the emergence of the IoT paradigm, which makes large volumes of manufacturing data available, leading to the realization of Big Data at the shop floor level. The recently developed DL algorithms present broad opportunities as well as many challenges in handling the Big Data to extract useful information from the machining process status such as the stability condition and the chatter onset [68].

On the other hand, the development of intelligent machine tools with the ability to monitor and control multiple process modules has been considered a key to building the next generation of smart factories of Industry 4.0 [7]. Over the last 10 years, there have been investigations on the development of intelligent spindles with the capabilities of sensing, decision-making and control, aiming to achieve optimum machining process and reliable operations. A major expected function is the monitoring and control of chatter [7]. In the concept of such a spindle, displacement sensors and accelerometers are integrated into the spindle structure to measure the vibration signals. The measured signals are analysed by advanced signal processing techniques in the decision-making module for chatter detection/prediction. When chatter occurs, the control module is activated to achieve active chatter control of the intelligent spindle [7]. There have been reports on the development of smart cutting tools [414], including force-based smart cutting tools, cutting temperature-based internally cooled cutting tool, fast tool servo (FTS) and smart collets for ultraprecision and micro-manufacturing purposes and instrumented smart spindle as well [415]. These tools are able to provide real-time in-process data such as the cutting forces and thus are more proactive to cope with machining dynamics, process variations and complexity. Hence, in the next generation smart machining systems equipped with these smart tools to provide useful information regarding the machining dynamics, the capability of chatter detection and control will be further enhanced with the latest development of novel sensors and their integration at the factory level. However, there is still a long way to go with extensive research efforts for such an intelligent spindle and smart tools to be employed practically in the industry.

Digital Twin is a core component of CPS that can sense and detect accurately the behaviour and real-time state of the manufacturing system [416]. Ladj et al. [416] proposed a knowledge-based Digital Shadow as the core component of a future Digital Twin, with tool failure detection demonstrated as a case. Ward et al. [417] presented a machining Digital Twin capable of real-time adaptive control. The digital machining platform consists of a number of modules, including an online machining-induced residual stress (MIRS) control system, chatter detection and control system and adaptive feed rate control system. The method used real-time model-based simulations to predict stability conditions, and the closed loop feedback updated the spindle speed and feed rate in real time to prevent chatter occurrence without the use of additional sensors. Nevertheless, this approach relies on the accuracy of the stability model for the simulation, and the computational demands for practical machining operations are heavy. A further note is that the available literature on the chatter detection using Digital Twins through simulated signals is still very limited so far. Considering the importance of the Digital Twins in the Industry 4.0 paradigm, this should be a topic worth more attention.

7 Conclusions

This paper presents a comprehensive review of recent literature on chatter detection methodologies for milling. It has critically analysed the diverse reported techniques, including data acquisition and the employed sensor types, stability models for experimental design, signal processing methods, feature extraction and condition classification models. It shows that recent research efforts have been concentrated on the development of new techniques for signal processing and the definition of new chatter indicators for single-sensor approaches. While vibration is the most utilized signal type, detection accuracy is enhanced when using multiple signals and more than one feature; therefore, the study of multi-sensor and information fusion is one of the current topics of research.

Regarding signal processing, VMD and WPD are among the most popular methods, while significant research efforts are on the automatic selection of their optimal decomposition parameters. Another trend is the use of novel signal processing techniques to overcome the limitations of VMD performance, where singularity analysis, SST, EWT, cyclostationarity analysis and angular sampling are the emerging alternatives. Deep learning has been recently utilized along with image representation, while supervised machine learning techniques are employed along with diverse signal processing methods.

The literature analysis suggests that the selection of features and thresholds seems to be dependent on the experimental conditions and the acquired data, due to the lack of open datasets. At present, there is no agreement about the best feature to detect chatter, which has promoted the continuous study of new alternatives; however, it is reported the benefit of using multiple features and information fusion, instead of employing a unique chatter indicator. The performance of a chatter detection system depends on the employed signal type, the signal processing technique, the feature extraction and the selection and training of the classification model. To this point, it cannot be claimed which is the optimal configuration, as it will depend on the application itself, the selection of chatter indicators, the experimental conditions and the size and segmentation of the data.

Research challenges and future perspectives have been discussed and can be summarized as simultaneous monitoring of different tools and machining conditions using multiple-sensor and information fusion; development of a robust monitoring system that can work under varying machining conditions with accurate and reliable chatter detection; integrating model-based chatter prediction into the online chatter detection system; employment of novel AI algorithms for optimal and automatic selection of parameters for signal processing and classification; utilization of Big Data at the shop floor and smart manufacturing systems under the Industry 4.0 framework; and development of machining Digital Twin capable of real-time chatter detection and prevention or adaptive control. In addition, research of novel signal processing methods and chatter features will be still a topic of interest.