Identification of abnormal tribological regimes using a microphone and semi-supervised machine-learning algorithm

Functional surfaces in relative contact and motion are prone to wear and tear, resulting in loss of efficiency and performance of the workpieces/machines. Wear occurs in the form of adhesion, abrasion, scuffing, galling, and scoring between contacts. However, the rate of the wear phenomenon depends primarily on the physical properties and the surrounding environment. Monitoring the integrity of surfaces by offline inspections leads to significant wasted machine time. A potential alternate option to offline inspection currently practiced in industries is the analysis of sensors signatures capable of capturing the wear state and correlating it with the wear phenomenon, followed by in situ classification using a state-of-the-art machine learning (ML) algorithm. Though this technique is better than offline inspection, it possesses inherent disadvantages for training the ML models. Ideally, supervised training of ML models requires the datasets considered for the classification to be of equal weightage to avoid biasing. The collection of such a dataset is very cumbersome and expensive in practice, as in real industrial applications, the malfunction period is minimal compared to normal operation. Furthermore, classification models would not classify new wear phenomena from the normal regime if they are unfamiliar. As a promising alternative, in this work, we propose a methodology able to differentiate the abnormal regimes, i.e., wear phenomenon regimes, from the normal regime. This is carried out by familiarizing the ML algorithms only with the distribution of the acoustic emission (AE) signals captured using a microphone related to the normal regime. As a result, the ML algorithms would be able to detect whether some overlaps exist with the learnt distributions when a new, unseen signal arrives. To achieve this goal, a generative convolutional neural network (CNN) architecture based on variational auto encoder (VAE) is built and trained. During the validation procedure of the proposed CNN architectures, we were capable of identifying acoustics signals corresponding to the normal and abnormal wear regime with an accuracy of 97% and 80%. Hence, our approach shows very promising results for in situ and real-time condition monitoring or even wear prediction in tribological applications.


Introduction
Wearing or deterioration of surface integrity between two mating surfaces is common as they experience interfacial friction [1]. Interfacial friction over a period of time results in progressive wear wherein local asperities get deformed and the chemical composition of the mating surfaces changes [1]. The nature of deformation at the local asperities is primarily governed by the physical and geometrical properties of the sliding surfaces [2]. The deformation and/or deterioration results in loss of material leading to deviation from original design dimensions, which may cause vibrations of the moving parts. Alteration of the chemical composition, such as forming oxide layers in functional surfaces, influences the stress state on the asperities [3]. In terms of maintenance, diagnosing in situ and real-time changes in the mating surfaces' asperities and composition will significantly increase the machinery's life span. Apart from increasing the life span, it also optimizes energy consumption loss during the process [4].
The term "scuffing" describes an adhesive wear mechanism, which leads to severe wear damage and catastrophic failure of a sliding surface, e.g., in journal bearings [5]. Despite extensive research over the past decades [3,6,7], scuffing's underlying mechanism remains poorly understood due to its complexity and sudden occurrence. Blok [8], as early as 1937, has proposed that scuffing is initiated by reaching a critical temperature. Other theories include the breakdown of elastohydrodynamic lubrication [9], asperity deformation [3], and the local reduction of iron oxide at increased temperatures [7]. Mechanisms such as abrasive wear happen between surfaces that differ in hardness and are in relative motion. The abrasive debris or deformation of the surface is mainly formed due to abrasive wear [10]. Debris particles between surfaces and deformation of surfaces tend to restrict the motion of the sliding surface, deteriorating the performance and simultaneously producing heat [11]. Scoring is a severe form of abrasive wear characterized by grooves' formation in the direction of the sliding motion [12]. Adhesive wear is another form of tribological malfunction that results in diffusion of the materials involved in motion to the opposite sides. It is usually the consequence of lubricant starvation, resulting in the formation of cold welds [13].
In the present work, the tribological behaviour of a linearly oscillating "self-lubricating" bushing was studied. In contrast to a rotary bearing operation, the sliding in axial direction does not generate enough hydrodynamic pressure between bushing and shaft. A stable lubricated operation relies on hydrostatic pressure build-up, either through an external pressurized lubricant supply or, as in the present application (see Section 2.1), through an even distribution of the lubricant supplied by the lubricant reservoirs provided by the "self-lubricating" bushing. Lubricant film thickness is not very well defined in this setup, as the release of the lubricant from the embedded lubricant reservoirs is not controlled externally. Mixed friction/lubrication conditions, i.e., partial direct contact between the bushing and the shaft, may occur temporarily. A local breakdown of the lubricant film may lead to a solid contact between the bushing and the shaft. This usually causes higher levels of friction forces together with higher levels of frictional heat generation, generation of wear particles [14], vibration and noise emission [15], and consequentially scuffing.
An effective means to ensure the efficiency of machines with moving components experiencing interfacial friction is to monitor the states of the mating surfaces. The complexity involved in the wear mechanism has usually restrained the measurement of wear status to offline techniques. Offline techniques either use visual or sophisticated devices such as a confocal microscope or scanning electron microscope (SEM) to measure the wear [16,17]. This kind of time-discrete offline inspection implicates the component to be disassembled from the machine, followed by inspection and reassembly. Such offline inspection results in loss of useful machining time, requires a skilled workforce to support this activity, and adds risks of damaging the parts during the unmounting and remounting operations. Owing to the limitation in accessing the mating surfaces directly in real-time, a recent trend in monitoring the wear is to place sensors at a proximity of the process zone that are capable of capturing the surfaces state [18,19]. The sensor data can then be interpreted and correlated to different wear mechanisms [20].
Over the years, in monitoring tribological phenomena, acoustic emission (AE) systems have been increasingly used in industry due to their ability to detect minor material changes. To do so, they are able to pick up elastic waves that originate as a result of plastic deformation and other events related to wear [21,22]. The two major advantages of using AE is that it is a non-destructive monitoring technique, and it involves minimum alteration of the machine for installation. Acoustic signals between sliding surfaces have been reported successfully on the characterization of www.Springer.com/journal/40544 | Friction crack growth [21,23,24], slip between surfaces [25], phase transformation [26,27], surface damage [28], accumulation of wear debris [29,30], etc. Statistical AE waveform features such as fast Fourier transforms (FFT) [23,[31][32][33], root mean square (RMS) [34,35], amplitude [34], count [23,28], etc. have been able to correlate with specific tribological conditions. Conventional frictional force and coefficient of friction measurements have also been proven to quantify wear [36,37]. Studies on temperature measurements using imaging systems between the sliding contacts have also demonstrated the ability to quantify tribological conditions [38]. Additionally, techniques based on electrical resistance and vibrations are used in diagnosing abnormal wear states [39][40][41][42]. Finally, sliding interaction between asperities results in instantaneous loading and unloading, existing for a few microseconds, which can be picked up only by AE sensors. This is due to the combination of high sensitivity and temporal resolution, making the AE sensing technique preferred as compared to other techniques and sensors [43].
The use of machine learning (ML) algorithms along with sensor data for decision making in tribological systems, allows bridging the gap of time-discrete offline inspection and continuous wear progression monitoring. Baccar et al. demonstrated that features from continuous wavelet transform (CWT) in AE signals could be classified into five wear process stages (run-in, steady-state, surface changes, permanent wear, and wear-out) with the help of fuzzy model [44]. Shevchik et al. have developed an in situ and real-time monitoring system successfully to distinguish the steady-state, pre-scuffing, and scuffing regimes for a simulated tribo-condition involving stainless steel and grey cast iron. They analyze AE data with support vector machines (SVM) based on radial basis kernel and random forest (RF) [45,46]. On the same tribosystem, they also proposed a monitoring system based on diffusion maps to predict the sliding surfaces state, including scuffing regimes [47]. Histogram features extracted from digital images have been used along with Naïve Bayes and decision trees to predict scratches and defects on sheet metal surface [48]. Progressive wear in bearings has been extensively researched using ML models successfully [49][50][51]. Sadegh et al. have used artificial neural networks (ANNs) and genetic algorithm (GA) for classification of the lubrication conditions in journal bearings successfully [52]. Deep neural network (DNNs) have also been reported to classify vibration data of the normal and five abnormal conditions in journal bearings [53]. Based on the literature results, we can conclude that when the sensors carry the representative signal information of the tribological conditions, any state-of-the-art ML algorithm should identify the significant patterns for modelling and monitoring the sliding state conditions [54].
An autoencoder is a neural network architecture consisting of two parts, namely encoder and decoder. The encoder part primarily tries to compress the information passed as an input to a sparse representation that preserves the maximum information. The decoder part tries to reconstruct the original data from the sparse representation. An autoencoder architecture's primary objectives are to learn the input distribution, structure embedded in the data, and effective reconstruction of the input [55]. The network's key attribute is the sparse representation layer's design, the so-called bottleneck or latent space layer. During autoencoders training, the loss is computed as a difference between the input (X) and the reconstructed input (X'). The computed loss is backpropagated to update the weights of the neurons. Although they learn the input space distribution, the bottleneck layer learns only a single value, which makes the network's learning capability limited to the data they are exposed to. An autoencoder with a capability to learn the input distribution in a probabilistic manner rather than a single value is called a variational auto encoder (VAE). Unlike standard autoencoder, where a single value is outputted for each dimension, the encoder part of a VAE gives two vectors describing the mean and variance of the latent state distributions, as shown in Fig. 1. The sparse representation or the bottleneck layer is built by randomly sampling the distribution. Such sampling helps avoid overfitting chances, and helps in interpolating and finally regularizing the latent space [56]. The autoencoders and VAEs find application in anomaly detection, denoising, information retrieval, etc. [57][58][59][60].
With little knowledge on the cause of the wear mechanism [61], simulation of the wear mechanisms and obtaining a corresponding sensor signature in a laboratory or industry setup for training ML models is a cumbersome and expensive task. As an alternative, such a gap can be compensated using semi-supervised ML learning, where the model is trained with data that are simulatable and that are of interest, i.e., the normal regime. In this work, instead of training the ML algorithm to classify all wear mechanisms such as scuffing, abrasive wear, etc., that occur between sliding contacts, we propose a VAE architecture to distinguish the normal regime from all other wear regimes, i.e., binary classification. The signals collected from the airborne acoustic sensor is used for training the VAE architecture. Subsequently, signals from the real environment are used to identify the status of wear, i.e., normal or abnormal regimes. The paper is organized into five sections. Section 1 gives a brief outline of the wear phenomena, sensor technology, and machine learning algorithms used for in situ monitoring of abnormal tribological contacts. Section 2 introduces the experimental setup and the proposed methodology. Analysis of the acoustic signal acquired during the experiment is reviewed in Section 3. Section 4 presents the results of the abnormal regime prediction using the trained VAE model. Finally, Section 5 summarizes the findings of our contribution.

Tribological setup and experimental conditions
Experiments were carried out on a custom-built transversally oscillating tribometer setup. The tribological pairing consisted of a "self-lubricating" bronze bushing (beforehand infused with lubricant) and a steel shaft, see Fig. 2. The bronze bushing has an inner diameter of 24 mm, wall thickness of 8 mm, and a length of 30 mm. The shaft of diameter 24 mm made of hardened and polished Cr-steel was mounted on a movable table, capable of performing oscillations with variable frequencies and stroke lengths by pneumatic cylinders. For the current experiments, the tribometer was operated at a nominal oscillation frequency of 1 Hz and a stroke length of 50 mm. The real oscillation frequency varied depending on the axial load due to the pneumatic drive and the wear phenomenon. For instance, during scuffing in the bearing, there was a reduction in the operating frequency. A normal load of 6 kN was applied with a lever system's help, gradually increasing over an initial duration of 1.5 hours ("run-in" period). Out of the tests performed, one example test featuring the required failure states was selected for this investigation.
The tribometer was equipped with a precision Bruel & Kjaer 4189-A-021 microphone to record airborne sound emissions (see Fig. 2). As highly resolved temporal data reveal hidden details in tribological contacts [32,62], a state-of-the-art high speed data acquisition board cDAQ 9174 from National Instruments was employed, and AE data were recorded. Axial and normal force (mounted on the loading lever) data www.Springer.com/journal/40544 | Friction were recorded using load cells (U9C) and a 24-bit bridge module with a maximum data acquisition rate of 50 kS/s for the purpose of computing the groundtruths. The acquisition frequency was set to 5 kHz. Furthermore, the sample temperature was measured using a thermocouple mounted inside the rim of the bearing. An overview of the sensor parameters is shown in Table 1.
Experiments were carried out until a stop criterion was reached, i.e., when either the axial force or the sample temperature exceeded a pre-defined threshold value, namely 3.5 kN for the axial force or 150 °C bushing temperature. Typical running times of the experiments lie between 10 and 12 hours or around 40,000 cycles.

Data evaluation methodology
The proposed methodology for predicting the abnormal regime with the VAE model consists of three steps. The first step involves the construction of the network and the decision of the input space's size. The second step consists of splitting up stochastically the data from the normal regime to 70%, 20%, and 10%, followed by training of the network. To do so, 70% of the normal regime data is fed to the VAE model for training. This ensures that, during the training of the VAE model, the reconstruction loss decreases with epochs. The decrease in the reconstruction loss signifies that the network has started to learn the distribution and embedded patterns inside the normal regime's acoustic signals. After model training, the 20% data from the normal regime that was segregated is fed into the model, and the reconstruction loss is computed. A threshold is calculated from the computed reconstruction loss distribution, which is a sum of mean and three standard deviations. The third and final step is passing the known signal corresponding to scuffing, wear, and 10% data from the normal regime into the model and comparing the reconstruction loss. The signals with a lower reconstruction loss than the computed threshold are labelled as the normal regime. In contrast, the ones with higher reconstruction loss are labelled as the abnormal regime. The intuition behind our approach is that the model learns the distribution of the data it has been trained and familiar with and would fail to reconstruct any other unfamiliar distributions. The schematics of the proposed methodology is shown in Fig. 3.   3 Experimental results Figure 4 shows the evolution of the axial force signal during the experiment trial. During the run-in, i.e., the first ≈ 5,000 cycles, the normal load was increased slowly until the nominal load was reached. The maxima of each cycle were taken and plotted against the cycle number. The graph shows occasional spikes of the axial force even during normal operation, which may be ascribed to short-time metal-to-metal contact due to insufficient lubrication as well as to the detachment of wear debris from the edges of the lubricant reservoirs (see Fig. 5) and subsequent transport through the contact zone. The experiment reached a phase of abnormal behaviour after about 23,000 cycles, which is characterized by pronounced peaks and elevated axial force values. The experiment ended, exceeding the pre-set threshold value of 3,500 N. However, abnormal behaviour was detected in the force signal about 2,500 cycles before the experiment stopped.   ). These maxima are distributed equally over time during normal operation, and a cycle length of about 1 second can be derived from the graph, which corresponds to the set operation frequency. In contrast, the microphone signal taken towards the end of the experiment (Fig. 6(b)) shows an asymmetric distribution of the sound emitted at the turning points (narrow peaks up to 1 V), and the cycle length is almost double its original value, corresponding to a frequency of ≈ 0.5 Hz. It has to be noted that additional increased sound emissions were also recorded between the turning points (peaking at ~0.5 V), i.e., during the movement of the bushing. Ongoing wear phenomena must have emitted these during the pass over the lubricant reservoirs. Figure 7 shows a comparison of the energy density between the normal and abnormal operation signals in five different bands, namely 0-2 kHz, 2-4 kHz, 4-6 kHz, 6-8 kHz, and 8-10 kHz. The energy density was computed for a window size of 5,000 data points. The comparison plot in Fig. 7 reveals that the signals corresponding to two regimes have distinct distributions motivating the use of autoencoder architecture. Our approach is to familiarize a generative network, such as VAE, to understand the distribution corresponding only to the normal regime. Then, the trained VAE model would evaluate whether some overlaps exist with the learnt distributions when a new, unseen signal arrives. Based on the overlapping score, the AE signal's characteristic can be detected, allowing us to differentiate the normal to the abnormal regime.
The software interface used to acquire and store the data from the microphone and the load cell was a NI LabView. The data from the acoustic emission and force sensors ground-truths over each cycle were synchronized in offline mode. Prior to using the normal regime's raw acoustic emission as an input to the convolutional neural network (CNN) model, downsampling and preprocessing were performed. The downsampling of the microphone data's was performed based on understanding the frequency components present inside the signals. As the signal corresponding to the normal and abnormal regime carried frequency components up to 10 kHz, the raw microphone data, originally sampled at 102.4 kHz, were downsampled to 20 kHz satisfying the Nyquist Shannon sampling theorem [63]. A Butterworth low pass filter of 20 kHz was also applied on the raw signal as the sensor's operating range was well within 20 kHz. The preprocessed microphone data from the normal regime was split into sliding windows of 5,000 data points constituting a data set of 4,000 rows.

Failure prediction
A 10 layers CNN architecture was selected to build the VAE model, with 5 layers corresponding to the encoder (E) and the remaining 5 layers for the decoder (D), as illustrated in Fig. 8. The input layer (or the first layer) of the VAE model accepts the preprocessed acoustic time-series signal with a size of 5,000 data points converted into a tensor format of size 1 × 1 × 5,000 × batch size. The batch size represents the number of samples that will be propagated into the network at a time. The sparse representation (or the bottleneck layer) was designed with a size of 1 × 90 × 27. Inbuilt functions from the PyTorch library [64] were chosen to do the 1-dimensional (1D) convolution, batch normalization, and activation. The tanh activation function was employed to introduce nonlinearity in the model training. The decoder part (D) of the VAE model was symmetrically inverse of the encoder. However, due to the upscaling from the bottleneck layer to the final output layer, 1D transpose convolution   | https://mc03.manuscriptcentral.com/friction from the PyTorch library was preferred instead of a 1D convolution. As shown in Fig. 8, filters of four different sizes, such as (3,3), (5,5), (7,7), and (9,9) were used to perform the upsampling and downsampling.
The VAE model was trained on a hardware-accelerated graphical processing unit (GPU) environment, namely NVIDIA ® Titan. The training process comprises three stages. The input signal (1×1×5000×batch size) was forward passed into the model and reconstructed in the first stage. A batch size of 100 was used for the training. The second stage involved finding the differences between the input and generated output with a loss function based on the mean squared error (MSE) coupled with the Kullback-Leibler (KL) divergence loss. The third and final stage of the training process involved back-propagating the loss to alter the network's weights to reduce its magnitude. The parameters used for the model training are listed in Table 2. The VAE was trained using Adam optimizer with a dropout rate of 0.5, and a learning rate of 0.001 for 250 epochs.
The acoustic signals used as input for training the VAE model are shown in Fig. 9. In this figure, the signals enclosed in the green boundaries correspond to the normal regime (70%), which will be used to train the VAE model. The signals enclosed in the red boundaries correspond to wear and scuffing regimes, which will be used to test the trained VAE model on its prediction accuracy. A total of 3,500 signals of length 5,000 data points were used as input for training the algorithm. Though the AE signals were captured at ~100 kHz initially, as discussed in the end of Section 3, they were downsampled to 20 kHz as they carried frequency components up to 10 kHz, as shown in Fig. 7. Figure 10 shows the training loss curves for the VAE model using the time-series signals of length 5,000 data points corresponding to the normal regime. As visualized in Fig. 10, the loss values reduce with iterations (epochs), confirming that the VAE model has learned the AE signals' distributions corresponding to the normal regime. The training lasted for 250 epochs and loss values saturated after 150 epochs.
The reconstruction loss distribution is computed on the remaining 30% of the AE signals corresponding to the normal regime (stable interfacial contact). The  www.Springer.com/journal/40544 | Friction reconstruction loss distribution for the normal regime data set is depicted in Fig. 11(a). The overall reconstruction loss distribution lies with the limits of 0 to 0.008. The threshold value of 0.0065 to detect abnormal regime is calculated from this distribution based on: Threshold = mean (μ) + 3 standard deviation (σ) (1) Any reconstruction loss corresponding to an AE signal more than the threshold calculated (0.0065) will be flagged as an abnormal regime. The trained VAE model's predictability is assessed by comparing the reconstruction loss with the threshold value from the known abnormal regime signals. Figure 11(b) shows the distribution of reconstruction loss for the abnormal regime signal (known ground-truth based on the force sensor data discussed in Fig. 4). From Fig. 11(b), it is evident that most of the distribution is concentrated above the computed threshold value of 0.0065, and the VAE model can identify these signals without being a part of the training set. Figure 12 shows the reconstruction of the signals corresponding to the normal and abnormal regimes. In the case of the normal regime as depicted in Fig. 12(a), with a distribution is familiar for the VAE model, reconstruction of the signal envelopes is identical. However, in case of abnormal signals in Fig. 12(b) that the VAE model is unfamiliar with, the reconstruction loss is a bit poorer yielding high difference between the original and reconstructed signal. A total of 500 signals corresponding to the abnormal regime were tested, and the VAE model was able to detect 403 signals with the accuracy of around 80%. Simultaneously 500 signals corresponding to the normal regimes were also tested out of which ~97% of signals were identified correctly. Apart from using the tanh activation function, the model was also trained using activation functions such as sigmoid, rectified linear unit (ReLU), and scaled exponential linear unit (SELU). However, based on learning the normal regime signals and reconstructed distributions, it was found that the tanh activation has the lowest reconstruction error. The tanh activation has a low reconstruction threshold value of 0.0065 and so performs better than sigmoid, ReLU and SELU, having reconstruction  Taking into account the proposed VAE architecture with 10 layers, the total number of trainable parameters is 0.131 million. The important step in developing optimized VAE architecture is to verify it with other configurations based on prediction accuracy. The choice of VAE architecture with 10 layers was selected after comparing the prediction accuracy of models with different layer configuration (6, 8, 10, and 12 layers) and keeping the trainable parameters around 0.13 million as well as the sparse representation layer size constant. Even layers were chosen to maintain the symmetricity between the encoder and decoder. The accuracy of the different configuration of VAE architecture is listed in Table 3. From this table, it is seen that the VAE configuration with 10 layers and 12 layers have similar accuracy. Taken into consideration that the 12 layers model requires to train 7,000 more parameters, the 10 layer model was downselected.
Even better accuracy will be achievable if the cycles, sensor, and positional encoder data would be synchronized. Under such circumstances, each cycle can be split into moving windows and used as an input to the VAE model. The failure mechanisms between tribological contacts occur over time. As a result, the chances of it overlapping with the normal regime is significant. Therefore, a soft threshold can be made based on the histories before the present cycle. Any drift in the direction outside the tolerance of the reconstruction threshold can be used for flagging abnormality, which will be part of our future work. Finally, apart from optimizing the experimental plan, the ML model can be optimized in the choice of the bottleneck layer, training parameters, and modification in the VAE network by adding skip connections which are also intended in the future.
The proposed method was tested on experimental data under controlled laboratory conditions. For Singapore, under Rolls-Royce @ NTU corporate lab. Prior to joining Empa, he was a research scientist in A*Star-Agency for Science, Technology and Research, Singapore. He now concentrates on implementing machine learning models for in-process sensing of manufacturing processes for anomaly detection and process automation based on sensor signatures. He has 18+ years of experience in the field of tribology. Although since many years his research interests have been mainly in the field of physics-based modelling and simulation of tribological systems, he is currently concentrating on tribology-related aspects of data science, machine learning, and artificial intelligence.