Deep learning-based fault diagnostic network of high-speed train secondary suspension systems for immunity to track irregularities and wheel wear

Fault detection and isolation of high-speed train suspension systems is of critical importance to guarantee train running safety. Firstly, the existing methods concerning fault detection or isolation of train suspension systems are briefly reviewed and divided into two categories, i.e., model-based and data-driven approaches. The advantages and disadvantages of these two categories of approaches are briefly summarized. Secondly, a 1D convolution network-based fault diagnostic method for high-speed train suspension systems is designed. To improve the robustness of the method, a Gaussian white noise strategy (GWN-strategy) for immunity to track irregularities and an edge sample training strategy (EST-strategy) for immunity to wheel wear are proposed. The whole network is called GWN-EST-1DCNN method. Thirdly, to show the performance of this method, a multibody dynamics simulation model of a high-speed train is built to generate the lateral acceleration of a bogie frame corresponding to different track irregularities, wheel profiles, and secondary suspension faults. The simulated signals are then inputted into the diagnostic network, and the results show the correctness and superiority of the GWN-EST-1DCNN method. Finally, the 1DCNN method is further validated using tracking data of a CRH3 train running on a high-speed railway line.


Introduction
As railway transportation is developing at a considerable speed in many regions worldwide, condition monitoring of high-speed trains is receiving increasing attention, in which, failures of suspension systems will increase the vibration of vehicle components and reduce the running stability, and may even lead to severe accidents, such as derailment [1,2].Therefore, it is of critical importance to diagnose the faults of railway vehicle suspension systems.

Literature review
Currently, the approaches for fault detection or isolation (FD/I) of railway vehicle suspension systems can be classified into two main categories, i.e., the model-based approach and data-driven approach [3].

Model-based approach
The first reported approaches for railway vehicle suspension fault diagnosis (RVSFD) are mainly model-based approaches.This type of approach usually requires the development of a sophisticated dynamic model to determine the relationship between faulty states and vehicle responses.Data collected by the sensors are then fed into such models to predict the corresponding vehicle dynamic responses.The outputs of the model are compared with the real-time measurement data, and the residual between the measured data and the prediction data is designated to identify failures [4].As shown in Table 1, the model-based approaches with different strategies for FD/I of railway vehicle suspensions over the past two decades can be mainly classified into four sub-categories [5]: (1) Kalman filter-based (KF-based); (2) interacting multiple modelsbased (IMM-based); (3) Rao-Blackwellised particle filterbased (RBPF-based); and (4) recursive least squares-based (RLS-based).
(1) KF-based [10][11][12][13][14][15][16][17][18]: Kalman filter (KF) is an effective tool to estimate state variables for a dynamic system.An earlier study using KF in fault detection and isolation (FD&I) of vehicle suspension systems is the literature [10].In this work, a 2D half-vehicle model, which considered the lateral and yaw motions of two wheelsets and one bogie as well as the lateral motion of the carbody, was built.Based on the vehicle model with 7 degrees of freedom (DOFs), a KF-based method was proposed to detect and isolate the faults of the secondary suspension (anti-yaw damper and secondary lateral damper).The results showed that the KF-based method was computationally efficient and could identify the abrupt faults of vehicle suspension systems.In [11], a light rail vehicle (LRV) model consisting of three wagons, which considered two DOFs of each carbody (bounce and pitch motions) and one DOF of each bogie (bounce motion), was built.Based on the 9-DOF-LRV model, a KF-based method was used to generate residuals for fault isolation of the primary and secondary suspensions (dampers and springs).In [12], based on a vehicle model with the same structure as in [10], a hybrid extended Kalman filter-based (HEKF-based) approach combined with a nonlinear residual generator was proposed for FD&I of the secondary suspension (anti-yaw damper and secondary lateral damper).In [13,14], a three-dimensional vehicle model with 46 DOFs was built, and a multiple Kalman filter-based (MKF-based) approach was applied for FD&I of the secondary suspension system (anti-yaw damper, secondary vertical damper, and secondary lateral damper).With this MKF-based method, high robustness against track uncertainties can be achieved.In [15], based on a 7-DOF-model of ERRI B176 benchmark vehicle, a linear KF scheme was employed to diagnose faults of the secondary vertical suspension.Moreover, it was stated that this method can be used for condition monitoring of secondary suspension instead of calendar-based maintenance.In [16], based on a twomass oscillator with 2 DOFs (i.e., one-eighth of the entire vehicle model), a cubature Kalman filter-based (CKF-based) approach was proposed to diagnose the faults of the secondary vertical suspension.(2) IMM-based [19][20][21]: In [19,20], the IMM approach was proposed to detect the failure of the secondary suspension (secondary lateral damper and secondary lateral spring) of the same vehicle as described in [10].This approach is similar to the KF-based approaches, while it additionally includes mode mixing.Parallel KFs are no longer separated but interact with each other.The input state-space vector at each time step for a given filter is a combination of the output state-space vector of all filters at the previous time step.This combination is based on the mode likelihood and given transition probabilities  [10], Wei et al. [11], Jesussek et al. [12], Jesussek et al. [13], Jesussek et al. [14], Onat et al. [15], Zoljic-Beglerovic et al. [16], Wei et al. [17], and Xu et al. [18] KF-based between modes.Based on the above studies, a model updating procedure was proposed in [21] to adapt the baseline model when a fault was detected, and to allow for the identification of simultaneous faults.In this work, a 2D vehicle model with 7 DOFs (the same as described in [10]) and a three-dimensional (3D) vehicle model with 34 DOFs were, respectively, built, and the IMM approach was used to detect the faults in the secondary suspension (secondary lateral damper and secondary lateral spring).(3) RBPF-based [22][23][24]: Another classical model-based approach reported in the earlier literature is the RBPF approach.For instance, an RBPF-based approach was proposed in [22] to estimate the secondary suspension parameters (anti-yaw damper and secondary lateral damper) of a half-vehicle model with 7 DOFs.The experimental results showed that the RBPF-based method was more promising than the traditional EKFbased approach.The RBPF-based method, however, is usually computationally expensive, and it is thus more suitable to be used in cases where the detection time is of minor importance.Similar studies were presented in [23,24].Although this approach relies on MKF, it is different from the previous approaches since the associated model is not selected in advance to represent a fixed fault type and magnitude [5].(4) RLS-based [25,26]: The above model-based approaches used in RVSFD can find traces of KF.More precisely, they are more or less based on KF.In [25], KF was not used, but a closely related timedomain filter known as RLS was adopted instead.
RLS is an algorithm equipped with the memory and machine learning features and has the capacity to identify multiple parameters simultaneously from an input-output linear system by filtering the error signal between the measured and simulation outputs.In this work, a 3D vehicle model with 42 DOFs was simulated.The field test data from an E464 locomotive were adopted to validate the feasibility of this approach.The results showed that this approach was promising in RVSFD.A similar study was presented in [26].
One of the biggest merits of the aforementioned modelbased approaches is that through mathematical modeling, and the relationship between the input as a faulty state and the output that can reflect the system dynamic behavior can be clearly established [27].This can help researchers or even field staff to clearly understand the diagnostic model, which is helpful for engineering applications.However, the following issues currently limit the development of these model-based approaches: • High modeling difficulty.Vehicle dynamics models are challenging to be accurately built, mainly due to two causes [8]: (1) Train suspension systems are often nonlinear, and it is usually extremely difficult to obtain the detailed and accurate parameters of the nonlinear elements, such as dampers, and springs; (2) In train dynamics simulation, it is difficult to consider the elasticity of the carbody, bogie, wheelset, etc. • High hardware cost.The above model-based approaches all require the use of a relatively large number of sensors, which makes the hardware used in RVSFD rather expensive and raises concerns about the reliability of the transducers.For instance, the minimum number of sensors used in [24] is 3, and more in [28].• Low computing efficiency.Dynamics simulation involves a large number of nonlinear force calculations and iterative computations, especially when a complicated vehicle-track coupling system needs to be considered, resulting in low calculation efficiency.
In conclusion, the model-based approaches have great potential in RVSFD, but the corresponding vehicle model needs to be accurately established, and the calculation efficiency should also be improved.

Data-driven approach
The data-driven approach does not rely on vehicle simulation models but requires historical tracking data and prior training.As shown in Table 1, the data-driven approaches with different strategies for FD/I of railway vehicle suspensions over the past two decades can be mainly classified into four sub-categories: (1)  (1) SM-based [29][30][31][32][33][34][35]: A classical SM-based approach used in RVSFD is based on the cross-correlation function.In [29], the cross-correlation function between the accelerations of two bogies was applied to determine the health conditions of vehicle suspension systems.The basic idea of this approach is that a faulty element in vehicle suspensions can alter the symmetry of a vehicle with a symmetrical configuration, which results in a coupling relationship between motions that can be observed in the crosscorrelation function.Aiming at identifying the failure of the vertical primary suspension, this work analyzed the impact of the faulty damper on the correlation between the bounce, pitch, and roll acceleration signals.Similar studies were presented in [29,30].Actually, if this approach is used to identify different failures of vehicle suspension systems, such as the primary vertical spring failure and secondary vertical damper failure, different cross-correlation functions corresponding to different faults must be observed and counted.This approach, therefore, is also a kind of statistical method.Another interesting method based on a stochastic functional model (FM) of the system dynamics under varying payload was postulated in [32].Using the model-induced parameter space, the healthy system state under variable operating conditions was represented by a certain parameter subspace, which was constructed in an initial learning phase.In the inspection phase, fault detection was achieved by checking whether the current system dynamics belonged to the healthy parameter subspace or not.Moreover, a conventional statistical time series detection method was also introduced in [32] for comparison purposes, and the experimental results showed the superiority of the FM-based method.Similar studies were presented in [33][34][35][36][37].
In [38], a nonparametric, statistical time-series-based method was proposed to characterize the primary and secondary suspension faults of a self-steering threepiece MKV bogie.The method made use of changes in the vibration signal spectrum, and a verified dynamic simulation model was developed to generate vehicle suspension acceleration response for the healthy and faulty states.The result showed that, with this method, the damage of the primary, secondary, and stabilizer springs could be detected.(2) ML-based [39][40][41][42]: Traditional ML methods are mainly composed of two steps: (a) feature extraction; and (2) pattern recognition.In [39], a wavelet entropy-support vector machine (SVM)-based approach was proposed to diagnose the faults in high-speed train suspension systems.More specifically, the wavelet entropy of the bogie frame accelerations was extracted as the feature that reflects the fault states of suspension systems, and SVM was then adopted to classify these different faults.In this work, three types of vehicle suspension faults, including yaw dampers removal (YDR), lateral dampers removal (LDR), and air springs removal (ASR), were considered.In [40], the features used to characterize different signals were dominating frequency along with the corresponding relative damping coefficient, root mean square (RMS) of lateral bogie frame acceleration and mean ratio of axlebox acceleration and bogie frame acceleration.Two classifiers (i.e., linear SVM and Gaussian SVM) were used for the FD&I of yaw dampers of high-speed trains.The simulation results showed that both of these classifiers could identify the faulty yaw dampers well.Moreover, the Gaussian SVM classifier performed slightly better in the training and testing phases, while it had a higher risk of overfitting to the current dataset.In [41], to diagnose the faults of the lateral suspension system of railway vehicles, four time-domain features (mean, standard deviation, skewness, and kurtosis) and three frequency-domain features (frequency center (FC), root mean square frequency (RMSF), and root variance frequency (RVF)) of the bogie lateral accelerations were extracted.After that, three classifiers (Dempster-Shafer (D-S) evidence theory, Fisher discrimination analysis (FDA), and SVM) were applied to the fault classification, respectively.The results showed that the three classifiers all could classify the faults with a high accuracy, in which, the D-S evidence theory outperformed the other two classifiers.In [42], to monitor the stiffness and damping coefficients of the vehicle suspension systems of high-speed trains in real-time, the position, height, and width of the largest peak in magnitude frequency of the axlebox accelerations were considered as the input features.The classifier used in this work was a multi-output support vector machine (MSVR).Besides, it was also stated that unlike the model-based approaches, this datadriven approach did not rely on accurate dynamics models.(3) HM-based [3,27]: HM-based method refers to the combination of signal processing methods and MLbased methods.For instance, in order to diagnose the faults in the secondary suspensions system of highspeed trains, a feature extraction method based on multiscale permutation entropy and linear local tangent space alignment (MPE-LLTSA) was proposed in [3].More specifically, a preliminary highdimensional feature matrix was constructed using MPE, and LLTSA was then used to reduce the dimensionality for obtaining a low-dimensional feature matrix.The classifier used in this work was a multi-class SVM.The results showed that the MPE-LLTSA-SVM method could accurately recognize the secondary suspension faults when the track irregularities and wheel profiles were relatively constant.However, the robustness against track irregularities and wheel wear was not well solved.In [27], the random decrement technique (RDT) was used to extract the free response of the bogie frame lateral accelerations.The output of the RDT was then analyzed using the Prony method to identify the characteristic exponents of the system.In the fault classification step, two classifiers were compared, i.e., artificial neural networks (ANN), and k-nearest neighbor (k-NN), and the k-NN classifier were proved to be more reliable than the ANN classifier.(4) DL-based [43] In recent years, DL methods have begun to be applied to various industries, but research on RVSFD is relatively scarce.In [43], to diagnose the faults of suspension systems of high-speed trains, three synchrony measurements (instantaneous phase synchrony, amplitude envelope synchrony, and composite synchrony) were applied to estimate the similarity between bogie acceleration signals, and a synchrony group convolutional network was proposed for feature extraction and pattern classification of the multichannel monitoring system.The effectiveness of the method was validated by a simulation dataset.
One of the biggest merits of the aforementioned datadriven approaches is that they do not rely on sophisticated dynamics models or high-fidelity simulations.In particular, the data-driven approaches combine the power of data analysis and engineering domain knowledge to generate a model that can be trained quickly and adapted easily to different vehicle suspension systems [42].Moreover, two model-based approaches (robust observer, and the KF combined with the generalized likelihood ratio test (GLRT) and two data-driven approaches (dynamical principal components analysis (DPCA), and the dynamical canonical variate analysis (CVA)) were, respectively, introduced in [8] to detect the faults in the primary and secondary suspensions of an urban rail vehicle.The comparison results showed that the data-driven approaches outperformed the model-based approaches from the perspective of modeling, computational efficiency, and accuracy.However, datadriven approaches currently used in RVSFD still face the following challenges: • Database establishment.No matter from the perspective of data statistics or from the perspective of model training, data-driven approaches require a massive amount of historical tracking data, which is a common problem facing the entire industry.• Development of adaptive fault feature extraction method.As far as ML-based approaches are concerned, the first step is to extract features that reflect the fault status of vehicle suspension systems.In reality, the vehicle suspension systems have many nonlinear components [44][45][46][47], including springs, dampers, etc.The nonlinear factors of the vehicle components, usually, result in acquired signals that contain multiple natural oscillation modes, especially when multi-faults are coupled together [3].As a result, it is difficult to characterize these nonlinear signals by using traditional single time-domain or frequency-domain feature extraction methods [48][49][50][51].
• High sensitivity of collected signals to track irregularities and wheel wear.The running of a vehicle on a track is achieved through the wheel-rail contact.Track irregularities will seriously affect wheel-rail contact, such as contact area, contact force, and affect the vibration signals used for condition monitoring [52].More importantly, the wheel profile will continuously change as the mileage increases due to the presence of wear [53,54], which will seriously affect the vibration signals.In short, the high sensitivity of the collected signals to track irregularities and wheel wear could affect the robustness of data-driven approaches.

Motivation
With the advent of the era of big data, railway companies have started to establish related databases.Specifically, high-speed trains are usually equipped with a large number of sensors, and it is easy to acquire tracking data from these trains, which lays a solid foundation for the study, application, and promotion of data-driven approaches in RVSFD.The train suspension system is a highly nonlinear system [3].As described in Sect.1.1.2,it is difficult to accurately characterize these nonlinear signals with traditional single feature extraction methods [55].To overcome this problem, we have proposed a feature extraction method of MPE-LLTSA in [3] to RVSFD, which can realize the feature extraction of signals at multiple scales.However, a deep understanding of objects, as well as the corresponding signals, is still a prerequisite, and extensive expertise and data analysis capabilities are required for building this method.Therefore, a simple and adaptive feature extraction method that can overcome the nonlinear interference of the vehicle system is required.Under the background of the era of big data, DL is a powerful tool that has been successfully applied in many industries, such as image recognition [56], earthquake prediction [57], transportation planning [58], fault diagnosis of rotating machinery [59], multibody dynamics simulation (MBS) [59].Exploring its possibility in fault diagnosis of railway vehicles is a topic of big interest.Motivated by this, this paper aims at developing a DL-based fault diagnosis method for RVSFD.
As described in Sect.1.1.2,track irregularities and wheel wear will affect the vibration signals used for condition monitoring of railway vehicles.Therefore, the developed fault diagnosis method for train suspension systems must be guaranteed to be immune to changes in track irregularities and wheel wear before being put into use.In our previous work [3], we have briefly discussed the robustness of the diagnostic method caused by wheel wear, but it was not studied in-depth.In addition, the interference caused by track irregularities was also not analyzed.To improve the previous work, these two issues are the main subjects of this study.

Contribution and structure of this paper
The main contribution of this work is summarized as follows: ( The rest of this paper is structured as follows.In Sect.2, an MBS model of a high-speed train is built, where three measured track irregularities and seven tracking wheel profiles are presented.In Sect.3, a DL-based diagnosis network for RVSFD is presented.In Sect.4, the simulation result is presented.In Sect.5, the presented diagnosis network is validated using the tracking data of a CRH3 train running on a high-speed railway line.Finally, concluding remarks are briefly given in Sect.6.

Vehicle-track coupled model
Here, the description of the vehicle-track coupled model is divided into two parts: vehicle-track model (Sect.2.1) and track irregularities and wheel wear (Sect.2.2).

Vehicle-track model
The vehicle model built in our work consists of three substructures, one for the carbody and two for the bogies, where each bogie consists of one bogie frame, two wheelsets, and four axleboxes.These rigid bodies are assembled by primary and secondary suspensions.By assuming a constant running speed of the carbody, the carbody is considered as 5 DOFs.The bogie and wheelset can be characterized by 6 DOFs each, and each axlebox only rotates relative to the corresponding wheelset, i.e., with one DOF.Finally, the MBS model of the vehicle has 49 DOFs.The final vehicle model simulated in SIMPACK is shown in Fig. 1a.For more information, as well as the main parameters, of the vehicle model, see Ref. [3].
The wheelset is supported by two rails, where Hertzian contact [61] and FASTSIM [62] algorithms are used.Simulating the track structure according to the realistic condition (e.g., 'rail ?rail slab ?concrete base ?substructure' [63]) would involve a large number of DOFs, thus increasing the computational effort considerably.Therefore, referring to Refs.[64,65], the track model is simplified as a co-running track with a form of 'rail ?track slab ?ground' [66].The stiffness and damping of the fastener system are considered between the rail and the track slab, and the stiffness and damping of the cementasphalt mortar are considered between sleeper and ground, as shown in Fig. 1b.Some parameters of the track model are listed in Table 2.

Track irregularities
The operating environment of a train is complex and changeable, in which track irregularities are often not constant [67][68][69].The developed FD&I method must be immune to the disturbance of track irregularities before it is implemented in actual engineering.To investigate the impact of track irregularities on the robustness of the fault diagnosis method, track irregularities measured on three different high-speed railway lines are introduced, namely Wuhang-Guangzhou railway line (WG-line), Beijing-Tianjin railway line (JJ-line), and Zhengzhou-Xian railway line (ZX-line).Figure 2 shows the track irregularities of 1000 m in the whole line.

Wheel wear
When a train is running, the wheel profile will change continuously due to wear, which will always affect wheelrail contact, including contact force, contact patch size, etc., and further affects the dynamic characteristics of the bogie, including the bogie frame acceleration.The FD&I method may incorrectly attribute the change in the bogie acceleration to a failure of the vehicle suspension system.To analyze the impact of wheel wear on the robustness of the diagnostic method, the wheel profile evolution of a CRH3 high-speed train running on the WG-line is introduced in our work for analysis.The total length of the WGline is 1,068.8km, the minimum radius of curve of the line is 7,000 m, the gauge is 1,435 mm, and the rail cant is 1/40.This line situation is a typical example of China highspeed railway lines.In Fig. 3, the new S1002CN profile, as well as the worn S1002CN profiles due to wear as the mileage increases, are presented.It can be seen that the wear at the flange root is mild, and the wear volume is mainly distributed in the range of -25-25 mm relative to the nominal rolling circle of the wheel.Eventually, a ''hollow wear'' that commonly occurs in high-speed trains is developed.More information concerning the tracking measurement of the wheel profile evolution can be found in [53].

Diagnostic network design
The description of the diagnostic network is divided into three subsections.In Sect.3.1, the design of the deep neural network is described.Two strategies to improve the robustness against track irregularities and wheel wear are described in Sect.3.2, i.e., a GWN-strategy and an ESTstrategy.The whole structure of the diagnostic network for railway vehicle suspension systems is described in Sect.3.3.

One-dimensional CNN (1DCNN)
A traditional two-dimensional CNN (2DCNN) is designed to take advantage of the spatial features in 2D images by using locally connected and tied-weights convolutional filters that operate on multiple pixels simultaneously rather than a single pixel [56,70], and this approach can better detect the dependencies between pixels.In a 2DCNN, 2D input data are first converted into 3D data (width, height, and depth), where the depth of 1 for a one-band image and 3 for a three-band image (red, green, blue).Next, a feature map is obtained by multiple applications of convolution operators across sub-regions of the entire image, which first add a bias term and then apply a nonlinear activation function.If the kth feature map at a given layer is represented as h k , whose filters are determined by weights W k and bias b k , the feature map h k is then expressed by   However, for 1D time-series data, such as the acceleration data used in this paper, 1DCNN is usually a more ideal choice [71].Figure 4 shows the difference between the 2DCNN and 1DCNN, where applying 2DCNN to a 2D image will generate a 2D image, whereas applying 1DCNN to a 1D image will generate a 1D image.The convolutional filter of the 1DCNN is one-dimensional, which enables it to detect the interdependencies in 1D data.

The Architecture of the designed 1DCNN
The architecture of the 1DCNN model designed in our work is shown in Fig. 5.The proposed model includes 7 main blocks, the first 5 blocks are designed for feature extraction, among which each block consists of a 1DCNN layer, an advanced activation function (AAF) layer, and a max-pooling layer.The last two blocks are designed for For the design of the 1DCNN layer, existing studies [56,70] have shown that the feature maps should change from wide and shallow to narrow and deep from the input layer to the output layer.This rule has proven to be very effective in many successful CNN models, such as the classic AlexNet [56] and VGGNet [70].This article, therefore, follows this rule to adjust the number of the convolution kernels in the CNN layer part, i.e., the number of CNN convolution kernels in each subsequent layer is twice that of the previous layer.This strategy can increase the depth of the feature maps from the first block to the last block.In our model, the number of convolution kernels in the first CNN layer (block 1) is set to 32, and the number of convolution kernels in the last CNN layer (block 5) is 512.
For the selection of the activation function, most studies on CNN models use the rectified linear unit (ReLU) function [72].This function, however, has a disadvantage, that is, a too-large learning rate or gradients could easily lead to the ''death'' of neurons, and the ReLU function often cannot perform well when the nonlinear relationship of the input dataset is very complicated [73].The vehicle suspension system is a highly nonlinear system.For signals from such systems, this activation function is obviously not an ideal choice.Therefore, an advanced activation function, parametric rectified linear unit (PReLU) proposed in [73], is used in our work, and the expression is given:  Fig. 5 The designed architecture of the proposed 1DCNN model where y i is the input of the activation function on the ith channel, which is the output of 1DCNN in this work; a i is a coefficient controlling the slope of the negative part, whose value can be automatically learned from the data during the training phase to meet the dataset of different nonlinear relationships.
The design of the pooling layer is critical for the CNN model since it can significantly reduce the model parameters and the time required for training without sacrificing model accuracy [55,60].Therefore, after the 1DCNN and activation function layers of each block in the first 5 blocks, a local max-pooling layer is added to extract the key features of the 1DCNN layer output and reduce the model parameters.In addition, the stride of each maxpooling layer is set to 2, which can reduce the width of the feature maps.Further, in order to allow the output results of the 1DCNN model to be the input of the fully connected layer later (block 6), it is also necessary to transform the dimension of the output results of 1DCNN.This article uses the global average pooling layer [74] to replace the original flatten layer, which can further reduce the model parameters and increase efficiency.
In the first fully connected layer (block 6), the number of neurons is set to 64, and the advanced activation function PReLU is used as the activation function.The number of neurons in the last fully connected layer (block 7) is set to 4 since the number of fault categories in this work is 4 (see Sect. 4), and then, the Softmax [75] classification function is used as the activation function to output the model's predicted probability for each type of failure.

Two strategies for increasing robustness against
track irregularities and wheel wear

Gaussian white noise strategy against track irregularities
Track irregularities affect the dynamic characteristics of railway vehicles, including the signals required for suspension fault diagnosis.Under different track irregularity conditions, the amplitude and frequency distribution of acceleration are often different, which may affect the robustness of the fault diagnosis method.Figure 6 shows the lateral acceleration distributions of the bogie frame under three different track irregularity conditions when the yaw damper fails (YDF).It can be clearly seen that although the three signals have roughly the same trend distribution, such as the frequency of the peaks, there are some different relatively high-frequency and low amplitude impact components between the two peaks.
In order to make the fault diagnostic network immune to these relatively high-frequency and low amplitude impact components, a strategy of adding Gaussian white noise to the original signal is proposed to overcome these impact components caused by the track irregularities, i.e., the GWN-strategy.This strategy of adding GWN to the raw signal is often used in signal processing and pattern recognition.For instance, it is commonly used in empirical modal decomposition (EMD), and the method named ensemble empirical modal decomposition (EEMD) has been developed based on this [76].It should be noticed that the amplitude of noise affects the diagnosis accuracy.However, there is no specific equation reported in the literature to guide the choice of the noise amplitude until now.Thus, for an investigated signal, different noise levels should be tried to select the appropriate one.In this paper, after many trials, it is suggested that the amplitude of the added white noise is about 0.2 times the standard deviation of the investigated signal.This value is also suggested when using EEMD proposed in Ref. [76].It is important to note that such an approach of adding white noise does not eliminate the high-frequency components of the signal, rather it makes all signals have high-frequency components and thus, the diagnostic method is immune to these highfrequency components.Such a similar approach is also commonly used in CNN-based image recognition [77].The feasibility of this strategy is demonstrated in Sect. 4.

Edge sample training strategy against wheel wear
As described in Sect.2.2.2, the wheel profile will change as the train running mileage increases.This process is a continuous process, which will always affect the bogie frame acceleration.Figure 7 shows the bogie frame lateral accelerations for a new S1002CN profile, a worn S1002CN profile after running 95,000 km (S1002CN-W95K), and a worn S1002CN profile after running 1,95,000 km (S1002CN-W190K) when the yaw damper fails; it can be clearly seen that as the wear increases, the vibration amplitude of the bogie acceleration also increases.Therefore, using the dataset trained under the new wheel profile (S1002CN) to identify the dataset under the worn wheel profiles (S1002CN-W95K and S1002CN-W190K) may cause misidentification, i.e., the fault diagnosis method may incorrectly attribute the change in the bogie acceleration to a failure of the vehicle suspension system.To overcome the above problem, an EST-strategy is proposed in this paper, i.e., during the phase of training dataset establishment, the dataset corresponding to the new wheel profile (S1002CN) and the dataset corresponding to the most worn wheel profile (S1002W190K) are used as the training dataset for the fault diagnosis method.With the EST-strategy, the interference of the wheel wear on the robustness of the fault diagnosis method can be suppressed, and it is not necessary to train the dataset corresponding to each worn wheel profile evolved during the running of the wheel (of course, it is also unrealistic).The feasibility of this strategy is demonstrated in Sect. 4.

Diagnostic network of railway vehicle suspension systems
Finally, the architecture of the designed diagnostic network of train suspension systems is illustrated in Fig. 8.The whole process is called GWN-EST-1DCNN method, and it consists of the following three phases: Phase I Data preprocessing.In this phase, firstly, the bogie frame accelerations concerning different faults are collected, and the GWN-strategy described in Sect.3.2.1 is then applied to the original acceleration signals.
Phase II Training dataset establishment.Based on the ETS-strategy described in Sect.3.2.2, the samples corresponding to the new wheel profile (S1002CN) and the samples corresponding to the most worn wheel profile (S1002CN-W190K) are chosen as the training dataset for the diagnostic network, and their upper envelopes are extracted.
Phase III Fault diagnosis and visualization.Using the 1DCNN designed in Sect.3.1.2to train and classify different kinds of faults, and the final results are visualized by Andrews curve [78].
Andrews curve is a method for visualizing high-dimensional datasets by mapping each observation onto a function.
For a k-dimensional dataset, n i ¼ x i1 ; x i2 ; . ..; x ik ð Þ .The Andrews curve is a plot of ðt; y it Þ in the range of t 2 Àp; p ½ , where y it is given by where k i ¼ i; i ¼ 1; 2; . . .It indicates that Andrews curves that are represented by functions close together suggest that the corresponding data points will also be close together, and thus, Andrews curve is suitable for visualizing the clustering and classification of high-dimensional datasets.
More information concerning Andrews curve can be found in [79].

Simulation
In the simulation experiment, a normal state and three failure states are constructed.Firstly, the secondary lateral dampers, the yaw dampers, and both the yaw dampers and the secondary lateral dampers of the front bogie are, respectively, removed (abbreviated as LDF, YDF, and Y&LDF, respectively).Secondly, simulation experiments are performed under the normal state and these three different failure states (i.e., normal, LDF, YDF, and Y&LDF), respectively.Besides, a tri-axial accelerometer is installed on the bogie frame (see Fig. 8).The reason for the accelerometer position is that we plan to use the data collected by only one sensor in the future to monitor the faults including primary suspension faults (only secondary suspension faults are tested in this paper), and using the vibration acceleration data from the bogie frame is a compromise choice.The sampling frequency is selected as 250 Hz.The acceleration signal used in this work is the lateral acceleration signal from the tri-axial accelerometer.
The vehicle speed is equal to 250 km/h.The specific fault construction process was described in detail in the authors' previous work [3].The simulation experiment consists of 4 cases.Case I (Sect.4.1.1)shows the feasibility of using the designed 1DCNN method for train suspension systems in the case of the same railway line and the same wheel profile.Through the univariate analysis method, Case II (Sect.4.1.2) and demonstrates that the GWN-EST-1DCNN method is not disturbed by simultaneous changes in track irregularities and wheel wear.A vehicle model, with S1002CN wheel profiles, running on the WG line is taken as an example to illustrate the feasibility of using the 1DCNN method for train suspension systems in the case of the same railway line and the same wheel profile.The training dataset and testing dataset are, respectively, composed of 3,456 samples and 1,408 different samples, respectively (see Table 3).The length of each sample is t ¼ 5 s.According to the designed network of the 1DCNN described in Sect.3.1, the final results, including the convergence rate, confusion matrix, and visualization features, are shown in Fig. 9. Figure 9a indicates that the designed 1DCNN method with 30 epochs is convergent.The confusion matrix for the testing samples is shown in Fig. 9b, and it can be seen that all the states can be totally distinguished by the 1DCNN method (100%).To visualize the classification and clustering results, Andrews plot is presented in Fig. 9c, which shows that the four states can be completely separated and the clustering result is excellent.Overall, Fig. 9 indicates that the designed  1DCNN can diagnose the secondary suspension faults and has the potential to be applied in RVSFD.

Case II (same wheel profile, and different railway lines)
To demonstrate that the GWN-strategy can improve the robustness of the diagnostic method against track irregularities, three measured track irregularities plotted in Fig. 4 are introduced here.The wheel profile used here is S1002CN.The training dataset and testing dataset are composed of 3,456 samples and 4,992 different samples, respectively (see Table 3).Finally, the results are shown in Fig. 10.It can be clearly seen that the LDF state cannot be well distinguished without the GWN-strategy.On the ZXline, 47.9% of the LDF state was incorrectly identified as the normal state, and the corresponding incorrect recognition rate is 11.6% on the JJ-line.However, with the GWNstrategy, the corresponding incorrect recognition rates decrease from 47% and 11.6% to 16% and 4%, respectively; and the classification and clustering effect become better since it can be clearly seen that for the case of ZXline and Without GWN-strategy, the Andrew curve cannot show the LDF state (blue curve) at all.The simulation experiment proves that the GWN-strategy can improve the robustness against track irregularities.

Case III (same railway line, and different wheel profiles)
To demonstrate that the EST-strategy can improve the robustness of the diagnostic method against wheel wear, a new S1002CN wheel profile, the most worn wheel profile (S1002CN-W190K), and a wheel profile with a degree of wear between the two profiles (S1002CN-W95K) are introduced here (See Fig. 3).The rail line used here is the WG-line.The training dataset and testing dataset are composed of 3,456 samples and 1,480 different samples, respectively (see Table 3).Note that when using the ESTstrategy, the samples of the training dataset are randomly selected from the dataset corresponding to the S1002CN wheel profile and the dataset corresponding to the S1002CN-W190K wheel profile, while without the ESTstrategy, the training dataset is only the dataset corresponding to the S1002CN wheel profile; the testing dataset is the dataset corresponding the S1002CN-W95K.and all the Y&LDF samples are incorrectly recognized as YDF.By contrast, when using the EST-strategy, the accuracy is significantly improved.Besides, the Andrews curve also shows that when the EST strategy is applied, the difference in the curve distribution of these four faults becomes obvious.The simulation experiment proves that the EST-strategy can improve the robustness against wheel wear.

Case IV (different railway lines, and different wheel profiles)
In this subsection, we test the fault states under different track irregularities and different wheel profiles, and the training dataset and testing dataset are composed of 3,456 samples and 4,992 different samples, respectively (see Table 3).According to the technique route of the GWN-EST-1DCNN method described in Sect.3.3, the final results in Fig. 12 show that, compared with simply using the 1DCNN method, the GWN-EST-1DCNN method can classify fault states regardless of whether the profile changes or the railway line changes except for a slightly worse prediction of the LDF samples on the ZX-line.
Overall, it can be concluded that the recognition result is greatly improved.

Discussion
The advantages of the proposed GWN-EST-1DCNN method mainly arise from the following two aspects: (1) Track irregularities affect the bogie accelerations required for train suspension fault diagnosis.Under different track irregularities, there are some different relatively high-frequency and low amplitude impact components in these acceleration signals.The strategy of adding Gaussian white noise (GWN-strategy) to the original acceleration signals can improve the immunity of the diagnostic method to track irregularities since this strategy reduces the sensitivity of diagnostic methods to changes in track spectrum.(2) The wheel profile will change as the train running mileage increases.As the mileage increases, the amplitude of the bogie acceleration also increases.Therefore, using the dataset trained under the new wheel profile to identify the dataset under the worn wheel profiles may cause misidentification, i.e., the fault diagnostic method may incorrectly attribute the change in the bogie acceleration to a failure of the vehicle suspension system.The EST-strategy can improve the immunity of the diagnostic method to wheel wear mainly due to two reasons: (I) The training dataset of the diagnostic method covers a wider range of samples, which can identify the testing dataset to a certain extent more accurately.(II) Training the diagnostic method with the datasets corresponding to the new wheel profile and the most worn wheel profile makes the method immune to the changes in the acceleration amplitude caused by wheel wear to a certain extent.The actual operating conditions of railway vehicles are more complicated.To verify the performance of the 1DCNN method in real-life conditions, the field tracking data of a CRH3 train running on a high-speed railway line are applied.More information concerning the monitoring system can be found in [3].The acceleration was measured through a tri-axial accelerometer mounted on the bogie frame (see Fig. 13), and the original sampling frequency was equal to 2 kHz.In our work, the lateral acceleration signal from the tri-axial accelerometer with a resampling frequency 250Hz is used.When processing the tracking data, it was found that the acceleration signal of front bogie at the third car was abnormally vibrating, and its amplitude, for most of the time, was usually greater than that of the acceleration signal of other front bogies.Upon inspection, it was found that a hydraulic cylinder of the secondary lateral damper at the front bogie of the third car was short of oil (see Fig. 13b).Figure 13c shows the vehicle speed, the lateral acceleration of the front bogie at the third car (abnormal), and the lateral acceleration of the front bogie at the second car (normal).

Fault diagnosis
Three states, including the normal (normal), the lateral damper failure at the speed of 80-100 km/h (LDF80-100), and the lateral damper failure at the speed of 180-200 km/ h (LDF180-200), are selected for analysis.The training dataset and testing dataset are, respectively, composed of 2,560 samples (normal: 1000; LDF80-100: 780; LDF180-200: 780) and 1024 samples (normal: 500; LDF80-100: 262; LDF180-200: 262), and the length of each sample is t ¼ 5 s.The designed 1DCNN method described in Sect.3.1 is applied to these signals, and the final results, including the convergence rate, confusion matrix, and Andrews curve, are shown in Fig. 14. Figure 14a indicates that the proposed 1DCNN method with 30 epochs is convergent.The confusion matrix for the testing samples is provided in Fig. 14b, and it can be seen that all the states can be totally distinguished (100%).Andrews plot presented in Fig. 14c shows that the three states can be completely separated and the clustering result is excellent.The experiment results obtained using the field tracking data further verify that the proposed 1DCNN-based method can accurately identify the faults of the vehicle suspension systems at different speeds.
Due to the limitation of experimental resources, this paper only verified the method of 1DCNN to identify the secondary lateral damper failure using the field tracking data.The developed fault diagnosis method for train suspension systems, therefore, must be guaranteed to be immune to changes resulting from the track irregularities and wheel wear before being put into use.Aiming at solving this issue, a GWN-EST-1DCNN-based method for high-speed train suspension systems is proposed.This method consists of three phases.In the first phase (data preprocessing), a strategy of adding Gaussian white noise (GWN-strategy) is applied to the original signal, making the diagnostic method be immune to the interference caused by track irregularities.In the second phase (training dataset establishment), an EST-strategy is proposed to improve the robustness of the diagnostic network against wheel wear.In the third phase (training and recognition), a 1DCNN-based fault diagnostic network of high-speed train suspension systems is built.Simulation experiments show the superiority and correctness of the proposed method.In addition, the field tracking data of a CRH3 train running on a highspeed railway line are used to further verify the effectiveness of the 1DCNN method.The test results show that the method has the potential to be applied in the field of railway engineering.This paper ends with the following notes.(1) It should be noted that the trained DL algorithm is extremely sensitive to the vehicle speed because the axlebox acceleration caused by different suspension faults varies at different vehicle speeds.Therefore, during on-board monitoring, the suspension status can be determined by obtaining the axlebox acceleration at a constant speed (e.g., 200 km/h or 250 km/h).However, to achieve real-time monitoring, more velocity conditions need to be further analyzed.(2) In the simulation experiments, only the complete damage of the dampers in the secondary suspension system is simulated.The degradation of suspension systems, including dampers, will be studied in the following-up work.(3) In the field experimental part, due to the limitation of experimental resources, this paper only verified the method of 1DCNN to identify the secondary lateral damper failure using the field tracking data.
and r is the activation function.

Fig. 1
Fig. 1 Models for simulation: a vehicle model and b track model

Fig. 3 Fig. 2
Fig.3Measurement of the wheel profile evolution: a measured wheel profiles and b their wear depth distributions with respect to mileage[53]

Fig. 4
Fig. 4 The differences between convolution networks: a 2DCNN and b 1DCNN

Fig. 6
Fig. 6 Lateral acceleration of the bogie frame under three different track irregularity conditions (YDF, S1002CN profile)

Fig. 7
Fig. 7 Lateral acceleration of the bogie frame under three different wheel profiles (YDF, WG-line)

Fig. 8
Fig. 8 The architecture of the fault diagnostic network (GWN-EST-1DCNN) of railway vehicle suspension systems

Fig. 9 123 RailFig. 10
Fig. 9 Results for S1002CN profile and WG-line: a convergence rate, b recognition, and c visualization

Fig. 13
Fig. 13 Measurements of vehicle speed and front bogie acceleration: a the 3D drawing of the bogie; b the secondary lateral damper; c the vehicle speed, the lateral acceleration of the front bogie at the third car (abnormal), and the lateral acceleration of the front bogie at the second car (normal)

Fig. 14
Fig. 14 Fault diagnosis results: a convergence rate, b recognition and c visualization

Table 1
Summary of approaches used in RVSFD

Table 2
Primary parameters of the track model

Table 3
The number of samples in different cases