Recurrent neural network model for high-speed train vibration prediction from time series

In this article, we want to discuss the use of deep learning model to predict potential vibrations of high-speed trains. In our research, we have tested and developed deep learning model to predict potential vibrations from time series of recorded vibrations during travel. We have tested various training models, different time steps and potential error margins to examine how well we are able to predict situation on the track. Summarizing, in our article we have used the RNN-LSTM neural network model with hyperbolic tangent in hidden layers and rectified linear unit gate at the final layer in order to predict future values from the time series data. Results of our research show the our system is able to predict vibrations with Accuracy of above 99% in series of values forward.


Introduction
Deep learning systems are widely applied in many fields of modern technology due to high precision and many possible applications. Intelligent transportation is related to the variety of developments in technology. We can read about many interesting models used to predict or simulate vibrations. In [8] was proposed a model of vibration analysis for elements joined by friction. Neural networks were used to predict ground vibration in [3]. As a result, warning system for geological activities was developed. Similarly in [4] was presented how to use heuristic model to help on prediction of blast-produced ground vibration. A model composed for open-pit mine vibrations prediction presented in [12] was also based on neural network.
Deep learning models are very often used in diagnostic purposes for variety of technical systems. In [11] was presented how to evaluate displacement from data to improve transportation systems. Model presented in [5] was developed to help on fault detection in residual generator from collected data. There are also very interesting models of machine learning devoted to the topic of vibration analysis in means of transport. In [10] was presented an idea to evaluate vibration of high-speed railway by using deep learning. The model proposed in [13] was oriented on effect of vibrations that come from trains to urban areas. This topic was explored in [17] to show various aspects of such vibrations. Results presented in [21] described how energy consumption analysis may help in prediction of interior noise in a high speed. A wide spectrum of numerical experiments for various urban trains was presented in [7]. In [18] was presented an analytical approach to use neural networks for simulation of dynamical elements in moving vehicles. Among models used in analysis of time series, Recurrent Neural Networks (RNN) are presented in many efficient applications. An idea discussed in [20] has been developed for application of RNN in traffic noise prediction. Results presented in [6] show how to model RNN to infer global temporal structure. Presented results show that this type of neural networks can well adapt to variety of examples. In [19] was discussed efficiency of RNN in stochastic time series analysis, while results of [16] show solution to improve long time series processing with fuzzy granulation.  In this article, we want to discuss application of deep learning model to predict potential vibrations of high-speed trains from time series of recorded travels. In our work, we have prepared a novel lightweight solution for predicting the future train vibrations using time series data provided as an input. What is more, the proposed model performs not only well in terms of training and evaluation times but also in high Accuracy terms. Our proposed prediction model is developed using Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) neurons. We have examined which training model would help to better train this system, and as a result, we propose to use NAdam training algorithm. Tests were also done for different configurations of the developed classifier. We have tested how many readings in a time proposed classifier can need to work with highest potential efficiency. We have also tested how long would it be necessary to train the classifier to work with assumed efficiency. Because we are working with time series data, we have decided to use LSTM neurons in our architecture as they have the best ability to remember past values and thus predicting trends. Other neuron types are ineffective because of great regression in performance in both training time and Accuracy standpoint. Results of our findings are compared and discussed to draw conclusions for future work on efficient predictors of long time series of sensor readings.

Deep learning model
The research in our project is oriented on prediction of possible vibrations in high-speed trains by using deep neural network model. We have developed a model of Recurrent Neural Network to analyze signal from the high-speed train. In Fig. 1 is presented a sample explanation of our research assumptions. We assume that high-speed train is moving at regular speed on the track. During motion vibration signal is read from suspension sensors and processed by developed LSTM-RNN to decide if the train is stable as visible in Fig. 2. If the system discovers malfunction or difference from regular situation on the track, it sends warning to the operator. As a result, driving of such high-speed train becomes more safe and easier.

Data
In order to develop our system, we have used the dataset provided by Enfang Cui [2]. This collection consists of 200 hours of metro train vibration energy harvesting data collected at intervals of 2 minutes. After simple data analysis, we have spotted that values range is quite similar across the whole range and the differences are small. Thus, we have decided to use standard Minimum-Maximum scaling algorithm for our data preprocessing to fit them into the 0-1 range. This preprocessing helped the network results to be rescaled to the original shape. For our experiments, this collection was split into the train/test subsets with the ratio of 70:30.

Recurrent neural network
Because we are working with the time series data, we have decided to use Recurrent Neural Network in order to allow it to learn the dependencies of values in time and that way give us better results. In our RNN model, we have used one Input layer followed by 8 LSTM layers and finally 2 standard layers. For all hidden layers, we have used Hyperbolic Tangent activation function, and for the output Fig. 1 In our research, we assume that high-speed train or metro is moving on the truck and sensors are reading vibrations from suspension system. Recordings are analyzed by our developed deep learning model to help operator during driving layer we have decided to use ReLU function as we are dealing with the value prediction which can be higher than the maximum value of the training set so we do not want to be restricted by -1 to 1 range that the Tanh function gives us. In some cases, better results could be given by using leaky ReLU or ELU activation functions; however, in this dataset it is not needed because vibrations could not be negative, especially when using ReLU is much more computational heavy and one of our goals was to reduce the training times to the minimum. The final architecture is shown in Fig. 3, while LSTM neuron model is presented in Fig. 4.
Each of LSTM neurons works with memory recall ability mathematically modeled as: where we assume: x t as input time series array, f t forget gate activation function, i t input/update gate activation value, o t output gate activation value, h t hidden state,c t cell input activation, c t cell state, while other symbols are W,b weights matrices and bias, r sigmoid activation function, tanh hyperbolic tangent activation function. For calculations, we assume c 0 ¼ 0 and h 0 ¼ 0.

Model training
After conducting experiments to get the best possible Accuracy of our predictor, we have used NAdam optimization algorithm. Because of high performance and short training times, it is widely used in machine learning research. To improve model Accuracy and reduce training even more, we have used learning rate decay. It allowed deep neural network to first quickly adapt to the given problem and then slowly polish the final efficiency. NAdam formula can be described as follows. First we need to compute values a i and b i which are later used for computing correlationsâ i andb i : where h 1 and h 2 are constant hyper-parameters and z is a current gradient value of the error function. Correlations are described as follows: After that the formula for updating weights is defined as follows: where c is a small, constant value and g is a learning rate. Next we apply NAG formula to classical Adam using equations below: Finally, we modify Adam algorithm update rule; thus, we need to change equations forâ i and x i : We have used Mean Squared Logarithmic Error as fitness function for our model, since it gave well understanding of the training process for this type of data.

Experiments
Specification of our computer is as follows: • CPU AMD Ryzen Threadripper 2950X 16 cores/32 threads • GPU 29 NVidia RTX 2080 8GB • RAM 64GB Because in our work we are dealing with time series data, we have chosen empirically to use Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) neurons. However, to ensure that our choice was correct we have done a comparison between RNN-LSTM, classical artificial neural network (ANN) and other types of RNN models. Results are shown in Table 1. We can see that in our tests models which use Long Short-Term Memory neurons are having the best Accuracy. Our proposed model is giving result similar to Bi-LSTM; however, from our experiments we see that training of our model was much shorter, what gave us a big advance and justified our choice to present this idea as a main result of our research in this project.

Finding the best training algorithm
In our experiments, we searched for the best training algorithm for this dataset. To do this, we have conducted series of experiments running the same network architecture training by different methods of optimization. Results are found in Table 4 and more detailed description can be found below: • NAdam Predictor reached over 99.12% making it the best fitting algorithm for our task.
More results containing charts of changes to Loss functions and Accuracy in time domain are found in Fig. 5. As we can see, some did not learn any valuable dependencies Fig. 8 Results of Accuracy and Loss function in training for prediction of using different error margins. In blue line, we can see results on train data, while in orange we can see results on test data c Fig. 9 Results of searching for how many future values should the network predict. In blue line, we can see results on train data, while in orange we can see results on test data transforming all data into the same, constant values. In some cases, it worked; however, the Accuracy was very high reaching over 95%. Finally, we have chosen NAdam as the training algorithm due to the best Accuracy. In numerical experiments, we have divided our data into 70:30 for training and testing in our experiments. K-fold cross-validation has been used for training data. The model was trained in configuration with reshuffled training data at each iteration of applied training algorithm.

Searching for the best time step
To improve Accuracy on time-changing data even more, we had to find the best time step for our proposed predictor. Results are shown in Table 2, and charts are presented in Figs. 6, 7. As we can see, proposed RNN-LSTM is very sensitive to changing the time step value. For example, changing it from 20 to 19 resulted in almost 50% reduction in Accuracy; however, further reduction to 18 increased it again. Because of that, we have tested some different configurations, and finally, we have chosen the time step value of 20 as it gave us the best results. In Table 2, we can see how the number of calculations improves decision processes in the proposed system. We can see that the number of iterations above 10 does not influence much this process. After experiments, we have concluded that 10 iterations are a minimum to make the system both flexible to input data and maintain prediction ability well.

Computing overall Accuracy based on error margin experiments
Our architecture predicts future time series values with a very high Accuracy. As shown in Figs. 8, 9 and Table 3, if we allow the network to have an error margin of 5% of the real value predictor Accuracy reaches about 75.66% which leads to the conclusion that proposed network architecture is very accurate with most of its predictions. However, if we step up the margin to 10%, the Accuracy stays around 99.12%. That implies that proposed RNN-LSTM network correctly predicts the overall trend, but it is still not intelligent enough to be ideal in terms of the exact value for time series. On the other hand, stepping even more gives us the Accuracy of 100% so we can conclude that this network is a good fit for the given problem.

Training times
During our numerical experiments, despite logging model's performance in terms of Loss and Accuracy, we have also tested training times. Results are shown in Table 4. As we can see, different optimization algorithms have also much different training times, which can lead us to confirm our choice of NAdam training algorithm for machine learning applications of time series data.

Non-machine learning prediction methods
In this section, we want to compare our proposed approach to statistical method. We have, therefore, implemented various measures. Sample Moving Average (SMA) for vibration prediction from readings is calculated as: where n is number of assumed factors, c i is i-th value. Exponential Moving Average (EMA) for vibration prediction from readings is calculated as: where EMA 0 ¼ Y 0 and Y is a value from the i-th value of the set taken into account, a -is 2 nÀ1 where n is the number of values of the set.
Linear Trend Line for vibration prediction from readings is built for the number of detected cases as: where X i is i-th value of the set, X is arithmetic mean of the set, Y i is i-th value of the set, Y is arithmetic mean of the set. Directional coefficient a of the trend line and free expression b are calculated as Results of vibration prediction are presented in Fig. 10. The violet line represents the data from sensor, while light blue exemplifies the LTL. Red and green chart constitutes Exponential Moving Average and Sample Moving Average, respectively. EMA and SMA have the same value of a ''step''; therefore, we display them on the same charts.

Conclusions
In Fig. 11, we see comparison between original sensor readings and results of prediction from our developed deep learning system. When we analyze results, we can see that original readings without prediction and dumping system show higher fluctuations in each interval than our proposed deep learning model. As a result, we see that application of neural network is very efficient in analysis of data samples. In Table 5, we can see comparisons to other models. Our proposed model is working very well. Proposed LSTM-RNN has reached lowest value MSE among compared solutions. For compared MAE, proposed model is second among all compared. Therefore, we can see that proposed model is working very well with time series data. Our model is based on RNN with LSTM neurons, while other is using attention mechanisms or random sampling ideas. Results show that proposed by us simplified processing gives MSE and MAE lowest than other presented models. These results confirm efficiency of our idea. We have also tested other configurations in our experiments. After some simple testing with changing the layers type from ReLU to ELU, we have experienced no gain in Accuracy (in some cases even up to 1% decrease), what's more the training time performance was highly decreased coming about 1/4 times longer than the ReLU. The reason is because of the higher computational complexity of ELU function and adding more negative data which are not only unusable for this model as all values should be above zero, but also creates additional noise. Therefore, we decided to present the proposed architecture as our final model in this task. Our proposed model has three sensitive points. One of them is time as we have to assume how many individual sensor readings we can take, so that both proposed neural network can properly predict and the relevant systems can react. The self-learning method may turn out to be another problem, i.e., when excessively monotonous sections of the route may influence neural network. The last vulnerable point is reality itself, as long as we can very accurately predict standard train runs, we cannot be sure whether they will always be like that, as heavy construction works near the traction can affect the entire system.

Final remarks
In our article, we proposed vibration prediction model by the use of LSTM-RNN deep learning architecture. We collaterally presented optional applications of those predictions to create a system allowing to decrease sensible vibrations to satisfying extent. As the entire architecture was developed based on the data from telegraphic sensors during a drive, the model was devised in regard to that sort of transport. However, being obtained with accurate measures and equipment, we will be able to recreate the systems on the rules of a different transport, including air transport, road transport and water transport. In the future, we are willing to focus on adaptive ability, which will yield in further development in deep learning models for smart transport.
Acknowledgements Authors would like to acknowledge contribution to this research from the Rector of the Silesian University of Technology, Gliwice, Poland, under proquality grant no. 09/010/RGJ21/ 0007.
Author contributions All authors contributed equally to the research by conceptualization, methodology, software, validation, formal analysis, investigation, resources, data curation, writing-original draft, review & editing.

Declarations
Conflict of interest The authors declare that they have no known competing financial interests or personal relations that could have appeared to influence the work reported in this paper.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless Dual-Stage Two-Phase attention-based Recurrent Neural Network 0.0661 N/A indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.