Trajectory prediction based on long short-term memory network and Kalman filter using hurricanes as an example

Trajectory data can objectively reflect the moving law of moving objects. Therefore, trajectory prediction has high application value. Hurricanes often cause incalculable losses of life and property, trajectory prediction can be an effective means to mitigate damage caused by hurricanes. With the popularization and wide application of artificial intelligence technology, from the perspective of machine learning, this paper trains a trajectory prediction model through historical trajectory data based on a long short-term memory (LSTM) network. An improved LSTM (ILSTM) trajectory prediction algorithm that improves the prediction of the simple LSTM is proposed, and the Kalman filter is used to filter the prediction results of the improved LSTM algorithm, which is called LSTM-KF. Through simulation experiments of Atlantic hurricane data from 1851 to 2016, compared to other LSTM and ILSTM algorithms, it is found that the LSTM-KF trajectory prediction algorithm has the lowest prediction error and the best prediction effect.


Introduction
With the continuous development of satellite navigation, wireless communication and other technologies, mobile intelligent devices with positioning functions are currently widely used. When people use these devices, they also actively or passively record a large number of historical trajectories, leading to the formation of spatiotemporal trajectories [1,2]. Spatiotemporal trajectory data can accurately record the activity of moving objects over a long period of time and objectively reflect the law of the activity of moving objects. Mobile communication equipment, animal migration, transportation and meteorological clouds are examples of moving objects in specific application fields. Therefore, mining the temporal and spatial patterns contained in the historical trajectories and predicting the Jun Tang tangjun06@nudt.edu.cn 1 College of Systems Engineering, National University of Defense Technology, Changsha 41000, China future position of a moving object have high application value. Trajectory prediction of a moving object has become a hot topic in many research fields. Hurricanes, for example, which is a strong and deep tropical cyclone produced in the Atlantic Ocean and the eastern part of the North Pacific Ocean and is also known as a typhoon or cyclone, are a potentially valuable application [3]. Hurricane trajectories are one type of spatiotemporal trajectory. As a common natural phenomenon, hurricanes often cause great losses of life and property. On October 12, 2019, the typhoon 'Hagibis' landed on the Izu Peninsula in Shizuoka-ken, Japan. This typhoon killed 88 people, and 7 people went missing. More than 3900 people were affected during this typhoon. In addition, the 'Hagibis' typhoon caused 102.73 billion yen of losses in Japan's agriculture, forestry, fishery and other related industries. It is very important to monitor and record the trajectory of hurricanes and provide support for the analysis and forecast of hurricanes [4].
The moving object studied in this paper is the hurricane. Trajectory prediction, as the most important method to reduce the damage caused by a hurricane, has become a hot issue in the field of trajectory research. The common hurricane prediction involves climate persistence, integrated forecasting and probability forecasting [5,6]. These prediction methods are complex because of involving many factors such as phase transition, vertical advection and the influence of the boundary layer. DeMaria [7] introduced statistical methods to predict the intensity of hurricanes in the Atlantic Ocean and the East Pacific Ocean, delaying the forecast time from three days to five days. In [8], a new spatiotemporal multivariate function model was proposed to improve prediction. This model regarded the dimensions, accuracy and wind speed of hurricanes as time functions to understand the spatial and temporal trends of the path and intensity of a hurricane.
Machine learning is a part of artificial intelligence. In essence, machine learning enables computers to simulate human learning behaviour, automatically acquire knowledge and skills through learning, continuously improve performance and realize artificial intelligence [9]. The common algorithms of machine learning include regression algorithm, algorithm, support vector machine (SVM), clustering algorithm, dimension reduction algorithm and neural network [10]. Neural networks are currently the most influential algorithms in machine learning. With the popularization and application of new meteorological observation equipment, it is possible to obtain a large amount of ground and upper-air data from meteorological observation stations, as well as satellite and radar detection data. These data are in different formats, such as numbers, text and images, and have spatial and temporal characteristics [11]. Machine learning methods can provide new information regarding how to deeply excavate the physical mechanism of weather changes hidden in these massive data, explore signals that indicate the evolution law of weather and establish a new weather forecasting method. In recent years, it has become a hot spot in the fields of meteorology, mathematics or computer to study the effective weather forecasting methods combined with machine learning. The application of machine learning to hurricane prediction has also achieved some fruitful achievements. In [12], the method of fuzzy c-means clustering was used to classify typhoon trajectories and forecast them, which is suitable for moving objects with fuzzy trajectory boundaries. Song [13] combined big data and data mining technology, trained a long short-term memory (LSTM) neural network, and established a typhoon path prediction model based on machine learning. In [14], a sparse recurrent neural network (RNN) combined with a flexible topological structure was proposed to predict the trajectory of Atlantic hurricanes. The dynamic time warping (DTW) distance between the direction of target hurricanes and other hurricanes in the dataset was determined and compared to predict their future trajectories. Baik [15,16] used a backpropagation (BP) network to forecast typhoon intensity and compared it with results from the regression method. The results showed that the error of the BP network's typhoon intensity forecast model is lower than that of the regression method, demonstrating the application prospects of BP networks in typhoon intensity forecasting. The application of machine learning methods to hurricane prediction is still a new research field. These methods have a good ability to deal with nonlinear problems and is especially suitable for solving nonlinear problems with complex internal mechanisms. Therefore, the machine learning methods can be applied in meteorology.
LSTM as a neural network has been widely and used in text sequences or images for a long time. In this paper, an LSTM-KF trajectory prediction algorithm is proposed by combining LSTM and Kalman filter. The simulation results show that the LSTM-KF algorithm has a good effect. The rest of the paper is organized as follows: Section 2 introduces the correlation method of trajectory prediction and the basic mechanism of the neural network. The relevant methods and technologies used in trajectory prediction are described in Section 3. In Section 4, the improved LSTM trajectory prediction algorithm and a trajectory prediction algorithm based on LSTM and Kalman filter (LSTM-KF) are proposed. Finally, Atlantic hurricane data are used in a simulation experiment to verify the rationality of the improved LSTM and the LSTM-KF trajectory prediction algorithm in Section 5.

Trajectory prediction method
At present, the active space of moving objects can be divided into restricted movement [17,18,31] and free movement [19][20][21][22]. Current trajectory prediction methods mainly focus on restricted movement trajectory prediction with certain constraints (such as road networks). Because this kind of trajectory follows specific behaviour habits or motion patterns, useful patterns are easily found and the prediction results are satisfactory regardless of the accuracy or effectiveness. However, in nature, free movement is more frequent than limited movement and is more important to predict, especially for disaster prevention and mitigation. The sampling interval of free movement trajectory data are sparse, and it is difficult to accurately capture the moving direction, turning position and other moving characteristics of moving objects, which makes the prediction of the sparse trajectory in free space difficult in trajectory data research. Hurricane trajectory prediction is a kind of free movement trajectory prediction. According to the prediction cycle, trajectory prediction can be divided into shortterm trajectory prediction [23] and long-term trajectory prediction [24].
Trajectory prediction methods can be divided into traditional prediction methods based on a construction motion model [25][26][27][28], methods based on frequent pattern mining technology [29][30][31] and methods based on a machine learning [32][33][34][35][36]. The prediction methods based on the construction of a motion model mainly consider the current movement of moving objects, such as their speed and direction, and matches the predicted movement with the constructed motion function for prediction. Wolfson [25] et al. proposed a spatiotemporal model of moving objects (MOST) that took location as a dynamic attribute and then predicted the location of a mobile object based on linear constraints. Junghans et al. [26] proposed a model and prediction method of a moving area that used the minimum closed box to model the moving area and used linear regression (LR) and recursive motion (RMF) functions to determine the evolution mode of the area.
Trajectory prediction based on frequent pattern mining technology achieves the goal of trajectory prediction by analysing historical data in a moving object database, processing the historical data and mining the matching pattern. Monreale [29] comprehensively considered the spatiotemporal information of moving objects to build a T-pattern tree and mined the motion law of the moving objects. Long [30] mined frequent paths based on an FP growth algorithm that introduced the speed of moving objects into the prediction and proposed a simple and effective trajectory prediction algorithm (E3TP). Kim et al. [31] combined the characteristics of road networks, calculated the similarity between trajectories to search for candidate trajectories, evaluated the historical trajectories stored in a historical mobile database to determine the subtrajectories with similar trajectories, and finally, predicted a future path by analysing the direction of the candidate trajectories.
Machine learning methods mainly aim at mining the behaviour characteristics of moving objects in a historical trajectory and perform trajectory prediction by improving the Markov model, probability graph model, support vector machine model and neural network model. Based on the hidden Markov model, Qiao [32] proposed an adaptive parameter selection algorithm to improve the adaptability of trajectory prediction in a big data environment. Li [33] introduced the concept of fuzzy trajectory to solve the sharp boundary problem caused by fixed mesh generation and improved the traditional LSTM model to make full use of the proximity and periodic characteristics of the historical trajectory to improve the prediction accuracy of the trajectory position. Leege [34] proposed a path prediction method based on machine learning that combined and sorted the traffic flow of fixed arrival routes of airplanes by their actual flight path and meteorological data and trained the model by using historical data to perform time prediction. Related technologies used for moving object trajectory prediction are listed in Table 1.

Overview of neural network
Neural networks (also known as artificial neural networks or ANNs) are among most powerful machine learning algorithms. The general idea of a neural network is to adopt learning and training. First, the input is calculated by processing the neurons in the hidden layer and the output layer, which are compared with the expected output. Then, a backpropagation learning algorithm is used to repeatedly correct the connection weight coefficient from the input layer to the hidden layer and from the hidden layer to the output layer. When the error between the output and the expected output of the neural network reaches the preset error convergence standard, the training is stopped to obtain a network model with an improved generalization ability [37]. Figure 1 shows the general structure of a neural network that consists of an input layer, hidden layer and output layer. The input layer is responsible for receiving the signal, the hidden layer is responsible for the decomposition and processing of data, and the final result is integrated into the output layer. The circles in each layer in Fig. 1 represent processing units that can be thought of as simulating a neuron. Neurons are connected by nerve wires, which can be assigned different weights. The core task of neural network is to train a set of nearly perfect weights. Several processing neuron make up a layer, and several layers make up a network, which is known as a neural network. When there is more than one hidden layer of the neural network is called a deep neural network.
The BP neural network was the first neural network described. Werbos [38] first proposed the BP neural network in 1974, and Rumelhart et al. [39] developed the theory of the network. The learning process of a BP neural network consists of the forward propagation of a signal and the backpropagation of error. The cell structure of a BP neural network is shown in Fig. 2. In Fig. 2, S j = n i=1 w ij x i + b j and y j = f (S j ). f (·) represents the transfer function, W ij represents weight, b j represents bias. The weight and bias are acquired by the neural network through constant training. In forward propagation, y j is the output of neuron. If the actual output does not match the expected output, then the error backpropagation stage is entered. The purpose of backpropagation process is to adjust weights and bias to reduce the error between the output value and the expected value. The backpropagation process is summarized in references [38,39]. When there is more than one hidden layer of the neural network is called a deep neural network. At present, many advances in image recognition and speech recognition technology are derived from the development of deep neural network. Wolfson [25] Junghans [26] Prevost [27] Jeung [28] Monreale [29] Long [30] Kim [31] Qiao [32] Li [33] Leege [34] 3 Trajectory definition and long short-term memory Trajectory data store large amounts of position information of moving objects at different times. A collection of data in chronological order is called a trajectory T rj.

Definition 1
Trajectory. Moving objects move in geospatial space, and their positions change with time. The ordered sequence of m discrete position points and related auxiliary information is defined as trajectories.
where P m represents the spatial position information of the m-th trajectory point and I m represents the relevant information of the m-th trajectory point. Taking hurricane data as an example, I m can represent the maximum sustained wind speed and status of system.

Definition 2 Subtrajectory.
A subtrajectory refers to the ordered set of trajectory points within a certain trajectory, as shown in Eq. 2.
Where k is the length of subtrajectory.
Definition 3 Grid trajectory. The geographic space of a moving object is divided into different space areas by fixed grids (such as squares, triangles or hexagons), and a single coordinate point of the trajectory is mapped to the space where S m represents the spatial position information of the m-th trajectory point mapped to the corresponding grid index on the grid space. As shown in Fig. 3, the corresponding grid trajectory is T rj = {(P 1 , The long short-term memory (LSTM) network [40] is a kind of time recurrent neural network (RNN) that is specially designed to solve the long-term dependence problem of the general RNN. LSTM is widely used in temporal sequences now. Trajectory data is also a kind of temporal data, so LSTM can be applied to the trajectory [13,33]. The cell structure of LSTM is shown in Fig. 4. The cell generally includes a forget gate, an input gate and an output gate. And it is like a memory machine that constantly forgets some information, remembers some information, and takes into account previous input before output. In the cell structure of LSTM, x t is the input at current moment, h t−1 is the output at last movement, and C t generally referred to as the cell state, which is a cell state of long-term memory.
The forget gate, as the name implies, controls whether information should be forgotten. In LSTM, the forget gate controls how much information about the cell state of long-term memory at last moment C t−1 can be retained until the present moment, as shown in Eq. 4.
The value of f t in each dimension is in the range of (0, 1). The information will be forgotten when f t is close to 0, and the information will be retained when f t is close to 1. Where W f represents weight matrix, and b f represents bias.
• represents the matrix product. σ represents a Sigmoid function, as shown in Eq. 5.
The input gate is responsible for processing the current input and updates new information selectively. The input gate has two steps. In Eq. 6, i t can be used to control how much of the current input can be stored. The value of i t in each dimension is in the range of (0, 1). In Eq. 7, the tanh function generates a new candidate vector C t . The C t represents cell state of short-term memory at current moment.
where W i , W c represent weight matrix and b i , b c represent bias. The tanh function is as shown in Eq. 8. Before output gate, the cell state of long-term memory needs to be updated, as shown in Eq. 9. The cell state of short-term memory at current movement C t and cell state of long-term memory at last movement C t−1 are form new cell state.
where * represents the elementwise product. The output gate is used to selectively output information from the cell. As shown in Eqs. 10 and 11, o t is a probability that controls how much of C t can be used as the current output. The value of o t in each dimension is in the range of (0, 1). h t−1 is the output at time t and it is hidden state.
where W o is weight matrix and b o is bias. W f , W i , W t , W o and b f , b i , b t , b o represents respectively weights and bias of different gate. These two parameters are obtained by continuous training of LSTM network using historical training data. The notations referred to in Section 3 are listed in Table 2.

Encoding of trajectory data
Hurricane trajectory data should be converted into data that can be received by the LSTM network through trajectory data encoding. In this paper, the latitude and longitude coordinates, maximum sustained wind speed and status of system of hurricane are selected. When the hurricane trajectory is meshed, the size of the mesh blocks is 1 × 1 degree. Because of the spherical nature of the earth, the area of each grid block is not uniform across square miles, but since most points surround the earth's equator, the size difference between each grid block is negligible. The grid trajectory, as shown in Fig. 3, is taken as the input of LSTM. The encoding of trajectory data can be divided into three steps: Step 1: One-hot encoding for categorical data. The grid index S m in the grid trajectory can be one-hot encoded. The status of system of hurricane data is also a kind of categorical data, which can also be one-hot encoded. The status of system has nine types, which can also be one-hot encoded, such as tropical cyclone of tropical depression intensity (TD), tropical cyclone of tropical storm Output of the forget gate, input gate and output gate, between 0 and 1 • matrix product * elementwise product C t Cell state of long-term memory C t Cell state of short-term memory h t Hidden state output at time t intensity(TS), tropical cyclone of hurricane intensity(HU), extratropical cyclone(EX), subtropical cyclone of subtropical depression intensity(SD), subtropical cyclone of subtropical storm intensity(SS), a low that is neither a tropical cyclone (LO), tropical wave(WV) and disturbance(DB). One-hot encoding is the most commonly used and basic coding method to convert tags into vectors. One-hot encoding first requires categorical values to be mapped to integer values. Each integer value is then represented as a binary vector with zero values except for the index position, which has a value of 1. One-hot encoding diagram of grid index is shown in Fig. 5a and one-hot encoding diagram of status of system is shown in Fig. 5b.
Step 2: The numerical data are standardized so that the range of numerical data is between [0 1].
To eliminate the dimensional impact between indicators, data standardization is needed to solve the comparability between data indicators. After the original data are standardized, each index is on the same order of magnitude, which is suitable for a comprehensive comparative evaluation. The numerical data in I m are standardized by Eq. 12.
The maximum sustained wind speed of hurricane data is numerical data, so it can be normalized between [0,1].
Step 3: Connect the categorical data and numerical data to form a complete trajectory vector. The transformation from grid trajectory to a trajectory vector is realized by one-hot encoding and standardization. The trajectory data coding process is shown in Fig. 6.

Improved LSTM (ILSTM)
The simple LSTM trajectory prediction algorithm selects the subtrajectory with a sequence length of K. Then, the subtrajectories are meshed and encoded into the LSTM network to predict the index of (K + 1)-th grid. The flow diagram of the simple LSTM trajectory prediction algorithm is shown in Fig. 7. The simple LSTM trajectory prediction algorithm is divided into three modules: data preprocessing, model training and prediction. The data preprocessing modules involves gridding and coding the trajectory data to produce an acceptable trajectory vector for LSTM. The training module is to finds the best LSTM prediction model. The prediction module predicts the next grid index according to the trained prediction model.
The LSTM network can learn historical knowledge, but it may not be suitable for learning new knowledge. For example, the area where the hurricane trajectory appears may not have occurred in history or occurred only a few times, so the LSTM network may produce a large error between the predicted position and the real position due to insufficient learning. At the same time, there are few feature data available for hurricane data, and the Atlantic hurricane data used in this paper only include coordinates, maximum sustained wind speed, status of system, so a lack of features is also an important factor that restricts neural network learning.
On average, hurricanes move at speeds ranging from 15 km/h to 20 km/h and can be as fast as 30 km/h according to hurricane trajectory analysis [41,42]. With data intervals of 6 h, the distance a hurricane can travel is limited. The result of the simple LSTM trajectory prediction algorithm selects the grid corresponding to the highest probability value as the prediction position. The prediction of a single position may be take extreme case as the prediction result, that is, the predicted grid position significantly deviates from the normal position range. If the hurricane position predicted by the simple LSTM trajectory prediction algorithm exceeds a certain limit, then it needs to be corrected. In this case, we improve the simple LSTM trajectory prediction algorithm. We select a reasonable position from the candidate set by predicting Multiple grids are selected as a candidate set, and the order of the candidate set is sorted according to probability predicted by the simple LSTM. That is, the grid corresponding to the first top N probability value is selected as the candidate set C = S 1 , S 2 , · · · , S top N of the prediction results. The first grid whose Euclidean distance (13) from the centre coordinates to the coordinates of seq f lag is less than the threshold τ is selected as predicted grid from candidate set C. Then, the centre coordinates of the grid selected

Gridding and Encoded
Data preprocessing Model training Grid prediction from the candidate set are the predicted coordinates. If there is no grid located less than threshold τ in candidate set C, then the grid centre coordinates of seq f lag are taken as the predicted coordinates in this paper. That is, lon seq f lag central , lat seq f lag central are used as the coordinates of the predicted trajectory. The selection of τ is based on experience. The ILSTM trajectory prediction algorithm modifies the position of predicted extreme cases in a few cases. In most cases, the first grid in the candidate set satisfies the conditions. The improved LSTM trajectory prediction algorithm is shown in Fig. 8.

Trajectory prediction based on LSTM and Kalman filter
The Kalman filter [43], as a data processing technology that removes noise and restores real data, has been applied in fields such as communication, navigation, guidance and control. Kalman filtering estimates the state of a system optimally through the observation of the input and output. Because the observation data include the influence of system noise and external interference, the optimal estimation can also be regarded as a filtering process.

Definition 6
Prediction trajectory sequence Z. For a trajectory T rj with length n, step is used as the extraction interval of the subtrajectory and k is the extraction length of the subtrajectory, and the coordinates of the next point of each subtrajectory are predicted. If step = 1, then n − k subtrajectory sequences will be obtained, so n − k predicted coordinates will also be generated. These sequences generated by the predicted coordinates are called predicted trajectory sequences Z = {(lon 1 , lat 1 ) , (lon 2 , lat 2 ) , · · · , (lon n−k , lat n−k )}.
In this paper, it is assumed that the generated predicted trajectory sequence Z by the ILSTM method is a kind of observation data. Then, the Kalman filter will filter the predicted trajectory sequence to produce a more accurate and optimal estimation. The Kalman filter is used to filter the predicted trajectory sequence. First, the state vector of the hurricane movement is defined. In this paper, the state vector of a hurricane, as shown in Eq. 14, is used.
where V lon , V lat represent velocity components in longitude and latitude, respectively. In hurricane trajectory data, there is no hurricane velocity information at the current moment, so for convenience we set the initial moment V lon , V lat to 0. In this paper, when the Kalman filter is used, the system state vector of the trajectory mark point T rj f lag is input into the system as the initial value, that is, X(0) = [lon T rj f lag , lat T rj f lag , 0, 0] T . Kalman filtering requires a discrete control process system. The system state equation (15) and observation equation (16) of hurricane movement are as follows:

Z (m) = H X (m) + V (m)
where X(m) represents the system state vector and describes the state vector of the hurricane at time m (14). A represents the state transition matrix, which is used to describe the motion state transition mode from the previous time to the current time.W (m) represents the system noise, whose statistical characteristics are similar to white noise or Gaussian noise. Z(m) = [lon m , lat m ] T is the observation vector.In this paper, the prediction trajectory sequence Z is used to represent the observation sequence, which represents m-th predicted value. H is the observation matrix, and V (m) is the observation noise.
The core of the Kalman filter algorithm uses a recursive algorithm to achieve the optimal state estimation model and update the current state variables by using the previous estimated value and the current observed value. The state equation of the system is used to estimate the state of the system at the next time point. If the current time of system is m, then it can be estimated to the current state based on the previous state according to the state equation of the system.
where X(m|m − 1) is the previous state estimation and X(m − 1|m − 1) is the previous state optimal estimation. The system result has been updated, but the covariance corresponding to X(m|m − 1) has not been updated. Equation 18 represents the covariance.
The present state X(m|m − 1) has been estimated. Then, we obtain the measured values of the current state, that is, the trajectory coordinates Z(m) predicted by ILSTM algorithm. Combining the estimated value and the measured value, we can obtain the optimal estimated value X(m|m) at time m, as shown in Eq. 19. (19) where K is the Kalman gain matrix that is shown in Eq. 20. In this paper, K is a 4 × 2 matrix.
Until now, we produced the best estimate value X(m|m) at time m. However, to keep the Kalman filter running until the end of the system process, we need to update the covariance of X(m|m) at time m.
So far, the entire Kalman filtering process of trajectory prediction has been completed. The function of the Kalman filter in this paper is to estimate the optimal trajectory coordinates predicted by ILSTM algorithm. The notations referred to in Section 4 are listed in Table 3.

Prediction error comparison experiment
The real trajectories are T = {T rj 1 , T rj 2 , · · · , T rj n }. The predicted trajectories are T = T rj 1 , T rj 2 , · · · , T rj n .  The geometric space error between the predicted trajectory point and the actual trajectory point is used as the prediction error. As shown in Eq. 22, the root mean square error (RMSE ) unit is degrees.
Here, n represents the number of predicted trajectories and m represents the number of predicted trajectory points contained in each predicted trajectory.
In this experiment, hurricanes in 2016 are used as the verification set. The prediction errors of the simple LSTM algorithm, ILSTM algorithm and LSTM-KF algorithm are compared. The setting of experimental parameters is shown in Table 4. Figures 10 and 11 show the change of RSME value with the neural network training epochs. Figure 10 indicates that Q is [1,0,0,0; 0,1,0,0; 0,0,1,0; 0,0,0,1], and Fig. 11 indicates that Q is [2,0,0,0; 0,2,0,0; 0,0,2,0; 0,0,0,2].  that the RMSE of the simple LSTM trajectory prediction algorithm is initially large. The LSTM network does not learn enough at the initial moment, and the prediction error of simple LSTM algorithm is initially large. However, the RMSE of the ILSTM and LSTM-KF trajectory prediction algorithms are significantly lower than those of the simple LSTM. With an increase in the LSTM network training epochs, the RMSE of the simple LSTM algorithm gradually decreases and becomes stable. However, the prediction error of the simple LSTM is also significantly higher than that of the ILSTM and LSTM-KF trajectory prediction algorithms.
To better display the RMSE of the three methods, the RMSEs of epoch 71 to epoch 80 shown in Figs. 10b and 11b are listed in Table 5 Table 5 that the RMSE of each epoch of LSTM-KF is smaller than that of ILSTM and simple LSTM. In other words, the prediction effect of LSTM-KF should be the best of the three methods. In Fig. 10b, the average prediction error of the simple LSTM from epoch 71 to epoch 80 is 2.1065 • , that of the ILSTM is 1.4690 • , and that of the LSTM-KF is 1.4605 • . In Fig. 11b, the average prediction error of the simple LSTM from epoch 71 to epoch 80 is 2.3865 • , that of the ILSTM is 1.5141 • , and that of the LSTM-KF is 1.5017 • . Therefore, the prediction error of LSTM-KF is the smallest of the three methods. The prediction error of the simple LSTM trajectory prediction algorithm is approximately 2 • , and its effect is the worst of the three methods. However, the prediction error of LSTM-KF is slightly smaller than that of ILSTM, which shows that the trajectory coordinates predicted by LSTM-KF are more precise than those predicted by ILSTM. It can be concluded that LSTM-KF has the best prediction effect of the three methods.

Comparison experiment of single hurricane trajectory prediction
In this experiment, five hurricanes named 'KARL', 'MATTHEW', 'NICOLE', 'BONNIE' and 'ALEX' are selected from hurricanes that occurred in 2016. The simple LSTM, ILSTM and LSTM-KF algorithms are used to predict the trajectories of these hurricanes. The experimental parameters are set as shown in Table 6.    Figure 15 shows the last two predicted points of the elliptical circle in hurricane NICOLE, and Fig. 16 shows the predicted points of the elliptical circle in hurricane BONNIE. These predicted trajectories deviated significantly from the original hurricane orbit. In Fig. 16, the parts marked by the boxes 'Contrast 1' and 'Contrast 2' indicate that the predicted trajectory points of the simple LSTM algorithm show obvious confusion, while the predicted trajectory points of ILSTM and LSTM-KF basically follow the original trajectory. Therefore, there are some defects in the simple LSTM trajectory prediction. The prediction accuracy of the simple LSTM trajectory prediction algorithm is greatly reduced without adequate learning. The simple LSTM trajectory prediction algorithm has a good advantage for learning historical knowledge, but for time-series data with fewer sample data and high mutation, the simple LSTM algorithm easily produces a large range of prediction errors. The ILSTM and LSTM-KF algorithm, which modify the simple LSTM algorithm, show better robustness and can reduce the errors of the simple LSTM algorithm. It can be seen from Table 5 that ILSTM performs a rough prediction of trajectory coordinates, while LSTM-KF performs a more precise prediction of trajectory coordinates.

Parameter sensitivity experiment and time performance analysis
In order to verify the influence of different system noise covariance matrix Q and observation noise covariance matrix R on the prediction results of LSTM-KF algorithm, different experimental parameters Q and R are selected for the experiment. The RMSE difference between ILSTM and LSTM-KF is used to measure the impact of Q and R on the  Table 7. (23) Where RMSE LST M−KF represents the root mean square error of LSTM-KF algorithm. RMSE I LST M represents the root mean square error of LSTM-KF algorithm. epochs represent the training rounds, and epochs are set to 100.
When Q = [1,0,0,0; 0,1,0,0; 0,0,1,0; 0,0,0,1], the result of ΔRMSE changes with R is shown in Fig. 17a. When Q = [2,0,0,0; 0,2,0,0; 0,0,2,0; 0,0,0,2], the result of ΔRMSE changes with R is shown in Fig. 17b. It can be seen from Fig. 17 that the observation noise covariance matrix R will have an impact on the prediction accuracy of LSTM-KF algorithm. Figure 17a and b show that ΔRMSE tend to increase and then decrease rather than increase indefinitely as R increases. In Fig. 17a, the maximum ΔRMSE is 0.0419 when R = R 2 , and the minimum ΔRMSE is -0.0033 when R = R 10 . In Fig. 17b, the maximum ΔRMSE is 0.0404 when R = R 4 , and the minimum ΔRMSE is 0.0077 when R = R 0.1 . Therefore, different R will have different influences on the accuracy of the predicted trajectory, and if the selected R is not appropriate, it may also have a negative effect on the predicted results. For example, ΔRMSE is negative when R = R 10 in Fig. 17a.
Then, the observation noise R is set as [1, 0; 0, 1], [2, 0; 0, 2], [3, 0; 0, 3], [4, 0; 0, 4], and the system noise covariance matrix Q is set as the parameters shown in Table 8. The result of ΔRMSE changes with system noise covariance matrix Q as shown in Fig. 18.   Fig. 18a. When R = [2, 0; 0, 2], the result of ΔRMSE changes with Q are shown in Fig. 18b. When R = [3, 0; 0, 3], the result of ΔRMSE changes with Q are shown in Fig. 18c, and When R = [4, 0; 0, 4], the result of ΔRMSE changes with Q are shown in Fig. 18d. As shown in Fig. 18a, b, c, and d, ΔRMSE also approximate a trend of increasing and then decreasing. Therefore, different system noise covariance matrix Q will have an impact on the prediction accuracy of LSTM-KF algorithm, and inappropriate values may have a negative impact. For example, in Fig. 18b, c, and d, ΔRMSE is negative when Q is compared at the beginning. However, it is worth noting that the better effect of Q and R is around 0.05. As shown in Fig. 17a, the maximum ΔRMSE is 0.0419 when R = R 2 . As shown in Fig. 18a, the maximum ΔRMSE is 0.0535 when Q = Q 0.6 . As shown in Fig. 18b, the maximum ΔRMSE is 0.0556 when Q = Q 0.9 . As shown in Fig. 18d, the maximum ΔRMSE is 0.0512 when Q = Q 5 . Selecting the appropriate value of Q and R will produce better optimization effect on the predicted results, but the effect of prediction optimization may be in a small range. In order to test the time performance of LSTM-KF algorithm, this experiment tested the time performance by counting model training time and predicted response time of different network structures. The hardware platform of this experiment is CPU Intel(R) Core(TM) i7-9850h, 2.60 Ghz, 16G memory, GPU Quadro RTX 3000. We select hurricane data from 1895 to 2015 for training and statistic the model training time, as shown in Fig. 19. Forecast the hurricane trajectory in 2016 and statistic their predicted response time, as shown in Fig. 20. It can be seen from Fig. 19 that the neural network structure is 128*128*256, 256*256 *256, 128*256 *512, 256 *256, and the training time of the model is 870s, 929s, 913s and 817s, respectively. As shown in Fig. 20, the structure of the neural network is 128*128*256, 256*256 *256, 128*256 *512, 256 *256, and the response time of model is 5s, 6s, 6s and 4s respectively. The predicted response time of the whole year of 2016 has remained at a few seconds. Although LSTM-KF algorithm takes more time in model training, the time  The structure network of LSTM

Conclusion
Trajectory prediction has become a research hotspot in many fields. Hurricanes are serious threats to people's lives and cause significant economic losses. Effective prediction of the trajectory of a hurricane has good application value.
In this paper, from the perspective of machine learning, using real Atlantic hurricane data, an LSTM network is applied to hurricane trajectory prediction, and the prediction model is trained using historical hurricane data. The main contributions of this paper are as follows: 1. In the stage of data preprocessing, the trajectories are gridded and encoded, and a trajectory vector is generated by combining numerical data with categorical data to input into the LSTM. 2. Based on the simple LSTM trajectory prediction algorithm, this paper improves the prediction module and proposes an improved LSTM trajectory prediction algorithm. 3. Combined with a Kalman filter, the predicted coordinates of the improved LSTM trajectory prediction algorithm are filtered, and the LSTM-KF trajectory prediction algorithm is proposed. 4. Real Atlantic hurricane data from 1851 to 2016 are used in simulation experiments. The prediction results of the LSTM-KF trajectory prediction algorithm are better than those of the improved LSTM algorithm and the simple LSTM algorithm. The trajectory prediction algorithm proposed in this paper only considers several factors, such as latitude and longitude, maximum sustained wind speed and the system state, but its prediction error is still very large. It is hoped that more meteorological factors can be considered in future studies to establish a more complete prediction model.