Keywords

1 Introduction

Due to the rapid growth of urban population, the pressure of urban traffic load is increasing. City buses are the most important and popular transportation for most urban residents. Accurate prediction of passenger flow in various periods has important significance for allocating buses according to passenger travel rules and improving the utilization of vehicles to meet the needs of passengers. However, the passenger flow has non-linear dynamics, affected by time and external factors, and has complex temporal and spatial characteristics. Therefore, it is crucial to develop a multi-variable prediction model that integrates multiple influencing factors to predict the passenger flow.

There are two ways to develop the passenger flow prediction model. On the one hand, the passenger flow forecasting is regarded as a regression problem, and the data of time and other external factors are used to construct the feature space. Use Linear Regression, Support Vector Regression (SVR) and other machine learning algorithms to establish a prediction model. In addition, bus passenger flow data has time series characteristics and is typical time series data. Therefore, bus passenger flow forecasting can be regarded as a time series forecasting problem. Time series forecasting needs to examine the data mining time series information of passenger flow in a time segment, and establish a time series prediction model based on the overall time series characteristics of the data. This method takes into account the time series characteristics of the data and is widely used in the prediction of passenger flow and traffic flow. In recent years, the application of deep learning in various fields has made breakthrough progress. Therefore, researchers at home and abroad have also begun to pay attention to the application of deep learning in time series prediction tasks. Convolutional neural networks (CNN) can extract local features of time series data and Recurrent Neural Network (RNN) and improved long short-term memory (LSTM) and bi-directional long short-term memory (BILSTM) can capture the time series characteristics of data. In addition, the attention mechanism (Attention) is applied in the recurrent neural network. It can improve the processing performance of RNN for ultra-long sequences. On the basis of these research results, this paper proposes a neural network model ACLB that combines attention mechanism, CNN, LSTM, and BISLTM based on the characteristics of multivariate bus passenger flow sequence data.

2 Related Work

Traditional time series forecasting models are Smoothing Methods and autoregressive methods, including ARIMA and SARIMA. etc. Li Jie, Peng Qiyuan [1] have used the SARIMA model to predict the flow of people on the Guangzhou-Zhuhai Intercity Railway and achieved good results. Many researchers have begun to apply Deep Learning to solve time series related problems [2,3,4,5]. Yun Liu et al. combined CNN and LSTM to propose the DeepConvLSTM [7] model to be applied to the field of human activity recognition (HAR). This model can automatically extract human behavior characteristics and time feature. Fazle Karim [8] used Fully Convolutional Network (FCN) to replace the pooling layer and fully connected layer of CNN in the task of time series classification, and then combined with LSTM to establish the LSTM-FCN model and ALSTM-FCN. Xie Guicai [4] et al. proposed a multi-scale fusion timing mode convolutional network based on CNN. The model designed short-term mode components and long-term mode components to extract the short-period and long-period spatiotemporal features of the time series, and then obtained Feature fusion recalibration of the final output prediction value comparison, but the model does not consider the influence of external factors other than the flow of people.

3 Model:ACLB

Bus passenger flow prediction should consider the complex non-linear relationship between urban bus passenger flow and time and space factors. The passenger flow of a certain time period is not only affected by the adjacent time period, but also related to various current external factors. For example, the passenger flow of weekdays has obvious morning peak and evening peak, and the peak passenger flow of holidays will be postponed later. Temporary rainfall may lead to a sharp drop in the number of people taking public transportation. And each feature of the data is of different importance to the final prediction result. Therefore, the prediction model should not only consider the temporal and spatial characteristics of the time series data, but also consider reducing the interference of the less correlated data on the prediction result. In order to overcome these problems, this paper proposes a new neural network model ACLB. The structure of the ACLB model is shown in Fig. 1:

Fig. 1.
figure 1

The structure of the ACLB mode

The ACLB model consists of a CNN-LSTM layer, a BILSTM layer, an attention layer, a fully connected layer, and an output layer. The ACLB model incorporates an attention mechanism on the basis of CNN-LSTM, so that the model can extract the spatiotemporal features of the data and focus the model's attention on key features, and the BILSTM layer is added to extract the bidirectional time dependence of time series data.

3.1 CNN-LSTM Layer

The CNN is used as a feature extractor, and then the sequence output from the CNN is input to the LSTM for training. This CNN-LSTM structure model is mainly used for image caption generation [4], but in research, it is found that CNN-LSTM can also be applied to Time series forecasting [2, 9,10,11], such as electricity forecasting [12, 13], stock closing price forecasting and other fields. The CNN-LSTM layer in the ACLB model uses the combined structure of CNN and LSTM to extract the local features and timing features of the data. CNN-LSTM Layer is shown in Fig. 2:

Fig. 2.
figure 2

The structure of the CNN-LSTM layer

Convolutional Neural Networks.

In the task of machine learning, feature extraction is a very critical step. For time series prediction, extracting data features can also significantly improve the performance of the model. CNN consists of a convolutional layer, a pooling layer, a fully connected layer and an output layer. It is generally used for feature extraction in image processing, text processing and other fields. At the same time, CNN also has a good effect on time series data. The core part of the CNN convolutional layer is an automatic feature extractor and reduces the overall computational consumption of the model.

Long Short-term Memory.

CNN can effectively extract local features of time series data, but CNN cannot capture the time dependence of time series. Therefore, after CNN extracts spatiotemporal features, the LSTM [14, 15] is used to extract the time dependence of time series. LSTM is an improvement of RNN. It adds forget gate, update gate, output gate, memory cell C on the basis of RNN, alleviating the problem of RNN gradient explosion so that the LSTM can capture long-term dependencies. The structure of an LSTM node is shown in Fig. 3:

Fig. 3.
figure 3

The structure of an LSTM node

$$ \hat{C}^t = tanh(w_c [a^{t - 1} {,}X^t ]) + b_c $$
(1)
$$ {\Gamma }_u = \sigma (w_u [a^{t - 1} {,}X^t ]) + b_u $$
(2)
$$ {\Gamma }_f = \sigma (w_f [a^{t - 1} {,}X^t ]) + b_f $$
(3)
$$ \Gamma_o = \sigma (w_o [a^{t - 1} {,}X^t ]) + b_o $$
(4)
$$ C^t = \Gamma_u * \hat{C}^t + \Gamma_f * C^{t - 1} $$
(5)
$$ a^t = \Gamma_o * tanh(c^t ) $$
(6)

\({\hat{\user2{C}}}^{({{\varvec{t}}})}\) is the memory cell value to be refreshed, \({{\varvec{a}}}^{{\varvec{t}}}\) is the activation value of the previous LSTM node, \({{\varvec{X}}}^{{\varvec{t}}}\) is the input value of the current node, \({{\varvec{C}}}^{{\varvec{t}}}\) is the memory cell value, \({{\varvec{\varGamma}}}_{{\varvec{u}}}\) is the update gate, \({{\varvec{\varGamma}}}_{{\varvec{f}}}\) is the forget gate, \({{\varvec{\varGamma}}}_{{\varvec{o}}}\) is the output gate, partial is the range of the activation function from 0 to 1, \({{\varvec{a}}}^{{{\varvec{t}}} - 1}\) is the hidden state of tht node, \({{\varvec{b}}}_{{\varvec{c}}} {,}{{\varvec{b}}}_{{\varvec{u}}} {,}{{\varvec{b}}}_{{\varvec{f}}} {,}{{\varvec{b}}}_{{\varvec{o}}}\) are all offset values. Memory cell C is the key structure inSTM. It transmits information on the entire LSTM, so that key sequence information is retained or discarded, and the problems of gradient explosion and gradient disappearance are alleviated. From Fig. 3 and formula (1)–(6), it can be found that when the memory cell value is passed from the previous node to the current node, its value is controlled by the current node’s forgetting gate, the update gate and the input value X of the current node.

3.2 Attention Layer

The attention [14, 16, 17] mechanism is inspired by the cognitive mechanism of the human brain. The human brain can grasp the key information from the complex information and ignore the meaningless information. The attention mechanism assigns weights to the input data to make the model focus on the important features of the data. The structure of the attention mechanism is shown in Fig. 4:

Fig. 4.
figure 4

The structure of attention layer

$$ {\upalpha }^{{\text{t}},{\text{i}}} = \frac{{{\text{exp(e}}^{{\text{t}},{\text{i}}} {)}}}{{\sum_{{\text{i}} = 0}^{{\text{t}} + {\text{n}}} {\text{exp}}({\text{e}}^{{\text{t}},{\text{i}}} )}} $$
(7)
$$ {\text{e}}^{{\text{t}},{\text{i}}} = {\text{S}}^{{\text{t}} - 1} \cdot {\text{a}}^{\text{i}} $$
(8)

\( [a^0 {,}a^1 {,} \ldots a^n ]\) is the hidden state from the CNN-LSTM layer. \(\alpha^{t,i}\) represents the ratio of the model's attention to \(a^i\) in the input sequence when the attention layer outputs the value \(S^t\). The attention mechanism makes the model always focus on the most critical information.

3.3 BILSTM Layer

BILSTM [18, 19] consists of two LSTMs with opposite information propagation directions. This structure enables BILSTM to capture the forward and backward information of the sequence.

\([S^1 {,}S^2 {,} \ldots {,}S^t {,} \ldots {,}S^{n - 1} {,}S^n ]\) is from Attention Layer, It is input into BILSTM to get \([H^1 {,}H^2 {,}...{,}H^t {,} \ldots {,}H^{n - 1} {,}H^n ]\). The formula is as follows

$$ \vec{H}^t = \overrightarrow {LSTM} (\vec{C}{,}S^t {,}\vec{h}^t ) $$
(9)
$$ \overleftarrow {H }^t = \overleftarrow {LSTM} (\mathop{C}\limits^{\leftarrow} {,}S^t {,}\overleftarrow {h }^t ) $$
(10)
$$ H^t = w_1 {\overrightarrow {H}}^t \cdot w_2 {\overleftarrow {H}}^t $$
(11)

In the formula (9), (10) and (11), C is the memory cell value, S is the current input, h is the hidden state of the previous node. The (\(\leftarrow {,} \to\)) in the formula represents the direction of information flow. \(\vec{H}^t {,}\overleftarrow {H }^t\) is the output of the LSTM in the opposite direction. \(H^t\) is the output of BILSTM.

4 Experiment

4.1 Construct Training Set

The data set is historical bus card data and weather information data from aity in Guangdong from August 1, 2014 to December 31, 2014. Count the number of passengers in different time periods at one-hour intervals, remove useless fields, and insert weather information corresponding to each time period. \(x_i = \left[ {{\text{passenger flow}}, {\text{temperature}}, rainfall, \ldots .} \right]\) represents passenger flow and external factor data in the i period of the day, \(X_i = (x_{i - k} {,}x_{i - k + 1} {,} \ldots x_i )\) represents a time series from i − k to i. The passenger flow forecast problem is defined as (12)

$$ {{\varvec{Y}}}_{{{\varvec{i}}} + {{\varvec{h}}}} = {{\varvec{f}}}({{\varvec{X}}}_{{\varvec{i}}} ) $$
(12)

\({{\varvec{Y}}}_{{{\varvec{i}}} + {{\varvec{h}}}}\) is the passenger flow predicted by model at i + h. In the following experiment, h is set to 1, which is to predict the passenger flow 1 h away from the current moment. we uses the original data to construct a training set \(Z = (X_1 {,}X_2 {,}X_{3 } {,} \ldots {,}X_n )\). Among them, the data from August 1 to November 30, 2014 is the training set, December 1 to December 15 is the test set, and December 16 to December 31 is the verification set.

4.2 Model Details

The CNN-LSTM layer in the ACBL model has 2 CNN, 2 pool, and 2 LSTM layers, and the convolution kernels are all set to \(3 \times 1\). The LSTM has 100 hidden neurons, dropout = 0.5 and the BILSTM layer has 100 hidden neurons. During the training, the learning rate is 0.001 and the bachsize is 10. In order to reflect that the improvement of the ACLB model is effective, the performance of the ACLB model is compared with CNN, LSTM, BILSTM, CNN-LSTM and FCN-ALSTM.

4.3 Result

The evaluation indicators adopt RSME and MAPE. In order to avoid the influence of different dimensions on the model, the passenger flow data have been normalized. From the data in Table 1, compared with the single models CNN, LSTM, and BILSTM, the RMSE of CNN-LSTM is reduced by 0.188, 0.159, 0.003, respectively, and the MAPE is reduced by 12.6%, 11.6%, and 2.6%, respectively. Compared with the CNN-LSTM and FCN-ALSTM models, the ACLB model has reduced RMSE by 0.024 and 0.022, and MAPE reduced by 1.3% and 1.5%, respectively.

Table 1. Model performance evaluation (passenger flow prediction result when h = 1)

Therefore, the ACLB model effectively reduces reduce the error of passenger flow forecast and improves the accuracy.

Figure 5(a)–(e) is the RMSE comparison chart of ACLB and all models for each period from December 29 to 31.

Fig. 5.
figure 5

ACLB and LSTM, BILSTM, CNN, CNN-LSTM, FCN-ALSTM RMSE comparison chart. Passenger flow data has been normalized, so RMSE has no unit

5 Conclusion

In this article, we propose a new model ACLB for passenger flow prediction. In order to evaluate the performance of the ACLB model, in the experiment we used the ACLB model and other models to predict the passenger flow in the next hour. The experimental results show that the ACLB model works well. However, the data set in this article is only a small sample of data. In the next step, we will verify the performance of the ACLB model on a larger range of data sets.