Introduction

Traffic prediction is the crucial part of urban transportation system. Over the years, researchers have proposed corresponding prediction methods for different types of traffic flows, such as link-level [10], region-level [25], and network-level [9] traffic prediction. Most of the existing link-level research has focused on accurately predicting traffic flow on specific link, which is quite different from this study. Traffic propagation flow (tpf), which represents the number of vehicles that transfer from upstream road link downstream road link, is the object to be studied in this paper. The tpf reflects the dynamic changes of traffic flow in the road network, with which the urban traffic management department can carry out some traffic management and control strategies. For instance, if the early forecast of traffic propagating volume far exceeds the capacity of downstream link, reducing the green time of the corresponding phase is a good measure to avoid congestion. Specially, traffic propagation volume is the key parameter for designing intersection signal timing scheme.

It is challenging to accurately predict the traffic propagation flow due to its complex temporal and spatial features. Like other traffic flow, the tpf shows repeatability and periodicity in the time dimension, so the most intuitive is to use the time-series model to predict it, like Autoregressive Integrated Moving Average (ARIMA) [17], Kalman filtering model [32], or deep learning models for prediction, like Recurrent Neural Network (RNN) model [5], Long Short-Term Memory (LSTM) model [23], Gated Recurrent Unit (GRU) model [18], etc. This kind of methods has been proved effective in specific scenarios, like highway with no side roads.

However, for the intricate urban road network, the classical time-series model is not efficient to accurately predict the tpf. This is because the traffic propagating volume is also affected by spatial dependencies determined by the topology of the network, such as the sudden change of traffic flow in upstream road link, the changes in capacity of downstream road link, etc. Take the situation in Fig. 1 as an example, there are totally four Traffic Propagation Transactions (\(TPT\)s) that is made up of the upstream link, the downstream link, and the traffic propagation volumes between them. We assume that vehicles travel from link a to link d through link c, and travel from link b to link e through link c. The downstream link of \(TPT1\), which is link c, is the upstream link of \(TPT3\), so they are adjacent traffic propagation transaction. An increase of traffic propagation volumes of \(TPT1\) has a high probability of causing an increase of traffic propagation volumes of \(TPT3\). Therefore, the spatial dependencies should also been considered to forecast the traffic propagation flow [26].

Fig. 1
figure 1

The relationships among traffic propagation transactions

The Convolutional Neural Network (CNN) models can be used to capture the spatial features of traffic flow in intricate road network. Due to the invention of the Graph Convolutional Network (GCN), spatial features can now be retrieved from non-Euclidean distance data, such as road network, social network, and other graph structure-based data [1, 31]. However, most of the GCN models only consider the influence between adjacent nodes. For instance, as shown in Fig. 1, only the effects of \(TPT1\) on \(TPT3\) and \(TPT4\), and the effects of \(TPT2\) on \(TPT3\) and \(TPT4\), are considered for traffic propagation prediction, which is not sufficient. However, the tpf is affected by various spatial dependencies. At least two additional spatial dependency features should be considered: (1) the relationship between traffic propagation transactions involving the same upstream link. As shown in Fig. 1, \(TPT3\) and \(TPT4\) have the same upstream link, i.e., Link c. Since the capacity of Link c is limit, the traffic propagation volumes from Link c to Link d and Link e are also limit. Therefore, the traffic propagation volumes of \(TPT3\) and \(TPT4\) are influenced and constrained by one another, and (2) the relationship between traffic propagation transactions involving the same downstream link. As shown in Fig. 1, \(TPT1\) and \(TPT2\) have the same downstream link, i.e., Link c. Because the capacity of Link c is limit, the traffic propagation volumes from Link a and Link b to Link c are also limit. Therefore, the traffic propagation volumes of \(TPT1\) and \(TPT2\) are influenced and constrained by one another. However, most existing GCN-based models are only able to capture the relationships between adjacent nodes. Without considering the specific two spatial dependency features, GCN-based traffic propagation prediction model cannot achieve good prediction results.

Therefore, we propose a novel GCN-based traffic propagation prediction model named Traffic Propagation Prediction Graph Convolutional Network (TPP-GCN) to predict the traffic propagation volumes in urban road network.

The main contributions of our paper are summarized as follows:

  • We propose a novel GCN and GRU-based model to predict traffic propagation flow, which is a key parameter for traffic signal control, traffic guidance, and other applications of intelligent transportation. On the one hand, tpf's spatial characteristics are considered, including any affects from nearby tpf. On the other hand, the temporal characteristics of tpf, such as its periodicity and repetition in time, are captured.

  • We propose multi-graph convolutional network in the GCN part of the TPP-GCN model. As a result, the model may take into account the numerous factors affect the traffic propagation flow: the influences of adjacent traffic propagation transactions, the relationships between \(TPT\)s with same upstream link and downstream link.

  • We conduct extensive experiment on real-world dataset to evaluate the TPP-GCN model. Experiment results show that the TPP-GCN model outperforms benchmark models of HA, ARIMA, GCN, GRU, and T-GCN on practically all prediction scales.

  • We explain the TPP-GCN model’s applicable conditions in the discussion part to better implement the model in practice.

The rest of the paper is organized as follows. Sect. "Related work" covers the classical methods for urban traffic prediction. In Sect. "Definitions and problem formulation", we introduce the definitions and problem formulation. The TPP-GCN model is introduced in Sect. "TPP-GCN model", including overall framework of TPP-GCN, multi-graph convolutional network, and Gated Recurrent Unit. Sect. "Experiment" carries out experiment based on real-world data and shows the performances of the TPP-GCN model and other benchmark methods. Finally, we introduce the conclusion in Sect. "Conclusion".

Related work

Traffic flow prediction is a research hot topic of the urban transportation management system. Accurately predicting traffic flow can support various intelligent decision-making applications, such as intersection signal control, congestion mitigation, etc. Researchers have done a lot of work in this topic. In conclusion, traffic prediction methods can be divided into two types: model-driven methods and data-driven ones. The model-driven methods, such as cell transmission model [6], queuing theory [7], car-following model [13], three-phase traffic theory [16], etc. This type of methods is generally relied on various assumptions and ideal conditions, which make it difficult to achieve good prediction performance in real scenarios. The data-driven traffic prediction methods, in contrast, consider both feasibility and accuracy in real-world applications. As a result, this kind of model has drawn increasing attention in recent years.

The data-driven models use actual historical traffic data to predict traffic flow, including two specific types: statistical models and machine learning-based models. The statistical method uses historical data to extract trends and periodic characteristics of traffic flow that can be used to predict future traffic flow. The classical ARIMA model, Kalman filter model, and Bayesian model have been used in traffic prediction studies since the 1970s [3, 14, 27]. In the years that followed, variants of these models kept developing: Shahriari et al. proposed an novel E-ARIMA to improve the traffic prediction performance [17]. Trinh et al. developed an incremental Unscented Kalman Filter (UKF) to predict traffic flow and speed with incomplete traffic data [20]. Gu et al. introduced an improved Bayesian model named IBCM-DL for urban traffic prediction [4]. These methods are typically applied to small-scale traffic datasets and simple road scenes. It is difficult to apply them to complex road networks and large-scale traffic datasets. The machine learning methods, such as convolution neural network (CNN) [28], recurrent neural network (RNN) [2], graph convolutional network (GCN) [29], etc., effectively solve these problems.

Many machine learning-based studies were carried out to capture temporal dependency features of urban traffic flow. For example: Shu et al. introduced a GRU-based model called Bi-GRU prediction model to predict urban short-term traffic flow [18]. Wang et al. proposed a deep learning model based on the long short-term memory (LSTM) model to predict the long-term sequence traffic flow and verified that the proposed model has better accuracy and stability [23]. These methods have been proved to have good performance in single-link scenarios. However, the prediction performance become worse when these methods are applied to complex road network scenarios. It is so that traffic flow on a given road link can be influenced not only by its own capacity but also by those of other road links.

Therefore, some studies tried to capture the spatial dependency features of traffic flow, such as the geographic information of the given link, traffic volumes of adjacent road links, etc. For instance, Zhang et al. introduced a Convolution Neural Network (CNN)-based deep learning method to predict short-term traffic flow [28]. Sun et al. proposed a CNN-based model called CNN-BDSTN to capture the spatial features of traffic speed for traffic speed prediction in urban road network [19]. The CNN-based model usually divides the urban area into grids and converted traffic flow into Euclidean distance data. However, the road traffic flow is turned to be grid traffic flow, which is usually used for visualization and cannot support intelligent decision-making applications.

In recent years, numerous research have demonstrated that the spatial dependency features of traffic flow in urban complex road network can be captured using Graph Convolutional Network (GCN) [8, 21]. A popular topic of research in traffic prediction is the use of the GCN model to capture spatial dependency features and time-series models to capture temporal dependency features. Zhao et al. proposed a GCN and GRU-based traffic prediction model called temporal graph convolutional network (T-GCN), in which not only the spatial features of traffic flow in complex road network can be captured, but also the temporal features can be obtained [29]. Li et al. introduced a GCN and LSTM-based deep learning model called graph and attention-based long short-term memory network (GLA), with which the spatial–temporal features of traffic flow cab be captured [11]. Li et al. proposed the Dynamic Graph Convolutional Recurrent Network (DGCRN) model for urban traffic flow prediction, in which the dynamic spatial features ca be obtained [9]. Although most GCN-based studies considered the influences of adjacent links on given link, many significant spatial factors that influence traffic flow were disregarded. Traffic prediction research still face challenges: (1) To the best of our knowledge, no research has been done on predicting traffic propagation, which is an important content in traffic prediction. Existing methods cannot be directly applied to traffic propagation prediction. (2) Most of the existing GCN-based methods do not fully take the influence of complex spatial dependency features of traffic propagation flow in urban road network into account, as we mentioned in Sect. "Introduction".

From the discussion above, we propose a novel multi-graph learning-based approach to predict urban traffic propagation flow, called traffic propagation prediction graph convolutional network (TPP-GCN), to capture the complex spatial and temporal features of traffic propagation flow in urban complex road network.

Definitions and problem formulation

Table 1 lists some notations used in this paper.

Table 1 Notations

Most existed GCN-based traffic prediction model conducted graph \(G\left\langle {V,E} \right\rangle\) with the information of road link and connectivity between links, i.e., each road link represent a vertex \(v\) in \(V = \left\{ {v_{1} ,v_{2} ,...,v_{n} } \right\}\) and each connection between two vertexes represent an edge \(e\) in \(E = \left\{ {e_{1} ,e_{2} ,...e_{m} } \right\}\). In contrast, here we define the graph as follows:

Definition 1

Traffic propagation transaction, \(TPT\), is a three-dimensional vector \(\left\langle {link_{u} ,link_{d} ,Q_{u \to d} } \right\rangle\), where \(Link_{u}\) and \(Link_{d}\) are the upstream link and downstream link, and \(Q_{u \to d}\) is the traffic propagation volume from \(Link_{u}\) to \(Link_{d}\) in specific time interval \(t\) which can be 5 min, 10 min, etc. As shown in Fig. 2, \(TPT1\) and \(TPT2\) can be obtained from \(Link \, a\), \(Link \, b\), and \(Link \, c\).

Fig. 2
figure 2

Illustration of traffic propagation transactions

Definition 2

Traffic propagation transaction graph \(G^{\prime}\left( {V^{\prime},E^{\prime}} \right)\), the vertexes of graph \(G^{\prime}\) are represented by all the traffic propagation transactions in road network. Specifically, the edges of graph \(G^{\prime}\) depend on the relationships among all traffic propagation transactions, and the most intuitive one is the connectivity. For instance, in Fig. 2, traffic propagation transaction \(TPT2\) and \(TPT1\) are adjacent and along with the direction of traffic flow, so there is an edge from \(TPT2\) to \(TPT1\). Using more relationships to conduct different graph topologies is a good way to capture richer spatial features of traffic propagation flow, which can promote prediction performance. In this paper, several different topologies are defined, see Sect. "Multi-graph convolution part" for details, and these topologies correspond to a variety of graphs, i.e., \(G^{\prime}1\), \(G^{\prime}2\), \(G^{\prime}3\), etc.

Given an urban road network, we can obtain its traffic propagation transaction graph \(G^{\prime} = \left( {V^{\prime},E^{\prime}} \right)\). The traffic propagation volume matrix \(Q^{N \times T}\) can be obtained on graph \(G^{\prime}\), where \(N\) represents the number of transactions and \(T\) represents the number of time intervals. The goal of this paper is to establish the function between historical volume matrix and future volume matrix, which can be defined as follows:

$$ \left[ {Q^{t} ,Q^{t - 1} ,...,Q^{{t - T^{\prime}}} } \right]\mathop{\longrightarrow}\limits^{gc\left( \cdot \right) + gru\left( \cdot \right)}\left[ {Q^{t} ,Q^{t + 1} ,...Q^{t + T} } \right], $$
(1)

where \(gc\left( \cdot \right)\) represent the graph convolution process and \(gru\left( \cdot \right)\) represent the process of calculating temporal features.

TPP-GCN model

Overall framework

The overall framework of the TPP-GCN model is shown in Fig. 3.

Fig. 3
figure 3

The overall framework of proposed TPP-GCN

The left part is the multi-graph convolutional network component. According to the topology of the urban road network and the historical traffic data, all the traffic propagation transactions can be obtained. With each \(TPT\) as a vertex and the relationships between \(TPT\)s as edges, various types of traffic propagating graph structures can be established. Traffic propagation volume matrix \(Q^{N \times T}\) is input into the network for sequential graph convolution operations on different graphs. The new feature matrix is output, which contains rich spatial feature information. The output is a new feature matrix with extensive spatial feature information.

The right part is the Gated Recurrent Unit component. The new feature matrix output from first part is vectorized and fed into the GRU model. After proper training, all parameters in GRU can be determined and the model will be fit. With the fitted model, the future traffic propagating volumes can be forecasted.

Multi-graph convolution part

The classical CNN model is generally used to extract spatial features of Euclidean distance data, such as images, videos, etc. However, it is difficult to apply it to non-Euclidean distance data like social network and road network [24]. To tackle this problem, Graph Convolutional Networks can be utilized, which complete the convolutional operations in the Fourier domain. Convolutional filters are employed in the GCN model to capture the spatial characteristics. Therefore, the GCN model has been utilized extensively in studies, such as product attributes prediction [30], urban traffic revitalization index prediction [22], sentiment classification [33], etc.

Here, we present three convolution layers based on three kinds of graph in the GCN part to capture the spatial features in traffic propagation flow:


Graph I: The edge of the graph is determined by connectivity of traffic propagation transaction and direction of traffic flow. If the downstream link of traffic propagation transaction \(TPTi\) is the upstream link of \(TPTj\), there is an edge from \(TPTi\) to \(TPTj\). In contrast, no edge exists from \(TPTj\) to \(TPTi\), since the traffic flow is not along with this direction.

It is obvious that the upstream \(TPTi\) positively affects the downstream \(TPTj\). The larger volume of upstream \(TPTi\) leads to larger volume of upstream \(TPTj\). Thus, the edges of Graph I are generated from the relationships of upstream and downstream traffic propagation transactions. The Graph I’s structure of the road network in Fig. 1 is shown in Fig. 4a. Besides, the edges’ weights in Graph I are all set to 1.

Fig. 4
figure 4

Illustrate of a Graph I, b Graph II, and c Graph III of road network in Fig. 1

Thus, the first convolution layer is based on Graph I, and the output of the first layer is determined by the input, which can be defined as follows:

$$ H^{\left( 1 \right)} = \sigma \left( {\widetilde{D}_{1}^{{ - \frac{1}{2}}} \widehat{A}_{1} \widetilde{D}_{1}^{{ - \frac{1}{2}}} Q^{(I)} \theta^{\left( I \right)} } \right), $$
(2)

where \(H^{\left( 1 \right)}\) is the output of the first layer, \(\sigma \left( {} \right)\) is the activation function, \(\widetilde{D}_{1}\) is the degree matrix of this graph, \(Q^{\left( I \right)}\) is the input traffic propagation volume matrix, and \(\theta^{\left( I \right)}\) represents all parameters of input layer. \(\widehat{A}_{1}\) is the sum of the identity matrix and matrix \(A_{1}\), which reflects the connectivity of traffic propagation transaction and direction of traffic flow. Each element \(a_{1}^{ij}\) in matrix \(A_{1}\) is calculated as follows:

$$ a_{1}^{ij} = \left\{ {\begin{array}{*{20}c} 1 & {TPTi^{\prime}s{\text{ downstream link is the upstream link of }}TPTj} \\ 0 & {{\text{otherwise}}} \\ \end{array} } \right.. $$
(3)

Graph II: The edge of the graph is determined by the influences between traffic propagation transactions with same upstream link. If the upstream link of traffic propagation transaction \(TPTi\) is also the upstream link of \(TPTj\), there two edges between them. Thus, the Graph II’s structure of the road network in Fig. 1 is shown in Fig. 4b.

In this case, it is hard to say whether the effect is positive or negative, so we use the correlation coefficient of \(TPTi\) and \(TPTj\) to determine the edge’s weight. If the historical volumes of \(TPTi\) and \(TPTj\) are uncorrelated, the weighs of the two edges between them will set to a smaller value. On the other hand, if the historical volumes of \(TPTi\) and \(TPTj\) have positive or negative correlations, the weights will set to a larger value. Here, we use correlation coefficients to quantify the weight.

The second convolution layer is based on Graph II, and the output of the second layer is determined by output of first layer, which can be defined as follows:

$$ H^{\left( 2 \right)} = \sigma \left( {\widetilde{D}_{2}^{{ - \frac{1}{2}}} \widehat{A}_{2} \widetilde{D}_{2}^{{ - \frac{1}{2}}} H^{\left( 1 \right)} \theta^{\left( 1 \right)} } \right), $$
(4)

where \(H^{\left( 2 \right)}\) is the output of the first layer, \(\widetilde{D}_{2}\) is the degree matrix of this graph, \(\theta^{\left( 1 \right)}\) represents all parameters of the first layer, and \(\widehat{A}_{2}\) is the sum of the identity matrix and matrix \(A_{2}\), which reflects the influences of traffic propagation flow with same upstream link. Each element \(a_{2}^{ij}\) in matrix \(A_{2}\) is calculated as follows:

$$ a_{2}^{ij} = \left\{ {\begin{array}{*{20}c} {\frac{{\left| {cor\left( {\overline{Q}_{i}^{1\sim 288} ,\overline{Q}_{j}^{1\sim 288} } \right)} \right| + 1}}{2}} & {TPTi{\text{ and }}TPTj{\text{ have}}} \\ &{\text{ same upstream link}} \\ 0 & {{\text{otherwise}}} \\ \end{array} } \right., $$
(5)

where \(\left| {cor\left( {\overline{Q}_{i}^{1\sim 288} ,\overline{Q}_{j}^{1\sim 288} } \right)} \right|\) is the correlation coefficient of \(TPTi\) and \(TPTj\), and \(\overline{Q}_{i}^{1\sim 288}\) and \(\overline{Q}_{j}^{1\sim 288}\) are the historical average traffic propagating volumes of \(TPTi\) and \(TPTj\). The value of \(a_{2}^{ij}\) ranges from 0.5 to 1.


Graph III: The edge of the graph is determined by the influences between traffic propagation transactions with same downstream link. If the downstream link of traffic propagation transaction \(TPTi\) is also the downstream link of \(TPTj\); there are two edges between them. Thus, the Graph III’s structure of the road network in Fig. 1 is shown in Fig. 4c.

Like Graph II, we also use the correlation coefficient of \(TPTi\) and \(TPTj\) to determine the edge’s weight. The correlation coefficients are used to quantify the weight in Graph III.

The third convolution layer is based on Graph III, and the output of the third layer is determined by output of the second layer, which can be defined as follows:

$$ H^{\left( 3 \right)} = \sigma \left( {\widetilde{D}_{3}^{{ - \frac{1}{2}}} \widehat{A}_{3} \widetilde{D}_{3}^{{ - \frac{1}{2}}} H^{\left( 2 \right)} \theta^{\left( 2 \right)} } \right), $$
(6)

where \(H^{\left( 3 \right)}\) is the output of the third layer, \(\widetilde{D}_{3}\) is the degree matrix of this graph, \(\theta^{\left( 2 \right)}\) represents all parameters of the second layer, and \(\widehat{A}_{3}\) is the sum of the identity matrix and matrix \(A_{3}\), which reflects the influences of traffic propagation flow with same downstream link. Each element \(a_{3}^{ij}\) in matrix \(A_{3}\) is calculated as follows:

$$ a_{3}^{ij} = \left\{ {\begin{array}{*{20}c} {\frac{{\left| {cor\left( {\overline{Q}_{i}^{1\sim 288} ,\overline{Q}_{j}^{1\sim 288} } \right)} \right| + 1}}{2}} & {TPTi{\text{ and }}TPTj} {\text{ have}}\\ &{\text{same downstream link}}& \\ 0 & {{\text{otherwise}}} \\ \end{array} } \right., $$
(7)

where the value of \(a_{3}^{ij}\) ranges from 0.5 to 1.

Gated recurrent unit part

Urban traffic flow exhibits distinct trends and periodicity in the time dimension, so researchers have employed classic time-series models, such as ARIMA model, etc. Some deep learning models have recently been used in studies on urban traffic prediction. Recurrent neural network model (RNN) has outperformed traditional time-series models in traffic prediction tasks [12]. The gradient disappearance and explosion problems will occur, because the RNN-based model has an excess of parameters, which worsens the prediction performance. The Gated Recurrent Unit (GRU) model with fewer parameters successfully addresses this issue [15]. Therefore, the GRU model is utilized in this paper.

In the previous studies, the GRU model is already described in detail, and here, we have not modified it. Readers may refer to some previous studies if they are interested [25, 29]. It is necessary to note that the inputs of the GRU model are defined as the outputs of the GCN part

$$ Q_{GRU}^{\left( I \right)} = gc\left( {A_{3} ,gc\left( {A_{2} ,gc\left( {A_{1} ,Q^{\left( I \right)} } \right)} \right)} \right), $$
(8)

where \(gc\left( \cdot \right)\) represents the graph convolution process, and the right side is the result of three-step graph convolution process based on Graph I, Graph II, and Graph III. And then, the training process based on GRU is present to figure out the function between future traffic propagation volume matrix and the historical matrix after the graph convolution

$$ \left[ {\widehat{Q}^{t} ,\widehat{Q}^{t + 1} ,...,\widehat{Q}^{t + T} } \right] = gru\left( {\left[ {Q^{t - 1} ,Q^{t - 2} ,...,Q^{{t - T^{\prime}}} } \right]} \right), $$
(9)

where \(T\) and \(T^{\prime}\) are the time interval numbers of future traffic propagation volume matrix and the historical matrix, respectively.

With appropriate training, all the parameters are determined, which means that a precise and complete TPP-GCN model is obtained.

The pseudocode of TPP-GCN

The pseudocode for the TPP-GCN is shown below.

According to Table 2, the computation complexity of the TPP-GCN model is \(\mathcal{O}\left( {N \cdot T} \right)\), in which \(N\) is the vertex number of the traffic propagation transaction graph \(G^{\prime}\) and \(T\) is the number of hidden units in GRU. In addition, we also evaluate the training complexity of the TPP-GCN model. If the dataset includes \(M\) time periods, and the number of hidden layers is still \(T\), then we have a total of \(M - T + 1\) samples for training. In the training process, if the batch size and epoch values are \(b\) and \(e\), respectively, then a total of \(\frac{M - T + 1}{b} \cdot e\) training will be performed. To achieve a better prediction effect, we usually recommend that the value of \(M\) be much larger than \(T\) in the actual training process. Therefore, we can infer that the training complexity of the TPP-GCN model is \(\mathcal{O}\left( {N \cdot \frac{M}{b} \cdot e} \right)\).

Table 2 The pseudocode for the TPP-GCN

Experiment

Data description

Graph data

We selected the road network in a square area of Huaqiangbei commercial area in Futian District of Shenzhen to verify the proposed model. As shown in Fig. 5, the road network contains 15 intersections and 22 links, and the letters E, W, S, and N stand for eastbound, westbound, southbound, and northbound, respectively. The study area has a total area of 215,000 square meters and measures 500 m from East to West and 430 m from North to South.

Fig. 5
figure 5

Illustration of study area and road network

Traffic propagation flow data

We used the taxi GPS data in January 2019 as the flow data source to measure the traffic propagation flow in the study network. We performed a series of processing tasks on the taxi GPS data: (1) The first 3 days of January 2019 are Chinese New Year holidays, during which the changes in urban traffic flow are very different from weekdays and weekends, so we removed the data for those three days. As a result, a total of 28 days of GPS data are utilized. (2) We filtered taxi GPS points within the study area by latitude and longitude coordinates. A total of 8,190,000 GPS point coordinates and 38,0000 vehicle trajectories were gathered. (3) The average traffic propagation volumes of all traffic propagation transactions, which represent traffic propagation features, were aggregated every 5 min. (4) A training dataset and a test dataset were generated from the 28-day dataset. The test dataset includes the data from January 25th to January 31st, 2019. And the remaining 21-day of data are used as training dataset.

Benchmark model and evaluation measurement

Several benchmark models are applied for comparison and evaluation with the TPP-GCN model in this paper:

  1. (a)

    Historical average (HA), which uses the historical traffic volume in the same time interval as the predicted value.

  2. (b)

    Auto-Regressive Moving Average (ARIMA) [17], which is a common parameter-based traffic prediction model.

  3. (c)

    Gated Recurrent Unit (GRU): see Sect. "Gated Recurrent Unit part" for detail.

  4. (d)

    Graph Convolutional Network (GCN): see Sect. "Multi-graph convolution part" for detail, which only carried out the graph convolution process on Graph I.

  5. (e)

    Temporal Graph Convolutional Network (T-GCN) [29], which captures spatial–temporal dependency features of traffic flow.

  6. (f)

    TPP-GCN I: see Sect. "Multi-graph convolution part" for detail, which carried out the graph convolution process on Graph I and Graph II.

  7. (g)

    TPP-GCN II: see Sect. "Multi-graph convolution part" for detail, which only carried out the graph convolution process on Graph I and Graph III.

To compare the differences between predicted values \(\widehat{{q_{i} }}\) and real values \(q_{i}\) of the TPP-GCN model and the benchmark models, we use three standard indicators:

(1) Mean Absolute Error (MAE):

$$ MAE = \frac{1}{N}\sum\limits_{i = 1}^{N} {\left| {q_{i} - \widehat{{q_{i} }}} \right|} . $$
(10)

(2) Mean Absolute Percentage Error (MAPE):

$$ MAPE = \frac{1}{N}\sum\limits_{i = 1}^{N} {\frac{{\left| {q_{i} - \widehat{{q_{i} }}} \right|}}{{q_{i} }}} . $$
(11)

(3) Root-Mean-Square Error (RMSE):

$$ RMSE = \sqrt {\frac{1}{N}\sum\limits_{i = 1}^{N} {\left( {q_{i} - \widehat{{q_{i} }}} \right)} } . $$
(12)

Hyperparameters

The TPP-GCN model and all benchmark models are implemented using Pytorch.

The TPP-GCN model’s learning rate, batch size, and training Epoch are set to 0.001, 32, and 600, respectively. The number of hidden units of the model is set to 64, which is the optimal value according to the Ref. [29]. The L2 loss function was selected in the training process.

The T-GCN model’s learning rate, batch size, and training Epoch are set to 0.001, 32, and 5000. According to the Ref. [29], 64 hidden units was set for our dataset, which lead to the best prediction result. The L2 loss function was also selected for this model.

The hyperparameters of GRU, TPP-GCN I, and TPP-GCN II are all the same as the TPP-GCN model.

Results

Table 3 illustrates TPP-GCN model’s and other benchmark models' performances for five different time scales: 5 min, 15 min, 30 min, 45 min, and 60 min. According to Table 3, it shows that the TPP-GCN model performs the best over almost all prediction horizons.

Table 3 Prediction performances of TPP-GCN and benchmark models

We can infer the following facts from Table 3: (a) The deep learning models outperform the time-series models in traffic propagation prediction task for all prediction horizons. The intricate features of traffic propagation in the spatial dimension cannot be captured by conventional time-series models like HA and ARIMA. (b) TPP-GCN I, TPP-GCN II, and TPP-GCN outperform the T-GCN model, proving the validity of our hypothesis that considering the relationship between traffic propagation transactions with the same upstream and downstream to capture richer spatial features enhances the performance of traffic propagation prediction. (c) TPP-GCN has better performance on smaller prediction scale. This is because the spatial features of relationships between \(TPT\)s with same upstream and downstream link have less influence on long-term traffic propagation flow.

Figure 6 shows the comparison of ground truth value and the prediction results of other models on traffic propagation transaction \(TPT11\), which represents the traffic propagation flow from Link 7W to Link 4W in Fig. 5. This transaction is a typical one in all 46 transactions that have an average value around 20 vehicles per 5 min, on which all models have better performance. It is obvious that the prediction curves of TPP-GCN I, TPP-GCN II, and TPP-GCN are closer to ground truth curve than curves of GRU, GCN, and T-GCN in all prediction days. Especially, the TPP-GCN model outperforms all other models, which verify its better validity and accuracy. Besides, the traffic propagation flow, like other types of traffic flow, has the characteristics of trend and periodicity. That is to say, the traffic propagation volumes are small at night, while the daytime volumes are large. The TPP-GCN model accurately captures the trend and periodicity characteristics of the traffic propagation flow.

Fig. 6
figure 6

Comparison of all models’ prediction results and ground truth values in a Jan. 25th (Friday), b Jan. 27th (Sunday), c Jan. 28th (Monday), and d Jan. 30th (Wednesday)

Discussion

To verify the promotion of considering two additional spatial dependency features, we further compare the results of TPP-GCN I, TPP-GCN II, and T-GCN on \(TPTs\) that have same upstream link and \(TPTs\) that have same downstream link for 5-min time scale. As shown in Table 4, the TPP-GCN I model outperforms T-GCN model on the \(TPTs\) that have same upstream link, and the TPP-GCN II model outperforms T-GCN model on the \(TPTs\) that have same downstream link.

Table 4 Prediction performances of T-GCN, TPP-GCN I, and TPP-GCN II on \(TPTs\) that have same upstream/downstream link for 5-min time scale

Besides, we illustrate the boxplots of MAE, MAPE, and RMSE of T-GCN, TPP-GCN I, and TPP-GCN II models on the \(TPTs\) that have same upstream/downstream link, as shown in Fig. 7. There are totally 14 \(TPTs\) that have same upstream link and 16 \(TPTs\) that have same downstream link. It can be seen from the top row three subplots that T-GCN model and TPP-GCN I model both perform good for all three indicators for all 14 \(TPTs\). However, TPP-GCN I model has more concentrated indicator values than T-GCN model, which means that TPP-GCN I model can achieve better results in most cases. From the three subplots in the below row in Fig. 8, it is more obvious that TPP-GCN II has more concentrated indicator values than T-GCN model.

Fig. 7
figure 7

Boxplot of MAE, MAPE, and RMSE of prediction results. The three subplots in the top row represent comparison of TPP-GCN I and T-GCN on \(TPTs\) that have same upstream link, and the three subplots in the below row represent comparison of TPP-GCN II and T-GCN on \(TPTs\) that have same upstream link

Fig. 8
figure 8

Comparison of MAPE values for different traffic propagation volumes: a from Link 7W to Link 4W; b from Link 14W to 7W; c from Link 15W to 14W

The indicators in Table 4 are significantly smaller than the indicators in Table 3. This is because the traffic propagation volumes of the selected \(TPTs\) in Table 4 are bigger than rest of the \(TPTs\), and we inferred that the smaller traffic propagation volumes lead to worse prediction performance. To verify it, we evaluate the impact of the traffic propagation volumes on the TPP-GCN model’s performance for 5-min time scale. The values of MAPE for various traffic propagation volumes is shown in Fig. 8. When the traffic propagation volume is less than 5 vehicles per 5 min, the MAPE values of the TPP-GCN model for three different traffic propagation transactions are 1.18, 0.95, and 0.98 indicating poor performances. The values of the indicator drop sharply as traffic propagation volumes increased. When the traffic propagation volume is greater than 20 vehicles every 5 min, the three index values are stable at extremely low levels, indicating that the model’s performance has reached its best level. Therefore, we recommend applying the TPP-GCN model to a road network with large traffic propagation volumes, i.e., 15 vehicles per 5 min.

Conclusion

This paper proposes a multi-graph convolution-based model for traffic propagation prediction called TPP-GCN. This model consists of a multi-graph convolutional network and a gated recurrent unit. In the first part, the impact of adjacent traffic propagation transactions (\(TPT\)s), the relationships between \(TPT\)s with same upstream link and same downstream link are all considered to establish multi-layer graph convolutional network. Comprehensive spatial features of traffic propagation flow are captured by the multi-graph convolutional operation. In addition, the GRU part extracts the temporal features of traffic propagation flow. This paper also carries out experiment based on actual traffic data derived from real-world data to compare and evaluate the model with the existing benchmark models. The experimental results shows that the TPP-GCN model outperforms the other benchmark models on almost all prediction horizons. The implementation of the TPP-GCN and dataset are available at https://github.com/Joker-L0912/TPP-GCN.

Although the TPP-GCN model performs well for most of the prediction horizons. The traffic flow volatility has negative impact on the model performance. Therefore, future study will concentrate on model optimization to lessen the influence of traffic flow volatility. In addition, the TPP-GCN model only considers the spatial features in the first-order receptive field, which may cause the model prediction effect to be not so good. In future research, we will try to consider the second-order and third-order receptive field of spatial features to improve the model. Besides, the TPP-GCN model is only applied in predicting traffic propagation flow of urban road network. To validate the model’s applicability, we will extend the model to research in other domains, such as traffic congestion propagation, knowledge transfer prediction in social networks, etc.