1 Introduction

With the ongoing growth of urban populations, transportation problems in major cities, such as traffic congestion [1, 2] and air pollution [3, 4], are becoming increasingly severe. Intelligent transportation systems (ITS) are emerging as a crucial infrastructure in developing smart cities, and traffic forecasting plays a significant role in its advancement. Conducting early intervention through traffic prediction can enhance the efficiency of urban transport systems, guide scheduling, and staff preallocation.

Accurate traffic prediction requires traffic big data as its support and foundation. Bello et al. [5] and Zhang et al. [6] had identified the five characteristics of big data, referred to as 5 V features, including large volume, large velocity, large variety, veracity, and value. The 5 V features highlight the notable increase in data volume. Specifically, traffic data is a sort of big data. It also has the feature of multi-source, heterogeneity, and multi-modal, which are all from different sources, such as sensors installed on the road (e.g., loop detectors), transactions in urban public transportation systems (e.g., bus or subway), smartphones equipped with GPS receivers, etc. Furthermore, traffic big data exhibits various forms of data representation, encompassing text, video, numbers, and symbols. Therefore, how to perceive, collect, and manage traffic big data is a significant issue. Besides, analyzing and exploiting the value of traffic big data presents another significant challenge.

Fig. 1
figure 1

The overall architecture of traffic forecasting

In recent decades, numerous studies have been conducted to address the aforementioned challenges in traffic forecasting. Early, traffic forecasting was directly treated as a time series, and thus, there were many traditional time series forecasting methods applied here, e.g., Auto-Regressive Integrated Moving Average (ARIMA) [7]. Later, machine learning methods of data-driven models like Support Vector Regression (SVR) [8] were employed to handle more complex traffic data. Even though these models are built according to strict mathematical theories, the performance of these models heavily relies on feature engineering and is often limited by the feature representation capacity. Consequently, their performance in real-world applications remains unsatisfactory due to the reliance on certain unrealistic assumptions. To overcome these limitations, many researchers have turned to deep learning techniques for modeling high-dimensional spatio-temporal data. For example, Recurrent Neural Networks (RNNs) (e.g., LSTM [9], Gated Recurrent Unit (GRU) [10]) have been utilized to capture the temporal feature of traffic data. Additionally, Convolutional Neural Networks (CNNs) have been applied to represent the spatial features of grid-based traffic data effectively. More recently, Graph Convolutional Networks (GCN) have gained popularity for learning spatial correlations in graph-based data.

There have been numerous surveys on traffic forecasting[11,12,13,14,15,16,17,18,19,20], but few studies focus on the holistic architecture of traffic forecasting. As depicted in Fig. 1, for traffic prediction, researchers generally first analyze the characteristics of traffic data. Subsequently, they consider how to design a model that can effectively capture spatial and temporal features. Finally, they devote attention to developing a loss function that enables accurate traffic prediction. In light of the above pipeline, we aim to present a comprehensive overview based on the overall architecture of traffic forecasting. To be more specific, our initial analysis mainly focuses on traffic data collection, traffic data formats, and traffic data characteristics. Then, we conclude the methods of spatial representation modeling, temporal representation modeling, and spatio-temporal representation modeling, respectively. Lastly, we summarize the specific prediction tasks related to traffic data, such as traffic flow forecasting, traffic speed forecasting, traffic demand forecasting, and other hybrid traffic forecasting.

The contributions of this work are listed as follows:

  1. 1.

    We summarize and analyze traffic forecasting tasks by considering the overall architecture, including traffic data analysis, traffic data modeling, and traffic forecasting applications.

  2. 2.

    We review the open datasets and source resources for traffic forecasting.

  3. 3.

    The existing problems and challenges of traffic forecasting are presented, and the future research direction and idea are also discussed.

The rest of this paper is organized as follows: In Sect. 2, the paper introduces related surveys and traffic forecasting tasks. Sect. 3 shows traffic data analysis, including traffic data collection, traffic data formats, and traffic data characteristics. Traffic data modeling is explored in Sect. 4. The applications of traffic forecasting are introduced in Sect. 5. Section 6 provides public datasets for traffic forecasting. Sect. 7 discusses the challenges of the current traffic forecasting models and presents the future research directions. Section 8 is the conclusion of this paper.

2 Related Work

In this section, we first make related surveys about traffic forecasting and then discuss the distinctions between existing surveys and our work. Additionally, we provide a preliminary about traffic forecasting, categorizing it into single-step and multi-step forecasting tasks.

2.1 Related Survey

The existing traffic forecasting surveys can be divided into three categories: deep learning-based, graph neural network-based, and task-based reviews. The deep learning-based short-term traffic forecasting was roughly classified into five generations in [15]. Three deep learning model categories for traffic forecasting were highlighted, including grid-based, graph-based, and multivariate time series models in [14]. The data preparation process methods and five traffic forecasting models were concluded in [16]. This survey covered spatio-temporal datasets collection technology, urban map division, and other various traffic data issues (e.g., data missing, data imbalance, and data uncertainty) as three traffic data preparation process methods. Moreover, it provided five traffic forecasting methods: statistics-based methods, machine-based methods, deep learning-based methods, reinforcement learning-based methods, and transfer learning-based methods. Yin et al.[17] reviewed traditional and deep learning methods for traffic forecasting. The traffic prediction techniques were categorized into four groups in [18], including machine learning, computational intelligence, deep learning, and hybrid methods. Traffic volume prediction and speed prediction problems were discussed in [19]. The models were only divided into statistical models, machine learning-based methods, and graph neural networks-based methods.

Table 1 Comparison between this and other related surveys

Due to the spatial graph structure being non-Euclidean, graph neural networks and their variants have emerged to represent the spatio-temporal relation in traffic prediction. Recently, many surveys about graph neural networks for traffic prediction have been conducted. The spatio-temporal graph neural network models were divided into graph convolutional recurrent neural network, fully graph convolutional network, graph multi-attention network, and self-learning graph structure in [11]. A GNN-based traffic forecasting survey was given in [12], which discussed diverse traffic forecasting applications, such as traffic flow forecasting, traffic speed forecasting, traffic demand forecasting, and other hybrids. Jiang et al. [13] summarized open data and big data tools applied in traffic estimation and prediction, provided different data types that are utilized for traffic estimation and forecasting tasks, and opened a new GPS trajectory dataset for further research. Jin et al. [20] introduced spatio-temporal graph data generation methods and the deep-learning architectures in spatio-temporal graph neural networks (STGNN). It also thoroughly analyzed existing STGNN methods for temporal learning, spatial learning, and spatio-temporal fusion. Further, it examined some recently emerging approaches that integrated STGNN with other advanced learning frameworks. It mainly focused on STGNNs in various fields, including public safety, healthcare, transportation, environment, climate, and others.

In conclusion, as is mentioned in Table 1, these existing surveys mainly focus on methods or task classification without delving into the overall architecture of traffic forecasting. The differences between our study and existing surveys are that we present the overall architecture of traffic forecasting, including traffic data analysis, traffic data modeling, and traffic forecasting applications.

2.2 Traffic Forecasting

Traffic forecasting is an essential component of intelligent transportation systems, which generally predict the future traffic state (e.g., traffic speed, traffic flow, traffic demand, etc.) at a particular time interval when given historical observed data and other related external data. Traffic forecasting tasks are usually classified into short-term forecasting [21, 22] and long-term forecasting [23, 24] based on prediction time interval length. The time interval of short-term traffic forecasting is generally 5 min or 15 min, and the time interval of long-term traffic forecasting is 1 h or even more. In addition, according to the number of the prediction time interval, traffic prediction tasks can be divided into single-step traffic prediction [21, 22] and multi-step traffic prediction tasks [23,24,25], which is shown in Fig. 2. We introduce the concepts of single-step traffic prediction in Definition 1 and the multi-step traffic prediction problem addressed in traffic prediction tasks in Definition 2.

Fig. 2
figure 2

the example of single-step prediction and multi-step prediction

Definition 1

(Single-step Traffic Prediction) Given historical observed traffic data of n time steps \(X=(x_{t-n},x_{t-n+1},...,x_{t})\), single-step traffic prediction problem aims to predict traffic data in the next time interval \(t+1\).

Definition 2

(Multi-step Traffic Prediction) Given observations of historical time steps \(X=(x_{t-n},x_{t-n+1},...,x_{t})\), multi-step traffic prediction problem targets to predict traffic data of the next m time steps, denoted as \(Y=(x_{t+1},x_{t+2},...,x_{t+m})\).

3 Traffic Data Analysis

Before designing traffic prediction models, researchers are usually required to obtain traffic data, investigate the feature/pattern of traffic data, and mine existing issues of traffic data. Therefore, in this section, we initially analyze traffic data collection scenes and summarize various approaches to address different data preprocessing issues. Secondly, we conclude two traffic data formats and discuss distinctions between traffic region data and point data. Finally, we present traffic data characteristics and provide some examples. As the Fig. 3 is shown.

Fig. 3
figure 3

Traffic data analysis for traffic forecasting

3.1 Traffic Data Collection

Many sensors have been deployed in the real world. Traffic data is mainly collected by urban sensing or automatic fare collection. Traffic data collection is mainly derived from fixed urban sensing, mobile urban sensing, and passive urban sensing. Firstly, surveillance installed on the road collects traffic speed and traffic flow data, which is a kind of fixed urban sensing. Secondly, share-bike or taxi-driven vehicles in the city collect traffic speed and traffic flow data, which is a kind of fixed urban sensing. Moreover, transactions in urban transportation systems and smartphones equipped with GPS receivers are passive urban sensing.

3.1.1 Traffic Data Pre-processing

The collected traffic data face biased samples, data sparsity, and data missing challenges. To deal with these problems, researchers developed many models for data pre-processing [26].

For biased samples of traffic data, Zhou et al. [27] designed a causal spatio-temporal graph learning framework (CauSTG) to achieve invariance in spatio-temporal data. For data sparsity, Liu et al. [28] proposed contrastive learning methods that introduced four data augmentation methods for spatio-temporal graph prediction. For data missing, main works contain traditional machine learning methods (e.g., Principal Component Analysis (PCA) [29], Matrix or Tensor Factorization (MF or TF) [30]) and deep learning methods [31, 32]. On the one hand, some scholars employ RNN and its variants to cope with traffic missing data imputation. For instance, kong et al. [31] developed a dynamic graph convolutional recurrent imputation network (DGCRIN) to impute missing traffic data. On the other hand, the adversarial learning method is integrated to traffic missing data imputation. Yuan et al. [32] proposed generative loss and center loss to minimize reconstructed errors of imputed entries and ensure each imputed entry and its neighbors conform to local spatio-temporal distribution.

3.2 Traffic Data Formats

Based on data formats of traffic data, which are categorized into traffic region data and traffic point data, as is shown in Fig. 4. Traffic region data is like the grid data [33]. Traffic point data can include sensors and subway/bus stations [19].

Fig. 4
figure 4

The example of traffic data formats

The difference between traffic region data and traffic point data is that they are in different spatial dimensions. The former is in a regular grid form, while the latter is irregular. Therefore, it can be considered that traffic region data is a special form of traffic point data when it is regularly distributed in spatial dimensions. That is, traffic point data is a more general representation than traffic region data.

To sum up, among the two typical traffic data, traffic region data is the more special type of traffic data because of its arrangement rules in time and space dimensions. Traffic point data is a kind of spatio-temporal data that is irregularly arranged in spatial dimensions, and it is a more general representation than traffic region data. Therefore, the relationship between traffic region data and traffic point data is from special to general. Thus, this paper adopts the idea of ’from special to general’ and gradually summarizes the related work of two kinds of traffic data.

3.3 Traffic Data Characteristic

Based on the characteristics of traffic data, traffic data can be categorized into multi-source data, heterogeneity, multi-modal data, and complex spatio-temporal dependence. Researchers primarily utilize traffic big data from various sources, including sensors like loop detectors deployed in the city, transactions in urban public transportation systems, and smartphones equipped with GPS receivers. The structure of traffic big data is diverse, with heterogeneous data such as non-linear spatial data and different traffic patterns. Non-linear spatial data refers to station relations that are not regular and can be located anywhere, similar to a graph. Different traffic patterns indicate that various stations exhibit distinct travel patterns at different times. For instance, residential areas may have a strong correlation with business areas in the morning but correlate with shopping malls in the evening due to people returning home after dinner. Additionally, traffic big data encompasses various data types, such as text, video, numbers, and symbols [34]. Traffic data represents typical spatio-temporal data constantly changing with time and space, resulting in multi-scale temporal relationships and dynamic, evolutive spatial relationships. Among them, temporal correlation mainly refers to the changes in traffic status over time, resulting in data showing a certain degree of proximity, periodicity, and trend. The spatial correlation is manifested as the mutual influence of traffic conditions between various nodes of the urban road network. Besides, the functional semantics of a city can also affect its spatial relationships, resulting in nodes at longer distances exhibiting similar spatial patterns due to the same urban functions. The spatial relation also keeps the local and global correlation, such as road and region.

In addition, traffic data is also susceptible to external factors such as weather, emergencies, holidays, etc., further increasing the complexity of traffic data. To sum up, how to design an effective traffic model to obtain spatio-temporal correlations after traffic data analysis is the main problem for traffic data mining.

4 Traffic Data Modeling

Traffic data is a kind of spatio-temporal data contains spatial and temporal dimensions. Thus, we categorize traffic data modeling into three groups: spatial representation, temporal representation, and spatio-temporal representation. The spatial representation generally deals with grid data or graph data. Scholars usually employ convolution neural networks to learn the feature of grid data and learn complex and dynamic spatial dependencies of graph data through graph neural networks or their variants [35,36,37,38,39,40,41,42]. The temporal representation treats time as sequence data, researchers usually utilize RNN[43], TCN[44], Causal TCN or their variants [45, 46]. The spatio-temporal representation means that models can simultaneously capture spatial and temporal features simultaneously, such as STSGCN [47] and STJGCN [48]. Finally, we also discuss research about the combination of spatio-temporal with other promising methods, such as Meta-learning[49], ODE[23], Self-supervised learning [28], Continue learning [50] and so on.

Table 2 The summary of traffic graph
Fig. 5
figure 5

The example of traffic graph

4.1 Spatial Representation

In this section, we summarize methods of spatial representation from two parts, namely, spatial structure and spatial model.

4.1.1 Spatial Structure

The spatial structure of traffic region data is regular and a kind of grid data. For traffic point data, spatial relationships are irregular and dynamic in different time dimensions, exhibited through graph structure. The spatial information incorporates some latent semantic features except for inherent geographic features. Thus, many researchers mainly focus on traffic graph construction. We summarize traffic graph as static graphs[35,36,37,38,39], virtual graphs [35, 39,40,41,42, 51], hierarchical graph [53, 54, 57], or dynamic graphs [55, 56, 58,59,60,61], which are discussed in this survey and the example is shown in Fig. 5.

The physical graph, adjacency graph, and neighborhood graph are static graphs that can capture station geographic information. Static graphs are constructed from real-world transportation systems, such as road networks and urban public transportation systems. The functional similarity graph (e.g., POI), temporal pattern similarity graph (e.g., Dynamic Time Warping, DTW), distance graph, origin–destination graph, and heuristic graph are virtual graphs, which mainly mine the similarity of different nodes or the implicit nodes relation from various aspects. Hierarchical graphs can construct the natural hierarchical structure of traffic systems and reflect the interaction between micro and macro layers (e.g., road segments and regions). Dynamic graphs are primarily generated from data to tackle the uncertainty of spatial dependence, where node connections evolve with different temporal dimensions.

We make detailed summaries about diverse traffic graph definitions and provide how to construct the various traffic graphs in the existing works. We also provide references for each type of traffic graph in Table 2.

  1. 1.

    Static Graph: The static graph represents geographic connections. Neighbor matrix \(A^{t}\) can be defined as follows:

    $$\begin{aligned} A^{t}_{ij} = {\left\{ \begin{array}{ll} &{} 1, \text{ if } v_{i} \text { and } v_{j} \text { are adjacent,} \\ &{} 0, \text{ otherwise. } \end{array}\right. } \end{aligned}$$
    (1)

    where \(A^{t}_{ij}\) means an element in adjacency matrix at time t, \(v_{i}\) and \(v_{j}\) are different nodes in the graph. The different works treat the static graph as different names, such as physic graph [35], neighbourhood graph [36, 38] and adjacency graph [37, 41, 42].

  2. 2.

    Distance Graph: The traffic patterns of adjacent stations can be highly correlated. For example, residents within a region may have similar daily travel patterns. Thus, the distance graph is an important factor for traffic prediction based on the perspective of geography. The distance weight matrix D is calculated using a Gaussian Kernel [62] as follows:

    $$\begin{aligned} D_{ij}=exp \left(-\frac{DS(v_i,v_{j})^2}{\sigma ^{2}}\right) \end{aligned}$$
    (2)

    where \(DS(v_i,v_{j})\) indicates the shortest travel distance between different stations \(v_i\) and \(v_j\), \(\sigma\) is the standard deviation of travel distances. There are different distance graph representation types, such as travel distance graph [37] and distance graph [39, 40].

  3. 3.

    Functional Similarity Graph: Generally, locations with similar functionalities or utilities (e.g., shopping malls, schools, parks, hospitals, etc.), have strong spatio-temporal correlations. Geng et al. [36] defined the functional similarity graph \(A^s\) by using the POI similarity. The formulation is below:

    $$\begin{aligned} A^s_{ij}=Sim(P_{v_i},P_{v_j}) \in [0,1] \end{aligned}$$
    (3)

    where \(P_{v_i}\), \(P_{v_i}\) are the POI vectors of regions \({v_i}\) and \({v_i}\) respectively, Sim(.) is similarity function. Moreover, Shao et al. [39] enhanced functional similarity graph \(W_{ij}^{F}\) by using Pearson correlation coefficients [63] to construct the global contextual function similarity graph.

    $$\begin{aligned} W_{ij}^{F}:={\left\{ \begin{array}{ll} \frac{\sum _{k=1}^{K}\left( f_{i,k}-\bar{F_{i}} \right) \left( f_{j,k}-\bar{F_{j}} \right) }{\sqrt{\sum _{i=1}^{k}\left( f_{i,k}-\bar{F_{i}} \right) ^2}\sqrt{\sum _{j=1}^{k}\left( f_{j,k}-\bar{F_{j}} \right) ^2}}&{} \text { if } i\ne j, \\ 0, &{} otherwise. \end{array}\right. } \end{aligned}$$
    (4)

    where K is the total number of functions, then the vector of the number of global contextual similarity functions of vertex \(v_i\) is indicated as \(F_i = \left\{ f_{i,1},f_{i,2},...,f_{i,k},..., f_{i,K}\right\}\). The function similarity graph is designed in these works [35, 36, 39, 42, 51]

  4. 4.

    Temporal Pattern Similarity Graph: The existing works mainly utilized the DTW algorithm [52] to capture temporal similarities among traffic time series of different node pairs. The similarity score S(ij) between station i and j is as follows:

    $$\begin{aligned} S(i,j)=exp(-DTW(X^i,X^j)) \end{aligned}$$
    (5)

    Moreover, Shao et al. [39] used Pearson correlation coefficients [63] to design temporal similarity matrix \(W^T\), which is as follows:

    $$\begin{aligned} W_{ij}^{T}:={\left\{ \begin{array}{ll} \frac{\sum _{p=1}^{P}\left( t_{i,p}-\bar{T_{i}} \right) \left( t_{j,p}-\bar{T_{j}} \right) }{\sqrt{\sum _{i=1}^{P}\left( t_{i,p}-\bar{T_{i}} \right) ^2}\sqrt{\sum _{j=1}^{P}\left( f_{j,p}-\bar{T_{j}} \right) ^2}}&{} \text { if } i\ne j, \\ 0, &{} otherwise. \end{array}\right. } \end{aligned}$$
    (6)
  5. 5.

    Origin–Destination Graph: The origin–destination distribution of ridership is vital for traffic prediction. Thus, Liu et al. [35] defined a correlation ratio matrix C, the formulation is as follows:

    $$\begin{aligned} C\left( i,j \right) = \frac{OD\left( i,j \right) }{\sum _{k=1}^{N}OD\left( i,k \right) } \end{aligned}$$
    (7)

    where OD(ij) is the total number of passengers that travel from station j to station i. The origin–destination graph has various names, such as correlation graph [35, 42], transportation connectivity [36], population flow graph [37].

  6. 6.

    Heuristic Graph: To utilize heuristic knowledge and human insights, Shao et al. [39] defined a new graph model called the heuristic graph. The formulation is as follows:

    $$\begin{aligned} W_{ij}^{H}:={\left\{ \begin{array}{ll} exp\left( -\frac{\left\| d_{ij}^{H}\right\| ^2}{\sigma _{H}^{2}} \right) &{} \text { for } i\ne j, \\ 0, &{} otherwise. \end{array}\right. } \end{aligned}$$
    (8)

    The distribution distance is calculated by the Euclidean distance \(d_{ij}^{H}=\sqrt{\left( \alpha _{1}-\alpha _{2} \right) ^2 + \left( \beta _{1}-\beta _{2} \right) ^2}\). \(\sigma _{H}^{2}\) is a parameter to adjust the distribution of \(W_{ij}^H\).

  7. 7.

    Hierarchical Graph: Guo et al. [53] constructed the interaction between micro and macro layers of GCNs, which integrated the different scales of features of road segments and regions for improving traffic forecasting performance.

  8. 8.

    Dynamic Graph: Xie et al. [55] employed a dynamic graph relationship learning module to learn dynamic spatial relationships between metro stations without a predefined graph adjacency matrix.

4.1.2 Spatial Model

We summarize the existing spatial representation methods and divide them into five categories: Grid-based methods, GNN-based methods, Attention-based methods, Multi-graph fusion methods, and Graph learning-based methods. We describe the details of the spatial modeling methods for the above five categories.

  1. 1.

    Grid-Based Methods: Traffic region data and image-like data have certain similarities in the data structure, and a region can be regarded as a pixel in the image. Therefore, Convolutional Neural Networks (CNNS), which have made breakthroughs in the field of image processing, have also been applied to representation learning of traffic region data, that is grid-based spatial representation methods. For example, Zhang et al. [64] proposed an ST-ResNet model based on residual convolution unit for crowd flow forecasting. Guo et al. [65] developed a spatio-temporal three-dimensional convolution neural network (ST-3DNet) to extract spatio-temporal features from traffic grid data. Some scholars combined CNN and LSTM to present a novel convolution long and short-time memory network (ConvLSTM)[66] to process spatio-temporal grid data. In general, these methods can learn effective feature representations by capturing temporal and spatial correlation in traffic grid data due to the powerful expressive ability of CNN.

  2. 2.

    GNN-Based Methods: Traffic point data is an irregular structure in the spatial dimension. Traditional CNN models cannot be directly used for representation learning of traffic graph data. Recently, due to graph convolution networks’ strong ability to represent graph structure features, there are many GNN-based methods for traffic prediction. GNNs leverage a neighborhood aggregation strategy to sample and aggregate neighboring nodes features [67,68,69]. DCRNN [43] viewed traffic flow patterns as a diffusion process and utilized bidirectional random walks to learn spatial dependency on a directed graph. This is the first attempt to combine GNNs with RNNs for traffic forecasting. STGCN [44] developed an integration of GNNS and CNN to capture spatio-temporal correlation for traffic forecasting, which is a typical art for GNNs and CNN-based traffic forecasting.

  3. 3.

    Attention-Based Methods: After the GNN-based works, some researchers employ attention mechanisms to capture complex and dynamic spatial and temporal correlation. For instance, GMAN [24] leveraged attention mechanisms to capture spatial and temporal relationships for traffic forecasting. ASTGCN [70] employed spatial attention, temporal attention, and graph convolution network to extract dynamic spatio-temporal dependencies for traffic prediction.

  4. 4.

    Multi-Graph Fusion Methods: As is mentioned in Sect. 4.1.1, scholars designed manifold traffic graphs to learn various spatial features. How to integrate multiple traffic graphs is crucial for accurate traffic prediction. In this survey, we classify multi-graph fusion methods into three groups: static and virtual graph fusion methods, static graph and dynamic graph fusion methods, and static, virtual, and dynamic fusion methods. Firstly, there are many static and virtual graph fusion methods [35,36,37,38, 40]. Geng et al. [36] designed neighborhood graph, functional similarity graph, and transportation connectivity to explicitly model the complex spatial relation and leveraged the contextual gated RNN to learn the temporal relation. For the second class, Li et al. [56] proposed the static graph and dynamically generated graph fusion to model complex and dynamic spatial features and used the RNN model to capture temporal information for traffic forecasting. For the third class, Shao et al. [39] constructed a static graph and four virtual graphs, then fused the five graphs into the dynamic learned graph to capture the spatial feature.

  5. 5.

    Graph Learning Methods: The graph structure is dynamic with different time series. Thus, a lot of graph learning works emerge. We divide graph learning methods into two groups: adaptive graph learning methods and graph generation methods. For adaptive graph learning works, they adaptively learn graph node features and spatial dependencies. For example, AGCRN [71] developed two adaptive modules to automatically learn node features and spatial dependencies. Wu et al. [72] designed an adaptive matrix to extract the hidden spatial dependency for traffic forecasting, which opened up graph learning research direction and brought a significant impact in traffic forecasting domains. For graph learning methods, they dynamically generate traffic graphs in different time dimensions to capture dynamic spatial correlation. For instance, Wu et al.[45] employed node embedding and edge relation learning to represent dynamic spatial dependencies. Ye et al. [46] designed a graph learning method based on multi-scale temporal dependency to learn the evolution of spatial relations, and the proposed model could reduce graph learning parameters. Zhang et al. [73] utilized static structure learning to generate shared spatial structure and employed dynamic structure learning to model the unique structure of each node.

Fig. 6
figure 6

A timeline of important research on spatial model for traffic prediction

As is shown in Fig. 6, scholars first divide the city into several grids and utilize deep learning-based approaches to learn grid-based spatial correlation [33]. However, deep learning-based methods can’t deal with non-Euclidean spatial dependencies. Thus, the GNN-based study emerges to capture complex graph structure features, such as DCRNN [43] by combining GNN and RNN, STGCN [44] with integrating GNN and CNN for traffic prediction. Afterwards, some researchers develop attention-based methods to model spatio-temporal relations. The above works mainly concentrate on the single or static graph. They can’t represent complex and implicit spatial features. Therefore, multi-graph fusion methods are employed to construct various graph structures (e.g., Virtual Graph [35], Hierarchical Graph [53]) and fuse diverse graph structures to learn spatial dependencies. Traffic patterns are not the same for different stations/nodes, and spatio-temporal dependencies are dynamic and evolve in different time series. So graph-learning traffic prediction methods are applied to achieve accurate traffic prediction through node embedding and edge relation learning [45].

4.2 Temporal Representation

In this section, we review methods of temporal representation from two parts, namely, temporal feature and temporal model.

Fig. 7
figure 7

The example of temporal feature

4.2.1 Temporal Feature

Temporal features are categorized into closeness, periodicity, trend, daily, weekly, and holiday. The closeness is that the neighboring time series traffic trend can not change dramatically. As is shown in Fig. 7a, the traffic trend of adjacent days has only slight changes and is relatively stable. The periodicity is that traffic patterns are periodic fluctuations during different time scales. As is shown in Fig. 7b, each weekday’s traffic patterns are similar and present the periodicity. The trend means traffic patterns have some fluctuation rule, such as the trend is increased during the morning peak. As shown in Fig. 7c, traffic flow on each Monday has a growth trend and a downward trend on each Saturday. The daily means the last day’s traffic pattern may be similar to the next day’s traffic pattern, which can be observed in Fig. 7d. The weekly relation in traffic patterns may be similar among weekdays and weekends, which can be illustrated in Fig. 7d. The traffic trend can change significantly on different holidays, as depicted in Fig. 7f.

4.2.2 Temporal Model

Traffic data is a typical time series data. Compared to time series data, 1) traffic data possesses not only temporal attributes such as closeness, periodicity, and trend but also exhibits intricate spatial relationships that are dynamic, heterogeneous, and hierarchical. Therefore, it is necessary for traffic data prediction models to incorporate spatial and temporal information learning simultaneously. It is worth mentioning that certain multivariate time series prediction techniques that take spatial representation into account can apply to traffic prediction. For instance, MTGNN [45] and ESG [46] have shown impressive results on specific traffic datasets. However, these studies primarily focused on learning dynamic spatial features while neglecting other critical spatial attributes of traffic data, such as heterogeneity and hierarchy. Therefore, it is imperative to meticulously design specialized models that accurately represent spatial heterogeneity and hierarchy. 2) In some domains, such as electricity, incorporating spatial modeling is not required for predicting time series tasks. In fact, including spatial representation can even degrade the model’s performance [74, 75]. These models are also unsuitable for traffic forecasting as they do not consider complex spatial modeling. In this section, we conclude the temporal representation methods as RNN-based, TCN-based, Causal TCN-based, and Self-attention-based methods.

  1. 1.

    RNN-Based Methods: RNN-based methods are traditional methods that regard temporal dimension as sequence data. Early RNN and its various variants were widely utilized to represent temporal features. Ye et al. [38] employed three long short-term memory network (LSTM)-based modules to learn three temporal properties (e.g., closeness, daily periodicity, weekly trend) of target stations. Zhang et al. [33] designed three residual networks to learn the closeness, period, and trend properties for traffic forecasting. Liu et al. [35] leveraged GRU to capture temporal features. Yao et al. [66] learned the spatial relations and the temporal correlations with CNN and LSTM [76] modules, respectively. Zhao et al. [77] leveraged GRU to learn dynamic temporal dependence for traffic forecasting. RNNs have been extensively utilized in different research on traffic prediction for temporal learning. Nonetheless, these models have a notable drawback: the recurrent structures require the computation of sequences at each time step, resulting in a significant increase in computational cost and a subsequent decrease in model efficiency. Compared to RNN and its variants, Temporal Convolutional Networks (TCN) can effectively tackle this issue with their parallel 1D-CNN structures.

  2. 2.

    TCN-Based Methods: TCN-based methods employ parallel 1D-CNN structures to capture temporal features. Zhao et al. [77] proposed a temporal graph convolutional network for temporal representation, which combined GNN with gated RNN for traffic forecasting. STGCN [44] designed gated temporal convolutions to learn significant temporal features. Although TCN is an efficient parallel neural architecture for time series learning, it does not consider the temporal order of spatio-temporal graph data and multi-scale temporal features. In contrast, Causal Tcn-based methods can explicitly model the causal relation of temporal data and learn multi-scale temporal dependencies.

  3. 3.

    Causal TCN-Based Methods: Causal TCN-based methods mainly integrate the difference kernel size convolution operators to get more extensive temporal features. Wu et al. [72] applied the dilated causal convolution to model temporal features of nodes. Wu et al. [45] modified 1D convolutions with different convolutional kernels to capture temporal patterns of multiple frequencies. Ye et al. [46] employed the dilated convolution to obtain multi-scale temporal features. While Causal TCN-based methods effectively capture the causal and multi-scale relation of time series, they fall short of adequately representing long-range temporal dependencies.

  4. 4.

    Self-Attention-Based Methods: Self-attention-based methods are designed to achieve long-time series prediction, which are popular methods for long-range temporal representation. The typical model is Transformer [78]. Grigsby et al. [79] employed the transformer to learn longer temporal information. STDGRL [55] leveraged the transformer to model long-range temporal dependencies.

In conclusion, RNN-based methods [33, 35] can obtain sequence features of temporal dimensions. However, RNN-based methods rely on the output aggregated from the previous time step at each time step, making it difficult to parallelize calculations. Thus, TCN-based temporal representation methods [44, 77] are employed to learn the temporal features by parallel 1D-CNN structures. Considering the causal affection and multi-scale attribute, causal TCN-based methods [45, 46, 72] are proposed to learn the broader range receptive field and obtain multi-scale temporal features. Furthermore, self-attention-based methods such as Transformer [78] are designed to capture long-range temporal representation.

4.3 Spatio-Temporal Representation

We analyze spatio-temporal representation methods of two types: spatio-temporal modeling simultaneously and spatio-temporal with other advanced technologies. To capture the spatio-temporal dependencies simultaneously, Song et al. [47] developed a spatio-temporal GNN model to learn localized and heterogeneous spatio-temporal dependencies synchronously. For the combination of spatio-temporal with other advanced algorithms, we divide it into four classes: Meta-learning-based methods, ODE Network-based methods, Self-supervised-based methods, and continuous Learning-based methods.

  1. 1.

    Meta-Learning Based Methods: To extract the traffic meta-knowledge and learn the diversity of spatio-temporal correlations, some scholars employ Meta-learning for traffic prediction. For instance, Fang et al. [49] designed two meta-learning-based modules to fuse multi-source external data with temporal and spatial features. ST-MetaNet [80] utilized a meta graph attention network to learn complex spatial correlations.

  2. 2.

    Ordinary Differential Equation (ODE) Network-based Methods: Graph convolutional networks generally perform better by using more stacks of layers, while the performance suffers a decrease from the depth of layers. Inspired by neural ODE network [81], Fang et al. [23] designed the tensor-based ordinary differential equation network to learn spatio-temporal relationships dynamically. To combine continuous modeling and neural ODE network, Jin et al. [82] integrated a neural ODE network with GNN for traffic forecasting.

  3. 3.

    Self-Supervised Based Methods: To solve data scarcity and heterogeneity issues, some researchers consider using self-supervised learning for traffic prediction. For example, Liu et al. [28] integrated contrastive learning with spatio-temporal GNN networks to improve the accuracy and robustness of traffic forecasting. It had a strong generalization ability and could be a plug-in for many existing spatio-temporal methods. SPGCL [83] maximized the distinctive gap between positive and negative samples and constructed an optimal graph with a self-paced strategy for traffic forecasting. Ji et al. [84] proposed a spatio-temporal contrastive learning framework, which could better represent traffic patterns and capture traffic data heterogeneity features.

  4. 4.

    Continual Learning Based Methods: Continual learning, also named life-long learning and incremental learning, can train the model sequentially by streaming increased data from different tasks. Chen et al. [50] introduced continual learning to obtain efficient updates and effective prediction for streaming traffic data.

In summary, it’s crucial to mine the spatio-temporal data characteristics, discover traffic patterns based on traffic data, and leverage suitable and promising technology (e.g., ODE, Contrastive learning, Continual learning) to model the effective traffic prediction framework.

Table 3 The typical work for traffic forecasting

5 Traffic Forecasting Applications

Traffic prediction contains various applications. Here, we summarize the main applications of the existing traffic prediction work, including traffic flow forecasting, traffic speed forecasting, traffic demand forecasting, and other hybrid traffic forecasting. We also sort out and analyze the typical study for traffic forecasting in Table 3.

5.1 Traffic Flow Forecasting

Traffic flow typically connects to the number of vehicles, crowds, and passengers passing through a particular space, such as a road segment, sensor point deployed on the road, or bus/subway station in an observed time interval. Accurate traffic prediction can help to reveal real-time traffic demands and be critical for traffic management, public safety, route planning, line scheduling, and staff preallocation.

For spatial representation of traffic flow forecasting, researchers usually utilize grid-based [33], multi-graph [35] and dynamic graph methods [55]. For instance, Zhang et al. [33] partitioned cities into regular grid maps based on geographical coordinates and organized the collected traffic data as Euclidean 2D or 3D tensors, so that CNNs can be applied to extract spatial topologies. Du et al. [85] designed a hybrid multi-modal learning method to learn spatio-temporal dependencies for short-term traffic flow forecasting jointly.

These grid-based methods are suitable for predicting traffic region data but cannot model the non-linear graph structure. Therefore, several multi-graph fusion methods are proposed to learn the spatial physical and semantic information. For example, PVCGN [35] could effectively capture complex ridership correlation from tailored traffic graphs. Specifically, a physical graph was directly constructed based on the real-world station topology connection. In addition, a similarity graph and a correlation graph as complementary graphs were designed to reveal the similarity and correlation of inter-station passenger flow. This paper incorporated a static graph and two virtual graphs into the graph convolution gated recurrent unit to learn the spatio-temporal relation and applied a fully connected gated recurrent unit (FC-GRU) to model the global evolution information. Finally, a Seq2Seq model with GC-GRU and FC-GRU was employed for metro ridership forecasting.

This particular method of predicting traffic by combining multiple graphs relies on predefined spatial dependencies, which are based on prior knowledge. However, the spatial relationships between traffic data constantly evolve at different time steps. Therefore, Xie et al. [55] designed a dynamic graph convolutional network to learn spatial features, leveraged transformer to obtain long-range temporal information, and employed gated fusion to combine spatial features and temporal dependencies for urban subway station passenger flow forecasting.

The typical methods for temporal representation of traffic flow forecasting are RNN and its variants for short-term traffic flow prediction [35, 85, 86]. and the self-attention-based methods for long-term traffic flow prediction [55]. Li et al. [86] utilized the GNN and residual lstm for the traffic flow prediction. First, this paper calculated the correlation coefficient with min-max normalization to remove spatial heterogeneity. Then, it employed z-score transformation to eliminate daily periodicity for stronger temporal auto-correlation.

In addition, some researchers regard traffic flow prediction as a service-level/line-level task [88, 89]. For example, Luo et al. [89] proposed a spatio-temporal hashing multi-graph convolution network, where two types of sub-graphs were constructed from perspectives of physical adjacency and semantic similarity, respectively. This model explicitly captures spatio-temporal dependencies among bus stations/lines. Luo et al. [88] designed the MDL-SPFP model to jointly predict the arriving bus service flow, line-level on-board passenger flow, and line-level boarding/alighting passenger, which combines three modules, attention mechanism, residual block, and multi-scale convolution, to capture various complex non-linear spatio-temporal dependencies well.

Below, we discuss the differences among the passenger flow forecasting tasks at station, line, and bus service levels. The passenger flow forecasting at the station level is to predict the passenger flow of each station without distinguishing the passenger flow of different lines, while the passenger flow at the line level is to distinguish the passenger flow of different stations on different lines, which is a more fine-grained passenger flow forecasting. The passenger flow forecast of bus service level will further distinguish the passenger flow of different stations, lines, and vehicles. Therefore, the ridership prediction at the line/bus service level will incorporate additional prior knowledge, such as the specific line and vehicle information. Moreover, regarding spatio-temporal modeling, the inclusion of intermediate hubs and their influence will be considered.

5.2 Traffic Speed Forecasting

Traffic speed is generally based on the average speed of vehicles through certain locations, such as a sensor point in the observed time interval. This task is mainly about vehicles, researchers develop spatio-temporal modeling methods to predict the speed of vehicles, which is beneficial for travel planning.

For the spatial representation of traffic speed forecasting, sensors installed on the highway road are irregular. The different sensors are connected through a graph structure. Thus, scholars generally use graph neural networks to capture the spatial correlation for traffic speed prediction. For the temporal representation of traffic speed forecasting, some TCN-based [44], causal TCN-based methods [45, 46] are proposed to capture different receptive temporal features. For instance, Liu et al. [43] proposed a bidirectional diffusion convolution framework to model spatial dependency and a sequence-to-sequence architecture with GRU was employed to extract temporal dependency for traffic speed forecasting. This is a pioneering work of traffic speed prediction based on the graph neural network. Afterward, researchers established different models based on the distinct characteristics of traffic speed data. To learn spatio-temporal relations synchronously, STSGCN [47] elaborately constructed a spatio-temporal synchronous modeling mechanism to learn localized spatio-temporal correlations and designed multiple modules with different periods to model spatio-temporal heterogeneity for traffic speed forecasting.

Due to shallow GNNs incapable of capturing long-range spatial correlations, Fang et al. [23] developed a tensor-based ordinary differential equation (ODE) network to model spatio-temporal dependencies for traffic speed forecasting. This work applied deeper networks to learn spatio-temporal features synchronously, which constructed a semantical adjacency matrix to obtain spatial features and elaborately designed a temporal dilated convolution structure to extract long-term temporal dependencies. To obtain the dynamic spatio-temporal dependencies, Lu et al. [90] combined the graph sequence neural network with a horizontal attention mechanism and a vertical attention mechanism to process graph sequences for traffic speed prediction.

5.3 Traffic Demand Forecasting

Traffic demand is the number of passengers with pick-up or drop-off demands, such as ride-sharing, taxi, or bike sharing, for a particular region in the observed time interval. Accurate traffic demand prediction can help to guide an efficient disposition of supplies.

For the spatial representation of traffic demand forecasting, scholars usually employ grid-based [85], multi-graph fusion [36], and dynamic graph methods [58]. For the temporal representation of traffic demand forecasting, general methods are RNN and its variants [36, 91]. Early, Du et al. [91] designed a dynamic transition CNN to obtain spatial distributions for traffic demand forecasting and to learn dynamic demand evolution. This grid-based traffic demand prediction method utilizes deep learning methods to model the spatial and temporal correlation. Then, Geng et al. [36] constructed a neighboring graph, a functional similarity graph, and a transportation connectivity graph to learn the non-Euclidean spatial dependencies. This work is a grid-based spatial structure. The neighborhood graph was designed based on spatial proximity. The functional similarity was used by point-of-interest similarity vectors. The transportation connectivity was constructed based on the connection through motorways, highways, or public transportation. It utilized GNN to learn the three graph features fused the outputs and applied a contextual gated RNN to model temporal features for ride-hailing demand forecasting. This study employs various predefined geographic adjacency or other function graphs to represent the complex spatial semantic information, but it ignores the dynamic spatial and temporal modeling. Therefore, Huang et al. [58] developed a dynamical spatial-temporal GNN model to achieve the traffic demand prediction task.

5.4 Other Hybrid Traffic Forecasting

Besides the above three classes of traffic forecasting tasks, we summarize the other hybrid traffic forecasting tasks, such as traffic accident forecasting and delay time forecasting.

  1. 1.

    Traffic Accident Forecasting: The target of traffic accident forecasting is to predict the traffic accident numbers in the history time series. Wang et al. [51] designed a model to learn spatio-temporal dependencies from geographical and semantic perspectives for traffic accident forecasting. Zhou et al. [92] constructed a unified framework to solve urban accidents sparse issues from finer granularities and multiple steps aspects for traffic accident forecasting.

  2. 2.

    Delay Time Forecasting: The delay time of certain lines in the railway system is vital for urban public transportation management. Heglund et al. [93] utilized the sub-graph idea and constructed a graph-based formulation of a subset of the British railway network to predict the cascading delay time of the railway.

6 Public Datasets

To facilitate the participation and contribution of other researchers, we collect and organize manifold relevant open datasets about traffic forecasting tasks. A detailed description of these open datasets is provided in the following. The list of traffic forecasting datasets is shown in Table 4.

  1. 1.

    TaxiBJ [33]: This dataset contains the taxi inflow and outflow generated from more than 34,000 taxicab GPS data in Beijing from Jul 1st, 2013 to Oct 30th, Mar 1st, 2014 to Jun 30th, 2014, Mar 1st, 2015 to Jun 30th, 2015 and Nov 1st, 2015 to Apr 10th, 2016. This grid-based dataset has 32 \(\times\) 32 grids, and the taxi flow data are counted every 30 min.

  2. 2.

    SHMetro [35]: This dataset records 811.8 million transaction records of the metro system in Shanghai. It contains 288 metro stations and 958 physical edges. The time is from Jul 1st, 2016, to Sep 30th, 2016. The time interval of each station’s inflow and outflow is 15 min.

  3. 3.

    HZMetro [35] This dataset records 58.75 million transaction records of the metro system in Hangzhou. It contains 80 stations and 958 physical edges. The time slice size of each station inflow and outflow is 15 min. The time is from January 1 to January 25, 2019.

  4. 4.

    SHSpeed [94]: This dataset contains 10-minute traffic speed data for taxi trajectory data from 1 April 2015 to 30 April 2015. It includes 156 urban road segments in the central area of Shanghai.

  5. 5.

    METR-LA [43]: This dataset contains traffic speed or volume derived from the Los Angeles County road network highway. It involves 207 loop detectors. The time slice is 5 min. The time is from Mar 1st, 2012 to Jun 30th, 2012.

  6. 6.

    Performance Measurement System (PeMS): This dataset contains over 18,000 vehicle detector stations from the freeway system from all major metropolitan areas of California. The time is from 2001 to 2019. The samples are collected every 30 s and aggregated to 5 min. Each record includes a timestamp, station ID, district, freeway, direction of travel, total flow, and average speed.

  7. 7.

    Bike-NYC [91]:The dataset derives from the NYC Bike system and involves 416 stations. The period is from 1st Jul. 2013 to 31st Dec. 2016.

  8. 8.

    Taxi-NYC [91]: The taxi trip collects from both the yellow and green taxis in New York and records pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts for each trip. The time is from 2009.

  9. 9.

    T-drive [95]: This dataset contains many taxicab trajectories derived from 30,000 taxis in Beijing. The time is from Feb 1st, 2015, to Jun 2nd, 2015. It is utilized to learn smart driving directions and provide a user with the practically fastest route to a given destination at a given departure time.

  10. 10.

    LargeST [96]: This is a large-scale traffic forecasting dataset with 8,600 sensors, which consist of sensor ID, latitude, longitude, district, country, highway location, lane, type, direction. It’s divided into three subsets, including Greater Los Angeles (GLA), Greater Bay Area (GBA), and San Diego (SD). This dataset provides a vital foundation for large-scale traffic prediction research.

Table 4 The list of traffic forecasting datasets

7 Challenges and Future Directions

In this section, we discuss some general challenges for traffic forecasting tasks. Besides we also present the possible future research directions.

7.1 Challenges

We have investigated the realms of traffic data analysis, traffic data modeling, and its various applications. Despite the notable progress made in traffic prediction, several challenges remain that demand attention. We have outlined these challenges below:

  1. 1.

    Heterogeneous Data: Traffic forecasting tasks can leverage traffic data and external factors, such as weather and POI, to train the model. However, traffic data is usually confronted with missing data, sparse data, and noise. Therefore, heterogeneous traffic data fusion has been a challenging problem in the traffic domain.

  2. 2.

    Multi-task Prediction: In traffic forecasting fields, a multi-task model usually integrates all traffic information and jointly trains multiple transportation forecasting models. For instance, Li et al. [97] employed knowledge adaption to adaptive learn the knowledge from a station-intensive source to station-sparse sources for traffic demand prediction. Wang et al. [98] developed continual learning to train the model sequentially when data from various tasks obtains in a streaming manner. The main challenges are how to model the diverse data formats and the various inherent spatio-temporal correlations in different traffic forecasting tasks.

  3. 3.

    Model Interpretability: Machine learning, deep learning, and graph convolution network models have been highly successful in learning complex traffic data for traffic forecasting. However, these models are black box and end-to-end models, lacking interpretability and explainability, which have received increasing attention. While there have been studies on the explainability of Graph Neural Networks (GNN) in other domains [99,100,101], there is still limited research in the traffic prediction area. Therefore, a primary challenge lies in applying explainability techniques to the task of traffic forecasting [102, 103].

7.2 Future Directions

In this section, we suggest the following directions for further investigation in traffic prediction.

  1. 1.

    Large-Scale Graph-Based Traffic Forecasting: Most of the existing studies mainly consider how to deal with small-scale traffic graph data. However, many monitoring/stations in the smart city can construct a large-scale traffic graph. Therefore, it is essential to design a model to handle the traffic prediction task of a large-scale traffic graph.

  2. 2.

    Self-Supervised Based Traffic Forecasting: Graph contrastive learning has been verified effective for improving the performance of deep learning models [104], especially, data augmentation for GNNs has been identified as useful for semi-supervised node classification task [105]. However, graph contrastive learning has been explored less for spatio-temporal graphs. In particular, temporal dependencies and adaptive data augmentation are not utilized in existing graph augmentation methods. Thus, adaptive data augmentation techniques can be leveraged to identify the importance of edges/nodes to boost the performance of traffic forecasting.

  3. 3.

    Knowledge Guided Traffic Forecasting: In traffic big data, there is a lot of traffic knowledge, such as explicit or implicit traffic knowledge. The explicit traffic can be the external data, such as weather, air quality, POI, etc. The implicit traffic knowledge is defined as the hidden or latent knowledge, such as traffic potential fields [84, 106], traffic causal knowledge [107], traffic knowledge graph [108,109,110,111], etc. How to mine, represent, and integrate traffic knowledge is a hot direction for accurate traffic forecasting.

  4. 4.

    Large Model Guided Traffic Forecasting: In the urban spatio-temporal prediction domain, Huawei company proposed the Pangu large model [112] for weather forecasting, which employed a three-dimensional deep network to handle the complex weather pattern. The traffic data is multi-source (e.g., weather, accidents, etc.) and multi-modal, number, image, video, etc.). Thus, how to integrate the large model with traffic data to achieve traffic prediction is a promising research direction.

8 Conclusion

This paper presented a comprehensive review of the overall architecture of traffic forecasting. We first summarized related survey works about traffic prediction, pointed out the differences between our study and these existing surveys, and gave the traffic forecasting preliminary. Then, three critical parts of traffic forecasting were summarized: traffic data analysis, traffic data modeling, and traffic forecasting applications. We investigated traffic data from traffic data collection, traffic data formats, and traffic data characteristics. Moreover, we reviewed traffic data modeling from spatial representation, temporal representation, and spatio-temporal representation. The applications of traffic forecasting were discussed, including traffic flow forecasting, traffic speed forecasting, traffic demand, and other hybrid traffic forecasting. Furthermore, we provided the latest collection of open datasets for traffic forecasting. The challenge and future direction were also further pointed out in the following research.