A recurrent neural network for urban long-term traffic flow forecasting

This paper investigates the use of recurrent neural network to predict urban long-term traffic flows. A representation of the long-term flows with related weather and contextual information is first introduced. A recurrent neural network approach, named RNN-LF, is then proposed to predict the long-term of flows from multiple data sources. Moreover, a parallel implementation on GPU of the proposed solution is developed (GRNN-LF), which allows to boost the performance of RNN-LF. Several experiments have been carried out on real traffic flow including a small city (Odense, Denmark) and a very big city (Beijing). The results reveal that the sequential version (RNN-LF) is capable of dealing effectively with traffic of small cities. They also confirm the scalability of GRNN-LF compared to the most competitive GPU-based software tools when dealing with big traffic flow such as Beijing urban data.

analysis.The traffic flow is computed by counting the number of objects (cars, passengers, cabs, buses, etc.) that cross a given location during a time interval.In the last few years, several learning algorithms have been proposed for traffic flows forecasting [1][2][3][4][5].However, these algorithms are only able to predict short-term flow, i.e flows represented by a single flow observation.That is, they are only able to provide short-term flow forecasting, but not long-term.Long-term flows, is defined by the set of flow sequences captured during a specific time period [6,7].In the last decade, several works have been proposed for sequence data forecasting.Zhao et al. [8] proposed the global weighting algorithm for forecasting sequence data retrieved from real image datasets.The sequence data is constructed from the set of the training data, which is viewed as basis function.Each sequence data is estimated by weighting all the basis functions using the average distance of all the training data.The results of the global weighting approach are satisfactory when the training and test data have high similarity values.However, the accuracy of this approach tends to decrease when there is a high gap between the training and the test images.Chen et al. [9] developed the softmax regression approach by considering the posterior emotion as a softmax transformation of linear functions of the features of each sequence data.The learning parameters are obtained by minimizing the errors between the predicted results and their ground truth using the gradient descent method [10].Chang et al. [11] developed the shared sparse learning approach, with a main assumption that the shared sparse coefficient is used to match between each data sample and the remaining training sequence data using the L − 1 norm.Lv et al. [12] developed a hybrid supervised and unsupervised model to automatically predict driver's braking intensity in a smart vehicle system environment.The cylinder brake pressure data is fitted to the gaussian distribution and labeled to three classes (low, moderate, and intensive) using the gaussian mixture model.Random forests and the artificial neural networks are then incorporated to predict braking intensity using the generated labeled gaussian distribution and the vehicle state information retrieved from the controller area network bus signals.Most sequence data forecasting apply optimization techniques.Obtaining the best fitting parameters for these techniques is challenging, which reduces the accuracy of the prediction process (noabely in scenarios of heterogeneous data).Therefore, these methods could not be applied for long-term traffic flow forecasting, and new methods are needed.

Motivation
For illustration, let us consider a daily scheduler of a person presented in Fig. 1.This person has some flexibility to arrange his daily activities from 11:00 onward.His lunch may be scheduled between 11:30 and 13:00, some personnel business between 15:00 and 16:00, and he may return home anytime from 17:00 to 20:00.Having an accurate long-term traffic flow forecast will enable this person to make optimal schedule of these activities (early morning or overnight) while minimizing the wasted time on the traffic jam.The existing models only allow to predict a short-term traffic flow and not the flow over long interval, which is required in this scenario.This motivates the need of models for longterm flows in several time intervals and arises the following questions: 1. How can we efficiently represent the long-term flow values for different time intervals?
2. How can we predict a new long-term flows from historical data? 3. How can we improve the accuracy of the existing sequence data forecasting process? 4. How can we predict real long-term flows from the real city data?

Contributions
In this work we consider the aforementioned questions and propose a new framework for predicting long-term traffic flows.To the best of our knowledge, we are the first to predict the long-term traffic of flows.The main contributions of the paper are summarized as follows:

Outline
The paper is organized as follows: Section 2 relates the existing traffic flow forecasting algorithms.Section 3 presents the proposed framework for long-term traffic flow forecasting.Section 4 presents GPU implementation of the framwork.Experimental analysis of the two real case studies is shown in Section 5. Section 6 presents the main findings of applying the proposed framework on urban traffic data.Section 7 concludes the paper and previews future work.

Literature review
In the last few years, several studies have been investigating traffic flow prediction [1,3,13].Yang et al. [2] used a stateof-the-art deep learning model (auto-encoder Levenberg-Marquardt) for improving the accuracy of traffic flow forecasting.It was designed using the Taguchi method to develop an optimized structure and learn traffic flow features.A layer-by-layer feature granulation was used with a greedy layer wise unsupervised learning algorithm.Huang et al. [4] proposed a hybrid model, which incorporates the online seasonal adjustment factors and adaptive Kalman filter to model the high traffic flow rate in a seasonal period.Four seasonal adjustment factors (daily, weekly, long daily, and long weekly) are first determined using online seasonal adjustment factors.The adaptive Kalman filter is then established to estimate high traffic flow rate upon a normality assumption.Zhang et al. [14] proposed the use of machine learning and evolutionary computation to predict flow rate.The most relevant features are first selected using both random forest and the genetic algorithm.The best features are then plugged into a support vector machine for learning and predicting new flow rates.Daraghmi et al. [15] used negative binomial model to smooth both spatial and temporal variables in the traffic flow forecasting.Chen et al. [5] proposed an ensemble learning approach represented by multiple non-linear least squares support vector regression models for predicting traffic flow.To adjust the parameters of such models, a heuristic harmony search approach has been applied.Chan et al. [16] treated the problem of timevariation between the historical flows and the new flow.They integrated the particle swarm optimization on the fuzzy neural network for predicting short-term traffic flow.We conclude from this review that solutions of traffic flow prediction (including those based on deep leaning) are restricted to a short-term flows, in which only a single value of flow is observed.To the best of our knowledge, there is no work related to long-term flows in the context of urban traffic forecasting.The remaining of this paper addresses this by proposing a novel framework that integrates an RNN for long-term traffic flow forecasting based on the weather information, the historical traffic flow data, and the contextual information.

Proposed approach
Figure 2 presents the general predictive long-term traffic flow framework.It includes three steps: 1. Data Collection: Urban sensing and data acquisition technologies are used to collect urban traffic data.Each row data represents one observation defining the date, the time, and the type of objects (vehicles or bikes) that pass through the given location.Table 1 gives an example of data collection obtained in Gronlandsgade, Odense (Denmark).2. Extraction and Merging: First, the long-term traffic flows are extracted from the urban traffic data obtained in the previous step, and then the flows are merged with  The detail explanation of the main steps of this framework is given in the following.

Extraction and merging step
A long-term traffic flow database is created in this step.The long-term traffic flows are first extracted from the urban traffic data obtained in the data collection step.The daily long-term traffic flows are then merged with the daily weather information, and the daily contextual information.
Extraction A traffic flow is defined as the number of vehicles passing through a location in the road network during a given time interval.A long-term traffic flow (D F ) links flow values to their likelihood of occurrence during a given period of time.We estimate the probability of each D F on the basis of their empirical counterparts based on real-life measurements.Let I be the set of time instants at regular time intervals at which flow measurements are collected in a specific location, and let X = {x 1 , ..., x |I | } be the set of corresponding flow values.Let λ be the duration of the time interval between two succesive measurement instants, and μ the duration to be considered by each flow measurement x ∈ X.For example, X can be the set of number of vehicles or bikes passed through a location in the previous hour measured every 5 minutes, collected during a period of one month (30 days).In this case, λ = 5 minutes, μ = 60 minutes and the number of measurements is |I | = 43200/5 = 8640.From I we can extract a collection T = {T 1 , . . ., T r } of non-intersecting subsets of τ consecutive time instants.For a subset T j , j = 1, . . ., r, we identify the time instant when the subset starts with ι(T j ).That is, T j = {ι(T j ), ι(T j ) + 1, . . ., ι(T j ) + τ }.To each T j , j = 1, . . ., r, it corresponds a set of flow measurements X T j .For example, each X T j , j = 1..7 can contain the flow measurements between 7:00 and 9:00 for each day of a week.The flow measurements in each set X T j , j = 1, . . ., r, can be represented as continuous random variables Y j ∈ N 0 to capture the uncertainty related to those measurements.Consequently, each Y j can be described by its probability mass function f Y j : To estimate f Y j = f j , we use the empirical density function fj of the flow given by the relative frequency of the measurements in X T j , i.e., where, Example 1 Table 2 illustrats how to extract the long-term traffic flows from real traffic data of the week days retrieved from Gronlandsgade, located at Odense city in Denmark.
We an interval size of 20, which allows to generate the following intervals: The final long-term traffic flows is given as follows:  The contextual information is also composed with several features, which represents the profiling of the day.In this paper, we are limited to the following two features.
-(Weekend end vs regular) day: It includes the type of the day in the observation, 0 for weekend day, 1 for the regular day.-Event day: It indicates if the day includes specific events such as new year day, national celebration day, or others.We set 0 for event day, 1 for non-event day.
In the long-term traffic flows database DB = {DB 1 , DB 2 ...DB r } , each row DB i is composed by a tuple < F i , W i , C i >.F i = {F i1 , F i2 ...F ik } is its long-term traffic flows, while k is the number of all possible flow intervals.W i = {W i1 , W i2 ...W ip } is its related weather information, while p is all possible weather features.C i = {C i1 , C i2 ...C in } is the set of contextual features, n is all possible contextual features.Table 3 presents an example of the long-term traffic flows database by considering 5 long-term traffic flows, 4 intervals, 3 weather features including conditioning: CD, average wind temperature degree: AWT, and windy speed: WS, and 2 pieces of contextual information, weekend vs. regular days: WR, and the whether it is an event day: E.

Learning step
The long-term traffic flows of the new observation is predicted in this step using recurrent neural network (manyto-many) architecture.As sketched in Fig. 3, the input data consists of the flow information, the weather information, and the contextual information of the current day, while the output data is a long-term traffic flow for the next day.We applied a multilayer feedforward neural network.Each neuron in layer l is linked with all neurons of the layer (l − 1) using different weight values.The neurons of the input layer is associated to each input data F i−1 (k possible intervals), W i−1 (p features of the weather information), and C i−1 (n features of the contextual information).The number of neurons in the input layer is k + p + n.The neurons of the output layer are connected to the output of the network ( Fi1 , Fi2 ... Fik ).The aim is to minimize the error between the output data of the network and the long-term traffic flows F i such that, The output of the m th neuron in the layer l, noted s m l , is given by (3).Note that the sum of the outputs of all neurons in the given layer should be equal to 1.This simulates the output of the long-term traffic flows.At each iteration i, the updating weight rule is given as follows, Where μ is the learning parameter rate, and Algorithm 1 presents the pseudo-code of the RNN-LF algorithm.It starts by initializing weight values.The function GenerateRandomValue generates random values between 0 and 1.At each iteration, i, the neurons of the input layer receives the input data consisting of the flow information, the weather information, and the contextual information.This input is explored by all neurons of the network using the Computing function, which is calculated by (3).The output of the network is compared to the output data, and the error is determined using (5).This error value is propagated across the network to update the weight value as illustrated in (4).This process is repeated until all the input data is processed.This allows to minimize the error function shown in (2).

Complexity analysis
The theoretical complexity cost of the proposed framework is divided into the following costs:

GRNN-LF
Generally speaking, the recurrent neural network models need massive training data in the learning process.This requires high computational time in a single CPU machine.The emergence of HPC tools such as CPU multi cores, GPU, and cluster computing allows to boost the performance of such models.We are interested in this paper by GPU computing.Several GPU-based software (including machine learning algorithms) have been developed for real world applications, e.g., [17][18][19][20].However, these tools are limited, i.e, they come with predefined parameters and do not provide flexibility for users.For instance, a user cannot specify which tasks should be performed in parallel in GPU, how to distribute tasks to the GPU blocks, event to the threads-block, how to manage the shared memories of GPU blocks.To address such limitation, we need to deeply understand GPU computing and goes with more low level tools such as CUDA libraries, 1 which directly communicates with the hardware GPU components.
In this section, we follow this approach and propose a new algorithm (GRNN-LF) that deals with efficient implementation of RNN-LF on GPU.

RNN-LF asnalysis
To design an efficient GPU-based approach, the most time consuming tasks for RNN-LF version should be determined.Algorithm 1, RNN-LF is divided into three steps: i) Initialization of weights (from line 3 to 9), ii) Computing outputs (from line 10 to 20), and iii) Updating weights (line 21).Flags are available after and before each of the three steps of RNN-LF mentioned above.Table 4 presents the experimental results for every step of the RNN-LF algorithm separately using different number of training data.Notice that by increasing the training data size from 1,000 to 10,000, the computing outputs and updating weights are clearly the most consuming tasks; The computing outputs task exceeds 77%, whereas the updating weights does not exceed 26% for the overall RNN-LF process.This explaines

Principle
GPUs (Graphic Processing Units) are graphical cards initially developed for video games, but their use as a powerful computing tool is now gaining field at many software application domains.The GPU hardware is composed of two hosts, i) the CPU and, ii) the GPU hosts.The former contains one processor and one main memory.The latter is a multi-threading system that consists of multiple computing cores, each core executes a block of threads.Threads of a block in the same core communicate with one another using a shared memory, whereas the communication between blocks relies on a global memory.The CPU/GPU communication is made possible by some hardware buses.
In the following, GRNN-LF is introduced as an adaptation of RNN-LF on GPU architectures for boosting the runtime performance.The most time-intensive operations are sent for GPU processing, whereas the less time consuming tasks are kept in CPU.The initialization of weights is first performed on CPU, and then the training data is then transmitted to the GPU that is used to update wrights and compute the outputs.Two main operations are distinguished: 1. Computing outputs: The process starts by handling the second layer using the output of the first layer, and so forth with the next layers until all layers have been processed.Algorithm 2 presents the pseudo-code of GRNN-LF using standard CUDA operations.
From a theoretical standpoint, GRNN-LF improves the sequential version of RNN-LF by exploiting the massively threaded computing of GPUs while computing outputs and updating weights.GRNN-LF also minimizes the CPU/GPU communication, by defining only two points of CPU/GPU communication.The first one takes place when the training database is loaded into the GPU unit, and the second when the final weight values are returned to the CPU.Moreover, GRNN-LF does not suffer from any threads divergence, because each thread deals with one multiplication for both computing outputs and updating weights steps.It also provides an efficient memory management, which explores the different shared memories of the blocks.However, synchronization between threads is needed at each step.
Three synchronization points are then required at each iteration.The first point is when computing the output of each neuron, the second is observed when switching between layers for computing outputs.The last point of synchronization is needed when passing to another entry in the training data.

Performance evaluation
A number of experiments have been carried out using real traffic flow datasets from two different cities, Odense and Beijing.In the following, we first test RNN-LF approach on Odense urban traffic data using several configuration (location, number of hidden layers, and the learning rate) and compare it with the baseline data sequence forecasting algorithms.The ability of predicting real longterm traffic flows is also demonstrated.GRNN-LF is then evaluated using both Odense and Beijing traffic datasets and compared with the baseline GPU machine learning approaches.In the following, the interval time used for determining each flow value is fixed to 5 minutes, and each long-term traffic flows is measured within 4 hours (from 8 to 12) of each day.To ensure fair comparison between the proposed solution and the baseline algorithms, the experimentation has been carried out in the same environment and the same operating system.The results obtained are the average of 100 different tests.To prevent overfitting, we used the well-known dropout function available in Keras.We also used a high number of epochs values to prevent the underfitting (100 for sequential version, 1000, and 2000 for GPU-based version).

Data description
Two kinds of data have been used: 1.The first is a real urban traffic data from Odense Kommune (Denmark). 2The data is a set of rows, where each row contains information related to the cars detected at specific locations such as the gap, length, location, date time, speed, class, as well as the weather data.The location is represented by lattitude and longtitude dimensions, the speed is calculated in km/h, and the date time (in the format YYYY-MM-DD hh:mm:ss) represents the year, the month, the day, the hour, the minute and the second that the car is passed by the given location.The most important information of each car is given as follows: The class is an integer that defines the type of the vehicle or the bike, e.g., 2 represents a passenger car.Tempm computes the average daily temp in C, Wspdm determines the wind speed in km/h, and Conds defines a description of conditions weather.In this study, we focus on ten locations described in Table 5.The traffic data input is obtained from Odense flow that are observed between 1 st January 2017 and 30 th April 2018.2. The second is a real urban traffic data from Beijing traffic flow 3 .It consists of more than 900 million traffic flow entries during a two-months time period in one location.The most important information of each car is given as follows: The Class in this dataset defines the type of vehicle or bus.

RNN-LF vs. sequence data prediction algorithms
The Odense dataset is used in this part of the experiments.The predictive rate is defined as the number of long-term traffic flows that are correctely predicted over the tested all long-term traffic flow.The first part of this experiment is to tune the RNN-LF parameters.Figure 4 presents the predictive rate of the RNN-LF algorithm using different locations in the city, different number of hidden levels (3,5,10), and different learning rate values (0.2, 0.5, 0.8, and 1.0).By varying the number of learning rate values from 0.2 to 0.8, and the number of hidden levels from 3 to 5, the predictive rate increases for locations.When the learning rate set is to 1.0 and the number of hidden to 10, the predictive rate decreases.The Gaussian function is used as it simulates a sequence much better than the other activity functions (e.g., Sinc function).In conclusion of this experiment, we set i) the number of hidden levels to 5, and ii) the learning rate to 0.8, as the best fitting parameters of RNN-LF.The next experiment investigate how RNN-LF can predict new long-term traffic flows.Figure 5 shows the long-term traffic flows predicted -The integration of several sources of data; flow information, the weather, and the contextual information.-The recurrent neural network approach that allows to learn multiple output form multiple input of different sizes.-The configuration used in the recurrent neural network that uses the best fitting values for the parameters (the number of hidden layers, and the learning rate).

GRNN-LF performance
The performance of GRNN-LF using different dataset sizes is tested in this part.Figure 6 shows the speed up of GRNN-LF compared to the sequential version using Odense traffic data.The speed-up metric is calculated as λ 1 λ b , where λ b is the runtime using b GPU blocks.By varying the location used, the speed up of GRNN-LF is more than 50 for nondense location (low traffic flows) and reaches 160 for dense location (high traffic flows).The last experiment considers big databases (Beijing traffic flow data) and compares GRNN-LF to some state-of-the-art GPU-based recurrent neural network software tools such as Agib's work [17], and Du's work [18].Table 6 presents the runtime by varying the number of flows in million from 100 to 900.The results show that GRNN-LF outperforms all the other GPU-based algorithms.GRNN-LF deals with Beijing traffic flow in less than 3300 seconds, while the best performing GPUbased recurrent neural network software takes about 4000 seconds.These results confirm the effectiveness of GRNN- LF that performs intelligent mapping between the training data and the GPU massively threaded.However, the other GPU-based software tools are not flexible and proposed a general mapping without any deep analysis of the problem to be solved in GPU.

Discussions
This section discusses the main findings and limits from the application of our approach to both Odense and Beijing real traffic data.learning technique into a specific application domain requires methodological refinement and adaptation.In this context, we argue that our approach benefits from the knowledge extracted in the refinement step that shifts the intelligence required for predicting the longterm traffic flows.
However, there are some limitations of the application of our approach such as, -The prediction model is based on three dimension (flow information, weather information, and contextual information) of one location.In contrast, the flow could be influenced by other information of other locations.Therefore, spatial information could also be used for the long-term traffic flow forecasting.Further, studying the different correlations among different long-term traffic flows allows to better understand the urban traffic data, and consequently increas the long-term traffic flow forecasting.
-In this research study, the considered contextual information could be coarse-grained for some applications.For instance, regular urban traffic data might be similar both in weekdays or weekends in some areas in a big smart city, where any difference of traffic could be identified for both weekdays or weekends.Thinking about other contextual information for some specified scenarios may improve the long-term traffic flow forecasting.-The prediction model is based on the database of entire long-term traffic flows.This gives satisfactory results for datasets of small and medium size.However, the accuracy decreases for large and big datasets.e.g., Beijing traffic data.One direction to solve this is to study the correlation between long-term traffic flows.For instance, by grouping the long-term traffic flows into similar clusters and applying the learning model on these clusters separately.Another way to address this issue is to preprocess the data, by ignoring noise (detecting outliers), and/or extract relevant features (feature selection and extraction).

Conclusions and perspectives
A novel traffic flow prediction framework has been proposed in this paper.It aims to learn long-term traffic flows from multiple data sources.In this framework, the set of long-term traffic flows with weather and contextual information is first generated.The RNN-LF algorithm is then used to predict new long-term traffic flows.The scalability of the proposed framework has been investigated, and the results show that RNN-LF outperforms the stateof-the-art learning models for predicting sequence data.
In addition, RNN-LF could predict long-term traffic flows from real case of Odense traffic flow data.To deal with big traffic flow data in real time, HPC-based version of RNN-LF has been developed.The approach called GRNN-LF has been implemented on GPU by developing an efficient mapping between the threads-block and the training data, and memory management optimization between different level of GPU memories.The less time-intensive task (initialization of weights) is performed on CPU, while the two most intensive-time tasks (computing outputs and the updating weights) benefit from the massively GPU threaded.In the computing outputs task, each block is mapped onto one neurone, and every thread is charged to compute the output of the connection between this neurone and one neurone of the previous layer.For the update of weights task, each block of threads is mapped onto one neurone, and every thread is charged to update the connection between the neurone of its block and one neurone situated in the previous layer.The results reveal that the GPU-based approach reach more than 160 speed up compared to the sequential RNN-LF when dealing with Odense traffic data.The results also show the superiority of the parallel version of GRNN-LF compared to the existing GPU-based solutions for neural network learning when dealing with big Beijing traffic data.Motivated by the promising results reported in this paper, we plan to investigate the following:

Fig. 3
Fig. 3 RNN-LF framework ) is the activation function.|l| is the number of neurons in the layer l. s j l−1 is the output of the j th neuron in the l-1 layer.ω mj l−1 is the weight value that connects the neurons s m l and s j l−1 .b m l is the bias value associated to the neuron s m l .

1 .|DB i | scans. 2 .
Long-term traffic flow database construction cost: The long-term traffic flow database is built from the traffic flow.This operation requires |DB i | scans of the i th long-term traffic flow.Therefore, this operation needs t i=1 Learning cost: The complexity cost of the recurrent neural network is O(L), where L is the number of layers.The complexity cost of the proposed framework is O( i |) + (L)).
3 https://www.beijingcitylab.com/by RNN-LF compared to the real long-term traffic flows that uses the global weighting algorithm on Anderupvej location.This figure shows correlation between the result returned by RNN-LF and those for real long-term flows.This result confirms the superiority of the proposed framework as compared to other existing algorithms.The following features contributed into this performance:

Fig. 5
Fig.5 The Original and the predicted long-term traffic flow by RNN-LF and the global weighting algorithm

Table 1
Data collection: Example of gronlandsgade location in Odense (Denmark) the historical long-term traffic flow database designed in the previous step.The output data is the long-term traffic flows of the new observation.

Table 4
Ratio of CPU time (%) of the main RNN-LF tasks on different training data size . With, n neurons of each layer, n blocks, and n threads per block are required to compute the outputs of the given neural network.GNN-LF defines a local table, say table j , for computing the weights of the neuron s Each block of threads is mapped onto one neuron, and every thread is charged to update the connection between the neuron of its block and one neuron situated in the previous layer.With, L, layers and, n neurons of each layer, L × n blocks, and n threads per block are required to compute and update all weights of the given neural network.GRNN-LF also defines a local table (table i ), for updating the weights of the i th neuron of the neural network.This table is allocated in the shared memory of each block.
The process takes L iterations, where L is the number of layers in the neural network.At each iteration i, each block b j is mapped onto the neuron s

Table 5
Odense location description

Table 6
Runtime (seconds) of GRNN-LF and baseline GPU algorithms for Beijing traffic flow data