Examining vehicular speed characteristics through divergences from prior distributions

A variety of approaches, within literature, has been conducted to interpret vehicular speed characteristics. This study turns the attention to the entropy-based approaches, and thus focuses on the maximum entropy method of statistical mechanics and the Kullback–Leibler (KL) divergence approach to examining the vehicular speeds. The vehicle speeds at the selected highway are analyzed in order to find out the disparities among them. However, it is turned out that the speed dynamics could not be distinguished over the speed distributions; hence the maximization of Shannon entropy seems insufficient to compare the speed distributions of each data set. For this reason, the KL divergence approach was performed. This approach displays the comparison, among the speed distributions, based on two prior distribution models, i.e., uniform and Gauss. The examination of the trends of KL divergences obtained from both distributions was made. It was concluded that the KL divergence values for the highway speed data sets ranged between about 0.53 and 0.70 for the uniform case, while for the Gaussian case the obtained values are between 0.16 and 0.33. The KL divergence trends for the real speeds were obtained analogous for both cases, but they differed significantly when the synthetic data sets were employed. As a result, the KL divergence approach proves suitable as an appropriate indicator to compare the speed distributions.


Introduction
Over recent decades, traffic flow has become far more complicated and complex, especially as a result of the increasing number of vehicles in cities. Moreover, there could be many other possible factors for rising traffic complexity, e.g., location of signalization, land-use characteristics, interaction among vehicles. In such a complex traffic system, all factors are in fact directly or indirectly related with traffic flow parameters. Hence, in real traffic flow, the memory is non-Markovian, time is discrete, and time-dependence is present [1,2]. The long-range interactions also emerge, and the impact of those properties would make the speed distributions non-Gaussian, and these distributions could be crucial guide to interpret traffic flow dynamics. It can be expressed that the distribution functions of the real traffic flow parameters, e.g., vehicular speeds, might not be well described within the classic extensive entropy formalism, i.e., the celebrated Boltzmann-Gibbs (BG) thermostatistics approach [2]. In this approach, maximum entropy principle could not handle the speed dynamics, and thus not compare the speed distributions of traffic flow since it may not describe the real traffic flow sufficiently.
In literature, there are numerous approaches examining the complexity and dynamics of traffic flow and one of them is the use of entropy-based methods. In terms of this, recent various studies, e.g., [1][2][3][4][5][6] would be typical in the vehicular traffic arena. Of these, for example, Kosun and Ozdemir [1] focus on the platoon formation of vehicles and propose an upper and a lower limit of Tsallis q entropic index. In the study [2], vehicular speeds are considered within the framework of nonextensive thermostatistics. Superstatistical analyses are performed, and the pdfs of the vehicular speeds at highway are associated with q-Gaussian distributions. Two distinct Tsallis q values are specified, and the time-dependency and time-independency of the traffic flow are discussed. Zhang and Shang [3] propose multivariate multiscale distribution entropy, and the traffic congestion at different time scales is examined. Depending on the method, the complexity of the traffic system is assessed with the complexity-entropy causality plane. Yan et al. [5] construct complex networks of multivariate traffic flow time series, and examine the dynamics of the hourly data. The authors also use network structure entropy to analyze the network structure and the calculated values are interpreted. In addition, the study [7] is related to the multivariate multiscale sample entropy in traffic. Xu et al. [8] focus on the travel time predictability based on the historical travel time series. The refined composite multiscale entropy algorithm is performed, and the connection between the upper bound of travel time predictability and entropy is given. Another study [9] considers stimulus-response equation, and the flow-concentration relationship in the traffic is given. The entropy is maximized by the Lagrange multiplier technique, and then the equation involving flow, concentration and speed parameters is defined. Silva et al. [10] examine the global, daily and hourly vehicle velocity dynamics using information theory concepts considering the causality Complexity-Entropy plane which is also utilized in [11] to characterize the global velocity behaviors where the compatibility of the velocities with correlated noise f −k power spectrum with k ≥ 0 is stated. Both in [10] and [11], the Bandt-Pompe methodology is applied considering the time series of vehicle velocities. The studies investigate the behavior of the velocities in terms of noises in detail and present the correlation degrees in the light of different events.
Even though such methods concerning traffic complexity and dynamics are present in the literature, KL divergence approach is not usual in this context. In terms of utilizing relative entropy (KL divergence) in vehicular traffic, the studies concerning, e.g., driving conditions [12], travel time [13][14][15], vehicle following [16] could be given as examples. For instance, in the study [12], transition probabilities for the vehicle speed and acceleration with regard to highway and city driving are considered. Then, KL divergence analyses are performed, driving style, i.e., smooth and aggressive, and locations are the main concerns in the KL divergence analyses which distinguish similar and dissimilar driving conditions. Ernst et al. [14] focus on the travel time estimation, and their main aim is to obtain the divergence between the true and estimated distributions. For this purpose, KL divergence is utilized and two types of matching technologies, i.e., id matching and signature matching are also compared. In the study, the authors set forth the signature matching as the best choice since it minimizes the dissimilarity between the true and estimated travel time distributions. Furthermore, in order to quantitatively measure the similarity between the different driving styles, KL divergence approach is utilized in [17]. The study [18] examines the amount of naturalistic driving data which is required for understanding driver behaviors. The KL divergence method is used to test the similarity between density functions considering different amounts of the data. Another paper [19] focuses on understanding the characteristics of bus speed distributions. In the paper, two problems are emphasized in the investigation of speed distribution, which are briefly related with appropriate modeling for the speed distributions, and explaining the distribution characteristics, respectively. According to the authors, regarding the latter problem, there is relatively less research which focuses on performing analysis to explain the speed distribution characteristics. The KL divergence approach as a clustering measure is proposed in terms of ranking the road sections on a bus route. The accelerating behavior of driver is studied in [20] where KL divergence approach is used with Kernel Density Estimation in order to process the convergence examination algorithm. To find the required number of data, so as to obtain a convergent distribution of velocity is at issue in the algorithm. In [21], a technique identifying anomalies in flow distribution probabilities based on k-nearest neighbors and KL divergence approach is adapted to obtain the distances between distributions. The spatial and temporal traffic flow information are considered and the proposed outlier detection framework identifies the outliers for flow distributions. Although such examples could be expanded according to the traffic literature, the real speed distribution characteristics have not been examined through uniform and Gaussian prior distributions within KL divergence approach, revealing the similarities between the divergence trends based on them, hence this study aims to fill this gap. For this aim, this study examines the speed distribution characteristics of the highway in the city of İstanbul, Turkey, and points to the divergence from the randomness in the traffic flow.
To begin with, this study is posited on the analogy between molecules moving in a closed system and vehicles at the highway traffic flow. Here, it is considered that the traffic flow system is dynamic and the steady-state flow is present. It is also assumed that the kinetic energies of the vehicles are conserved. In a few words, initially, the maximum entropy method of statistical mechanics has been tested in the study in order to reflect the speed distribution characteristics. The speed distributions are analytically obtained, grounding on the maximum entropy principle, and the uncertainty parameter (beta) is identified with respect to Gauss distribution. However, it is inferred that this seems insufficient to compare the vehicular speeds since their characteristics could not be distinguished through the pdfs. Hence, this requires a different approach. In order to find out the characteristics of the speed distributions and compare them, the KL divergence approach is utilized. To perform the KL divergence approach, two reference (prior) distributions are considered, and thus for the lucidity of comparisons, one is determined as uniform distribution which represents the steady distribution of vehicle speeds, and the other is selected as Gauss distribution. KL divergence approach computes the deviations of speed distributions from both uniform and Gaussian distributions, and then compares them. A set of distinctive inferences is drawn as a result of employing the reference distributions in KL divergence analyses.

Method
Entropy is basically a measure of uncertainty and disorder in the system which is under scrutiny. Also, entropy exhibits an ignorance about the subject that one considers. In this paper, to examine the vehicular speed characteristics, two methods are utilized. These are the maximum entropy method of statistical mechanics and KL divergence approach, which are linked up with the speed distributions and the divergences of the distributions, respectively. These are briefly outlined as follows. Shannon [22] has made a connection between information/uncertainty about an event and probability distribution, and elaborated the concept of the entropy in the context of the theory of communication. Once the complete information is provided, then the entropy would be zero [23]. In this study, the maximum entropy principle is utilized in the framework of vehicular speeds. Accordingly, in order to understand the vehicular speed characteristics, the author examines the properties of the speed distributions of the molecules (vehicles) instead of dealing with each individual speed.

Maximum entropy method
The basic idea behind the maximum entropy method is the maximization of Shannon entropy Eq. (1) under a set of constraints. Conventionally, the constraints employed in the maximization of the entropy could be given as in Eqs. (2) and (3). where S is the entropy, k is a positive constant, p i is the probability of occurrence of the ith state, i is the state energy, E is the mean energy.
Equations (2) and (3) express the normalization of the probabilities and conservation of the mean energy, respectively.
For equal probabilities, the entropy is defined as in Boltzmann's celebrated formula S = k ln W , where W designates a set of discrete states, and p i = 1∕W . There is a connection between the entropy of a system and probability of its state [24]. One of the properties of this entropy is additivity, i.e., the total entropy is the sum of the entropies of the systems. For instance, the entropies of the two systems could be denoted S BG (A) and S BG (B) , respectively. Then, the total entropy is defined as , note that this is applicable for Boltzmann-Gibbs thermostatistics domain [25].
In the thermodynamic equilibrium state, the Shannon entropy equation is maximum under the given constraints [26]. The method of Lagrange multiplier is utilized in the maximization of the entropy function. The maximization of the entropy for the vehicular speed problem is given in detail under Sect. 3.

Kullback-Leibler (KL) divergence approach
Let p(x) and q(x) are two probability distributions [23] over a discrete random variable and the KL divergence between p(x) and q(x) is defined as in Eq. (4).
where p(x) is the posterior distribution, q(x) is the prior distribution.
KL divergence [27] is an information-led measure of distance between the given probability distributions p(x) and q(x) , and it indicates the divergence between the two distributions. However, this is not related with the metric measure.
The fundamental properties of KL divergence can be listed as below [23].
and thus KL divergence is not a true distance.
In this paper, KL divergence approach is utilized for calculating the divergences between the real vehicular speed probability distributions and the selected reference distributions as displayed in Sect. 4.

Revisiting the maximum entropy method for the vehicular speed problem
Vehicular speed data are in fact discrete. The author intends to adapt this discrete data to the continuous case. For this purpose, distinct from the conventional expression of mean energy constraint, the new constraint is defined over mean of speed squares. This assumption also leads to the identification of beta Eq. (12). Accordingly, the entropy S = −k ∑ i p i ln p i is maximized subject to the norm and the mean of speed squares constraints, as, respectively, denoted in Eq. (5), and Eq. (6).
where v i is the speed of the state i, and p i is the probability of the state i, v 2 is the mean of speed squares.
Let us define a F p i , , function covering the entropy and the aforementioned constraints for the vehicular speed problem, by Eq. (7).
Let us derive F p i , , in terms of variable p i , as indicated in Eq. (8).
where and are the Lagrange undetermined multipliers.
When p i is solved, the formulation is given below where 1 Z is the normalization constant which is defined in terms of Z in Eq. (10).
where is the inverse temperature in the thermodynamics sense. The Gaussian probability distribution is given as below in regard to speed variable v where v is the mean speed and v = 0 , C is the constant and 1 In order to identify , the Eqs. (9) and (11) are considered. As a result, is determined as Hence, the probability distribution for the vehicular speed data is expressed as below.

Data processing and analyses
Traffic studies may require different kinds of data; in terms of this, one may notice the references, e.g., [28,29] for some descriptions and applications. Concerning the data usage in more recent studies, for example, low-frequency probe vehicle data are considered, e.g., in [30] to identify the network-wide turn-level intersection congestion. In another study [31], the complementary usage of traffic counter and probe car data is taken into account in estimating the path travel time. Besides, for instance in the car following models, trajectory dataset is considered, e.g., when investigating the impact of signal status on headway in [32], and generating Markov chain and training neural network in [33]. Regarding the purpose of this study, only vehicle speed data measured at a fixed point on the road is needed. Hence, for the analysis of vehicle speeds, one month speed data based on 24 h' measurement obtained from the selected surveillance point at İstanbul highway is considered. The data were measured at two-minute intervals, corresponding to the average values. For this study, three lane (left lane, middle lane, right lane) speed data in one direction are separated in two groups, each belonging to the fifteen days. Then, these six different data sets are employed, as denoted on Table 1.
After determining the data sets, the inconsistent speed values (null values, and zero values) were identified and discarded from each data set. By means of the box plot (Fig. 1), the outliers were tried to be visualized. It is easily noticed that those are more pronounced in Data set #4.
However, some of them are not omitted due to the characteristics of the data since it is surmised that those stem from the nature of the speeds (e.g., interactions). Only the Data set #4 involves very low values such as 1, and 2 km/h, which would be the potential outliers. On examining all the data sets, the values less than 12 km/h were discarded so that the data sets are consistent among each other.
As expected that the dynamics of the vehicular speeds varies depending on the each data set, and in order to compare them, the maximum entropy method of statistical mechanics is first tested. The histograms of the speeds extracted from the considered six data sets of traffic flow are plotted in Fig. 2.
The traffic flow involves non-random correlations, longterm memory and these could be associated with the speeds. According to the speed data sets, as the author expected that the probabilities could not follow an inverse relationship with energies (speeds) as is the case with Boltzmann-Gibbs thermostatistics [34].
It is evident that the data does not comply with the fitting plots of a selected standard Gaussian distribution even if it will be scaled for each obtained histogram. Further, it is vague that how far the distributions are different from each other. Thus, this situation could not allow providing a reliable interpretation for making a quantitative comparison of the speed characteristics. As a result, to compare the vehicular speed characteristics by this method, i.e., maximum entropy method of statistical mechanics seems insufficient. For this reason, the paper puts forward the KL divergence approach which provides divergences and these become quantitative indicators for the comparison of the distribution characteristics. In this approach, two common distributions, i.e., uniform and Gauss are selected as references. It is also wondered in this study whether there is a similarity between the trends of divergences from both reference distributions considering each data set. Let us analyze the speed data sets through the KL divergence approach in the following section.

Exerting KL divergence approach
As mentioned, the distribution function which is given by Eq. (13) cannot describe the fluctuations in the real vehicular speeds sufficiently, due to loss of information. That is, the analytical solution of this function cannot handle the analysis of speed data sets efficiently. The distribution function in fact may well suit for the mean speed region in the traffic flow. However, the real traffic data is timedependent. At first glance, much as the appearances of some data sets seem close on the plots (Fig. 2), they have different dynamical behavior. The differences among dynamical behaviors of the speed data sets are outlined through the KL divergence approach. It is expected that the KL divergence values vary for each data set. Notice that albeit the existence of this variation, the flow does not exhibit nonequilibrium, i.e., the dynamic equilibrium is present overall, thus the system does not break down. In the following sections, one could find the results of the KL divergence analyses in the light of the two prior distributions.

The use of Gaussian prior distribution for KL divergence analysis: gaussian case
The fact that the distribution of vehicle speeds could not be described with Gaussian distribution most particularly stems from the tail region. Besides, the vehicular speed distributions have inherently different internal dynamics at each data set. To this end, the KL divergence technique tries to compare the dynamics of the speed distributions over Gaussian reference distribution in this section. This thereby allows for obtaining the differentiation from the Gaussianity. To be comparable, the KL divergences are calculated for six data sets based on a standard Gaussian prior distribution by Eq. (4). If the observed probabilities are considerably different from the probabilities of the Gaussian prior distribution, the related regions on the histograms could not be characterized with the reference Gaussian distribution. One can investigate the KL divergences on Fig. 3.
It is also inferred from the Gaussian case that, when the D KL values tend to zero, the stability of the system would rise, and the influence of the tail region becomes negligible. In this respect, one would state that the system is steady-state and time-independent. It bears a relation to Gaussian-like dynamics, and this results in near zero values in the KL divergences, one can investigate Fig. 5 in terms of obtaining the divergences. As a result, the KL divergence analysis based on Gaussian prior distribution could suffice to describe the tail effects of the real data. Thus, the large deviations from Gaussianity in the tail region could inform us about the stability and indicate the time-dependency of the traffic flow.

The use of uniform prior distribution for KL divergence analysis: uniform case
The uniform distribution is also selected as a reference since it has certain probabilities for the given speed intervals, and it could be easily comparable, hence it is meaningful to select the uniformity as a prior distribution. By all means, uniform prior could be employed to indicate the divergences of speed distributions from the uniformly distributed probabilities. This also shows us the degree of divergence from the evenness of the distribution, as well as exhibits the uncertainty about the incident. Another reason to select uniform prior is to inquire whether there is a similarity between the trends of the KL divergence values obtained through Gaussian and uniform prior distributions. In order to definitely understand the grounds of the divergences, one should examine the speed histogram bins one by one, while considering Eq. (4). For the uniform case, the fluctuations in D KL values (Fig. 4) mainly hinge on the probability values in the mean speed region (if the dominant probabilities do not emerge in the tails) this is because larger divergences stem from there.
The trends of D KL values obtained from both cases would be also relevant to surmise the dynamics of speeds at each data set in regard to crosschecking the D KL trends of the cases. That is, should there occur an apparent distinction between the trends of the divergences obtained from the two cases, one may suspect, e.g., the influential peaks in the distribution, which is clarified in the following section. Furthermore, in divergence diagrams (Fig. 5), in the mean speed and its nearby region, there is a similar trend of the divergences of both cases within each data set. It is also inferred that the influences of the tail region on D KL values are certainly significant in Gaussian case rather than in uniform case.

Results and discussion
This study attains the examination of the speed distribution characteristics based on Gaussian and uniform prior distributions through the KL divergence approach. For the Gaussian case, the significant probabilities in the tail region have much more influence on D KL values since the prior probability is lower in this region, comparing with the uniform case. The results of both cases in the KL divergence approach allow comparing the speed distribution dynamics of each data set. In Fig. 5, the divergences obtained from both the Gaussian and uniform distributions considering histograms are displayed. In the plots, the more deviation from zero, the more those regions have divergences from the defined Gaussian or uniform probabilities. The divergences locate in different ranges for each data set, which as seen on the vertical axes. On the tail region, near zero values are prevalent for uniform case, while the higher values and fluctuations are more pronounced in Gaussian case, and these are rather noticeable in Data set #4. As a common inference for both cases, a discernable variation in the values emerges in the vicinity of mean speed. Please note that such a variation may be basically associated with the height of the distribution, and thus, the use of the specified prior distributions.
In this study, the proposed approach is tested with two ways. To do this, firstly next generation simulation (NGSIM) data are utilized. Considering five freeway lanes, a 24 h' speed data are extracted from the Interstate 80 (I-80) Freeway NGSIM data set [35]. For this study, each lane speed data of the freeway are designated as Data set #7, Data set #8, Data set #9, Data set #10, and Data set #11, respectively. The zero values were identified and discarded from each data set. The data sets were scaled and the histograms are depicted in this respect (Fig. 6). The analyses were repeated for Gaussian and uniform case by employing these data sets. The results show that the KL divergence trends obtained from the two cases agree once again (Figs. 8 and 9).
Secondly, four synthetic data sets (named as Data set #12, Data set #13, Data set #14, and Data set #15) are generated so as to pursue the possible differentiation between the KL divergence trends of the two cases. Some extreme cases of distributions are also tested in this way. A heavy-tailed data set is generated at first (Data set #12). After employing this data set along with the real data sets, the similarity of the trends of each case disappears considering a fast decline in the uniform case. Further, a sharp peak is placed in the left tail, then in mean speed region, and then in both regions, which are seen in the histograms of Data set #13, 14, and 15, Fig. 4 KL divergence values for each data set based on uniform distribution respectively (Fig. 7). The analyses are performed again for the two cases. With regard to all the data sets, the depiction of the KL divergence results is found in Figs. 8 and 9 for Gaussian and uniform case, respectively.
It is found that the trends do not remain similar after employing the synthetic data sets (Figs. 8 and 9). This finding would satisfy the aforementioned wonder of the study. Thus, it is inferred that the dissimilarity between the trends may emerge under certain circumstances.
To sum up, the analyses are conducted through the real speed and synthetic data sets. In the light of this, it is tentatively suggested that the KL divergence trends obtained through uniform and Gaussian prior distributions are analogous; however, the trends would differ as the data distribution approaches uniformity or influential peak(s)/outliers occur in the distribution.
The following results could be derived from the analyses.
• Once a heavy-tailed distribution is processed, the D KL value obtained from uniform case would show a conspicuous decline (Data set #12). • If a peak is divided into more than one peak in the given distribution, the D KL value obtained from uniform case would decrease. However, switching the locations of those peaks in the distribution does not affect the D KL value due to the fixed prior probabilities of uniform case, one may delve into the histograms of Data set #14 and Data set #15. • The influence of the tail region on D KL values is rather noticeable in Gaussian case; Data set #13 can be given as an extreme example. • For Gaussian case, the locations of bin probabilities in the histograms are pivotal since they could affect the D KL values. In terms of this, one may examine the D KL results of Data set #13 and Data set #15.

Conclusions
In this study, the vehicular speed characteristics of the highway is examined through Kullback-Leibler (KL) divergence approach, addressing the insufficiency of maximum entropy method of statistical methods. By utilizing two prior distributions, i.e., Gaussian and uniform in this approach, the KL divergences are obtained. According to the divergences, the fluctuations among the speed distributions are revealed. Moreover, the results speak to non-random behavior of vehicular speeds. It is also found that the trends of the divergences based on the two distributions are obtained analogous with regard to the real speed data sets. Through the employment of the synthetic data sets, the emergence of the dissimilarity between the trends is exhibited. Another result in this approach is that the influence of the tail region on the divergences is notable in Gaussian case. Consequently, the Gaussian case seems more sensitive to describe the dynamics of the tail region. Besides, the outcomes of the KL divergence analyses become quantitative indicator in order to make a relevant comparison of the speed distributions and better understand the dynamics of the vehicular speeds. As a final remark, the proposed approach may be adapted to different multi-particle complex systems.

Compliance with ethical standards
Conflict of interest The author states that there is no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.