1 Introduction

Establishing models that estimate the stochastic and dynamic transition of traffic regimes (DTTR) is important for predicting future traffic conditions and developing timely effective countermeasures to address congestion. For example, when two major traffic regimes—free-flow and congested regimes—are analyzed, the DTTR involves four transition phenomena. These include evolving from the free-flow to congested regime (breakdown), staying in the congested regime, congested to the free-flow regime (recovery), and staying in the free-flow regime in the next observation period. Since time is a major factor in their occurrences, the four transition processes can be referred to as the traffic regimes’ dynamic transition.

The DTTR is complex in nature, which is influenced by several factors, such as driver behavior, demand, vehicle mix, and weather conditions. Furthermore, the DTTR can vary greatly by day of the week and lateral lane locations on the same highway. Understanding the impact of these factors is useful for implementing advanced traffic management strategies such as variable speed limit, variable message signs, congestion pricing, and ramp-metering to improve the efficiency of traffic operation [1, 2].

Among the DTTR phenomena, the breakdown process is well-studied in the literature and its theory has recently been introduced in the roadway capacity estimation [3,4,5,6,7,8,9,10,11,12]. One major limitation of many previous investigations on the breakdown phenomenon is the fact that they ignore the operational differences due to lateral lane locations on the freeways. In the analysis, the multi-lane facility’s traffic data are usually aggregated and implicitly treated as one unit [1, 13]. The resulting model is also called a complete-pooled model [14, 15], which indicates that the operational characteristics are averaged across lanes. In practice, however, the operational characteristics of freeway segments may vary significantly across lanes [1, 13, 16], which is sometimes influenced by the operational policies. For instance, in urban areas, some states in the USA restrict heavy vehicles to use lanes near the shoulder. Also, some states discourage drivers using lanes near the median unless passing slow moving vehicles. Moreover, the operational characteristics of the lanes near the shoulder can be significantly influenced by weaving (merging to the freeway and diverging to exit a freeway) than lanes near the median [13, 17]. These introduce variations in the operating characteristics of a highway [18, 19]. Developing a model that does not take into account these characteristics and constrains the effect of influencing factors on the breakdown process to be the same across all lanes may lead to incorrect conclusions.

Recognizing the operational variations across different lanes and thus the breakdown process on the freeway, some empirical studies evaluated individual lanes separately. One study compared the complete-pooled and the lane-based approach to estimate the breakdown phenomenon on the diverging sections [1]. The study shows that using the lane-based approach significantly improves the accuracy of the extracted breakdown flow rate, while the aggregated approach underestimates the breakdown flow rate. Another study evaluated individual lane breakdown behavior on the merging freeway sections [16]. It also concludes that there is a significant difference in breakdown phenomenon among lanes.

Separating data and developing a model for each group are also referred to as the no-pooled model [14]. One outstanding drawback of using this model is that the operational characteristics of lanes are assumed to be independent, which as well implicitly suggests that data are coming from completely different sources or different portions of data. Such a model assumes that the operational characteristics of one lane do not affect other lanes. However, it may not be the case in traffic operations. The breakdown usually starts with one lane, generally on a lane near shoulder, and then other lanes follow [20]. Consequently, dependence on operational characteristics as well as some similarities across different lanes exist. Instead of conducting a separate analysis for each lane, some studies have utilized the hierarchical model (random effect) to estimate the breakdown phenomenon [13]. This type of model is also referred to as a partial-pooled model. This model provides a trade-off between the complete-pooled and no-pooled model properties by accounting for both the between-group and within-group variations [15, 21]. The hierarchical model also recognizes the group similarities and integrates such information in the parameter estimates [14, 21]. Using the hierarchical Weibull model, the study in [13] indicates that there is a significant variation in operational characteristics across different lanes on the freeway. Further, the study suggests that aggregating data could potentially ignore the possibility of one lane being congested, while the rest of the lanes are not congested on the same freeway segment (partial breakdown or semi-congested state).

In summary, despite the growing literature in evaluating the probabilistic characteristics of the breakdown process, quantifying the disparity effects on the other transition phenomena that describe the DTTR is not studied in the literature. As a result, this study attempts to fill the research gap by developing hierarchical regression models to calibrate the transition probabilities that describe the DTTR and quantify the associated variations due to different lateral lane locations and days of the week. The parameters’ posterior distributions of the proposed models are all fitted via the Bayesian framework to account for model and parameter uncertainties. Moreover, the transition phenomena that define the DTTR are identified on the basis of the number of traffic regimes, which are estimated using the Gaussian mixture model (GMM). This study uses one-year traffic data collected from a freeway facility located in Jacksonville, Florida. To the best of the authors’ knowledge, the approach herein has not been presented in the existing literature.

2 Study sites and data description

Data for the analysis were acquired from the Regional Integrated Transportation Information System (RITIS) database. For the purposes of this study, two detectors (Sites 1 and 2) for the southbound traffic shown in Fig. 1a located on I-295 in Jacksonville, Florida, were selected. At these sites, the posted speed limit is 65 miles per hour (mph). The number of lanes at Site 1 is three, while Site 2 is four. All lanes are standard 12 feet wide. The two sites consist of general purpose lanes with no managed lanes. Both sites are located just upstream of off-ramps that are prone to being in the congested state especially during peak hours. Traffic variables gathered for modeling were traffic volume and speed aggregated at a 15-min interval. These data were collected from March 1, 2015, to March 31, 2016, excluding weekends and public holidays.

Fig. 1
figure 1

Detector locations on Google map (a) and 24-h profiles of the traffic flow parameters at the two sites (b)

Figure 1b shows the 24-h time series of speed variable at Sites 1 and 2 for all data (one-year data) used in modeling, respectively. Evaluating these figures reveals that both sites experience congestion only in the morning peak period. As seen in the figures, the peak period is from 6 a.m. to 9 a.m. Further assessing the traffic speed variable in Fig. 1, one can say that Site 1 has a relatively lower speed than Site 2. The higher data density in the time series scatter plot for Site 1 is between 59 and 68 mph, while for Site 2 is between 61 and 81 mph in the free-flow state.

In order to obtain enough data of the breakdown and other transition events for modeling, only the morning peak period data were evaluated in the current study. The approach of grouping data into intervals, particularly to obtain peak period and then developing a statistical model, is consistent with the previous studies [22,23,24,25]. Review of traffic data during the selected peak period indicates that there were more than 7800 observations on each lane used to develop the dynamic transition model for Site 2. On the other hand, data for Site 1, less than 2400 observations, were available to the authors for the analysis for the period from March 1, 2015, to March 31, 2016. The speculative reason for that could be a detector malfunction. The descriptive statistics of the traffic data by lane for both sites are shown in Table 1.

Table 1 Descriptive statistics of flow parameters during the peak period

3 Speed thresholds for clustering traffic states

To identify the traffic regimes using the speed variable, the speed distribution of each lane was examined. It was found that the speed distributions at both sites have more than one subpopulation. The subpopulations of the speed distribution were clustered into homogeneous components using the finite GMM. The GMM model provides a highly flexible framework for fitting various distribution shapes including data with heterogeneous characteristics like traffic speed variable [25,26,27,28]. The GMM model fitting the speed data \(y\) can be represented as follows:

$$\begin{aligned} & f\left( y \right) = \mathop \sum \limits_{i = 1}^{n} w_{i} N_{i} \left( {y|\mu_{i} ,\sigma_{i}^{2} } \right), \\ & w_{i} = {\text{Dirichlet}}\left( {1, \ldots ,1} \right), \\ & \mu_{1} , \ldots ,\mu_{n} \,\sim\,N\left( {0,100^{2} } \right), \\ & \sigma_{1} , \ldots ,\sigma_{n} \,\sim\,{\text{HalfCauchy}}\left( {0, 10} \right), \\ \end{aligned}$$
(1)

where \(N_{i} \left( {y|\mu_{i} ,\sigma_{i}^{2} } \right)\) is the Gaussian distribution of component \(i\), \(\mu_{i}\) is the mean parameter of component \(i\), \(\sigma_{i} \;{\text{is }}\) the standard deviation of component \(i,\)n is the total number of the Gaussian distributions in the mixture model, and \(w_{i}\) is the mixing probability of component \(i\),

Two GMM models were developed in the PyMC3 package, Python programming language, to detect the speed thresholds for clustering traffic conditions for Site 1 and Site 2 dataset. The GMM model parameters were estimated using the Markov chain Monte Carlo (MCMC) simulation through the No-U-Turns (NUTS) step. As indicated in Eq. 1, the non-informative prior distributions were used in the model. The mixing probabilities were assumed to follow the Dirichlet distribution similar to [27, 28] studies. For the mean parameters, the prior distribution was assigned to follow the normal distribution with zero mean and standard deviation of 100, \(N\left( {0, 100^{2} } \right).\) Also, the standard deviation parameters in the model were assumed to follow the half-Cauchy distribution, \({\text{HalfCauchy}}\left( {0, 10} \right).\) In the analysis, a total of 10,000 iterations were sampled in each model, whereby the initial 5000 iterations were discarded as warm-up samples, while the last 5000 iterations were used for inference. The convergences were monitored using the Gelman–Rubin statistic and trace plots.

To assign the appropriate number of mixture components, the widely available information criterion was used in the analysis [29]. Findings from the analysis indicate that two mixture components for Site 1 dataset were found to be sufficient in approximating the mixture components for all lanes. As presented in Fig. 2a, one can conclude that the two components GMM predict the field data distributions with a reasonable accuracy. These mixture components can be referred to as congested and free-flow regimes. Using the GMM estimates (mean and standard deviation), the speed thresholds were calculated, i.e., the speed values that minimize the classification error of data between the estimated components. This approach has been used before to calculate the speed thresholds that group speed data into different traffic regimes [25, 28, 30]. The results of the analysis reveal that the lane near shoulder has the highest speed threshold (63.1 mph) compared to middle lane (56.1 mph) and lane near the median (59 mph). Visual inspection of the speed distributions in Fig. 2a suggests that the shoulder lane has comparatively higher speeds than the middle and median lanes. The calculated speed thresholds presented in Fig. 2a were further used for modeling the dynamic transition of traffic conditions.

Fig. 2
figure 2

Speed thresholds for clustering traffic regimes. a Site 1 dataset. b Site 2 dataset

For Site 2 dataset, three components were found to best estimate the data distributions for each lane corresponding to free-flow, congestion on-set/dissolution (COD) or transitional flow condition, and congested regimes. As seen in Fig. 2b, the expected posterior distributions approximate well the field data distributions. As opposed to Site 1, the modeling results suggest that the lane near the median has the highest speed threshold (56.2 mph) followed by the inner-left lane (55.7 mph) and then the inner-right lane (55 mph), and the lane near shoulder had the lowest speed (51 mph) for the COD and congested regimes. A similar pattern was seen on the thresholds that separate COD and free-flow regime. The estimated trend for Site 2 dataset mirrors what was revealed in one of the previous studies [13].

4 Modeling the dynamic transition of the traffic regimes

To analyze the dynamic transition of the estimated traffic regimes by the GMM, two Markov chain (MC) models were developed. The first model was the two-regime MC regression for Site 1 and the second model was the three-regime MC regression for Site 2 dataset. The discussions of the two MC regressions are presented in the following subsections.

4.1 Two-regime MC model

Suppose that the traffic states are observed in a sequence of the finite regimes at a discrete time interval \(t\) (\(t = 15\,{ \hbox{min} })\), the first-order MC model to probabilistically describe the transition of regimes is presented in Eq. 2. Note that the transition probabilities of this model are fitted with the explanatory variable \(x_{t}\) (flow rate at current time \(t\)) to account for variations or heterogeneity associated with the time-varying effect [24, 25]. The resulting transition probabilities are non-stationary, which varies as time progresses depending on the current observed flow rate.

$$p_{ij} \left( {x_{t} } \right) = {\text{Prob}}(S_{t + 1} = S_{j}^{{\prime }} |S_{t} = S_{i} , X_{t} = x_{t} ),$$
(2)

where \(p_{ij}\) is the probability of evolving from traffic regime i to j, \({\text{Prob}}(\,)\) is the probability function, \(S_{t}\) is the current observed traffic regime, \(S_{t + 1}\) is the next traffic regime, and \(S_{j}^{'}\) is the future estimated traffic regime.

Basically, the two-regime MC regression is defined by the four transition processes, which can be summarized in a matrix format as follows:

$$\varvec{P} = \left[ {\begin{array}{*{20}c} {p_{\text{ff}} } & {p_{\text{fc}} } \\ {p_{\text{cf}} } & {p_{\text{cc}} } \\ \end{array} } \right],$$
(3)

where the sum of each row equals to 1, \(p_{\text{ff}}\) is the probability of staying in the free-flow regime, \(p_{\text{fc}}\) is the probability to evolve from free-flow to congested regime (breakdown probability), \(p_{\text{cf}}\) is the probability to evolve from congested to free-flow regime (recovery probability), and \(p_{\text{cc}}\) is the probability of staying in the congested regime.

The estimated traffic regimes by the GMM are categorical in nature such as the free-flow and congested regime. There are two commonly used regressions for evaluating the influencing factors for the categorical response variables: probit and logistic regression. We selected the logistic regression model in the analysis because its model results can be easily interpreted using the odds ratio. To investigate the disparity effects associated with different lateral lane locations and days of the week (i.e., Monday through Friday) in the DTTR, the binary hierarchical logistic regressions were applied to estimate the transition probabilities presented in Eq. 3. In the analysis, traffic data are assumed to be nested to different lanes and days of the week. In this case, data within the same group are hypothesized to be correlated [14, 21, 31]. Suppose that a freeway has L lanes and m vehicles observed in each lane in each day (m = 1,.., M, and M is the total number of vehicles on the freeway). The transition process of the traffic regime \(R_{ij}\) can be predicted as follows:

$$\begin{aligned} & R_{ij} \sim{\text{Bernoulli}}\left({p_{ij} \left({x_{t}} \right)} \right), \\ & p_{ij} \left({x_{t}} \right) = \frac{1}{{1 + \exp \left({- \eta_{m}} \right)}}, \\ & \eta_{m} = \alpha_{0l} + \alpha_{1} x_{mt} + \epsilon_{k}, \\ \end{aligned}$$
(4)

where \(\alpha_{0l}\) is the random intercept associated with the lane lateral location, with the lane ordinal number \(l = 1, \ldots , L; \alpha_{1}\) represents the flow rate parameter; and \(\epsilon_{k}\) is the random effect associated with the day of the week, k = 1,…,5.

4.2 Three-regime MC model

In calibrating the transition probabilities for Site 2 dataset, the dynamic transition was assumed to occur in a sequential manner: free-flow to congestion on-set, then to the congested regime and congested regime to congestion dissolution, then to free-flow regime. The congestion dissolution and congestion on-set are assumed to have similar characteristics and thus are considered as one regime in the current study. Based on the three-phase theory by Kerner et al. [32], which indicates that there is no direct transition between congested and free-flow regimes, the transition from the free-flow to congestion regime and congested regime to free-flow is ignored in the current study. As a result, the transition probabilities for these processes were assigned zero in the matrix (Eq. 5).

$$\varvec{\pi}= \left[ {\begin{array}{*{20}c} {\pi_{\text{ff}} } & {\pi_{\text{fo}} } & 0 \\ {\pi_{\text{of}} } & {\pi_{\text{oo}} } & {\pi_{\text{oc}} } \\ 0 & {\pi_{\text{co}} } & {\pi_{\text{cc}} } \\ \end{array} } \right],$$
(5)

where the sum of each row equals to 1, \(\pi_{\text{ff}}\) is the probability of staying in the free-flow regime, \(\pi_{\text{fo}}\) is the probability to evolve from free-flow to COD regime, \(\pi_{\text{of}}\) is the probability to evolve from COD to free-flow regime, \(\pi_{\text{oo}}\) is the probability to stay in the COD regime, \(\pi_{\text{oc}}\) is the probability to evolve from COD to congested regime, \(\pi_{\text{co}}\) is the probability to evolve from congested to COD regime, and \(\pi_{\text{cc}}\) is the probability to stay in the congested regime.

As indicated in Eq. 5, the first and third rows have two nonzero elements, which indicate that there are two dependent transition processes. These transitions were calibrated using the binary logistic random-effect regression similar to those fitted for Eq. 3. In contrast, the transition processes in the second row, which include COD to free-flow, stay in the COD regime, and COD to the congested regime was calibrated using the multinomial logistic random-effect regression (Eq. 6).

$$\begin{aligned} & R_{ij } \sim {\text{Multinomial}}\left( {\pi_{ij} \left( {x_{t} } \right)} \right), \\ & \pi_{ij} \left( {x_{t} } \right) = {\text{Prob}}\left( {R_{ij } = v} \right) = \frac{{{ \exp }\left( {\lambda_{mv} } \right)}}{{\mathop \sum \nolimits_{v = 1}^{V} { \exp }\left( {\lambda_{mv} } \right)}}, \\ & \lambda_{mv} = \beta_{0lv} + \beta_{1v} x_{mvt} + \varepsilon_{kv} , \\ \end{aligned}$$
(6)

where \(\pi_{ij}\) is the probability of evolving from regime i to j, \(\beta_{0lv}\) is the random intercept for the transition process \(v\), \(\beta_{1v}\) represents the flow rate parameter for the transition process \(v\), and \(\varepsilon_{kv}\) is the random-effect term for the transition process \(v\).

4.3 Parameter estimation for the two- and three-regime MC regressions

The NUTS step in the MCMC simulation implemented in the PyMC3 package was also used to calibrate the posterior distributions of the model parameters in Eqs. 4 and 6. The Bayesian analysis requires prior distributions of the model parameters to be specified before the simulations. Figure 3 shows the prior distributions selected for use in the multilevel logistic and multinomial logistic regression, respectively. As shown in both Fig. 3, the prior distributions for the random intercept in both the two- and three-regime MC models were assigned to follow the normal distribution with mean \(\mu_{1}\) and the standard deviation \(\sigma_{1}\)—that is, \(\alpha_{0l} \;{\text{and}}\;\beta_{0lv} \,\sim\, N\left( {\mu_{1} ,\sigma_{1}^{2} } \right)\). To borrow strength and facilitate parameters smoothening from each group, the hyper-parameters were shared by all intercept coefficients [21, 31]. The advantage of assigning this type of the hyper-parameter is the fact that the resulting model gains the advantages of a complete-pooled model and a no-pooled model [31]. The hyper-parameter priors (hyper-priors) were also assigned non-informative prior distributions. For \(\mu_{1} ,\) the normal distribution was specified in terms of mean zero and the standard deviation of 100, \(\mu_{1} \,\sim\,N\left( {0, 100^{2} } \right)\) while the \(\sigma_{1}\) hyper-parameter, the half-Cauchy distribution, \(\sigma_{1} \sim {\text{halfCauchy}} \left( {0, 10} \right)\) was used. Note that the hyper-parameter \(\sigma_{1}\) was used to quantify the disparity effect due to lateral lane location. For the flow model coefficients, the prior distributions were assigned the normal distribution with the mean of zero and the standard deviation of 100, \(\alpha_{1} \;{\text{and}}\;\beta_{1v} \sim N\left( {0, 100^{2} } \right).\) Furthermore, the prior distribution for the random-effect parameter \({\epsilon}_{k}\) and \({\varepsilon}_{kv}\) associated with the days of the week was specified to follow the normal distribution with mean \(\mu_{2}\) and the standard deviation of \(\sigma_{2}\), whereby \(\mu_{2} \,\sim\,N\left( {0, 100^{2} } \right)\) and \(\sigma_{2} \,\sim\,{\text{halfCauchy}}\left( {0, 10} \right).\) Also, parameter \(\sigma_{2}\) was used to calculate variations associated with days of the week.

Fig. 3
figure 3

Hierarchical structure of the multilevel regressions. a Logistic regression. b Multinomial logistic regression

5 Results

Similar to the GMM parameter estimation, 10,000 iterations were found adequate in estimating the posterior distributions of the regression’s parameters (binary and multinomial logit). Also, the initial 5000 iterations were discarded and the last 5000 iterations were used for inference. The results of the estimated regressions are presented in Tables 2 and 3. In these tables, summaries of the posterior distributions—mean, standard deviation, and the 95% posterior credible intervals (CIs) of each parameter—are reported. The next subsections discuss the results of the analysis, starting with the results discussion of Site 1 followed by Site 2 and concluding the section by discussing the disparity effects associated with factors such as lane lateral locations and days of the week.

Table 2 Parameters posterior distributions summaries for Site 1 models
Table 3 Parameters posterior distributions summaries for Site 2 models

5.1 Results of regression models for site 1

Two regression models were fitted to calibrate the transition probabilities of the breakdown and the stay in the congested regime processes. As presented in Table 2, the logarithm of the flow rate coefficient has a positive sign, which potentially indicates that when the flow rate increases the probability of traffic to breakdown also increases. The estimate of this coefficient suggests that a 1% increase in the log-transformed flow rate increases the likelihood of breakdown by 8.68%. The CI of this estimate does not contain zero as one of the credible values, and thus it is statistically significant at 95% CIs.

Figure 4a displays the relationship between flow rate and breakdown probability using the posterior predictive lines. This figure particularly shows an “S”-shaped trend on the relationship between the two variables. Although the breakdowns were modeled as lifetime events by some previous studies [12, 33, 34], the estimated pattern in these studies is consistent with the pattern reported in Fig. 4a. Moreover, the boxplots presented in Fig. 4b were used to compare breakdown probability across lanes. Review of this figure shows that the breakdown probabilities on the lane close to shoulder at 1000 veh/h/lane are even higher than those estimated at 2000 veh/h/lane for the lane near the median and middle lane. Moreover, the likelihood of lane near shoulder to breakdown at 2000 veh/h/lane is nearly one, while the middle lane and the lane near shoulder lane have approximately 0.5 likelihood. This situation, a difference existing in the estimated likelihood at the same flow rate, can lead to a partial breakdown on a highway. Similar observations are reported by one of the previous studies [13], which suggests that lanes near the shoulder have lower capacity than lanes near the median.

Fig. 4
figure 4

Breakdown probability and flow rate relationship for Site 1. a Breakdown probability versus flow rate. b Breakdown probability across lanes at different flow rates

For the stay in the congested regime, the flow rate coefficient estimate in Table 2 suggests that the likelihood of staying in the congested regime process increases by 1.8% when 1% of the log-transformed flow rate increases. As with the breakdown transition, the comparison of the estimated probability across lanes revealed that the likelihood of staying in the congested regime is higher on the lane near shoulder than on the middle lane and the lane near median at the same flow rate (Fig. 5b).

Fig. 5
figure 5

Stay in the congested transition probability and flow rate relationship for Site 1. a Stay in the congested regime probability versus flow rate. b Stay in the congested regime probability across lanes at the different flow rate

It is noteworthy to know that the stay in the free-flow and the recovery transition processes (congestion to free-flow) are not presented because these were considered as the base category in the model. To clarify this, the stay in the free-flow and breakdown probabilities in the transition matrix presented in Eq. 3 sum up to 1. Since the logit link function was used in the hierarchical regression to fit the transition matrix, the breakdown estimates and the stay in the free-flow regime are the same but in opposite sign (negative vs. positive). Similarly, the estimate of the stay in the congested regime and the recovery transition processes are the same but with different signs.

5.2 Results of regression models for site 2

Due to Site 2 dataset having three regimes—free-flow, COD, and congested regimes—three regression models were fitted to calibrate the transition processes in Eq. 5. These include two binary and one multinomial logistic hierarchical regression. Table 3 gives the calibrated regression coefficients. The analysis of the free-flow to COD transition reveals that a 1% increase in the log-transformed flow rate increases the transition probability by 1.02%. For the transition from congested to COD regime process—queue discharging process—the results in Table 3 show the positive relationship: an increase in the flow rate on the highway increases the likelihood of discharging the queue. Specifically, the model estimate shows that a 1% increase in the log-transformed flow rate when the current state is congested regime increases the queue discharge probability by 0.28%. The posterior predictive trend in the relationship between the COD to congested transition probability and the flow rate is indicated in Fig. 6a. This figure shows that the predicted trend has high uncertainties because the whisk lines are spread from the expected predictive line. One reason that is attributed to the estimates to have high uncertainties is data variations. The comparison of the estimated transition probability in Fig. 6b shows that the queue discharge in the lane near shoulder has comparatively lower likelihood than in other lanes at the same flow rate.

Fig. 6
figure 6

Transition probability and flow rate relationship for Site 2. a Congested to COD transition probability versus flow rate. b Congested to COD transition probability across lanes at different flow rates

The results of the multinomial logistic hierarchical regression in Table 3 were calibrated by considering the stay in the COD transition as the base category. The selection of this variable was done arbitrarily. One can select either the COD to free-flow or COD to congested regime transition as a base category, and the results of the analysis will yield the same interpretation. As indicated in Table 3, the COD to the free-flow regime transition has a negative sign with the traffic flow parameter. This suggests that increasing the traffic flow reduces the likelihood of the highway to evolve to free-flow state. The model coefficient particularly reveals that a 1% increase in the log-transformed flow rate reduces the probability of this transition process by 2.39%. The association between the flow rate and the COD to free-flow transition probability is illustrated in Fig. 7a. As demonstrated in this figure, the estimated trend is a decreasing “concave upward” shape. Paralleling the COD to free-flow transition probability across lanes, the shoulder lane indicates the highest probability for this transition, while the lane near the median, inner-right, and inner-left lanes has the nearly the same likelihood at the same flow rate (Fig. 7b).

Fig. 7
figure 7

Transition probability and flow rate relationship for Site 2. a COD to free-flow transition probability versus flow rate. b COD to free-flow transition probabilities across lanes at different flow rates

Also presented in Table 3, the results for the COD to congested transition were significant at the 95% CI. The estimate of the logarithm of traffic flow is 0.82, which indicates that a 1% increase in the logarithm of flow rate would cause the likelihood of COD to congested transition to increase by 0.82% relative to staying in the COD regime.

5.3 Disparity effects caused by different lane lateral locations and days of the week

To quantify the disparity effects associated with lane lateral locations and different days of the week, the intra-class correlation coefficient (ICC) was calculated for each model. The ICC quantifies the proportion of variations that would not have been accounted in the model that ignores data clustering [35]. Alternatively, this value can be viewed as the measure of the correlation between observations within the same cluster. Because variances are non-negative in the model, the ICC normally ranges between 0 and 1. The disparity parameters presented in Tables 2 and 3 were used to calculate the ICC. The ICC analysis for the breakdown model (Site 1) shows that about 73% of the total variations are associated with the different lane lateral locations (Eq. 7). This value is relatively larger than the within-group variation: a variation due to standard logistic distribution. In this case, the breakdown events within the same lane are more similar than the breakdown events in different lanes.

$${\text{ICC}} = \left( {\frac{{\sigma_{1}^{2} }}{{\sigma_{1}^{2} + \sigma_{2}^{2} + \sigma_{\text{sl}}^{2} }}} \right) \times 100\% = \left( {\frac{{3.0^{2} }}{{0.25^{2} + 3.0^{2} + \frac{{\pi^{2} }}{3}}}} \right) \times 100\% = 73\% ,$$
(7)

where \(\sigma_{\text{sl}}^{2}\) represents within-group variance, which is \(\sigma_{sl}^{2} = \frac{{\pi^{2} }}{3} = 3.29\) for the standard logistic distribution [35].

On the other hand, different days of the week were found to have 0.5% contribution to the total variations. Furthermore, the ICC for the stay in the congested regime model is 18% for different lane lateral locations, while the factor—different days of the week—contributes only 0.7% to the total variations for Site 1 dataset.

$${\text{ICC}} = \left( {\frac{{\sigma_{2}^{2} }}{{\sigma_{1}^{2} + \sigma_{2}^{2} + \sigma_{\text{sl}}^{2} }}} \right) \times 100\% = \left( {\frac{{0.25^{2} }}{{0.25^{2} + 3.0^{2} + \frac{{\pi^{2} }}{3}}}} \right) \times 100\% = 0.5\% .$$
(8)

Similar analyses were conducted for Site 2, and the estimates indicate that the lateral lane location has the largest impact on the COD to congested transition process (ICC = 49.4%) followed by the COD to free-flow transition (ICC = 19.7%), the free-flow to COD transition (ICC = 10.5%), and the congested to COD transition (ICC = 1%). For different days of the week, the congested to COD transition has the highest variation (ICC = 1.44%) followed by the free-flow to COD transition (ICC = 1.2%), COD to congested transition (ICC = 0.5%), and COD to free-flow transition (ICC = 0.1%).

In summary, there is considerable evidence that lane lateral locations contribute a significant amount of variation to the DTTR than different days of the week (considering only weekdays). This observation is consistent across the two sites. Moreover, the highest disparity estimate associated with different days of the week is 1.44%. Based on this estimate, one may conclude that different days of the week are insignificantly causing variability in the DTTR. Even though the study in [36] investigated the difference in flow capacity due to different days of the week using the analysis of variance (ANOVA) approach, the same conclusions were made that there is no variation attributed to different days of the week on estimated capacity flow.

6 Discussion

This study has presented an empirical approach aimed at investigating disparity effects of the lateral lane locations and days of the week on the dynamic transition of traffic regimes (DTTR). In the analysis, the Markov chain theory and hierarchical regressions were integrated to describe the transition processes and the dependence of traffic regimes and capture the hierarchical structure of observations of the traffic data. The historical traffic flow parameters—speed and flow—collected for 1 year (2015–2016) from two sites on the freeway highway, were applied.

Using the GMM, the speed threshold of each lane that defines traffic conditions was identified in the analysis. Overall, the results of the hierarchical regressions in estimating the MC transition probabilities indicated that the log-transformed flow rate is the significant variable, at 95% posterior credible intervals, in predicting the likelihood of evolving from one traffic regime to the next. The lane near shoulder was estimated to have the highest likelihood of transitioning from one regime to the next compared to other lanes at a similar flow rate. Using the intra-class correlation coefficient (ICC) analysis, it was revealed that different lane lateral locations contribute a significant percentage to the total variations in the DTTR for Site 1 dataset. More specifically, the breakdown process was found to be more influenced by the variations than the rest evaluated transition processes (ICC = 73%). For Site 2 dataset, the largest variation due to lateral lane location was observed on the transition from the COD to the congested regime (ICC = 49.4%). Different days of the week, on the contrary, were found not to cause variations in the transition probabilities describing the DTTR. The highest estimate of the ICC among the fitted hierarchal models for both Site 1 and 2 was 1.44%.

The findings from this study can be possibly used to enhance the lane-distribution strategy in the application of the intelligent transportation systems, particularly in the dynamic lane-management to improve operations efficiency. Furthermore, results are anticipated to increase the awareness of the variation associated with different lateral lane locations and days of the week in traffic operations to both researchers and practitioners. This information is also useful to transportation agencies in developing other congestion countermeasures.

One limitation that could be further improved in this study is that the data that were used in modeling the DTTR from the detectors were not filtered to remove data that had overlapping bottlenecks between the exit and entrance ramps. It would be the future research task to consider this situation in the analysis. Also, more research using data with different site characteristics is required to validate the conclusion made in the current study. In addition, it is not clear if a similar conclusion will be made if different data resolution is used in modeling, such as 2 min, 5 min. In the future work, different data resolutions can be used in the model and compared with the current study results. Another future work would be the analysis of effects of the spatial heterogeneity, vehicle mix, weather, and driving characteristics on the DTTR and the number of traffic regimes in the GMM. Although the two sites evaluated in this study have different geometric characteristics and two regimes were identified on Site 1, while three regimes optimally describe the operating speed for Site 2, it is not yet clear if sites with similar geometric characteristics will yield a similar number of traffic regimes.