Data analytics approach for travel time reliability pattern analysis and prediction

Travel time reliability (TTR) is an important measure which has been widely used to represent the traffic conditions on freeways. The objective of this study is to develop a systematic approach to analyzing TTR on roadway segments along a corridor. A case study is conducted to illustrate the TTR patterns using vehicle probe data collected on a freeway corridor in Charlotte, North Carolina. A number of influential factors are considered when analyzing TTR, which include, but are not limited to, time of day, day of week, year, and segment location. A time series model is developed and used to predict the TTR. Numerical results clearly indicate the uniqueness of TTR patterns under each case and under different days of week and weather conditions. The research results can provide insightful and objective information on the traffic conditions along freeway segments, and the developed data-driven models can be used to objectively predict the future TTRs, and thus to help transportation planners make informed decisions.


Introduction
Travel time reliability (TTR) is an important measure which has been widely used to represent the traffic conditions on freeways. Accurately modeling TTR is very important to both transportation agencies and road users. Different definitions of TTR have been developed in different studies. The Federal Highway Administration (FHWA) [1] gave a formal definition of TTR, i.e., the consistency or dependability in travel times, as measured from day-to-day and/or across different times of the day. In a Strategic Highway Research Program 2 (SHRP 2) project conducted by Vandervalk et al. [2], TTR was defined as the variability in travel times that occur on a facility or for a trip over the course of time and the number of times (trips) that either ''fail'' or ''succeed'' in accordance with a predetermined performance standard or schedule. Table 1 provides a summary of existing TTR definitions in chronological order.
Different types of measures have also been widely applied to assess traffic performances and reliability in recent years. There are four most widely used TTR measures recommended by FHWA and they are 90th/95th percentile travel times, buffer index (BI), planning time index (PTI), and frequency of congestion (FOC). Other important measures include the standard deviation [11], coefficient of variation (COV) [12], present variation [5], skew and width of travel time distribution [13], and misery index [5]. Table 2 provides a summary of the TTR measures that were developed and used in past studies in alphabetical order.
Nowadays, anonymous vehicle probe data have been greatly improved in both data coverage and data fidelity, and thus have become a reliable source for freeway TTR analysis. However, in most cases, TTR data are analyzed in the short term, which may not be able to account for the TTR variability characteristics for all the segments of a corridor in the long term. Therefore, the detailed analysis of the TTR patterns in different cases which can represent different traffic conditions will be helpful.
The purpose of this study is to develop a systematic approach to illustrating how TTR distributes and varies with respect to the time of day (TOD), day of week (DOW), year, and weather. Case studies are conducted to present different TTR patterns under different conditions. The analysis of TTR conducted and the prediction model  [3] 1996 Range of travel times experienced during a large number of daily trips Charles [4] 1997 Probability that a component or system will perform a required function for a given period of time when used understated operating conditions. It is the probability of a non-failure over time Lomax et al. [5] 1997 Impact of nonrecurrent congestion on the transportation system NCHRP Report 399 [6] 1998 Measure of the variability of travel time 1998 California Transportation Plan [7] 1998 Level of variability between the expected travel time and the actual travel time experienced Fowkes et al. [8] 2004 Percent of on-time performance for a given time schedule 2006 Consistency or dependability in travel times, as measured from day-to-day and/or across different times of the day Elefteriadou and Cui [9] 2007 Probability of a device performing its purpose adequately for the period of time intended under the stated operating conditions Florida Department of Transportation [10] 2011 Percent of travel that takes no longer than the expected travel time plus a certain acceptable additional time 2014 Variability in travel times that occur on a facility or for a trip over the course of time and the number of times (trips) that either ''fail'' or ''succeed'' in accordance with a predetermined performance standard or schedule

Literature review
Basically, TTR can be analyzed using travel time distribution data, including both single mode distribution [16] and multimode distribution [17,18]. Eliasson [19] investigated the relationship between travel time distribution and different TOD periods. The result showed that travel times are approximately normally distributed under severe congestion conditions. However, the travel time distribution was skewed under low levels of congestion conditions. Emam and Ai-Deek [16] employed the Anderson-Darling (AD) goodness-of-fit statistics and error percentages to evaluate the performances of different distribution types. The results indicated that the lognormal distribution provided the best model fit and that it was more efficient to use the same day of the week (e.g., Mondays) in the estimation of TTR on a roadway segment than to use mixed data (i.e., data collected across multiple weekdays), because of the significant differences between traffic patterns across multiple weekdays. In addition, the researchers also noticed that the new reliability estimation method showed higher sensitivity to geographical locations. Chen et al. [18] explored the travel time distribution and variability patterns for different road types in different time periods.
The authors compared the goodness-of-fit results of several distribution types and analyzed TTR patterns with BI and COV. However, to investigate the impact of nonrecurring congestion on TTR, different sources of travel time variability (including traffic incidents, inclement weather, and work zones) were studied by different researchers around the world. With the consideration of incident, Hojati et al. [20] developed a method to quantify the impact of traffic incidents on TTR on freeways. The authors conducted a Tobit regression analysis to identify and quantify factors that affect extra buffer time index based on the Queensland DOT and STREAMS Incident Management System (SIMS) database. The results indicated that changes in TTR as a result of traffic incidents are related to the characteristics of the incidents. Charlotte et al. [21] modeled TTR on urban roads in the year 2010 in the region of Paris, France. The 90th percentile of the travel time distribution was modeled with linear regression using explanatory variables including the number of lanes, mean value of the travel time distribution, travel direction, time of the day, number of accidents, and roadworks. With the consideration of weather condition, Martchouk et al. [22] studied the travel time variability using the travel time data on freeway segments in Indianapolis collected with the help of anonymous Bluetooth sampling techniques. The effects of adverse weather were discussed in the study. The results showed that the travel time increased during adverse weather period, and the variance in travel times during the same time period also increased. Li et al. [23] conducted a study which focused on studying the weather impact on traffic operations. Different rainfall intensity data in 72 sample days in Florida were incorporated into the TTR model along with the historical speed database. Different scenarios for each hour (under clear weather, light rain, and heavy rain conditions) were created and applied to the respective roadway segments. The results showed that speed reductions on arterials were 10% for light rain and 12% for heavy rain. Kamga and Yazici [24] conducted a study via merging taxi trips' GPS records and historical weather records of New York City from 2009 to 2010 and then calculated the descriptive statistics of travel time under different TOD, DOW and various weather conditions. The authors used the classification and regression trees (C&RT) model to extract the travel time distribution information under each DOW-TOD-Weather category. With the analysis results of COV, the authors pointed out inclement weather may not only result in increased travel time but also result in higher TTR. With the consideration of multiple influencing factors, Tu [25] developed a TTR model with the consideration of four influencing factors including road geometry, adverse weather, speed limits, and traffic accidents. The model was validated using traffic data from urban freeways in Netherlands in the year 2005. The results of road geometry impacts indicated that there was a threshold value L for the length of ramp/weaving section. If the actual length was less than L, the TTR would decrease with the decreasing length of ramp/weaving sections. If the actual length was larger than L, the length would have far less impact on TTR. Javid and Javid [26] developed a framework to estimate travel time variability caused by traffic incidents based on integrated traffic, road geometry, incident, and weather data. A series of robust regression models were developed using the data from a stretch in California's highway system in 2 years. The results of the split-sample validation showed the effectiveness of the proposed models in estimating the travel time variability. In conclusion, for incidents occurring on weekends, the highway clearance time would be shorter. Shoulder existence and lane width would adversely impact downstream highway clearance time. Kwon et al. [27] developed a linear regression model to study the TTR with the consideration of factors under three categories: traffic influencing events (traffic incidents and crashes, work zone activity, weather, and environmental conditions), traffic demand (fluctuations in day-to-day demand and special events), and physical road features (traffic control devices and inadequate base capacity). The model was tested using the data collected from San Francisco Bay Area in the year 2008 and used to identify how each variable contributes to the TTR. The results of this study provided useful insights into predicting the TTR. Schroeder et al. [28] presented a methodology to conduct the freeway reliability analysis based on freeway data in North Carolina. The variability impact considerations included TOD, DOW, month-of-year differences, and various nonrecurring congestion sources (such as weather, incidents, work zones, and special events). The freeway scenario generator (FSG) was used and resulted in 2508 scenarios based on freeway facility data in North Carolina. The resulting travel time distribution was presented, and a sensitivity analysis was conducted to explore the relationship between weather and incidents and the overall reliability of the facility. Table 3 provides a summary of several previous studies on TTR analysis and modeling in chronological order.
Although useful, most of the studies did not consider the long-term variation of TTR. This study focuses on the long-term TTR analysis step-by-step in order to present data analysis results. The developed data-driven models can be used to objectively predict the future TTRs. This study is novel since it identifies the segments with various TTR patterns at different locations of a corridor, which can help reveal the travel time characteristics of these locations under different traffic conditions. The research results can provide insightful and objective information on the traffic conditions along freeway segments to help the transit planners make informed decisions.

Data description
This study focuses on the travel time data gathered from the Regional Integrated Transportation Information System (RITIS) web site and uses the collected data to conduct the TTR analysis. A series of major freeway segments are selected for the case study: Interstate 77 (I-77) southbound ( Fig. 1) is one of the most heavily traveled Interstate highways in Charlotte area and runs from north to south. A total of 32 roadway segments of I-77 southbound are selected in this study, which start from the intersection with Harris Oak Blvd and end at the interchange with I-485 (Exit 2) at the south part of the city. The total length of the selected segments is 19 miles. In this study, travel time and speed data are obtained from the RITIS web site which gathered information about roadway speeds and vehicle counts from 300 million realtime anonymous mobile phones, connected cars, trucks, delivery vans, and other fleet vehicles equipped with GPS locator devices. Travel time data from January 2011 to December 2015 aggregated at 15-min intervals are used in the present study. A sample of raw travel time data utilized in this study is shown in Table 4, which contains the following information: TMC_Code The RITIS Probe Data Analytics Suite uses the TMC (traffic message channel) standard to uniquely identify each road segment. This field indicates the segment ID.
Measurement_tstamp This field indicates the time stamp of the record.
Speed This field indicates the current estimated harmonic mean speed for the roadway segment in miles per hour.
Travel_time_seconds This field indicates the time it takes to drive along the roadway segment.

Study location identification
The historical weather data aggregated at 1-h intervals near the Charlotte Douglas International airport can be found at the www.wunderground.com web site. From previous studies, it is widely accepted that only severe weather events can cause a significant impact on speeds and travel times. Due to the weather characteristics in the Charlotte area and the distribution of each weather category, detailed weather conditions are further classified into three groups including normal, rain, and snow/fog/ ice. Table 5 presents the detailed classification of the weather conditions. Conditions such as overcast or mostly cloudy are assumed to be no different from clear conditions due to no obvious impact on traffic conditions. These conditions are classified into normal. All the conditions such as rain or thunderstorm are categorized as rain. In order to ensure the acceptable sample size, snow, fog, ice pellet, and other similar conditions are combined together due to their rate of occurrence.
There are four most widely used TTR measures in previous studies and they are BI, PTI, COV, and FOC. However, BI and COV have a limitation since their values depend on the average travel time, which may change over time [30]. Therefore, the PTI is chosen and used as the primary measure of TTR in this study. It is calculated as the 95th percentile travel time divided by the free-flow travel time so as to represent the percentage of extra travel time that most people will need to add on to their trip in order to ensure on-time arrival. For example, a PTI value of 1.5 at 5 p.m. means that for a 20-minute trip in light traffic, 30 min should be planned at 5 p.m. to make sure that he or she is on time. The equation of PTI is provided below: Data analytics approach for travel time reliability pattern analysis and prediction 253 where PTI i is the planning time index of segment I; T i95 is the 95th percentile travel time on the TMC segment i during the study period across multiple days (e.g., a month) or a year; FFTT i is free-flow travel time on TMC i during the same observation period as mentioned above. For each roadway segment, the free-flow travel time is computed by dividing the length of segment by the free-flow speed, which is defined as the 85th percentile speed during overnight hours (10 p.m. to 5 a.m.) [10,30,31].
The first step to identify the study segments is to plot the two-dimensional PTI matrix for each road segment along the corridor. This would provide a straightforward and visualized tool for decision makers to grasp the average traffic conditions along a corridor. The long-term (in oneyear period) PTI values of each segment from 2011 to 2015 were calculated and shown in Fig. 2. Note that in this Weibull, exponential, lognormal, and normal distribution testing figure, the horizontal axis denotes the time of day and the vertical axis represents TMC segments along the selected section on I-77 southbound. Each cell represents the PTI value. The darker the color, the higher the PTI. In order to select the sections which can represent different traffic conditions, the qualitative ratings for each freeway segment in the study area are conducted and further classified into different categories/levels based on the qualitative criteria of a previous study [32]. The ratings, based on the PTI values, are given as: (1) reliable (PTI \ 1.5); (2) moderately to heavily unreliable (1.5 \ PTI \ 2.5); and (3) extremely unreliable (PTI [ 2.5).
Based on the rating criteria mentioned above, eight segments (shown in Fig. 3) which contain four PTI rating cases are selected as the sample study segments. The four cases are Case 1 (p.m. peak only): The average PTI during a.m. peak period is reliable and during PM peak period is unreliable/extremely unreliable. The selected segments are 125-04779 and 125N04780.
Case 2 (a.m. peak only): The average PTI during a.m. peak period is unreliable/extremely unreliable and during PM peak period is reliable. The selected segments are 125N04788 and 125-04788.
Case 3 (Double peak): The average PTI during both a.m. and p.m. peak periods are unreliable/extremely unreliable. The selected segments are 125N04784 and 125N04785.
Case 4 (No peak): The average PTI during both a.m. and p.m. peak periods are reliable. The selected segments are 125-04790 and 125N04791.

TTR variability pattern under all conditions
The PTIs of each segment from 2011 to 2015 are shown in Fig. 4. The TTR variability pattern in each case can be categorized as follows:   Case 1 These two sections are located at the south part of the Charlotte downtown area. The volume of outbound traffic during p.m. hours is high and therefore contributes to the frequent congestion under p.m. peak condition. In more detail, in the year 2015, these two segments had obviously higher PTI values during peak hours than those in the years of 2011-2014. The condition like this may be attributed to different factors such as the traffic volume, weather condition, and accidents. One potential reason behind this could be the traffic volume on the segments of case 1 from 2011 to 2015 (annual average daily traffic (AADT): 15,300, 15,200, 15,400, 15,900, and 15,900, respectively). The correlation values between the AADT and average daily PTIs of these two segments are 0.86 and 0.83, respectively, which means that they are highly correlated. Therefore, the traffic volume may be a primary reason of the TTR distribution characteristics. Based on the historical weather data, the frequency of adverse weather in the year 2015 is higher than that in the years from 2011 to 2014. In order to eliminate the possible influence of adverse weather, the TTR distributions under only normal conditions during each year are also tested and the average daily PTI of 2015 is reduced a little bit (from 2.1 to 2.0) but is still higher than those of years 2011-2014. With respect to traffic accident, no detailed historical crash information about I-77 is found. However, the number of total crashes in Mecklenburg county in each year had been increasing from 2011 to 2015 (15,476,15,915,16,790,19,847, and 21,096, respectively) [33]. This can also be another potential reason that contributes to the worsening of the traffic condition in the year 2015.
Case 2 These two sections are located at the north part of the Charlotte downtown area. The volume of inbound traffic during a.m. hours is high and therefore contributes to the frequent congestion under a.m. peak condition. Similar to case 1, in the year 2015, these two segments had obviously higher PTI values during peak hours than those of years from 2011 to 2014. The condition like this may also be explained by the potential factors such as traffic volume (with the correlation values being 0.83 and 0.89, respectively), adverse weather and accident that contribute to the worsening of the traffic condition in the year 2015, as presented in case 1.
Case 3 These two sections are located adjacent to Charlotte downtown area. The volume of inbound traffic during a.m. hours and outbound traffic during p.m. hours are both high and therefore contributes to the frequent congestion under double peak conditions. Similar to cases 1 and 2, in the year 2015, these two segments had obviously higher PTI values during peak hours than those in the years of 2011-2014. However, the correlation values between traffic volume and average daily PTIs are not statistically significant (i.e., 0.56 and 0.71, respectively). Therefore, the condition like this may be explained by other potential factors (such as adverse weather and accident) that contribute to the worsening of the traffic condition in the year 2015. Case 4 These two sections are located far away from Charlotte downtown area. The traffic volumes during both AM and PM hours are low and therefore contribute to the no peak condition. The variation of PTIs throughout the day of each year does not change significantly (from 1.02 to 1.13, and 1.04 to 1.15, respectively).

TTR variability patterns for different days of week
The PTIs of each segment from Monday to Sunday are shown in Fig. 5, and the average PTIs are shown in Table 6. The TTR variability patterns for different days of week in each case can be categorized as follows: Case 1 For the segments showing the p.m. peak characteristics, the travel time on Friday is least reliable. This result is consistent with a previous study [29]. The TTR variability patterns on weekends are significantly different from weekdays. There are no PM peak characteristics of the TTR of these two segments on weekends as the PTIs      throughout the day do not change significantly. The maximum (and average) PTIs on weekends of these two segments are 1.28 (1.10) and 1.24 (1.08), respectively. The results indicate that traffic congestion on weekends becomes less frequent and also travel demand on weekends is perhaps much lower than that on weekdays, which is consistent with previous studies [18,34].
Case 2 For the segments showing the a.m. peak characteristics, the travel time on Tuesday is the least reliable. Similar to case 1, there are also no a.m. peak characteristics of the TTR of these two segments on weekends as the PTIs throughout the day do not change significantly.
Case 3 For the segments showing the double peak characteristics, the travel time on Friday is least reliable. Similar to case 1, there are also no a.m. peak characteristics of the TTR of these two segments on weekends as the PTIs throughout the day do not change significantly.
Case 4 For the segments showing no peak characteristics, average PTIs of each DOW do not change significantly (from 1.05 to 1.07 and 1.07 to 1.09, respectively). The results indicate that the traffic congestions on these two segments are not frequent on both weekdays and weekends.

TTR variability patterns under different weather conditions
The PTIs of each segment under different weather conditions are shown in Fig. 6. The TTR variability patterns in each case under different weather conditions can be categorized as follows: Case 1 The TTR variability patterns of these two segments under normal and rain conditions are similar, and the pattern is unique under the snow/ice/fog condition. In more detail, the PTIs under rain condition have obviously higher values than those under normal condition throughout the day. This probably suggests that rain can cause several travel problems (such as visibility issues) while driving a vehicle. Heavy rainfall may lead to hydroplaning, slippery surfaces for tires, and road flooding. Therefore, the values of PTIs under rain condition also increase and the traffic congestion becomes more frequent. This result is consistent with other studies [23,35]. The PTIs under snow/ice/fog conditions are also higher than those under normal condition throughout the day because of the influence of road surfaces and visibility problems [36]. Results clearly show that snow/fog/ice can contribute to unexpected condition on the roadway anytime throughout the day, resulting in the unique TTR variability pattern under the snow/fog/ice conditions. This result is also consistent with a previous study [37]. In particular, there is an extremely high PTI value at noon. Since the geometric design characteristics of all the segments are similar, the potential reason behind this unique pattern could be the nonrecurrent condition such as the incidents happened during snow condition on the case segments. This hypothesis should be checked in the future if more detailed data are available.
Case 2 and Case 3 Similar to case 1, the PTIs under rain condition have obviously higher values than those under normal condition throughout the day. PTIs under the snow/ ice/fog condition are also higher than the PTIs under normal condition throughout the day and demonstrate unique variability pattern.
Case 4 For segments 125-04790 and 125N04791, the PTIs under rain condition have higher values than those under normal condition but do not increase significantly. However, the PTIs under the snow/ice/fog conditions are much higher than the PTIs under normal condition throughout the day. This result shows the adverse weather (such as snow, fog, and ice) can significantly affect the traffic condition of the segment, and the traffic congestion becomes more frequent no matter when.

TTR prediction 4.1 Time series-based TTR prediction methodology
Exponential smoothing (ETS) model is a commonly used method in time series analysis and has been widely adopted Data analytics approach for travel time reliability pattern analysis and prediction 261   [38]. Recent observations are weighted more heavily than remote observations. The ETS equation [39] is shown as follows: where S t is the output of the exponential smoothing algorithm; a is smoothing factor, 0 B a B 1; x t is the raw data sequence. Based on the historical travel time data, the PTIs from Monday to Sunday in each year and the PTIs of each month can be calculated. Those values can be used as the input to the exponential smoothing model. The ETS model is utilized in this study to predict the PTIs from Monday to Sunday and the PTIs in each month in the year 2015. Note that the TTRs under different weather conditions are not predicted due to its limited sample size in a single year. The PTIs from Monday to Sunday and the PTIs in each month from 2011 to 2014 are used as the raw input in the model to predict the PTIs in 2015. In order to select the best smoothing factor a, the grid search method is adopted in this study (with an accuracy level of 0.1). The comparison results indicate that a with the value of 1 can provide the best prediction result. Therefore, the smoothing factor a with the value of 1 is utilized in this study to minimum prediction error.
The mean absolute percentage error (MAPE) is used in this study to evaluate the prediction results. The MAPE equation is shown as follows: where M is the absolute percentage error; n is the number of predicted points; A t is the actual TTR value; F t is the predicted TTR value.

Conclusion
With the analysis of the TTR of eight typical segments on the I-77 southbound corridor in Charlotte, NC, the TTR variability patterns could be identified under different conditions. Based on the historical TTR data (2011-2014), the TTR for specific DOW and the TTR of each month in the year 2015 are also predicted. The information gathered out of this study can be concluded as follows.
In general, the TTR variability patterns of different segments along the corridor are different. Different cases including PM peak only, AM peak only, double peak and no peak are analyzed separately since they demonstrate different results. The TTR prediction result also indicates that the TTR of a year could be predicted accurately based on the long-term historical TTR data.
With respect to DOW, the TTR analysis results show that for the segments with noticeable peak hour trend, the TTRs on weekends are much lower than those on weekdays. The TTR prediction results also show that the prediction errors on weekends are lower than those on weekdays. For the segments with no peak hour, the TTRs on weekends are similar to those on weekdays. The TTR prediction results show that the prediction errors on weekends are a little higher than those on weekdays. In particular, for the segments under cases 1 and 3 (PM peak only and double peak, respectively), the TTR on Friday is the highest. For the segments under case 2 (AM peak only), the TTR on Tuesday is the highest. For the segments under case 4 (no peak hour), the TTR on each DOW does not significantly change.
With respect to weather conditions, the TTR analysis results show that the PTIs under rain condition have obviously higher values than those under normal condition throughout the day. The PTIs under snow/ice/fog conditions are also higher than those under normal condition throughout the day with unique variability patterns.
In most cases, TTR data are analyzed at the segment level in the short term, which may not be able to account for the TTR variability characteristics for the whole section in the long term. This study aims to develop a systematic approach to analyzing TTR of roadway segments with different variability patterns along a corridor in the long term. However, with the limited amount of data, the impacts of accidents and roadworks on TTR are not discussed in this study. In the future, the impacts of these variables will be studied when the data can be made available. Spatial relationships between segments along the corridor and their impacts on the TTR will also be investigated. Furthermore, the TTR analysis will be conducted at a network level and relevant characteristics will be examined in detail.