Introduction

Advancements in traffic signal event logging in the past decade (Day et al. 2014; Sturdevant et al. 2012) have produced performance metrics that assess red-light running (Chen et al. 2017; Lavrenz et al. 2016), progression quality (Day et al. 2011; Hainen et al. 2014; Zheng et al. 2014), and queues (Wu et al. 2010). However, agencies still screen signalized intersections and approaches for safety improvements by utilizing crash data from the previous 3–5 years (Federal Highway Administration 2011; Indiana Department of Transportation 2012). Due to the relative infrequency of crashes at many locations, this multi-year analysis of data is needed to ensure the validity and accuracy of the agency’s statistical models. However, this method is considered reactive as agencies must wait for a substantial crash history to develop as evidence for proceeding with safety improvement projects. There is a growing interest in the industry to replace the historical method with surrogate events to reduce the time between data collection and the implementation of safety improvements.

Since the 1960s, there has been interest in supplementing or replacing crash counts with traffic conflicts (Perkins and Harris 1968). Conflicts occur more frequently than crashes and are caused by the same failures that result in crashes (Tarko 2020). The higher number of conflicts combined with their similar causations to crashes make them attractive to agencies trying to statistically determine areas for safety improvements. However, conflicts have a disadvantage; they can be difficult to collect, require trained personnel, and can be dependent on the subjective ratings of the observer.

Crowdsourced probe data that provide average segment speeds (Remias et al. 2013) have been commercially available for some time. Recent developments of probe data now include data elements such as hard-braking and acceleration from onboard sensors (Wejo and CtrlShift 2020). These data, aggregated by third-party vendors, can provide agencies with hard-braking events on their roadways.

Motivation

In July 2019, there were over 6 million hard-braking events in Indiana. In contrast, during the same month, there were only 17,652 crashes in Indiana, which represents 0.3% of the total number of hard-braking events. The motivation of this study is to use emerging crowdsourced hard-braking events for agency-wide screening of intersections and approaches for potential safety improvements, and follow-up with mitigation measures addressing emerging problems much quicker than typical practices that rely on 3–5 years of crash data.

Literature Review

In the early years of traffic conflict analysis, a traffic conflict was defined as the occurrence of an evasive maneuver, braking, or a lane change (Older and Spicer 1976). Although there are many studies that analyze traffic conflicts, very few have looked at hard-braking events at a large scale. Bagdadi and Varhelyi presented the critical jerk method to differentiate between critical and potentially critical events (Bagdadi and Várhelyi 2013). In a following paper, Bagdadi compared the critical jerk method to the longitudinal acceleration method in a naturalistic driving study focused on safety critical braking events. The study concluded that the critical jerk method was about 1.6 times better than the longitudinal acceleration method at identifying near-crashes (Bagdadi 2013). Stipancic et al. compared hard-braking events and hard-accelerating events to crash frequency for links and intersections. For both hard-braking events and hard-acceleration events, a positive correlation was found between the number of events and crash frequency for both links and intersections; however, the correlation was stronger for intersections (Stipancic et al. 2018). Li et al. analyzed roughly 1.5 million crowdsourced hard-braking events at signalized intersections, work zones, interchanges, and entry/exit ramps. The study concluded that dilemma zones could be identified by hard-braking events along with work zones that are in need of geometry changes or more advanced warning signs (Li et al. 2020).

Using video camera footage, Essa and Sayed concluded that the highest frequency of traffic conflicts occurred at the beginning of green as the queue is discharged at a low speed while vehicles joining the queue approach at a high speed; nevertheless, they considered most of these conflicts to be low severity (Essa and Sayed 2019). While Mekker et al.’s study focused on free flow and congested conditions on interstates, the study determined that a crash was approximately 24 times more likely to occur in congested conditions than in free flowing conditions (Mekker et al. 2014). One common cause of congestion on interstates is construction activity. Desai et al. found that, in and around interstate work zones, there was approximately 1 crash/mile for every 147 hard-braking events (Desai et al. 2020).

In a study using floating car data, Kveladze and Agerholm developed a GeoVisual Analytics approach to evaluate arterial safety with a focus on vulnerable road users. This study focused on segments far from intersections and other traffic controlling measures and was able to identify times and locations where pedestrians illegally crossed the arterial roads (Kveladze and Agerholm 2020).

One concern of crowdsourced data is the penetration levels required to accurately represent the traffic flow of the network. Day et al. determined in sequential studies that aggregate data at penetration levels ranging from 0.09 to 0.8% provide an acceptable representation of the corridor for actionable corridor retiming recommendations (Day et al. 2017; Day and Bullock 2016).

Proposed Approach

This paper presents a technique that analyzes hard-braking events relative to the stop bar of a signalized intersection. These events are aggregated by weekdays over the period of 1 month. The statistical relation between hard-braking locations relative to the stop bar and historical crash rates is then examined to determine if hard-braking data can provide a scalable approach to screen for potential safety improvement projects.

Study Corridor

This study utilizes weekday hard-braking data collected between July 1 and July 31, 2019 at eight intersections along a corridor on SR-37, south of Indianapolis, IN (Fig. 1b, callout ii). The corridor is a 4- to 6-lane principal arterial with a speed limit of 55 mph. The volume along the corridor varies between 64,000 vehicles/day at the northernmost intersection, 49,000 vehicles/day in the middle of the corridor, and 38,000 vehicles/day at the southernmost intersection. Indianapolis commuters living south of the city use this corridor to commute northbound in the morning and southbound in the evening. The studied intersections (Fig. 1c), in north to south order, are Thompson Rd., Harding St., Epler Ave., Southport Rd., Wicker Rd., County Line Rd., Fairview Rd. and Smith Valley Rd. These intersections run on an actuated-coordinated operation, most of them with a cycle length of 120 s, across four different weekday time-of-day (TOD) plans:

AM peak (AM): 05:00–09:15

Mid-day (MD): 09:15–14:30

PM peak (PM): 14:30–19:00

Evening (EV): 19:00–22:00

Fig. 1
figure 1

Indiana signalized corridor location for hard-braking event study

In addition to showing the location for SR-37 in Indiana, Fig. 1 shows the locations of the over 6 million July 2019 hard-braking events in Indiana (Fig. 1a). Of the 6 million hard-braking events, almost 16,000 occurred along the roughly 6.5 mile corridor (Fig. 1c).

Hard-Braking Events

Data

The hard-braking event data used in this study were made commercially available by data providers that worked directly with original equipment manufacturers (OEMs). The enhanced probe data from these connected passenger vehicles included an anonymized unique identifier with timestamp, geolocation, speed, heading and hard-braking/acceleration as attributes. The provider of this data defined hard-braking events as any vehicle decelerations with a magnitude greater than 8.76 ft/s2 (0.272 g).

In July 2019, there were over 1.5 million events in Greater Indianapolis (Fig. 1b), almost 16,000 events along the study corridor (Fig. 1c), and about 10,000 events at the corridor’s eight intersections. Every weekday, those intersections experienced an average of 321 events, and every hour, they experienced an average of 14 events. The penetration level of this data is estimated to be around 2%.

Methodology

The hard-braking events analyzed in this paper were sorted by intersection, distance from stop bar, and speed at which the vehicle was traveling when the hard-braking event occurred. In this study, the analysis was limited to through movements. A geofence region was drawn along the through lanes for each approach. This upstream region began parallel to the opposing direction’s stop bar and ended 1320 ft, a quarter mile, upstream. Once the geofenced region was defined, the hard-braking events that occurred within those regions were selected, and the GPS locations of each hard-braking event were compared to the location of the stop bar to calculate the distance from stop bar. Figure 2a shows the hard-braking events for an area along the study corridor. Figure 2b shows the upstream geofence regions and the geofenced hard-braking events color coded by speed. The 400 ft boundary, relative to the stop bar, roughly corresponds to the location of the dilemma zone detectors at this intersection (Gazis et al. 1960; Parsonson 1978; Zegeer and Deen 1978).

Fig. 2
figure 2

Visualization of hard-braking event processing

Analysis: Hard-Braking Events by Approach

The hard-braking events are classified by their distance from the stop bar to study the impact of dilemma zone (Gazis et al. 1960; Parsonson 1978; Zegeer and Deen 1978) and queuing. Type II dilemma zone has been defined in previous literature as the road segment where there is a 10–90% probability of a vehicle stopping at the beginning of the yellow light (Parsonson 1978). The occurrence of hard-braking events less than 400 ft (location of advance detector upstream of stop bar at 55 MPH speed limit zone) at lower speeds are possibly due to vehicles stopping for the red light, whereas such occurrences at higher speeds could be due to dilemma zone issues. Hard-braking events occurring at distances greater than 400 ft from the stop bar are potentially due to long queues during oversaturated conditions.

Figure 3 shows the number of weekday hard-braking events occurring at each intersection, stacked by distance from the stop bar, aggregated over the month of July 2019. For both NB and SB approaches, the majority of the hard-braking events occur within 400 ft of the stop bar. However, there are a few intersections (#8 Smith Valley Rd., in NB and #4, Southport Rd., and #5, Wicker Rd. in SB) where more than 40% of hard-braking events occurred more than 400 ft from the stop bar.

Fig. 3
figure 3

Number of weekday hard-braking events by intersection and distance upstream of stop bar

To understand the temporal nature of the hard-braking events and their distances from the stop bar, a heatmap was generated. Figure 4 illustrates a heatmap of the number of hard-braking events, during weekdays in July 2019, on the NB approach over a 24-h period (30-min bins) across two distance categories—less than 400 ft and greater than 400 ft. For the less than 400 ft category, the majority of hard-braking events occur during the AM, MD and PM plans (Fig. 4a), with no clear pattern or trend. For the 400–1320 ft range (Fig. 4b), there are generally fewer hard-braking events, except for perhaps intersection 8 during the PM plan.

Fig. 4
figure 4

Heatmap of weekday hard-braking events by intersection for northbound SR-37, in July 2019

Figure 5 shows a heatmap similar to Fig. 4, for the SB approach. Hard-braking events within 400 ft of the intersection (Fig. 5a) are generally higher for the PM plan, especially at intersection 8, Smith Valley Rd. Figure 5b, which comprises of events occurring beyond 400 ft, shows a different pattern than the northbound approaches. Intersection 4, Southport Rd., and intersection 5, Wicker Rd., experience a large number of hard-braking events during the PM plan. This could be indicative of hard-braking events that occur at the back of long queues during the PM peak period.

Fig. 5
figure 5

Heatmap of weekday hard-braking events by intersection for southbound SR-37, in July 2019

Analysis: Hard-Braking Patterns by Intersection

To further investigate the pattern of hard-braking events, a histogram of the events stacked by speeds are plotted for different time of day plans over their distance from the stop bar. Figures 6 and 7 present the two such patterns, in regard to intersections along the SR-37 corridor, for weekdays between 5:00 AM and 10:00 PM in July 2019.

Fig. 6
figure 6

Southbound approach, SR-37 at Southport Road (Intersection 4)

Fig. 7
figure 7

Southbound approach, SR-37, at Smith Valley Road (Intersection 8)

Figure 6 shows the hard-braking events at the southbound approach of intersection 4, Southport Rd. During the PM time plan (Fig. 6b), hard-braking events are occurring consistently for the entirety of the quarter-mile from the stop bar, with very few of those hard-braking events occurring at speeds over 45 mph. The aerial image in Fig. 6a shows that there are no driveways or bus stops in the region that could be contributing to these hard-braking events.

Figure 7 shows the hard-braking events at the southbound approach of intersection 8, Smith Valley Rd. The PM plan, (Fig. 7b), stands out as having numerous hard-braking events within the 0–400 ft region. In some of the speed bins around 250 ft upstream of the intersection, over 60% of those hard-braking events occur at speeds above 45 mph. Dilemma zone protection is often difficult on coordinated movements as more phases compete for green time and coordinated phases are forced off.

Crash Events

Crash Data

Crash counts were aggregated by intersection using information gathered from Indiana’s online crash repository. Using the provided GPS information, crashes that were located along the corridor within 1320 ft of an intersection were assigned to that intersection. Crashes that were missing geolocation information were manually assigned to intersections on the study corridor, if applicable, by reading through the crash report's narrative.

In Indiana, during July 2019, 17,652 crashes were reported, of which 24 occurred along the roughly 6.5-mile study corridor. 10 of those 24 crashes occurred in the vicinity of an intersection. To perform a statistical correlation test, the crash time frame was increased to a 4.5-year period between January 1, 2016 and July 9, 2020. This increased the intersection crash count to 551 crashes, of which 391 were weekday crashes and 261 of those indicated a rear-end collision.

Distribution of Crashes Among Intersections on Study Corridor

Figure 8 shows a stacked bar graph of the number of crashes categorized by manner of collision that occurred adjacent to the eight intersections along SR-37 on weekdays during the 4.5-year study period. The southbound approach of intersection 4, Southport Rd., stands out as having the most crashes (71 crashes) for the 4.5-year period. Of those 71 crashes, 70% were rear-end collisions. Likewise, the second and third highest crash count approaches, southbound intersection 5, Wicker Rd., and northbound intersection 8, Smith Valley Rd., have 75% and 65%, respectively, of their total crash count as rear-end crashes. Overall, 65% of the 391 recorded weekday crashes on this corridor were rear-end collisions.

Fig. 8
figure 8

Number of weekday crashes by intersection and type on SR-37, between January 1, 2016 and July 9, 2020

Methodology

Similar to the hard-braking events, crashes are filtered by their different attributes. The crashes are characterized by their recorded manner of collision, distance from stop bar, and time of day. Finally, a statistical analysis is completed. The Spearman’s rank-order correlation, Pearson’s correlation, and Kendall’s correlation tests are applied to the hard-braking event data and the crash data for each intersection. Additionally, a sensitivity analysis is done, and a preliminary model is presented. Rear-end crashes represented the largest group of crashes among the eight intersections; therefore, the statistical analysis focuses on the comparison between hard-braking events and rear-end crashes.

Analysis: Crashes by Time of Day

Figure 9 presents a heatmap of weekday crashes aggregated over the study period. Crashes were binned by 30-min periods and assigned to their respective intersections. In the SB approach (Fig. 9b), intersection 4, Southport Rd., and intersection 5, Wicker Rd., stand out in the PM time frame as having a relatively large number of crashes. Visually this is similar to Fig. 5b where Southport Rd. and Wicker Rd. stood out as having a larger number of hard-braking events at a distance of greater than 400 ft from the stop bar.

Fig. 9
figure 9

Heatmap of frequency of weekday crashes between January 1, 2016 and July 9, 2020

Correlation Between Hard-Braking Events and Crashes

Correlation Tests

In addition to the graphical visualizations highlighting similar patterns between crashes and hard-braking events, a several correlation tests are performed to determine if a linear correlation is present. The aggregated July 2019 weekday hard-braking events occurring over a 30-min period are compared with the aggregated 4.5-year period rear-end crashes occurring over the same 30-min period. First, a simple Spearman rank-order correlation test (Spearman 1904) is conducted to evaluate the monotonic relationship between a pair of data. The correlation coefficient, rs, represents the strength of that relationship. There are many interpretations in the literature (Dancey and Reidy 2007; Chan 2003) on coefficient thresholds, but this study utilizes a conservative interpretation suggested by Evans (1996) as seen in Table 1.

Table 1 Interpretation of correlation coefficent—Spearman

Tables 2 and 3 show the results of the Spearman test conducted at 95% and 99% confidence levels and highlights intersections with a strong correlation, for NB and SB respectively. Results indicate a strong correlation between rear-end crashes and hard-braking events past 400 ft of the stop bar at NB intersection 8, Smith Valley Rd., and SB intersection 4, Southport Rd., and intersection 5, Wicker Rd. A check in the strong correlation box is used if the rs value exceeds the 0.6 threshold shown in Table 1.

Table 2 Spearman’s correlation between intersection rear-end crash counts and number of hard-braking events by distance, for northbound SR-37
Table 3 Spearman’s correlation between intersection rear-end crash counts and number of hard-braking events by distance, for southbound SR-37

Interestingly, while SB intersection 8, Smith Valley Rd. experienced a high number of high-speed hard-braking events within 250 ft of the stop bar (Fig. 7b), this location does not exhibit a strong correlation to rear-end crashes as suggested by prior conflict models (Sharmaet al. 2011).

Along with the Spearman’s rank-order correlation test, a Pearson’s and a Kendall’s correlation test are performed (Pearson 1895; Kendall 1990). Table 4 presents the coefficient interpretations used by this study for the Pearson’s and Kendall’s correlation tests (Mukaka 2012).

Table 4 Interpretation of correlation coefficient—Pearson and Kendall

The results for the Pearson’s and Kendall’s correlation test in SB direction are shown in Tables 5 and 6, respectively. The three intersections that are shown by the Spearman’s correlation test to have a strong correlation are also shown to have a moderate correlation by the Pearson’s and Kendall’s correlation tests. In addition to those three intersections, the Pearson’s correlation test also identifies SB intersection 1, Thompson Rd. and SB intersection 4, Southport Rd., as having a moderate correlation between number of hard-braking events and rear-end crashes in the under 400 ft region.

Table 5 Pearson’s correlation between intersection rear-end crash counts and number of hard-braking events by distance, for southbound SR-37
Table 6 Kendall’s correlation between intersection rear-end crash counts and number of hard-braking events by distance, for southbound SR-37

Sensitivity Analysis

To determine if one month of hard-braking event data are sufficient to suggest a reasonable correlation between hard-braking events and crashes, a sensitivity analysis using Spearman’s correlation is performed. While this study primarily uses one month of hard-braking data collected from July 2019, the sensitivity analysis includes data from July and August 2019. Figure 10 shows the results of this analysis. The two plots in Fig. 10 show that the rs values plateau around 4 weeks’ worth of data. This suggests that one month of hard-braking data is sufficient to result in a reliable correlation with over 4.5 years’ worth of crash data.

Fig. 10
figure 10

Sensitivity analysis for Spearman correlation between hard-braking events and rear-end crashes for 8 weeks in July and August 2019

Statistical Modelling

To explore the relationship between number of hard-braking events, volumes, and other intersection attributes and the number of crashes, a statistical model is developed. The response variable in this study, the number of crashes across the eight intersections by 30-min bins, is discrete, non-negative integers which are typically modeled by a count data model. Commonly, these count data models are either a Poisson model or a negative binomial regression model (Washington 2010).

The Poisson model, which is often used to model rare-event count data, like crashes, requires the mean and variance of the response variable to be equal. When the variance is greater than the mean, there is over-dispersion in the data, which requires complex models such as the negative binomial model.

The Poisson model assumes the response variable y has a Poisson distribution and that the logarithm of expected values can be modeled as linear. The Poisson probability density function is given by

$$\Pr \{ Y = y_{i} \} = \frac{{e^{ - \mu } \mu^{{y_{i} }} }}{{y_{i} !}},$$
(1)

where μ is the Poisson parameter. When μ > 0, the mean and variance are equal to the expected number E(Y). Typically, the relationship between the explanatory variables and the Poisson parameter is a log-linear model,

$$\mu = e^{{\beta X_{i} }} ,$$
(2)

where Xi is a vector of explanatory variables and β is a vector of estimable parameters.

The data is considered over-dispersed, when the variance of the response variable is larger than its mean. This can typically be modeled using a negative binomial model, which can be derived from (2). For each observation i

$$\mu_{i} = e^{{(\beta X_{i} + \varepsilon_{i} )}} ,$$
(3)

where \({e}^{{\upvarepsilon }_{\text{i}}}\) is a Gamma-distributed disturbance term with a mean of 1 and a variance of α (Washington 2010). The added disturbance term allows the variance and the mean to differ as shown below

$$VAR[y_{i} ] = E[y_{i} ][1 + \alpha E[y_{i} ]] + \alpha E[y_{i} ]^{2} .$$
(4)

The probability density function for the negative binomial model is defined as

$$P(y_{i} ) = \frac{{\Gamma ({{(1} \mathord{\left/ {\vphantom {{(1} \alpha }} \right. \kern-\nulldelimiterspace} \alpha }) + y_{i} )}}{{\Gamma {{(1} \mathord{\left/ {\vphantom {{(1} \alpha }} \right. \kern-\nulldelimiterspace} \alpha }) + y_{i} !}}\left( {\frac{{{{(1} \mathord{\left/ {\vphantom {{(1} \alpha }} \right. \kern-\nulldelimiterspace} \alpha })}}{{{{(1} \mathord{\left/ {\vphantom {{(1} \alpha }} \right. \kern-\nulldelimiterspace} \alpha }) + \mu_{i} }}} \right)^{{{{(1} \mathord{\left/ {\vphantom {{(1} \alpha }} \right. \kern-\nulldelimiterspace} \alpha })}} \left( {\frac{{\mu_{i} }}{{{{(1} \mathord{\left/ {\vphantom {{(1} \alpha }} \right. \kern-\nulldelimiterspace} \alpha }) + \mu_{i} }}} \right)^{{y_{i} }} .$$
(5)

The Poisson model is a special case of the negative binomial model for when α, also known as the over-dispersion parameter, is considered to be equal to zero. The generalized linear model of the mean µ on the predictor vector Xi is formulated as

$$L(\mu ) = \beta_{i} X_{i}^{T} .$$
(6)

Table 7 shows the descriptive statistics for the model variables.

Table 7 Descriptive statistics for model variables

The Poisson and negative binomial models are considered. The variance of the response variable is larger than the mean of the response variable indicating the data may be over dispersed and favoring a negative binomial model. However, under the negative binomial model the overdispersion parameter is not significant. Therefore, the Poisson model is selected.

Table 8 presents the results of the data. Of the seven variables, only two, hard-braking and volume were found to be significant. The McFadden ρ2 is an indicator of the overall fit of the model and is given by

$$\rho^{2} = 1 - \frac{{{\text{LL}}(\beta )}}{{{\text{LL}}(0)}},$$
(7)
Table 8 Estimation results

where LL(β) is the log-likelihood at convergence with a parameter vector β and LL(0) is the initial log-likelihood. Varying between zero and 1, a ρ2 closer to one indicates a better model. The ρ2 statistic for the given model is estimated to be 0.19, pointing to the preliminary nature of this model and the limited nature of the hard-braking event dataset. The parameters show that the number of rear-end crashes increase significantly with an increase in hard-braking event counts, which is fairly intuitive. Additionally, it also follows naturally that an increase in volumes will lead to an increase in rear-end crash counts according to the model's parameters.

Summary and Discussion

This study compared crash data over a period of 4.5 years (January 2016 to July 2019) at eight signalized intersections with one month of hard-braking data (July 2019) to determine if there was a statistical relationship between crashes and hard-braking events. Geospatial analysis was conducted on more than 6 million records to associate nearly 7,000 hard-braking events occurring on this corridor. Graphical illustrations comparing aggregated hard-braking events and crashes (Figs. 4, 5 and 9) demonstrated a visual relationship between the two data sets. Statistical tests showed that three intersections (8 NB, 4 SB, and 5 SB) had a strong correlation between rear-end crashes and hard-braking events occurring past 400 ft from the stop bar (Tables 2 and 3). The same three intersections showed high hard-braking counts, farther away than 400 ft from the stop bar, in comparison with the rest of the corridor (Figs. 4b and 5b). This could indicate that the hard-braking and rear-end crashes correlation is stronger at locations where vehicles are hard-braking far away from the stop bar (perhaps due to long queues). Results from the sensitivity analysis showed that a sample size of at least 4 weeks of hard-braking events is needed to result in reliable correlation with crash data (Fig. 10). Finally, results from the statistical modelling illustrated that the number crashes can significantly increase with increase in number of hard-braking events and volume (Table 8).

The correlation shown between rear-end crashes and hard-braking events is particularly beneficial to agencies because statistically valid data can be collected in a month or two, instead of waiting the traditional 3–5 years for a statistically significant number of crashes to occur. Histograms like the one shown in Figs. 6 and 7, can provide agencies with a high fidelity perspective on exactly where those events may be clustered to assess potential mitigation measures.

The techniques described in this paper are also scalable to larger numbers of intersections and corridors. Agencies could implement this method to assess all traffic signals within an urban area or an entire state. Such analysis would be a relatively modest effort, and perhaps more importantly, require no investment in traffic signal infrastructure to collect this performance measure data.