A comparative study on crash-influencing factors by facility types on urban expressway

This study aims at identifying crash-influencing factors by facility type of Nagoya Urban Expressway, considering the interaction of geometry, traffic flow, and ambient conditions. Crash rate (CR) model is firstly developed separately at four facility types: basic, merge, and diverge segments and sharp curve. Traffic flows are thereby categorized, and based on the traffic categories, the significances of factors affecting crashes are analyzed by principal component analysis. The results reveal that, the CR at merge segment is significantly higher than those at basic and diverge segments in uncongested flow, while the value is not significantly different at the three facility types in congested flow. In both un- and congested flows, sharp curve has the worst safety performance in view of its highest CR. Regarding influencing factors, geometric design and traffic flow are most significant in un- and congested flows, respectively. As mainline flow increases, the effect of merging ratio affecting crash is on the rise at basic and merge segments as opposed to the decreasing significance of diverging ratio at diverge segment. Meanwhile, longer acceleration and deceleration lanes are adverse to safety in uncongested flow, while shorter acceleration and deceleration lanes are adverse in congested flow. Due to its special geometric design, crashes at sharp curve are highly associated with the large centrifugal force and heavy restricted visibility.


Introduction
Improving traffic safety is a worldwide issue to be relieved urgently. Crash characteristics and their influencing factors, as the theoretical basis for safety improvement, may provide direction for policies and countermeasures aimed at smoothing hazardous conditions. For a better understanding of crash-influencing factors, researchers have continually sought ways through an extensive array of approaches, and the most prominent one is crash data analysis [1]. The conventional approaches have established statistical links between crash rate (CR) and its explanatory factors [2,3]. In the analyses, traffic flows are generally represented by low-resolution data that is collected at a highly aggregated level, e.g., hourly or daily flows. Geometric features are primarily considered the hierarchy of radius or slope [4,5]. Meanwhile, several studies have suggested that crashes are associated with the interaction of geometry, traffic flow, and ambient conditions [6]. However, most existing studies investigated the factors individually and the related CR models were developed based on single factor only. As a result, it is inadequate to identify the nature of individuals through aggregated analysis only, since the conditions preceding individual crashes are virtually different from each other [1].
Considering the insufficiency of CR analysis above, some studies have tried to identify crash characteristics at individual level, in an effort to predict crash risk on a real time basis [7][8][9]. Through these studies, the effect of traffic flow on crash risk has been well analyzed. In theory, the concept of real-time crash prediction exhibits huge promise for the application of proactive traffic management strategies for safety.
However, the combined effects of geometry, traffic flow, and ambient conditions on crashes still have not been well anayzed through the above studies. Furthermore, these papers primarily developed crash model for the whole traffic conditions, which may conflict with the fact that the influence of traffic flow on crashes may vary when traffic conditions change. In addition, even if crash characteristics are found out to be dependent on facility type that is composed of uniform segment individually, e.g., basic, merge, and diverge segments [2], the existing studies are focused on the entire route of intercity expressway without segmentation.
Another cause for the limited predictive performance of existing models is the inadequacy of analytic process [10]. As for statistical methods, the significance and independence of explanatory variables should be identified in advance for the reliability of statistics. Whereas, many previous studies paid little attention to this point and incorporated the potential influencing factors into crash modeling directly.
Urban expressway is one common type of separated highway with full control of access in large cities in Japan. Generally, it is composed of various facility types where geometric features and traffic characteristics are often different from each other. Correspondingly, crash characteristics and their influencing factors may also be different by facility type. In the meantime, compared to intercity expressway, crash characteristics and their related influencing factors of urban expressway are different [11]. Necessarily, urban expressway deserves to be analyzed independently and its crash characteristics should be identified based on specific facility type.
Given the problems of existing studies, the objective of this paper is to investigate crash characteristics based on CR models and their influencing factors by facility type on urban expressway. Meanwhile, the causes are identified by considering the interaction of geometry, traffic flow, and ambient conditions. Besides, geometric features are identified considering the driver-vehicle-roadway interaction. The significances of these factors affecting crashes are compared at different facility types using principal component analysis (PCA). Their influencing mechanisms are further discussed. In essence, this study can be regarded as a proactive analysis for crash risk prediction model in the future.

Study sites
The test bed of this study is Nagoya Urban Expressway network (NEX) as shown in Fig. 1. Up to December 31, 2009, this network was about 69.2 km 9 2 (two directions) in total length with over 250 ultrasonic detectors installed with an average spacing of 500 m (varied in 250-750 m) on mainline. Most routes are 4-lane roadways (2-lane/dir), except the inner ring (Route no. R) that is one-way roadway and where the number of lanes differs (2)(3)(4)(5) with the change of ramp junctions. In the limited areas, such as the links of other routes to the inner ring, small curves are designed. In this network, two recurrent bottlenecks are located along Odaka line (Route no. 3).
Five databases are used in this study; (1) crash records with the occurrence time in minutes, the location in 0.1 km and the weather and pavement conditions; (2) detector data including traffic volume q, average speed v, and occupancy occ per 5 min; (3) geometric design and the location of detector in 0.01 km; (4) traffic regulation records for incidents (e.g., crash, working, and inclement weather) including the locations and periods of temporal lane and cross-section closures; and, (5) daily sunrise and sunset time records in Nagoya. Here, it is worth noting that detector data are processed for the whole cross section of each direction. The period of the data above is for 3 years (2007-2009) except for those on Kiyosu line (Route no. 6) that was opened from December 1, 2007. Basic segment is extracted outside the 500 m up-and down-stream of ramp junctions considering the experience in Japan [12]. Correspondingly, merge or diverge segments are regarded as the sections inside the 500 m up-and down-stream of on-and off-ramps, respectively. The segmentation methods are shown in Fig. 2. Other than these segments, there is a special geometric design in NEX, curve with small radius. Figure 3 explains CR statistics dependent on radius. Obviously, compared to other segments, much higher CR exists in the curves with radius smaller than 100 m. Thus, these curves are defined as sharp curves and regarded as another distinct facility type of NEX. Given the limitation of segment samples available, basic, merge, and diverge segments, and sharp curve will be analyzed in this study. The cross sections of inner ring are diverse and the length of individual layouts, i.e., 2-, 3-, 4-, or 5-lane, is not enough to be separately analyzed. Meanwhile, all of the sharp curves along Inner ring are 2-lane roadway. In this regard, only 2-lane segments are analyzed in this study and the geometric statistics by facility types are summarized in Table 1.

Detector data
In principle, detectors can count the number of vehicles at their locations only. In such case, the ''coverage area'' of detector is defined for estimating traffic conditions at crash locations through detector data. At basic segment, the boundary of two consecutive coverage areas is defined at the midpoint between two neighboring detectors. At merge and diverge segments, it is bounded at the ramp-junction point, and one segment can be divided into up-and downstream areas. Each sharp curve can be matched with a single detector. Note that the time of crash is recorded by road administrators after the crash occurrence. In reality, it does not correspond to occurrence time exactly. For this reason, data within small time before crashes should be rejected to avoid mixing up crash-influencing and crashinfluenced data. Therefore, the latest data at least 5 min before the recorded time are accepted after the exclusion of invalid data and the data within lane and section-closure intervals in advance.

Geometric features
Design consistency is the conformance of geometry of a highway with driver expectancy, and its importance and significant contribution to road safety is justified by understanding the driver-vehicle-roadway interaction [13] that may vary at individual locations in nature. In this regard, geometric variation in the upstream of crash location is proposed to reflect the effect of geometry on crashes. Considering the length of detector coverage area, the following variables in 500 m distance are extracted [12].  (1) Variation in road elevation h between the crash location and its 500 m upstream, and the maximum elevation difference H in this 500 m distance (Fig. 4).
(2) Horizontal displacement S. Radius is impossible to describe a section composed of various curves. On the other hand, centrifugal force is also associated with the horizontal displacement s in the direction of tangent to the curve j (Fig. 5). In such case, S in the 500 m distance (Rs j ) is adopted and calculated by the following equations.
where j is the ID of curve. R j , h j , L j , and s j correspond to the radius, central angle, arc length, and horizontal displacement of curve j, respectively.
(3) Index of centrifugal force I CF . Speed v always has a square relation with centrifugal force. This study designs I CF (I CF = Sv 2 ) to reflect the combined effect of speed v and horizontal displacement S, while it is not centrifugal force.
(4) Index of space displacement I SD . I SD (I SD = SH) is used to reveal the comprehensive geometric features induced by horizontal and vertical variation in this study.
The geometric data above are collected every 0.1 km as crash is recorded in a unit of 0.1 km. Besides, these data are also extracted at the location of detector that is the common link between crash and detector data. Table 2 summarizes the process of data collection.

Ambient conditions
Common, prevailing, and uncontrolled environment and weather conditions are defined as ambient conditions. They are (1) ambient light classified into daytime and nighttime, which are the period from sunrise to sunset and from sunset to sunrise, respectively; (2) sunny, cloudy, and rainy weather conditions at the time of crash; (3) dry and wet pavement conditions at the location of crash; and, (4) day type on crash days including holiday and weekday. Here, holiday includes all weekends, and all national and traditional holidays like the Golden Week in May and the Obon Week in August in Japan.

Data matching
The related detector data, geometric features, and ambient conditions for individual crashes are matched as exemplified in Table 3. The crashes matching with invalid detector data and within lane and cross-section closure intervals are excluded in advance. As a result, a total of 1,591 crashes remain for the following analysis.

Classification of traffic conditions
Congested flow, characterized by traffic oscillation, has different features from uncongested flow. It is necessary to make a distinction between two traffic regimes. Figure 6 shows the traffic volume-speed diagram at Horita on-ramp junction, one typical bottleneck in NEX. The speed of 60 km/h, corresponding to maximal flow is defined as the critical speed v c that is used for classifying un-and congested flows [2,14]. Besides, the corresponding value at another bottleneck (Takatsuji on-ramp junction) is also found out around 60 km/h. The value of 60 km/h would be regarded as the related index at basic and diverge segments, since no bottleneck can be virtually found at both segments in NEX. At sharp curve, a threshold speed of 45 km/h is selected in general for classifying two traffic regimes based on traffic flow-speed diagram at Tsurumai curve (Fig. 7). The value is further checked at other sharp curves, and it is found out to be reliable for classifying un-and congested flows basically.
To reflect the variation in traffic characteristics, each traffic regime is further sub-classified. It is evident that speed has a high variance at low flow rates (see Figs. 6 and 7). Besides, occupancy is not a commonly used index. Thus, traffic density k calculated by Eq. (3) is proposed to be the measure of effectiveness to further classify the traffic conditions. In view of the number of crash samples available, the aggregation intervals of k are set as 10 and 30 veh/km for un-and congested flows, respectively.
where q i and v i denote traffic flow and average speed in 5 min # i, respectively. k ei corresponds to the calculated traffic density in this 5 min.

Calculation of crash rate (CR)
CR for traffic condition n can be calculated by the following equation: where n and l are the ID of traffic condition and coverage area, respectively; NOC n is the number of crashes for traffic condition n. Q nl L l is the value of vehicle kilometers traveled (VKMT) in detector coverage area l for traffic condition n.    PCA is a powerful tool for reducing a large number of observed variables into a small number of artificial variables that account for most of the variance in the dataset [15]. In general, through orthogonal transformation, a set of observations of possibly correlated variables can be converted into a set of values of linearly uncorrelated variables. Those converted values are defined as principal components. Technically, a principal component can be regarded to be a linear combination of optimally weighted observed variables [15]. As a result, the components are ranked in the order of accounting amount of total variance in the observed variables. Then, two criteria are generally available to select the number of component extracted: (1) 80 % rule, the extracted components should be capable to explain at least 80 % of the variance in the original dataset.
(2) Eigen value rule, only components whose eigen values are over 1.0 can be retained.

Crash rate estimation models
In the following, the differences of crash characteristics by facility type are investigated by comparing CR models based on traffic conditions. Figure 8 gives the CR tendency following traffic density k by facility type in uncongested flow. It is evident that sharp curve has a special characteristic compared to other segments. Its CR is the highest among four facility types at low-density stages. Then, the value follows a decreasing tendency to k. In contrast, the CR at other segments increases as k increases. Such phenomenon may be related to the design of small radius for sharp curve. Such geometric design can result in high centrifugal force that can act on the vehicle and tries to push it to the outside of the curve. Furthermore, higher speed may result in higher centrifugal force. Regarding the differences at other segments, CR at merge segment increases rapidly at high-density stages and gets much higher compared to basic and diverge segments. The results of paired t-test at the three facility types in Table 4 also reveal that CR at basic/diverge segments is significantly lower than that at merge segment, while they are not significantly different from each other between basic and diverge segments. At merge segment, merging maneuvers can result in slow-down and lane-changing behaviors for mainline traffic. These interruptions may increase the possibility of vehicle conflicts. Such possibility can further increase with an increase in k. Table 5 summarizes the CR regression models as function of k as well as the goodness-of-fit of models at four facility types. At sharp curve, the model is power function while they are quadratic functions at other facility types. All of the models and variables are significant at 95 % confidence level (not shown in Table 5). Regarding quadratic models, CR at merge segment is most sensitive to the increase in k, more than three times of CR increases as that at basic and diverge segments by the increase in one unit of k. Figure 9 describes the differences of CR distribution to k by facility type in congested flow. It appears that CR follows increasing tendencies to k at four facility types. In contrast to other segments, sharp curve still has the highest CR in congested flow while no statistical regression model is developed at this facility type due to the limited crash samples. Since the differences of CR tendency at other segments are not clear in Fig. 9, a paired t-test is conducted as shown in Table 6. The results indicate that there is no     Table 7, an exponential function is adopted and it fits well to the combined CR tendency. The model and its variables are also significant at 95 % confidence level, while the results are not shown in Table 7.

Effects of influencing factors
The analyses above reveal that CR characteristics are different by facility type, which may be related to the different geometric designs and traffic characteristics. However, CR analysis is insufficient to examine a variety of factors by a single model. Instead, PCA is applied and the affecting mechanisms of individual factors are further investigated. Table 8 explains individual variables combining with its type and some summary statistics. In nature, traffic flow diagram is two-dimensional, and k and v are used together to describe traffic conditions. As for geometric features, h, I CF , and I SD are picked out to reflect the vertical, horizontal, and comprehensive geometric variations, respectively. Dummy variables are referred to incorporate ambient conditions into PCA. A dummy variable usually takes 0 and 1. In this case, weather conditions (over 2 categories) are replaced by pavement conditions (only 2 categories), since two conditions are usually highly related to each other. At merge and diverge segments, ramp traffic is a significant influencing factor on crashes [16]. This study employs ramp flow ratio to illustrate the interaction between ramp and mainline traffic. Merging ratio (MR) or diverging ratio (DR) is defined as the proportion of on-or off-ramp traffic out of the sum of ramp and mainline traffic, respectively. Meanwhile, the length of acceleration lane L A or the length of deceleration lane L D is adopted to reveal the space available provided for merging or diverging maneuvers, respectively.

PCA among various facility types
In essence, PCA rotates data by using a linear transformation. Consequently, only the monotonic loadings of factors can be reflected by this approach. For this reason, uncongested flow is further classified into low-and highdensity conditions at approximately 25 veh/km in view of the value of k (CR min ) as shown in Fig. 10, since there are different monotonicities of CR model in two conditions. As a result, three traffic conditions are analyzed, i.e., low-and high-density uncongested flow as well as congested flow. Table 9 demonstrates PCA results at basic segment in lowdensity uncongested flow. In terms of the rules introduced in Sect. 3.4, four components are selected and all of the factors can explain at least 80 % of variance in the original dataset in terms of the value of cumulative percent.

Low-density uncongested flow
In low-density uncongested flow, crashes at basic segment are found to be significantly associated with geometric variation (I CF and I SD ), traffic density along with ambient light, speed coupled with pavement, and vertical variation h. Geometric variation is the 1st component, as great variation may result in frequent speed reduction. Accordingly, the difficulty for drivers to control vehicle behaviors increases. At low traffic density k, driver's attention is not high, and some discretionary behaviors may be operated. Such condition combining with the poor ambient light is possible to increase crash risk. Meanwhile, due to the reduced value of tire-pavement friction, high speed v combining with wet pavement can reduce the roadability. In such cases, k and v are two separate components, which can further demonstrate that both variables are not highly interrelated at low flow rate. In addition, vertical variation h has a positive loading because of the increased visibility restriction and the difficulty in maintaining vehicle behaviors for drivers.
Principal components at other segments are analyzed as shown in Table 10. The variables that are significantly related to each component are selected based on their loadings. For judging the relative significance of the same component by facility type, the percent of variance explained by each component is provided as well.
One difference at merge segment from basic segment is that MR combining with the length of acceleration lane L A becomes a principal component. Meanwhile, day type is found to be significant. In terms of the percent of variance accounted by components, the significance of geometric variation gets lower in contrast to basic segment. Merging traffic is an important influencing factor, since it can induce interruption to mainline traffic. Such interruption may get stronger as MR increases. Besides, higher L A can provide more space for ramp and mainline traffic to adjust for   The variables highly related to each component are in bold Comparative study on crash-influencing factors 231 merging behaviors. Regarding the influence of day type on crashes, it may be related to the different vehicle compositions and driver populations between holiday and weekday, while such influence needs a further study to investigate vehicle behaviors at merge segment. As for geometric variation, on ramps in NEX are virtually allocated far from poor alignment like small curve. Thus, it is considered reliable that the significance of geometric variation affecting crashes is lower at merger segment compared to basic segment. At diverge segment, the most significant difference from basic and merge segments is that the DR and the vertical variation h are related to the 1st component. Higher DR can significantly interrupt mainline traffic since it is necessary to pass through several lanes to move onto the deceleration lane for driving vehicles. Furthermore, higher h can make lane-changing maneuvers more difficult.
Generally, sharp curve has much worse design consistency compared to other segments. Crashes at sharp curve are found to be associated with poor vertical consistency (I SD and h), high horizontal variation I CF along with speed v, low traffic density k in nighttime, wet pavement, and holiday. In NEX, sharp curve is often designed to connect routes with different elevations. Thus, the vertical consistency is fairly poor. Smaller radius along with high v may cause notable centrifugal force. The affecting mechanisms of other component are similar to these at basic, merge, and diverge segments.

High-density uncongested flow
As traffic density increases, the inter-vehicle interaction gets more intensive. The corresponding results of PCA in high-density uncongested flow are summarized in Table 11. All of the components are of statistical significance.
In the case of high-density uncongested flow, it is distinct that traffic-related variables including k and v become an independent component, as a reflection of the increased interaction of vehicles. Furthermore, in terms of the value of loading, high density not low density is adverse to safety. The finding can further support the results of CR models: CR is decreasing to k in low-density uncongested flow, while it is increasing in high-density uncongested flow.
With respect to the differences by facility type, at merge segment, MR gets to be a factor related to the 1st component due to the increased interruption of ramp traffic with the increase of traffic density. During the variation in traffic conditions, the significance of DR becomes lower than geometric variation at diverge segment. However, in high-density uncongested flow, L D is more important in contrast to low-density uncongested flow. Once a driver feels the difficulty for lane-changing maneuvers in diverging area, they may move onto the nearest lane to offramp in advance in the upstream of diverging area. As a result, the impact of lane-changing maneuvers on mainline traffic gets relatively low. In a sharp curve, crashes are still found to be probable with a decrease in k, which is similar to the tendency of CR model.

Congested flow
With the further increase of traffic density, congested flow appears. In the same way, Table 12 summarizes the results of PCA by facility type in congested flow. F factor, L loading Table 12 demonstrates that the effect of traffic flow on crashes get more important in congested flow, compared to that in uncongested flow. Except merge segment, the significance of traffic flow affecting crashes is the highest. Based on the percent of variance, the influence of geometric design is further decreasing.
Regarding the differences by facility type, crashes at merge segment are found to be positively associated with smaller L A , not higher L A . For congested flow, smaller L A may increase the difficulty of adequate speed adjustment for merging and lane-changing maneuvers. Besides, based on the loading of day type, weekday not holiday is a significant factor. It is likely related to higher percentage of heavy vehicles on weekday that may induce more frequent shockwave in congested flow. At diverge segment, as similar to merge segment, weekday is also a significant factor. Meanwhile, smaller L D not higher L D is adverse to safety. At sharp curve, poor ambient light can significantly restrict visibility, while visibility is critical for driving in small inter-vehicle spacing. Thus, ambient light becomes another important factor in congested flow compared to high-density uncongested flow.
From the analyses above, geometric features are found out to be the most significant influencing factor in uncongested flow. In this sense, the different CR characteristics by facility type in uncongested flow may be significantly associated with the variation in geometry. Poor design consistency induced by small radius is the potential cause for the highest CR in sharp curve. Ramp traffic can interrupt mainline traffic, and longer acceleration lane may provide longer interruption area. Both features can increase crash risk at merge segment. A lot of diverging traffic may move onto the lane nearest to deceleration lane in advance in the upstream of diverging area, since urban expressway carries a lot of commuters and many drivers are familiar with road structure. Hence, even if DR and L D are found out as significant influencing factors, CR at diverge segment is not significantly higher than that at basic segment.
As traffic density increases, the effects of traffic-related variables increase and get more significant than geometry in congested flow. In this condition, once a breakdown initiates at bottlenecks, it can propagate to upstream section that may consists of several facility types, where traffic conditions are not significantly different. As a result, the difference of CR characteristics at basic, merge, and diverge segments is not significant. However, due to the heavily restricted visibility induced by the special geometric design, sharp curve still has higher CR than other facility types.

Conclusions and future work
This paper identified the different CR characteristics by facility type of Nagoya Urban Expressway. In uncongested flow, CR at basic, merge, and diverge segments appears convex downward to traffic density. In contrast, the value at sharp curve follows a decreasing tendency. In congested flow, CR at four facility types increases as traffic density increases. In both un-and congested flows, sharp curve has the worst safety performance in view of its highest CR among the four facility types. As for other segments, merge Comparative study on crash-influencing factors 233 segment has higher CR compared to the basic and diverge segments in uncongested flow. Comparatively, CR at three facility types is not significantly different in congested flow.
The causes of the differences were further investigated by focusing on traffic conditions and considering the interaction of geometry, traffic flow, and ambient conditions. Generally, geometric features are the most significant factors in uncongested flow. With the increase of traffic density, the effects of traffic-related variables increase and become most significant in congested flow. For ramp traffic, the significance of MR affecting crashes is on the rise as mainline flow increases. In contrast, the significance of DR gets decreasing. In addition, higher L A and L D are adverse to safety for uncongested flow, while smaller L A and L D are adverse for congested flow. Crashes at sharp curve are highly associated with the after effects of its special geometric design, such as large centrifugal force and heavy restricted visibility.
The potential benefits of integrating these findings in safer geometric design and traffic control are numerous. The analysis can provide a basis for geometric audit for safety regarding design consistency. Meanwhile, based on the estimated CR models, road administrators can easily image the safety performance with the variation of traffic conditions at a given facility type. Furthermore, PCA results may help prioritize countermeasures and further estimate the safety performance of an adopted countermeasure.
For more accurate analysis of crash characteristics, data in smaller time window, e.g., 1 min even 30 s, are highly recommended to improve the reliability of statistics, since crash occurrence is significantly associated with the shortterm turbulence of traffic flow [1]. Furthermore, it is better to examine the effect of inter-lane interaction on crashes if the lane-based data is available. In this study, ramp traffic is found out to play a significant role for safety at merge and diverge segments. Thus, a microscopic analysis on driver behavior is needed. In essence, PCA is a qualitative analysis and the results are insufficient for applying specific countermeasures for a given case. Future studies are expected to acquire the quantitative effects of various influencing factors on crashes.