Application of multinomial and ordinal logistic regression to model injury severity of truck crashes, using violation and crash data

In 2016 alone, around 4000 people died in crashes involving trucks in the USA, with 21% of these fatalities involving only single-unit trucks. Much research has identified the underlying factors for truck crashes. However, few studies detected the factors unique to single and multiple crashes, and none have examined these underlying factors to severe truck crashes in conjunction with violation data. The current research assessed all of these factors using two approaches to improve truck safety. The first approach used ordinal logistic regression to investigate the contributory factors that increased the odds of severe single-truck and multiple-vehicle crashes, with involvement of at least one truck. The literature has indicated that past violations can be used to predict future violations and crashes. Therefore, the second approach used risky violations, related to truck crashes, to identify the contributory factors to the risky violations and truck crashes. Driver actions of failure to keep proper lane following too close and driving too fast for conditions accounted for about 40% of all the truck crashes. Therefore, the same violations as the aforementioned driver actions were included in the analysis. Based on ordinal logistic regression, the analysis for the first approach indicated that being under non-normal conditions at the time of crash, driving on dry-road condition and having a distraction in the cabin are some of the factors that increase the odds of severe single-truck crashes. On the other hand, speed compliance, alcohol involvement, and posted speed limits are some of the variables that impacted the severity of multiple-vehicle, truck-involved crashes. With the second approach, the violations related to risky driver actions, which were underlying causes of severe truck crashes, were identified and analysis was run to identify the groups at increased risk of truck-involved crashes. The results of violations indicated that being nonresident, driving off-peak hours, and driving on weekends could increase the risk of truck-involved crashes. This paper offers an insight into the capability of using violation data, in addition to crash data, in identification of possible countermeasures to reduce crash frequency.


Introduction
The importance of trucking industry to the economy is well acknowledged. Each year, trucks move 80% of all freight in the USA accounting for over $700 billion worth of goods [1]. The trucking industry in the nation transfers about 10.5 billion tons every year, which is expected to increase to 27 billion tons by 2040 [2]. In addition, seven million people are employed through this industry.
However, truck crashes contribute to a large number of crashes every year. These crashes place a huge burden on the nation in terms of death and injury. According to federal motor carrier safety administration (FMCSA), there were 667 truck occupant deaths (driver and passenger), which resulted in 398 deaths only in single-vehicle crashes [3].
Wyoming has the highest fatality rate (24.7 deaths per 100,000 population) in the USA [4]. In addition, Wyoming has the highest truck crash rate in the nation, 0.52 crashes per million vehicle mile travel (MVMT) compared with the national average of 0.26 crashes/MVMT [5]. The high truck crash and fatality rates result from the high amount of through truck traffic on Wyoming interstates, adverse weather conditions, and mountainous geometric conditions, for example steep grades and high mountains [6,7].
However, truck crashes can be alleviated by improving truck safety. This improvement can be reached through policies and regulations, which enhance the performance of the trucking industry without compromising safety. The performance of Wyoming highway patrol (WHP), and consequently road safety, in reduction of truck-related crashes could be enhanced by identification of the factors that increase the odds of future violation and consequently future crashes [8,9]. Thus, this study incorporates violation data, in addition to crash data, to identify the contributory factors to the violations that are likely to increase the odds of future severe truck crashes. Identification of these factors can help the WHP to put more emphasis on the contributory factors of risky violations resulting in truck crashes.
Out of 700 truck occupant deaths that occur annually in the USA, 60% occur in single-truck crashes [10]. There are important unique factors to single-and multiple-vehicle crashes involving trucks [11][12][13]. Therefore, this study analyzed single-truck and multiple-vehicle crashes, with truck involvement, separately. This study incorporated violation data to identify the groups, including driver, vehicle, and temporal characteristics, at higher risk of severe truck crashes. For the purpose of this study, a truck is defined as a commercial vehicle with gross vehicle weight rating greater than 4536 kg (10,000 lb).

Literature review
Identification of contributory factors to truck crashes is a vital part of the traffic safety improvement [14]. The causes of large truck crashes can be attributed to driver (87%), non-performance (12%), recognition (28%), decision (38%), performance (9%), and vehicle (10%) [15]. Considering that there is a vast body of research studying the contributory factors to severe truck crashes, the literature review is mainly focused on studies on identification of factors contributing to severe truck crashes, and then it will discuss the importance of studying the violation data to improve truck safety in more efficient ways.
Duncan et al. [16] used the ordered probit model to identify the impacts of different factors on injury of truckinvolved crashes. The results indicated that driving on dryroad conditions and non-congested roads is one of the factors that increase the odds of severity of crashes. Khattak et al. [10] used crash data to investigate the impact of truck rollovers and occupant injuries in single-vehicle crashes. The results showed that higher risk factors in single-truck crashes include risky driving, speeding, alcohol and drug use, traffic control violations, truck exposure to dangerous road geometry, and trucks that transport hazardous materials.
Zhu and Srinivasan [17] examined the factors impacting the severity in truck crashes. Truck driver distraction, alcohol use, and emotional factors of car drivers were some of the factors that were associated with higher severity of crashes. The contributory factors to injury severity of truck crashes in urban areas were investigated by Pahukula et al. [18]. The results indicated different time periods in a day have different contributing impacts on truck crash severity.
Although in the majority of the studies, researchers mostly looked at the severity of both multiple-vehicle and single-truck crashes as a whole, few studies detected the variables unique to single-truck and multiple-vehicle, truck-involved crashes. Zou et al. [19] carried out a study to investigate the differences between single-vehicle and multiple-vehicle truck crashes. The results indicated that there are substantial differences between factors affecting single-and multiple-truck crashes. Thus, this study examined truck crash severity separately for single-vehicle and multiple-vehicle truck crashes.
Much research has identified relationship between previous violations and the risk of involvement in the future crash. For instance, a study carried out by Li and Baker [20] showed that conviction records can be used to pinpoint groups with higher risk of involvement in fatal crashes. Similarly, Elliott [21] studied the ability of previous violations to predict future offenses and crashes. The results indicated that the drivers with previous ticketed offenses are at increased risk for future crashes.
In addition to crash data, some researchers [8,22] used violation data to assess unsafe driver actions to improve traffic safety. In their study, risky violations were used as means to predict future crashes. In another study, Chen et al. [23] carried out a study by examining driver records to investigate the relationship between crashes and past records of crashes and convictions. They found pre-period crashes per driver and pre-period number of convictions are positively correlated; failure to yield and disobeying traffic signals were two violations that best predict crashes. Lantz and Loftus [24] conducted a study to predict future crashes based on the history of driver information. In addition, effective enforcement actions that can predict the Application of multinomial and ordinal logistic regression to model injury severity of truck… 269 driver behavior and future crash involvement were identified. The study found that reckless driving and improper turn violations are the violations that have the highest increase in likelihood of a future crash. Also, failure to keep proper lane was some of the convictions with the highest likelihood of a future crash. Terrill et al. [25] studied the correlation between traffic citations and the number of crashes on an interstate in Wyoming. It was found that there is a negative correlation between the number of issued citations and number of crashes. However, the majority of the studies did not use conviction or violation data to investigate groups of truck drivers with an increased risk of being involved in truck crashes. Based on the literature, violations can be used as an indication of the groups that are at greater risk of getting involved in future crashes. Therefore, this study used violation data, in addition to crash data, to mitigate high truck crash rate in Wyoming, through identification of contributory factors to truck crashes.

Data preparation
The data were combined from the three interstates in Wyoming, I-80, I-25, and I-90, with the highest truck-related crash rates. Crash data were obtained from the Wyoming Department of Transportation (WYDOT) using the critical analysis reporting environment (CARE) from 2011 to 2014. The single-truck crash data set included 1371 crashes while multiple-vehicle crashes, truck-involved, incorporated 1543 crashes. This study used various variables, which can be categorized under driver, environmental, vehicle, temporal, crash, and driver behaviors. Driver characteristics included age, gender, residency, violation (conviction) record, and speed limit compliance at the time of crash. Environmental characteristics included weather and roadway-surface conditions. Weight of a truck was categorized as a vehicle characteristic. Day of week and time of crash were organized under temporal characteristics. Roadway characteristics included posted speed limit of a location where a crash occurred. Driver actions at the time of crash, number of vehicles, and pre-collision vehicle action were categorized under crash characteristics. Driver distraction, driver under influence (DUI) suspicion, fatigue, and the use of safety technology were categorized under driver behaviors. In this study, distraction is defined as any type of distraction such as TV, cell pager, or wireless communication inside the cabin at the time of crashes. Truck crash analyses were divided into two parts: single-truck and multiple-vehicle, truck-involved crashes. Single-truck crashes were investigated separately as more than 50% of all the truck crashes were single-truck crashes. Table 1 presents the frequency and percentage of the statistically significant variables and the responses for both single-truck and multiple-vehicle, truck-involved crashes.
The violation data were obtained from the Wyoming court-reported violation database from 2011 to 2014. For single-truck crash analysis, truck drivers were at fault of crashes. Therefore, the violation data for this analysis were filtered to include just truck driver violations to identify groups that are more at risk of single-truck crashes. There were 121,680 violations filtered to 17,239 truck violations. However, all violations were used for investigating the groups that were at higher risk of multiple-vehicle crashes, involving at least one truck. This is due to the fact that both truck and non-truck drivers could be at fault in these crashes. Only violation types: following too closely, failure to drive within single lane, and speeding too fast for conditions which resulted in truck crashes were included in this study. The aforementioned violations account for 39% of all the causes of truck crashes on Wyoming interstate (see Fig. 1). Therefore, the groups violating these violations can be considered at higher risk of future violations and future crashes.
The violations types included in violation analyses matched truck crash data. For instance, only one violation type: driving too fast for conditions was identified as a contributing factor to a crash type: ''drove too fast for conditions.'' On the other hand, follow too closely and fail to drive within single lane, for violation data, matched followed too close and failed to keep proper lane in the crash data.
The summary statistics of significant violation variables and the summary statistics of the responses are included in Table 2. Violation analyses were divided into two sections: truck-related violations and all types (truck and non-truck) of violations. For single-truck crashes, only trucks were at fault in the crashes, so only truck violation data were used to identify truck drivers being at higher risk of future crashes. For multiple-vehicle crashes, with an involvement of at least one truck, truck and non-truck vehicles/drivers could be at fault in the crashes. Therefore, all types of violations were used for investigating the drivers at higher risk of involvement in multiple-vehicle crashes, with involvement of at least one truck. Only 12% of all the citations were assigned to trucks. Other types of violation, ''others'' included violations such as seat belt, DUI, and hour of service (HOS) violations. Nonresidents of Wyoming accounted for 12% of all the truck violators. The WHP allocated only 20% of all their resources during offpeak hours, and most of the violators were male (97%).
Two statistical approaches were used in this study to improve truck safety on Wyoming interstates. First, ordinal logistic regression was used to investigate the contributory factors to severe truck crashes. Second, multinomial logistic regression was used to identify the factors that contribute to risky drivers' violations that were underlying causes of truck-related crashes.
When the distance between different categories of responses has a clear ordering, the response is considered ordinal. The model used for an ordinal response is as follows: where y is a linear function that determines discrete outcome, x i is a vector of observable features, b is a vector of regression coefficients, and e i is an error term which has a logistic distribution with mean zero and a variance of p 2 /3. Different binary variables were used as predictors in the severity model. For modeling truck crash severity, three levels of response were considered: fatal/incapacitating injury, injury/possible injury, and PDO. As not enough observation was identified for fatal crashes, this category was combined with incapacitating injury for the analysis.
However, as no ordering exists between different risky violation types, for instance between driving too fast for conditions and improper lane change, ordinal logistic regression was not suitable for analyzing violations. The Multinomial logistic regression can be used to model nominal outcomes with more than two levels [26]. For violation analysis, the baseline category J was used for the comparison of other types of violations. By using J as a baseline category, J -1 comparisons are considered in relation to the reference category. The logit for the jth comparison is where p ij is the probability of observing outcome j, x 0 i is a feature vector, and b j is a vector of unknown regression coefficients.
For the multinomial analyses, violation analyses, three risky violations being similar to causes of severe truck crashes were identified. These violations were compared with other types of violations. For the violation analysis, three types of violations similar to driver actions of truck   Fig. 1). Therefore, the groups violating these violations can be considered at higher risk of future violations and future crashes.
The current study, then, was set forward to fulfill two main objectives: 1. Using ordinal logistic regression to determine the factors impacting severe single-truck crashes and multiple-vehicle, truck-involved crashes. Two analyses were carried out to fulfill the objectives: a. Severe single-truck crash analysis. b. Severe multiple-vehicle crash analysis, with involvement of at least one truck.
2. Using multinomial logistic regression to investigate the groups more likely to violate the laws that are the main causes of single-and multiple-vehicle, truckinvolved crashes. Two analyses were carried out to fulfill this objective: a. Analysis of those types of violations associated with single-truck crashes.
b. Analysis of those types of violations associated with multiple-vehicle crashes involving at least one truck.

Ordinal logistics regression, crash data
This first modeling approach used ordinal logistic regression to investigate the variables that increase the odds of severe truck crashes. Table 3 shows the variables remaining in the reduced model at the pre-specified significance levels. The results indicated that having non-normal conditions (e.g., being angry, depressed, and anxious), at the time of single-truck crashes, increased the odds of severe single-truck crashes about four times (OR = 3.612) compared with the time that truck drivers were under normal conditions. It was found that dry-road conditions increase the odds of being in severe single-truck crashes. This result is in contrast with the results obtained by Kim et al. [27] indicating that wet or snowy/icy surfaces increased the odds of injury in single-vehicle crashes. However, the difference between single-vehicle and single-truck crashes should be noted. The lower odds of injury single-truck crashes may lie in the fact that truck drivers drive more cautiously, with lower speed, on not-dry-road conditions [28]. Being a male driver decreased the odds of being involved in severe single-truck crashes. Females are not more likely to be involved in a crash; however, when they are involved in a crash, they are more likely to be injured/ killed. The results disagreed with the research carried out by Kim et al. [27] indicating that being a male truck driver increases the odds of severe single-truck crashes.
Rollover/jackknife was another important variable, which increased the odds of severe single-truck crashes. The odds of a severe truck crash were estimated to be about three times higher if that truck crash involved a rollover/jackknife than if that single-truck crash did not involve a rollover/jackknife crash. The result is in accordance with the result carried out by Krull et al. [29]. However, it should be noted that this research included only rollover in all types of single-vehicle crashes.
The results show that when a driver is ejected at the time of single-truck crashes, the likelihood of severe crashes increases about four times (OR = 3.956). This is in accordance with the previous studies identified ejection as an important factor, which increases the severity of truck crashes [30]. Driver distraction was identified as a factor that increases the odds of being involved in severe singletruck crashes (OR = 1.894). The result confirmed the research carried out by Bunn [31], which indicated that distraction/inattention increased the odds of a fatal motor vehicle collision.
The same analysis was run to identify contributory factors to severe multiple-vehicle crashes, with involvement of at least one truck. Out of 25 included variables, six variables were found to be important at the pre-specified significance level. As can be seen from Table 3, the only significant similarly identified variable for single-truck crashes and multiple-vehicle, truck-involved crashes is gender. However, the sign is different, indicating that while female truck drivers increased the odds of single-truck crashes, female truck/non-truck drivers decreased the odds of severe truck-involved crashes. This could justify the importance of analyzing different crash types separately.
For the second analysis, crashes with involvement of at least one truck are investigated. As more than one vehicle was involved in a crash, only the information related to driver at fault is included in the analysis. Alcohol consumption has the highest impact on severe crashes compared with all significant variables (OR = 3.365). The result is in accordance with many studies conducted in the past, such as Chang et al. [30], who identified alcohol as a factor that increases the likelihood of having an injury dramatically.
Day of crash was another variable that impacted the crash severity of this type of crashes. It was found that weekend driving increases the odds of severe truck-involved crashes about 30% (OR = 1.292). The results may result from higher traffic on weekends compared with week days. Gender was found to be significant. However, female drivers are more likely to be involved in severe crashes compared with male drivers. This difference may result from the fact that non-truck vehicle could be at fault of crash for this analysis.
The odds of crash severity were estimated to be 2.396 times higher when the truck driver did not follow the speed limit. Although not much research has been done on the importance of speed compliance in preventing severe truck crashes, many studies showed that increased vehicle speed significantly increased the odds of getting involved in severe crashes [32,33]. Posted speed limit is another speed variable that increased the odds of being involved in severe crashes about 70%.

Multinomial logistics regression, violation data
The literature indicated that previous violations can be used to predict future violation and crashes. Therefore, for a better understanding of the enforcement efficiency, this section aimed to include the most common types of driver violations resulting in truck crashes. The objective of this section was to identify the groups that are at higher risk of violating the laws that can result in truck-related crashes, and consequently severe truck crashes. Figure 1 shows driving too fast for condition (26%), following too close (3%), failure to keep proper lane (10%) accounted for 39% of all the causes of truck crashes. Therefore, the same violations as the aforementioned driver actions (crash data) were identified in violation data to identify the groups at higher risk of future violations and future crashes.
This section identified the risk of different truck drivers in violating particular traffic laws, with different driver and temporal characteristics. Only truck-related violations were considered for evaluation of groups at higher risk of singletruck crashes as the risk of involvement in single-truck crashes was evaluated. For multiple-vehicle crashes with an involvement of truck, non-truck drivers, in addition to truck drivers, could be at fault. As a result, all types of violations, truck and non-truck violations, were included in the second analysis to identify the risk of different drivers in violating particular traffic laws that could increase the odds of multiple-vehicle, truck-involved crashes. Also, for the both analyses, only traffic violation types that are associated with truck crashes were included in this analysis. Therefore, the truck-related violations were filtered to include only violations titled: speeding too fast for conditions, following too close, and failure to drive within single lane. As can be seen from Table 4, nonresidents of Wyoming, driving on weekdays, and driving in off-peak hours all increased the odds of all the risky violation. These variables, based on the literature, could consequently increase the risk of getting involved in single-truck crashes.
The same analysis was conducted on all types of violations. As can be seen from Table 4, being a truck driver, with the highest impact, increased the odds of violating all of these laws. This is consistent with the crash data, indicating in 78% of all truck-related crashes, truck is at fault.

Discussion
A study of crashes with the purpose of mitigating the crash frequency is critical for the well-being of a society and the safety concern posed by truck crashes. This paper contributes to the body of knowledge by incorporating crash and violation data to identify the contributory factors to severe truck crashes.
Crash data from WYDOT were used to study the impacts of various variables on severe single-truck crashes and multiple-vehicle, truck-involved crashes. While much research has been done on the contributory factors to both multiple-vehicle and single-truck crashes as a whole, this work investigated multiple-vehicle and single-truck crashes separately. One of the reasons that single-truck crashes were analyzed separately was the importance of this type of crashes on Wyoming interstates. On Wyoming interstates, more than 50% of all the truck-related crashes were related to single-truck crashes. This study also identified different significant contributory factors to single-and multiple-vehicle, truck-involved crashes. Moreover, this study included violation data, in addition to truck-related crash data.
The literature has indicated that previous violations can be used to predict future offenses and crashes. Therefore, the current study used violation data, in addition to crash data, to examine the groups that are more likely to violate the laws, which account for the majority of driver actions in single-truck and multiple-vehicle crashes. The violations leading to truck crashes were identified from more than 800 violations and the risk examined. The results indicated that residency of a driver, time and day of a violation, and types of vehicles are some of the factors that contribute to the involvement of risky violations.
Soole et al. [34] reported that enforcement was effective in changing driver behavior resulting in reduced crash rates. Studies also showed that 90% improvement in law compliance can be achieved by enforcement approaches [35]. Therefore, WHP can play an important role in reducing the high truck crash rate in Wyoming by allocating resources in correct areas. Based on the violation data analyses, the recommendations to WHP could be related to the residency of the drivers, types of vehicle, and time of days. On the other hand, based on the crash data analyses, different recommendations, based on setting up more warning signs and making related policies, could be set forward for WYDOT to mitigate high truck crash rates in the states.
It is recommended for future study to incorporate more characteristics in violation data for identification of the groups at higher risk of future violations and future crashes. The future study also could match more violation types as crash types to identify the groups at higher risk of each of the crash types.