1 Introduction

The latest traffic safety data provided by the National Highway Traffic Safety Administration (NHTSA, 2020) summarized that pedestrian deaths reduced from 8342 to 5320 in Europe, a declines of 36 percent from 2009 to 2018, while in American the number increased 51 percent from 4109 to 6227. Although the pedestrian deaths have been decreased apparently in Europe, the average death rate is still kept over 443 annually, and more attention should be paid to deal with the pedestrians, and even the vulnerable road users. Among all the pedestrian deaths, pedestrian-vehicle crashes account for most of the proportion in Europe, same as in Croatia. According to the statistics by Ministry of the Interior, Croatia from 2015 to 2018 there were totally 7155 pedestrian-vehicle collisions, and about 2385 annually. Although the total number of death is not large (293 in total), the death rate is 58.6 per year, which is still relatively high.

In Croatia, not all the intersections are designed and labeled in a standard way, thus there still exists deficiency at some intersections, e.g. no pedestrian zebra crossing, no countdown light, inappropriate signal phase, bad lighting, etc., all of which may lead to the confliction between pedestrians and vehicles, and cause pedestrian injury. Consequently, in order to reduce the pedestrian-vehicle crashes and improve the pedestrian safety level, a variety of measures are being taken to find out the influencing factors and deal with the injury severity of pedestrian-vehicle crashes.

Therefore, the objective of this study includes the two main aspects: empirically to figure out the influencing factors for the injury severity of pedestrian-vehicle crashes at intersections; theoretically (1) to capture the impact of exogenous variables to vary among the intersections and (2) to accommodate the heterogeneity issue due to unobserved effects. The results may provide potential insight for reducing the injury severity of pedestrian-vehicle crashes.

2 Literature review

As for the pedestrian-related crashes, currently three types of methods, such as on-site investigation, mathematical modeling, simulation, etc., have been utilized to assess the pedestrian injury severity, in which econometric modeling has displayed potential promising [30].

The primary method to address pedestrian injury severity began with the probit or logistic regression model. Zajac and Ivan [31] employed ordered probit model to investigate the factors influencing injury severity of pedestrian crashes in Connecticut. The results showed that clear roadway width, vehicle type, driver alcohol involvement, 65 years or older pedestrians and pedestrian alcohol involvement were significant factors. Similarly, Sze and Wong [26] adopted binary logistic regression model to evaluate the injury risk of pedestrian casualties. It was found that demographic, road environment, and other factors were controlled for pedestrian injury risk while temporal variation was considered initially. Focused on age and pedestrian injury severity, Kim et al. [11] conducted heteroskedastic logit analysis and the results displayed pedestrian age increased the probability of fatal pedestrian injury. Pointed at injury severity in taxi-pedestrian crashes, Chung [4] utilized ordered probit model and the results revealed that pedestrian legs were the first impact region with less severe in crashes. From the perspective of time series, Batouli et al. [1] analyzed the pedestrian-vehicle crash injury severity factors with logistic regression model. It was found that pedestrian age and pedestrian impairment were significantly concerned with the severity.

Panel data (random-effects, random-parameter or mixed) logit or probit models have been extended to accommodate the unobserved heterogeneity of pedestrian injury severity with cross-sectional and time-series data in recent years. Eluru et al. [8] developed mixed generalized ordered response logit model to examine pedestrian injury severity level. The results suggested that age, speed limit, location and time-of-day were significant influencing factors. Kim et al. [12] employed a mixed logit model to address unobserved heterogeneity in pedestrian injury severity. It was shown that heterogeneity in the mean of the random parameters for pedestrian-solely-at-fault collision indicator was associated with pedestrian gender while that for the traffic-sign and motorist-back-up indicators was associated with pedestrian age. Similar study performed by Li and Fan [16], Li et al. [17] verified mixed logit model. Extendedly, Xin et al. [28] quantified the effects of neighborhood characteristics and built environment on pedestrian injury severity with random parameters generalized ordered probit model. For elderly pedestrian indicator the parameter was random with significant heterogeneity in both mean and variance and the elderly pedestrians involved in intersection-related crashes were suffered severe injury with high probability. As for the pedestrian red-light violations, Wang et al. [27] investigated the influencing factors of pedestrian-motor crashes at signalized crossings. Random parameter probit models were adopted to account for individual-specific heterogeneity, and pedestrian age of 12 to 25 years was likely to violate the red-light, as well as deep night, rainy weather, elderly pedestrians and bus crashes. From the perspective of hierarchical models, Kim et al. [13] established a hierarchical ordered model with crash features as lower-level variables and municipality features as upper-level. Elderly pedestrians were found to be significant factors of pedestrian injury and the proposed model explained a 7% unexplained variation in injury severity outcomes.

Combination with other approaches (e.g. clustering analysis, spatial and temporal analysis) has improved the pedestrian injury severity spatially and temporally. Mohamed et al. [19] combined data mining and regression methods to determine the influencing factors of pedestrian injury severity. A latent class with ordered probit model revealed the results greatly in New York whereas K-means with a multinomial logit model provided the results appropriately in Montreal. The findings showed that pedestrian age and other factors were significant factors of fatal crashes. Sasidharan et al. [23] employed the latent class cluster analysis to investigate pedestrian crash injury severities. The results suggested that this method was effective in reducing heterogeneity and revealing hidden relationships between crash severity levels and relevant factors. Identical method adopted by Zhao et al. [32] verified injury severity in pedestrian-train crashes at highway-rail grade crossings. From the perspective of space, Bhat et al. [2] proposed a spatial random coefficient multivariate count model to analyze pedestrian injury counts by severity level. Spatial heterogeneity, spatial dependency and spatial drift effects were accommodated and several groups of influencing factors were considered to be significant to reduce the number of pedestrian-vehicle crashes by severity level. By integrating spatial and temporal domain, Liu et al. [18] adopted geographically and temporally weighted ordinal logistic regression model to explore the correlates of pedestrian injury severity. It was found that pedestrian age, pedestrian position and other factors were highly related to pedestrian injury severity. Recent study by Song et al. [25] combined spatiotemporal analysis with hierarchical Bayesian random-effects logit model to examine the factors of pedestrian-injury severities.

There have been a variety of other methods to analyze pedestrian crash injury severity. Li et al. [14] combined classification and regression tree (CART) with random forest to analyze pedestrian injury severity under different weather conditions. The results showed that elderly pedestrians and higher speed limit led to higher injury severities. Sasidharan and Menendez [22] applied partial proportional odds model to identify the influencing factors of pedestrian injury severities. Pedestrian age was found to significantly affect the injury severity levels. Xu et al. [30] investigated pedestrian injury severity and addressed the heterogeneity issue at signalized intersections. Bayesian quantile regression models were presented and the results revealed that age, pedestrian special circumstance and pedestrian contributory were likely to influence the injury severity. Extendedly, Mujalli et al. [20] applied Bayesian networks to determine impact factors of pedestrian-vehicle crash severity. The significant factors increasing the risk of severe injury or fatality were speed limit, lighting and weather condition.

To sum up, the evaluation on injury severity of pedestrian-vehicle crashes varies from pooled logistic or probit regression models, to panel data models, and to cluster analysis and spatial–temporal analysis, which addresses the injury severity from various perspectives. However, the current models or approaches mainly concentrate on one specific aspect. Therefore, the purpose of this study is to (1) capture the impact of exogenous variables to vary among the intersections and (2) accommodate the heterogeneity issue due to unobserved effects.

3 Methodology

Assume i (i = 1, 2,…, N) to be an index to denote intersection, j (j = 1,2…, J) be an index for different intersection i and k (k = 1,2,3…, K) to be an index to represent pedestrian injury severity. The latent propensity equation for injury severity at ith intersection and jth interval can be expressed as:

$${\mathrm{y}}_{\mathrm{ij}}^{*}=\left({\mathrm{\alpha }}^{\prime}+{\updelta }_{\mathrm{i}}^{\prime}\right){\mathrm{z}}_{\mathrm{ij}}+{\upvarepsilon }_{{\mathrm{i}}^{\prime}}$$
(1)

where \({y}_{ij}^{*}\) here denotes the pedestrian injury severity probability \({y}_{ijk}\) by the thresholds (\({\upomega }_{0}=-\infty {\mathrm{ and }}{\upomega }_{\mathrm{k}}=\infty\)), \({z}_{ij}\) represents an (\(L \times 1\)) column vector of influencing factors on injury severity. \({\mathrm{\alpha }}\) denotes a (\(L \times 1\)) column vector of mean effects, and \({\delta }_{i}\) is another (\(L \times 1\)) column vector of unobserved factors influencing \({z}_{i}\) on the pedestrian injury severity probability for intersection i, and \({\varepsilon }_{i}\) is random error term (i.i.d. across intersections i). Specifically, pedestrian injury severity \({y}_{ij}\) is categorized no injuries, slight injury, severe injury and fatality, which matches the attributes of ordered probit model. Due to the fixed- and random-effects considered with panel data framework, panel mixed ordered probit model can reflect the features of exogenous variables and heterogeneity issue.

Different from the conventional ordered probit model with maximum likelihood estimation, quasi-likelihood based method is employed to estimate the panel mixed ordered probit model. In Eq. (1), the \({\mathrm{\alpha }}\), \(\upomega\) and \({\delta }_{i}\) are required to be estimated, so it is assumed that

$$E\left({y}_{ik}\left|{z}_{ik})={H}_{ik}(\right. {\mathrm{\alpha }},\upomega , {\delta }_{i}\right), 0\le {H}_{ik}\le 1, \sum_{k=1}^{K}{H}_{ik}=1$$
(2)

where \({H}_{ik}\) represents the ordered probit probability for injury severity category k, which is defined as: \({P}_{ijk}=\{G[{\upomega }_{\mathrm{k}}-\left\{\left( \left({\mathrm{\alpha }}^{\prime}+{\updelta }_{\mathrm{i}}^{\prime}\right){\mathrm{z}}_{\mathrm{i}}\right\}\right]-{\mathrm{G}}[{\upomega }_{{\mathrm{k}}-1}-\left\{\left( \left({\mathrm{\alpha }}^{\prime}+{\updelta }_{\mathrm{i}}^{\prime}\right){\mathrm{z}}_{\mathrm{i}}\right\}\right]\}\) where G(·) denotes the cumulative distribution of the standard normal distribution.

Thus the quasi-likelihood function at intersection i for a given \({\delta }_{i}\) can be described as:

$$L_{i} \left( {{\mathrm{\alpha }},{\mathrm{\omega }}\left| {\delta _{i} } \right.} \right) = \mathop \prod \limits_{{j = 1}}^{J} \mathop \prod \limits_{{k = 1}}^{K} \left\{ {{\text{G}}[{\mathrm{\omega }}_{{\text{k}}} - \{ (({\mathrm{\alpha }}^{\prime } + {\mathrm{\delta }}_{{\text{i}}}^{\prime } ){\text{z}}_{{\text{i}}} \} ] - {\text{G}}[{\mathrm{\omega }}_{{{\text{k}} - 1}} - \{ (({\mathrm{\alpha }}^{\prime } + {\mathrm{\delta }}_{{\text{i}}}^{\prime } ){\text{z}}_{{\text{i}}} \} ]} \right\}^{{d_{{ik}} }}$$
(3)

where \({d}_{ik}\) represents the proportion of injury severity category k.

The unconditional likelihood function for intersection i can be calculated as follows:

$${L}_{i}({\mathrm{\alpha }},\upomega ,{\delta }_{i})={\int }_{{\delta }_{i}}{L}_{i}\left({\mathrm{\alpha }},\upomega \left|{\delta }_{i}\right.\right)dF({\delta }_{i})$$
(4)

where \(F\) denotes the multi-dimensional cumulative normal distribution. Therefore, the quasi log-likelihood function can be converted as:

$$L\left(\varphi \right)=\sum_{i}{L}_{i}({\mathrm{\alpha }},\upomega ,{\delta }_{i})$$
(5)

More details about the estimation can be referred to Eluru et al. [9] and Chen et al. [3].

4 Data description

The dataset was collected from Traffic Accident Database System maintained by the Ministry of the Interior, Republic of Croatia from 2015 to 2018. The population from Traffic Accident Database System covered the four main aspects: injury severity features, pedestrian status, vehicle characteristics, roadway features and environmental profiles.

Totally there are about 7155 injury severity selected, in which about 2260 occurred at intersections (shown in Fig. 1). Among the intersections selected, 1995 samples were determined for the analysis after eliminating some observations without data or incomplete. As stated above, in Croatia the pedestrian-vehicle injury severity levels are typically categorized as no injuries, slight injury, severe injury and fatality, which can be ranked as ordered level, thus dependent variables were considered as ordered probit model; By integrating the dataset from different counties with different years, panel data framework can be constructed. However, since the data are not the same at each county of every year, unbalanced panel data model should be developed; In order to address the fixed- and random-effects simultaneously, mixed effects were considered, therefore, the final unbalanced panel mixed ordered probit model was proposed.

Fig. 1
figure 1

Injury severity selected at intersections in Croatia

Since pedestrians play significant role in pedestrian-vehicle injury severity, pedestrian status, such as gender, age, alcohol or not, number of participants in total, etc., and vehicle driver circumstances were collected to examine whether either part should take more responsibility, while the vehicle-related variables include the total vehicles involved and vehicle types.

The roadway characteristics contain the intersection type (e.g. Y-intersection, T-intersection, four-intersection, roundabout, etc.), roadway conditions (e.g. dry, wet, ice, snow, etc.), speed limit and vertical/horizontal signalization, while the injury includes the day, location, severity level and the environment conditions collect the visibility conditions, public lighting, weather conditions, and state of environment.

In order to evaluate the proposed models, all the parameters are digitalized and listed in Table 1, including the dependent variable, categorical variables, continuous variables and indicator variables, respectively.

Table 1 Summary of parameter description

5 Results

Based on the variables selected from the four aspects, the correlation among independent variables needs to be examined before running the model. The Pearson correlation test was conducted to avoid the co-linearity. Shown from the test result, public lighting is highly related to visibility, number of participants in total is highly related to number of participants no injuries, and the number of participants slightly injured to seriously injured, thus, in the final results the six variables may not occur at the same time.

In order to make the comparison, random-effects, random parameter and the proposed ordered probit model are conducted to evaluate the injury severity. Table 2 provides the results of three models.

Table 2 Results for random-effects, random parameter and unbalanced panel mixed ordered probit model

Shown from Table 2, several observations can be sorted out. First, the significant variables of random parameter and panel mixed ordered probit models are identical, but both models contain one more significant variable (e.g. pedestrian gender) than random-effects ordered probit model. Second, the AIC and BIC values from proposed model (2358.907 and 2402.489) are smaller than those from random-effects (2362.854 and 2418.838) and random parameter models (2365.307 and 2421.291) respectively, and the difference is beyond 5, which indicates the models are statistically different, while the values from random-effects model are very close to random parameter model. Third, although the quasi/log likelihood values at zero are the same, the proposed model converges at − 1168.453 whereas the random-effects model and random parameter model converge at − 1171.427 and − 1172.653 correspondingly. Therefore, the goodness-of-fit of the proposed model performs better, thus the following explanation would concentrate on the unbalanced panel mixed ordered probit model.

The first significant variable day is negatively related to pedestrian-vehicle injury severity, indicating that from weekend to weekday the injury severity level is decreasing from severe to no injuries. This implies that although there exist more injuries on weekdays, the severity level on weekends is more severe since more pedestrians go out on weekends. The finding is consistent with Dai [5], and Senserrick et al. [24].

In Table 2, pedestrian age, especially more than 65 years old, and gender are two significant variables influencing pedestrian-vehicle injury severity levels. Compared to pedestrians less than 65 years, the older pedestrians over 65 years old reveal a positive relation with injury severity, meaning that the older the pedestrians are, the more severe the injury is. Similar results were found by Eluru et al. [8] and Li and Fan [15]. The reason is such that it takes longer walking time for the elderly since their mobility is becoming slower than the younger, thus they are easily injury-prone and the severity is more severe; Pedestrian gender is negatively associated with injury severity of pedestrian-vehicle crashes, which indicates that from female to male pedestrians, the injury severity levels are reduced. This implies that female pedestrians experience more severe injury, and the probability increases 14.9% if one more male pedestrian is converted into female.

Speed limit, especially over 50 km/h has a positive relation with injury severity. Compared to speed limit no more than 50 km/h, the probability of injury severity is increased from no injuries to severe ones, and previous studies [10, 21, 29] have confirmed this point.

Vehicle type plays a positive significant role in the injury severity of pedestrian-vehicle crashes, indicating that the probability of injury severity is increased with the vehicle type from non-motorized bicycles to motorcycles, which makes sense. Usually the injury severity by motorcycles is more severe than general cars because of the shocking force and irregular driving mobility, and some previous studies [6, 15] have verified this.

The last two significant variables, number of participants (total and slightly injured), are positively and negatively associated with injury severity of pedestrian-vehicle crashes, respectively. The positive relation indicates that the more participants involved the more injury severity the pedestrian-vehicle crashes, which is understandable in real life. On the contrary, the more participants with slightly injured, the less injury severity, implying that when more slightly injured participants are included, the injury is not much severe.

As stated above, there have been various methods and approaches about the injury severity analysis of pedestrian-vehicle crashes. However, most of the studies concentrate on the hotspots or influencing factors generally and individually, and there may exist heterogeneity issue due to unobserved factors. In this study, in order to deal with this, the panel mixed ordered probit model is proposed, which can capture the impact of exogenous variables to vary among the intersections and accommodate the heterogeneity issue due to unobserved effects.

Shown from Table 2, the closer examination of the comparison results provides some similarities and differences among the three models. First, the similarity lies in that for all the significant variables, random parameter model and the proposed model give the same factors. This indicates that both models are very identical. Secondly, the difference is that significant variables in random-effects model are fewer than the other two models. This implies that random-effects ordered probit model accommodates the heterogeneity issue partially. Consequently, the impact of exogenous variables and the heterogeneity issue for injury severity of pedestrian-vehicle crashes can be determined with panel mixed ordered probit model, which provides a potential insight of injury severity over current studies.

6 Conclusions

In this study, in order to capture the impact of exogenous variables to vary among the intersections and accommodate the heterogeneity issue due to unobserved effects, panel mixed ordered probit model was proposed to identify the significant influencing factors of injury severity for pedestrian-vehicle crashes at intersections. The results revealed that day, pedestrian gender and age, especially over 65 years old, speed limit over 50 km/h, vehicle type and number of participants were significant factors of injury severity.

Two main findings can be obtained from the results of the work. First, panel mixed ordered probit model emphasizes not only the mixed effects of errors as mixed ordered probit model, but expands the exogenous variables with mixed effects, which can capture the impact of exogenous variables to vary among the intersections and accommodate the heterogeneity issue. This is to our knowledge the first attempt to present this model in injury severity analysis of pedestrian-vehicle crashes and expands the range of injury severity analysis. Second, compared to random parameter and random effects models, the proposed model shows better performance from the goodness-of-fit, which can be considered as an alternative to deal with the heterogeneity issue completely.

Empirically, since more pedestrians go out on weekends and injury severity is more severe, corresponding measures should be taken to alert pedestrians at intersections or to prevent the vehicles from confliction with pedestrians, such as traffic lights countdown timer with sound, virtual pedestrian walk screen wall, etc.; At certain intersections with more elders over 65 years old and females, the walking time should be long enough to allow the older pedestrians walk through the intersections, and more attention should be paid to female pedestrians; Speed limit should strictly follow the roadway classification, and at intersections with more pedestrians the speed limit should be set up no more than 50 km/h; Motorcycles at certain areas with more pedestrians should not be permitted, e.g. leisure space, CBD area, because pedestrians are more vulnerable at those locations; At last, more education and lessons should be taken by drivers and pedestrians so that the number of total participants can be reduced.

Some drawbacks may need to be strengthened in the further study. More variables related to drivers and pedestrians status (e.g. pedestrian crossing behavior), vehicle conditions, roadway characteristics (e.g. signal phase), environment (e.g. residential area, school zone), and land use types around intersections Elias and Shiftan [7] so that injury severity can be reflected completely. Another issue is that the results of the work are founded on the dataset from Croatia, and it is worthy of employing different data sources to ascertain the findings and transferability in the future. Future study may consider the injury severity spatially and temporally, so that spatial and temporal issues can be addressed clearly.