1 Introduction

Nowadays, transportation plays a significant role in economic, political, and social developments. Having higher rate of economic growth has a direct relation with transportation enhancement.

In Tehran (the capital of Iran), public transportation is largely used by the citizens. According to the statistics of Traffic Department of Tehran municipality, the share of public transportation in urban trips was 67% in 2015, while the share of the taxi, bus and minibus, metro, and other public transportation were 22%, 20%, 8%, and 7%, respectively. The number of passengers transferred by bus and taxi was 2.798 billion in 2015 among which 1.499 billion passengers were transferred by the taxi fleet [1]. Along with the development of public transportation particularly in metropolises, the use of vehicle is increasing on a daily basis, and subsequently, the rate of accidents leading to death increases as well.

According to the results of the studies, a significant number of urban accidents in many different countries are caused by driving violations. Besides that, in developing countries, car accidents are one of the major causes of fatality. Therefore, nowadays, this issue has turned to one of the most important concerns of traffic engineers and transport planners [2]. In this regard, Iran is not an exception as it is considered a developing country. From 2006 to 2016, in a 10 year period, 196,366 persons were killed in accidents in Iran which is a high figure.

Considering the issues mentioned above and as everyday, a large number of trips are completed by the taxi fleet, and committing driving violations and subsequent increase of probability of traffic conflict may lead to irreplaceable casualties. As most of these accidents are due to unsafe behavior of the drivers, the aim of this study is to investigate the unsafe behavior of taxi drivers and effective factors in the driving violations of this group of drivers in Tehran. Taxi drivers are viewed, as professional drivers who spent most of their time driving and obviously play an important role in transport of passengers.

2 Literature Review

Regarding the analysis of driving offenses and identifying various factors affecting it, different studies have been conducted around the world. In Iran, few studies have considered the psychological aspect of driving violations, though enough attention has not been paid to the studies assessing driving offenses of taxi drivers as a group of professional drivers. Moreover, in studies on taxi drivers’ offenses, the effect of the street and environmental conditions has not been under acceptable consideration.

Ivers et al. (2010), using multinomial regression, studied driving offenses and later accidents risk of inexperienced drivers in Australia. They reached the conclusion that inexperienced drivers committing driving offenses are at higher risk of having accidents compared to more experienced drivers [3]. Zhang et al. (2014), using logistic regression model, investigated traffic offenses in Huangdao of China. In this research, variables related to the driver (Demographic characteristics), car, street, and environmental factors were used. They found out that street lighting condition was one of the effective factors in driving offenses [4]. In another study, Oppenheim et al. (2016) using DBQ questionnaire in Israel analyzed driving offenses and the result showed that variables like age, gender, and driving experience have a direct impact on probability of committing driving offenses [5]. In the study conducted by Atombo et al. (2016), speeding and overtaking by considering variables like age, gender, marital status, education level, driving experience, types of driving license, size of the car, and cause of street accidents were investigated. The result showed that age, gender, and level of education are important factors affecting drivers’ behavior. In addition, statistical analysis of collected data showed that taxi drivers show more tendency towards speeding compared to overtaking [6]. Some other studies on driving behavior obtained similar results [7,8,9,10].

Esmaili and Mikaeili (2010), using logistic regression method and collecting data from Urmia traffic police and filling a questionnaire, investigated the relation between the welfare level and driving violations. At the end, they concluded that there is a meaningful relation between the welfare level and not paying attention to the police regulations [11]. The research by Liew et al. looked at the difference between driving experience and gender in driving offenses of Malaysian drivers. The data of this study were obtained through a questionnaire and analysis is done based on T test and Chi-squared test. The result of this research was that male, experienced drivers compared to other drivers are more likely to exceed the speed limit, fail to observe the appropriate longitudinal distance to the front passenger car, and enter prohibited special lane (BRT Lane) [12].

In a study on taxi drivers, Zhang (2010) investigated taxi traffic accidents to identify type of driving offenses of taxi drivers in China. He used simulations to analyze the research data and found that accidents and driving offenses occurred due to behavioral characteristics of the driver and passenger as well as vehicle performance [13]. Wu et al. (2007), using the logistic regression method, analyzed the differences between driving of taxi drivers and inexperienced drivers behind the red light in China using simulation tools. The results showed that non-professional drivers are more likely to avoid crossing the red lights than taxi drivers. In addition, the taxi drivers have a lower accident rate than inexperienced drivers [14].

In recent years, some studies have examined driver’s violations (not specifically for taxi drivers).

In 2017, Sagberg and Ingebrigtsen examined the effects of previous driving violations on committing new violations for Norwegian drivers during a 3-year period. They concluded that for drivers with more than four driving violations, the number of violations reduces significantly, while for drivers with less than four driving violations, the driving violations is reduced in a lower trend, the reason being drivers’ fear of the suspension of their driving license in a fifth violation [15].

In another study, Mohammadzadeh et al. (2018) examined the relationship between the number of driving violations with demographic and personality characteristics of drivers. After analyzing the information of 1762 drivers in Iran, using a questionnaire, it was concluded that various variables such as the level of education, driver’s marital status, income, and driver’s experience affect the number of driver violations [16].

Regarding what was stated in the transcription of the literature, it is highly valuable to analyze the factors affecting the violation of urban taxi drivers, as they constitute the largest community of motorists in metropolises worldwide, and currently, there is no specific research on their driving behavior and violations. In addition, variables associated with the driver, the vehicle, the road and environment in relative to the taxi drivers have not been fully understood. Variables such as engine power of the car, car life, holiday status, season, and taxi activity type have not been evaluated in the previous studies. Thus, this research aims to investigate the parameters mentioned for the violation of taxi drivers.

3 Methodology

The main purpose of this study is to investigate the unsafe behavior of taxi drivers and the factors affecting driving offenses occurring in Tehran. To achieve this purpose, first, all independent variables are categorized and the Pearson’s Chi-squared test with p value of 0.05 is used to identify significant variables. The Chi-squared statistic is a non-parametric tool designed to analyze group differences when the dependent variable is measured at a nominal level. Like all non-parametric statistics, the Chi-squared is robust with respect to the distribution of the data. Specifically, it does not require equality of variances among the study groups or homoscedasticity in the data. It permits evaluation of both dichotomous independent variables and of multiple group studies. Unlike many other non-parametric and some parametric statistics, the calculations needed to compute the Chi-squared provide considerable information about how each of the groups performed in the study. This richness of detail allows the researcher to understand the results and thus to derive more detailed information from this statistic than from many others.

In the next step, a multivariate logistic regression model is used to determine the relation between each independent variable and driving offenses in the city of Tehran. The logistic regression model is usually used to categorize discrete variables. This model can be used to categorize binary response variables such as variables with two values or to categorize response variables with r classifications (r can be greater than two). Since in this study, the response variable is nominal (not ordinal), the generalized logistic model is the most appropriate for analysis. These models consist of r-1 logit model for the response variable to compare each categorical level with a reference category.

In this study, “speeding” is considered as the reference variable for comparison with other types of driving violations. The logit model calculates the risk of “speeding” in comparison with other types of driving violations.

The type of driving violations which is denoted by Y is the response variable and environmental and human variables are predictor variables and are denoted by \({X_{i1}},{X_{i2}}{X_{i3}}, \ldots ,{X_{ip}}\), where i is the number of observation and p is the number of independent variables. It is assumed that \({Y_i}={({Y_{i1}},{Y_{i2}}, \cdots {\text{~}}{Y_{ir}})^T}\) has a multinomial distribution with index, \({n_i}=~\mathop \sum \nolimits_{{j=1}}^{r} {Y_{ij}}\) and parameter \({\left( {\mathop \prod \limits_{{i1}} ,\mathop \prod \limits_{{i2}} ,~ \ldots ,~\mathop \prod \limits_{{ir}} } \right)^T}\).

When classifications of 1, 2,… r response variable are disordered, \({\prod _i}{\text{~}}\)is related to independent variable through a set of r-1 reference classification of logit function. If \({j^{\text{*}}}\) is considered as reference classification, the model is defined as follows:

$$\log \left( {\frac{{~\mathop \prod \nolimits_{{ij}} ~}}{{\mathop \prod \nolimits_{{i{j^*}}} ~}}} \right)=~X_{i}^{T}{\beta _j},~~~~~~j \ne ~{j^*},$$
(1)

where \(X_{i}^{T}\) = transpose of independent variable vector and \({\beta _j}\) = coefficient vector for jth level of the response variable.

Since four classifications for response variable in this study have no specific order, three generalized logit models are defined from this analysis, as j takes the values from 1 to 6. Because \({X_{\text{i}}}\) has length p, this model has \((r - 1) \times p\) parameters that can be arranged as a matrix.

In this model:

Each classification can be selected as the reference classification. In this case, the value and interpretation of coefficients would be different.

The kth element of \({\beta _j}\) can be considered as a factor to increase the chance of response to be located in classification J versus classification \({j^{\text{*}}}\), when there is one unit increase in kth independent variable, and other independent variables remain constant.

For the non-reference classification \(j{\text{~}} \ne {\text{~}}{j^{\text{*}}}\), \({\prod _i}{\text{~}}\) is defined using \(~\beta\) as follows:

$$\mathop \prod \limits_{{i{j^*}}} =~\frac{1}{{1+~\mathop \sum \nolimits_{{k \ne {j^*}}} \exp (X_{i}^{T}{\beta _k})}}.$$
(2)

For the reference classification, \({\prod _i}{\text{~}}\) is defined using \(\beta\) as follows [17]:

$$\mathop \prod \limits_{{i{j^*}}} =~\frac{1}{{1+~\mathop \sum \nolimits_{{k \ne {j^*}}} \exp (X_{i}^{T}{\beta _k})}}.$$
(3)

In this study, “speeding” is considered as the reference classification. The multivariate logistic regression model is carried out using SPSS-24.

4 Data Collection

At this stage, according to the information in the traffic police database, driving violations of taxi drivers of 22 districts of Tehran during a 2-year period, from 2015 to 2016 included violations (dependent variable) additional to drivers characteristics and vehicles’ specifications, street, environmental condition. Finally, after correction of incomplete data, 90,553 data were used for statistical analysis.

According to the information in the Traffic Police Database, the information on driver’s violations is reported for 18 offense codes. In this regard, seven types of violations were selected and used for the model based on the following reasons: (1) the frequency of other violations of taxi drivers was negligible (less than 0.1%) and made statistical analysis difficult and (2) using all types of violations in modeling, the meaning of the final model may become insignificant and does the paper may not provide acceptable results. In addition, for the other types of taxi violations, the statistical model created (in the range of 95% confidence level) was not statistically meaningful. Then, the parameter of type of driving violations of taxi drivers was considered as a dependent variable in seven categories with 11 independent variables for investigation, as shown in Table 1. In addition, in this study, all variables used in the previous researches were also incorporated and entered into the model. Non-parametric Kendall test (discrete variables) has been used to examine the dependency of independent variables. The results showed that all independent variables had a correlation coefficient of less than 0.5, and therefore, the independent variables were not highly correlated.

Table 1 Classification and the frequency of variables

In Table 1, the classifications of all variables are displayed along with the frequency of each category classification.

All independent variables were classified and SPSS-24 software was used for statistical analyzes in this study. Independent variables included driver’s age, driving experience, type of driving license, car size, and engine power of the car, car life, car type, and lighting status (day and night), holiday status, season, time, and type of activity of taxi drivers committing a violation.

5 Result Analysis

In this study, the effectiveness of each of the independent variables on driving violations of urban taxi drivers of Tehran has been evaluated and the result of Chi-squared test is shown in Table2. As can be seen in Table 2 from the result of the Chi-squared test, all independent variables being investigated have a meaningful significance level of 95% (Sig < 0.05).

Table 2 Result of Chi-squared test of independent variables

Multinomial logistic regression model was used to analyze the data and identify the effective factors on committing driving offenses by the urban taxi drivers of Tehran. The Wald forward method was used to develop a model in SPSS. All effective and meaningful variables were identified in the first stage by Chi-squared test. The results of modeling of committing driving offenses for the variable of type of offence are given in seven categories in Table 3. It should be noted that speeding has been selected as a reference category. In addition, in the classifications defined for independent variables, the last classification is evaluated as a reference.

Table 3 Results of analysis of multinomial logistic regression model

According to the results, the probability of crossing the red light during working days is higher compared to non-working days. The main reason can be high traffic volume in the streets which increases the length of backup behind the red light, added travel times and delay for the drivers, so they try to pass the intersections with higher speed and this behavior increases the rate of crossing the red light. The probability of speeding is higher than that of crossing the red light between 12 a.m. and 6 a.m. due to low traffic volume and lack of police presence in the streets. Another reason for speeding during the early hours of day can be presence of more drivers with lower level of alertness (due to driving under the influence of alcohol).

According to Table 3, the taxi drivers over 70 years are more likely to cross the red light than the taxi drivers who are under the age of 30. Old age and having less patience can be mentioned as two main reasons for this case. According to Oppenheim et al. (2016), drivers’ age has a direct effect on the probability of committing a driving offence [5]. Moreover, as elder drivers need longer time for showing reaction to the red light and stopping the car, they are more likely to stand on the stop line and commit this driving offence. For the drivers having class 2 driver’s license, the probability of crossing the red light was more than the drivers with the class 1 driver’s license, because they drive riskier and have less experience.

On the other hand, the probability of prohibited entry on working days is 2.5 times more than non-working days. In addition, taxi drivers are more likely to commit prohibited entry during days compared to nights. It can be due to high traffic volume on working days during day, and because of this, the drivers are intrigued to escape from the heavy traffic of main streets and go to local streets.

The result of this study showed that younger taxi drivers, drive riskier compared to the older ones, so they commit more driving offenses. For instance, drivers less than 30 years are three times more probable to commit prohibited entry than drivers over 70. In addition, drivers in their 30 s are more likely to enter special lane compared to the drivers who are over 70 years. The result of this study is in accordance with the result of the study done by Oppenheim et al. in 2016 [5].

Moreover, the probability of committing prohibited U-turn by the taxi drivers with less than 10 years of driving experience is three times more than the taxi drivers driving for more than 30 years. In addition, Ivers et al. (2010) found the same result [3]. It should be mentioned that the result of this study is against the results obtained by Wu et al. in 2016. They found that inexperienced drivers cross the red light less than the experienced drivers [14].

In addition, the jitney service drivers commit prohibited entry more than the drivers of standard taxis, because they make more legal commitments and also get higher fair which in turn causes them to be less in hurry and more cautious. Another reason is that the jitney service drivers are more familiar with the streets, they are working in, and therefore, they take more risk in crossing one-way streets in opposite direction and committing prohibited entrance.

According to the results given in Table 3, drivers with newer and more luxurious cars commit less driving offenses compared to the drivers with older and cheaper cars. Committing the offence of not paying attention to the police is four times more probable by the van drivers compared to the taxi drivers having Toyota. Zhang in his studies in 2010 found that the type of car affects the drivers’ behavior while driving [13]. Finally, it was found that the drivers of taxis aging from 0 to 3 years are less probable to commit the offence of prohibited entry to the special lane compared to the taxis aging more than nine.

In Fig. 1, an explanatory analysis of all meaningful, independent variables is displayed.

Fig. 1
figure 1

Committing driving offenses by taxi drivers for different variables

6 Conclusion

The aim of this study was to investigate the unsafe behavior of taxi drivers in Tehran, Iran and the effective factors on their driving violations. For this goal, the urban taxi driver violations in 22 district of Tehran in the 2-year period was collected from the Traffic Department including information regarding different driving offenses (dependent variables), drivers characteristics cars specifications, street and the environmental condition in which the violation was committed. At the end and after correction of the incomplete data, 90,553 records of driving offenses were used for statistical analysis. The present research contributions can be summarized as a fact that a set of factors related to the driver, vehicle, and the environment together have been used in the modeling of taxi drivers’ violations and the variables of engine power of the car, car type, holiday status and type of taxi activity in this study have been used for the first time and evaluated.

The primary findings of the study was that, the probability for taxi drivers to cross the red light on working days is higher than holidays and also the probability of speeding is more than crossing the red light from 12 pm to 6 am. To tackle this issue, installation of speed camera recording the violation in dark conditions is suggested. Moreover, equipping the taxis with GPS and controlling taxis speed can help decreasing the speeding.

According to the results, the drivers of taxi fleet aged over 70 are more likely to pass the red light compared to the drivers younger than 30. Old drivers who were trained in a distant past can be updated by training.

The rate of passing the red light for the drivers having class 1 driver’s license compared to the ones with class 2 driver’s license is less. By giving some bonus like insurance discount, free annual technical diagnosis of car or giving free parking card can motivate the drivers to obey the rules and drive more safely. For the drivers with class 2 driver’s license who commit more traffic offenses, giving negative driving score on their driving profile is suggested which can act as a restraining force.

In addition, prohibited entry is more probable to be committed during working days, such that the probable rate of prohibited entry during working days is 2.5 times more than non-working days. Moreover, the taxi drivers are more likely to do prohibited entry during day. The rate of this offence can be decreased by increasing the efficiency of the existing streets through intelligent transportation system or restricting the single seat and also using HOV lanes.

Younger taxi drivers, drive riskier compared to the older ones and as a result, they commit more driving offenses. For instance, drivers less than 30 years are three times more probable to commit prohibited entry than drivers over 70 years. In addition, drivers in their 30 s are more likely to enter special lane compared to the drivers who are over 70 years. This kind of offence can be decreased by enacting stricter rules for the younger drivers.

Drivers with newer and more luxurious cars commit less driving offenses compared to the drivers with older and cheaper cars. According to the results, committing the offence of not paying attention to the police command is four times more probable by the van drivers compared to the taxi drivers having Toyota. In addition, the drivers of taxis aging from 0 to 3 years are less probable to commit the offence of prohibited entry to the special lane compared to the taxis aging more than 9 years.