1 Introduction

One of the problems encountered in traffic safety analysis is that it is difficult to obtain reliable exposure between different vehicle types such as trucks, buses, cars, two wheelers (2Ws) and three wheelers (3Ws). In one of the few studies dealing with this issue, Nationwide Personal Travel Survey data were used to estimate vehicle miles driven as measure of exposure [1]. Bhalla et al. [2] estimated exposure between pair of vehicles as the product of number of vehicles of that type and average vehicle miles travelled by that vehicle. Overall exposure of vehicles can be estimated from origin–destination surveys or household surveys, but these are not easily available. However, these exposure estimates do not tell us much about the actual interaction between vehicles on the road. Attempts to correlate total distance travelled (exposure) to vehicular interaction would be inaccurate as vehicular interactions significantly depend on vehicle positioning pattern in addition to density and composition. Recently, some studies have tried to understand microscopic vehicular interactions that contribute to accidents [3,4,5]. Oh et al. [4] suggested that exposure is equal to the total time a vehicle pair spend in following. Their approach allows for more precise measurement of exposure between the two vehicle types on a given road. The simulation model was calibrated and validated using traffic characteristics such as volume, density, speeds, occupancy time, etc., and then used to predict the exposure. This approach of using microscopic traffic flow simulation can be further explored to determine the exposure or interaction rate between different types of vehicles using the field data.

Over the years, various microscopic traffic flow models have been developed to predict vehicular behaviour from a mid-block section of road to the network level. In microscopic traffic flow models, each vehicle is described by its own equation of motion; hence, the computational time and memory required are greater for these models. In this context, Cellular Automaton (CA) modelling has been found promising to meet this challenge. The concept of microscopic traffic flow CA model was first coined by Cremer and Ludwig [6]. Their study was followed by Nagel and Schreckenberg [7], whose model was found to be superior in modelling randomisation in traffic flow. This model was very basic as it had only four rules that governed the movement of vehicles in a stream. These rules were acceleration, deceleration, randomisation and repositioning of vehicle based on new speed. This model even in its basic form was able to replicate some of the traffic features of homogeneous, single lane road with periodic boundary conditions. Most of the CA-based traffic flow models have addressed the homogeneous traffic flow and its behaviour. Due to the discreteness of this model, it provides an opportunity to simulate large-scale real time microscopic phenomena like platoon formation and the capacity drop at transitions between free and congested flow. Later, others [8,9,10] contributed to the development of the model by adding more rule sets to increase its capability to replicate traffic features seen in multilane and heterogeneous traffic. Matthew et al. [11] proposed a modified cell size, randomisation rule and lane-changing rule of CA model for heterogeneous conditions. Mallikarjuna and Rao [12] (Ma–Ra) developed a heterogeneous traffic model for Indian conditions. They found that traffic in India is highly heterogeneous with frequent lane changing; hence, it was necessary to modify the Knospe’s model to incorporate many types of vehicles and also their lateral movements. Traffic composition in India includes a significant proportion of motorised 2Ws and motorised 3Ws that have smaller dimensions than cars. Mallikarjuna and Rao reduced cell dimensions to 0.5-m in length and 1.75-m in width to represent different lengths and speed differentials in each time-step for various vehicle types. Lateral and longitudinal movement rules were also improved from earlier models. Zhao et al. [13] developed a CA model for determining interactions between motorised and non-motorised vehicles near a bus stop. This model incorporated non-lane based behaviour of non-motorised vehicles. Vasic and Ruskin have developed a CA model to simulate the road network structure [14]. This was done to simulate car and bicycle traffic on mid-blocks and at intersections. Xie et al. [15] developed a CA model for modelling interactions between vehicles and pedestrians at signalised crosswalk. Their results showed that there was a critical value that divides the vehicle flow into free and congested flow portions. Zhang et al. [16] compared CA and gas dynamics models using speed density characteristics of the mixed bicycle traffic (i.e. bicycle traffic including electric bicycles). They found that the results produced from CA were more consistent with the observed data when density was lower, while gas dynamics model performed better at densities higher than 0.3 bicycles/m2. Tao et al. [17] proposed an improved brake light CA model by improving acceleration rules and to avoid over-deceleration, the randomisation probability and deceleration extent are determined according to the results of the step of deterministic deceleration. Xie and Zhao [18] developed a CA model that considered timid and aggressive driving behaviours. They modified the anticipated speed of the preceding vehicle with a new constant parameter, \(\Delta v\) representing the aggressiveness of driver. Further, they found that even a small proportion of timid drivers significantly reduce the road capacity while it needs much more aggressive drivers to increase road capacity. Zheng [19] exhaustively reviewed the lane changing models available for microscopic traffic flow modelling. This study suggested that a comprehensive model that captures lane changing decisions needs to be developed and that this model should be able to predict vehicle trajectories close to that observed in field at microscopic level. At macroscopic level, the model should be able to produce fundamental traffic flow characteristics. Until now, researchers have evaluated CA models microscopically using individual vehicle trajectories [20] or macroscopically using traffic characteristics such as stream speed, average flow, density or occupancy and number of overtaking instances [8, 10, 12]. Pandey et al. [21] evaluated the CA model proposed by Mallikarjuna and Rao and found that even though the CA model could simulate fundamental diagrams (flow vs density plots) satisfactorily, it gave unexpected results when microscopic characteristics such as lane-maintaining behaviour, car-following and overtaking manoeuvres were compared with those observed in field. Also, the mean stream speed and capacity of road were higher than that observed on road. Authors believe that this was due to inadequacy that arises in not considering the heterogeneous nature of vehicles and difference in driving behaviour. Some of the inadequacies are listed below:

  1. 1.

    Inadequacies in lateral movement rules

    In the Ma–Ra model, overtaking vehicles are supposed to meet two criteria: (a) incentive criterion and (b) safety criterion.

    1. (a)

      The incentive criterion requires the vehicle to have longitudinal gap in target lane to be greater than the current speed multiplied by a factor. That means the vehicular lane change behaviour was purely based on availability of longitudinal gaps on road and that vehicles had no preference or desired position on road. This may not be true as it was found in this study that vehicles do have a preferred position on the road [21]. For example, heavy vehicles such as trucks and buses [henceforth referred to as heavy motor vehicles (HMVs)] tend to drive closer to median (henceforth referred to as ‘median lane’ in this paper) while lighter vehicles such as bicycles and three wheelers prefer travelling closer to shoulder (henceforth called as ‘shoulder lane’ in this paper). This phenomenon plays a very important role in determining stream speeds and inter-vehicular interactions as heavy vehicles prefer inner most high-speed lane over outer slower lane consisting three wheelers and bicycles. This means that maximum gap and speed are not the only criteria but also convenience. This was also evident as simulated stream speeds were higher than observed due to vehicles changing lanes based on just gap size and not convenience or safety. This results in higher than expected speeds. Hence, there is a need for identifying adequacies in Ma–Ra model in terms of determination of the interaction rate between different vehicle types.

    2. (b)

      The safety criterion requires the incoming vehicle from back to have back gap greater than current speed multiplied by factor. It was based on the assumption that vehicles coming from the rear would never decelerate even if a vehicle would enter into their lane ahead of them. Hence, the required gap calculated by safety criterion was much higher than those observed in field [21]. This may possibly cause large and unrealistic vehicular queues at lower densities as the vehicles would rarely get enough gaps for overtaking.

  2. 2.

    Inadequacy in longitudinal movement rules

The minimum safe distance between the vehicles was considered constant, irrespective of type of vehicles involved or their current speeds. This was unrealistic as studies have found that vehicles maintain different longitudinal gaps based on the type of the leader vehicle. For example, a HMV may maintain a higher longitudinal gap, while following 3W or 2W carrying children than following vehicles without children. It is also known that the minimum safe distance depends on the current speeds and the maximum deceleration rates of both vehicles.

Literature review in this area suggests that even when risk analysis has been an area of focus for researchers working in this field, most have assumed the number of crashes between vehicle pair as a subset of total exposure between them. Exposure was expressed as vehicle kilometers or vehicle hours travelled by that vehicle during the study period. Authors believe that this exposure is not accurate as it is based on the assumption that driver behaviour is similar across vehicle types and road types, which may not be true. Hence, there is a need to develop a new method of measuring exposure or interaction between vehicles based on the number of car-following and overtaking events observed in the field. It can thus be understood that the number of car-following and overtaking events between two vehicle types can be described as an interaction rate between two vehicle types. These are better correlated to crashes than vehicle kilometers or vehicle hours travelled. In this study, it was assumed that most crashes occur during car-following or overtaking events thus a microscopic traffic flow model based on CA could be explored to simulate the interactions between vehicle pairs. This led to the development of a position preference based CA (PP-CA) model for heterogeneous traffic conditions in the present study.

The rest of the paper is organised as follows. The modified longitudinal and lateral movement rules of the proposed model are discussed in Sect. 2. Section 3 discusses data collection and extraction methodology. Subsequently, Sect. 4 includes the validation of the model using fundamental diagrams and differences in observed and simulated interaction rates. Section 5 illustrates the application of the proposed model in determining the maximum interaction rates for different vehicle pairs in given traffic conditions.

2 Position preference based CA model

2.1 Model description

Inadequacy in lateral movement rules (Sect. 1 (1a)) is addressed by introducing a position preference parameter in the incentive criteria that reduces the probability of lane change as the distance between the target position and the preferred position increases. A difference between the proposed and conventional brake light models is that conventionally it was assumed that the probability of lane change only depends on the speed of the subject vehicle and hence only one parameter \(\alpha\) was used. This parameter captures the gap acceptance behaviour of the driver as a function of speed of vehicle. But in the proposed model, it is assumed that the probability also depends on the current position of the subject vehicle across road width. As discussed earlier, vehicles tend to have a desired or preferred position on road based on the vehicle type. As a result, vehicles often try to stick to their preferred lane even if there is a greater gap available on the adjacent non-preferred lane. This phenomenon is explained in detail in Fig. 1, which shows various interacting factors that may affect the driver’s decision (i.e. subject vehicle) during a lane change instance in the model. In the study, the Ma–Ra model is used as reference model for comparison and hence the conventional model refers to the Ma–Ra model. However, both models are quite different.

Fig. 1
figure 1

CA lattice structure and relative positioning of vehicles

2.1.1 Lateral movement rules

In this section, the lateral movement rules for the proposed model are presented. In Fig. 1, \(x_{n}^{t}\) (grey vehicle) is a subject vehicle that is trying to decide between three options. They are: Option 1 (lane change and follow the leader car \(x_{n + 2}^{t + 1}\)), Option 2 (lane change and follow the leader 3W \(x_{n + 3}^{t + 1}\)), and Option 3 (no lane change and keep following the leader vehicle \(x_{n + 1}^{t + 1}\)). Option 3 is close to a do-nothing scenario as the subject vehicle keeps following the leader 3W even if 3Ws are slower than light motorised vehicles (LMVs), heavy motorised vehicles (HMVs), and 2Ws. Also, notice that Option 3 has a lower longitudinal gap (\(g_{n}^{\text{f}}\)) compared to Options 1 and 2 (\(g_{n}^{{{\text{tf}}1}}\), \(g_{n}^{{{\text{tf}}2}}\)). Here, \(g_{n}^{\text{f}}\) is the front gap available to the subject vehicle (\(n)\) after considering the anticipated movement of the leader vehicle on the current lane. In Fig. 1, \(g_{n}^{\text{cf}}\) is the minimum safe distance between the subject vehicle \(n\) and its leader vehicle on the current lane (Option 3), calculated using Eqs. 1 or 2, whereas \(g_{n}^{{{\text{cf}}1}}\) and \(g_{n}^{{{\text{cf}}2}}\) represent the minimum safe distance between the subject vehicle \(n\) and its leader vehicle for Options 1 and 2, respectively; \(g_{n}^{\text{tb}}\) is the total back gap available on the target lane, \(g_{n}^{\text{cb}}\) is the minimum safe gap on the target lane between subject and incoming vehicles (Eq. 3), and \(l_{n}\) is the length of the subject vehicle. According to conventional CA models, Option 2 is better as it offers a larger longitudinal gap (\(g_{n}^{{{\text{tf}}2}}\)) and hence higher speeds compared to Options 1 and 3 (\(g_{n}^{{{\text{tf}}1}}\), \(g_{n}^{\text{f}}\)). But, Options 2 and 3 would put the subject vehicle behind 3Ws (\(n + 1\), \(n + 3\)) which have the slowest speeds and relatively higher maximum deceleration rates of the four vehicle types considered in the study. Hence, it can be assumed that the braking distance for 3Ws, which is a function of deceleration rate and speed of vehicle, would be lower than that of LMVs, HMVs and 2Ws. So, a subject vehicle following 3Ws (\(n + 1\), \(n + 3\)) needs to maintain a larger gap (safe distance) as compared to those when following LMV, HMV and 2Ws. Hence, even if \(g_{n}^{{{\text{tf}}2}}\) is larger than \(g_{n}^{{{\text{tf}}1}}\) and \(g_{n}^{\text{f}}\), the effective gap (\(g_{n}^{{{\text{tf}}2}} - g_{n}^{{{\text{cf}}2}}\)) for Option 2 can be smaller than those for Options 1 and 3. Also, if the gap \(g_{n}^{{{\text{tf}}2}}\) is not large enough the subject vehicle may be forced to change lane again and return to its present lane. On the other hand, Option 1 would put the subject vehicle behind car (\(n + 2\)). If the subject vehicle is also a car, Option 1 would allow higher speeds as compared to Options 2 and 3, in spite of Option 2 offering the highest gap. Since most cars travel closer to the median lane (Option 1) [21], Option 1 is the most preferred position for the subject vehicle. Option 1 would bring the subject vehicle closer to a preferred position and Option 2 would take it away from the preferred position. Hence, if the gap \(g_{n}^{{{\text{tf}}2}}\) is not large, Option 1 would appear better than Option 2. To incorporate this phenomenon an additional parameter beta \((\beta )\) is included to improve lane keeping profiles of vehicles. In the proposed PP-CA model, the probability of a subject vehicle making lane change to a target lane decreases with an increase in the speed of the subject vehicle and the difference between current position and preferred position of the subject vehicle represented by \(\Delta x_{n}\).Together, \(\alpha \) and \(\beta \), respectively, represent the mandatory and voluntary aspect of lane changes as observed in the field. Figure 2 shows the incentive and safety criteria used to decide if the gap in the target lane (\(g_{n}^{\text{tf}}\)) is large enough to justify a lane change. The decision to change a lane is based on the current speed of subject and leader vehicles, denoted by \(v_{n}^{t}\) and \(v_{n + 1}^{t}\), respectively, and the difference of current and target positions from preferred position, denoted by \(\Delta x_{n}\) and \(\Delta x_{n}^{t}\), respectively. Hence, the incentive criteria can be divided into two parts:

Fig. 2
figure 2

Lateral movement criteria

  1. 1.

    The gap calculated after considering the speed and position on the target lane (to the left of incentive criterion) is larger than that calculated for the current lane (to the right of incentive criterion)

  2. 2.

    The speed of the subject vehicle is either zero or the speed of the leader vehicle is less than the maximum speed of the subject vehicle.

Further, safety criterion ensures that the total gap available on the target lane to be larger than the sum of the safe gap between the subject vehicle and incoming vehicle on the target lane and the length of the subject vehicle.

Thus, the proposed model is adequate for heterogeneous traffic conditions in developing countries where different vehicle types with varying maximum speeds and acceleration rates are forced to share lanes. In these conditions, it is common to observe vehicles not initiating lane change for fear of getting stuck in non-preferred lanes/position and behind slower vehicles. But the model is generally applicable to roads with slower and faster lanes as the vehicles would prefer to stay on faster lane even if a larger gap available on slower lane/position.

2.1.2 Implications of position preference parameter

One of the implications of position preference parameter \(\left( \beta \right)\) is that it provides incentive to make lane change even if the subject vehicle is travelling at the desired speed. In conventional models, a vehicle would require two slower leader vehicles to complete a lane change manoeuvre. A complete lane change manoeuvre may be defined as a vehicle makes a lane change for overtaking the leader vehicle and then return to the original lane. Thus, a complete lane change manoeuvre involves two lane change processes and at least one leader vehicle. But in conventional CA models, the lane change is initiated only when following slower leader vehicles. Because of this, two slower leader vehicles would be required to complete a lane change process. If lane change manoeuvres are not completed in simulation or if the vehicles could not return to their original lanes after overtaking, the vehicles would end up in positions where they are hardly observed in field [21]. For example, HMVs travel closer to shoulder lanes instead of median lane. Position preference parameter allows vehicles to drift towards their preferred position on road as observed in data. This improvement in the model significantly changes the outcome of simulation as shown in subsequent sections. Also, in CA models, lane change rules can be symmetric or asymmetric with respect to lanes or vehicle type [9]. In symmetric models, both lanes are treated equally or all vehicle types have equal probability of acquiring a position on cell lattice. Asymmetric models are applicable when left/right overtaking is banned or when certain vehicle types are not allowed to acquire a certain position on road. In our model, the position parameter \(\left( \beta \right)\) allows to switch between symmetric and asymmetric rules. At \(\beta\) = 0, the model is perfectly symmetric, but as the value of \(\left( \beta \right)\) increases it becomes more and more asymmetric.

2.1.3 Longitudinal movement rules

Based on evaluation of lane change options and longitudinal movement rules, the model determines the exact position of a vehicle on the cell matrix. The longitudinal movement rules used in the study are similar to those proposed by Mallikarjuna and Rao [12] for heterogeneous conditions but with some modifications. In this model, originally based on Knospe’s brake light model [9], the subject vehicle would react to the ‘brake light status’ of the leader vehicle. If the gap in front is less than the interaction headway, it would adjust itself based on the speed of the leader vehicle. If the gap is more than the interaction headway, the subject vehicle would accelerate until it reaches a desired speed. If the gap is less than the safe gap, the subject vehicle would decelerate until a safe driving conditions is achieved. The acceleration is modified with a probability term \(p_{\text{o}}\) when the vehicle starts from rest and \(p_{\text{dec}}\) when the vehicle slows down. In this study, the longitudinal rules were modified such that the security gap used for calculating the effective gap for determining safe driving conditions is calculated dynamically based on speeds and maximum deceleration rates of subject and leader vehicles (Eq. 1). The longitudinal movement rules shown in Fig. 3 are explained below.

Fig. 3
figure 3

Steps in longitudinal movement procedure

  • Step 1 Value of randomisation parameter determines the probability of deceleration based on headway and speed of the subject vehicle and the brake light status of the leader. If brake light is on (=1) and headway \((t_{n}^{\text{h}} )\) is less than the interaction headway (\(t^{\text{s}}\)), then the probability of deceleration \(p = p_{\text{bl}}\). If the speed of the subject vehicle \((v_{n}^{t} )\) is zero, then the probability of deceleration \(p = p_{\text{o}}\). Otherwise \(p = p_{\text{dec}}\).

  • Step 2 The subject vehicle would accelerate if the braking status of the leader is off (=0) and the effective headway is larger than the interaction headway.

  • Step 3 The subject vehicle would decelerate if the speed obtained from acceleration rule is larger than that for a safe gap.

  • Step 4 The randomisation rule is applied based on the probabilities calculated in Step 1 to capture the stochastic behaviour of drivers in the field assuming that vehicles decelerate randomly.

  • Step 5 Subject vehicle’s position is updated based on the speed obtained from Step 4.

In Fig. 3, \(t_{n}^{\text{h}}\) is the available time headway for the subject vehicle \(n\), the leader vehicle is referred to as \(n + 1\) and the following vehicle as \(n - 1\); \(t^{\text{s}}\) is the interaction headway between subject and leader vehicles; \(v_{n}^{t}\) and \(v_{n}^{t + 1}\) are the speeds of subject vehicle at time-steps \(t\) and \(t + 1\), respectively; \(v_{n}^{\text{a}}\), \(v_{n}^{\text{b}} ,\;{\text{and}}\;v_{n}^{t + 1}\) are the updated speeds of subject vehicle after applying acceleration, braking and randomisation rule, respectively; \(l_{n}\) is the length of the subject vehicle; \(a_{n} \left( {v_{n}^{t} ,l_{n} } \right)\) is the acceleration which is a function of speed and vehicle type of the subject vehicle; similarly, \(d_{n} \left( {l_{n} } \right)\) is the deceleration rate of subject vehicle which is a function of vehicle type; \(v_{n}^{ \hbox{max} }\) is the maximum speed of the subject vehicle; \(x_{n}^{t}\) and \(x_{n}^{t + 1}\) are longitudinal positions of the subject vehicle at time \(t\) and \(t + 1\), respectively; \(p_{\text{o}} ,p_{\text{bl}}\) and \(p_{\text{dec}}\) are the probabilities of subject vehicle applying brake randomly based on different conditions mentioned in step 1 (i.e. determination of randomisation parameter); \(p_{\text{lc}}\) is probability of lane change at any time-step. The values of these parameters are presented in Sect. 5.2.

Safe distance is the minimum gap a vehicle would maintain in order to avoid collision in case the front vehicle is applying brakes suddenly. \(b_{n}^{t}\) and \(b_{n + 1}^{t}\) are the binary variables denoting brake light status of subject and leader vehicle, respectively, at time \(t\)(if equal to 1, brake light is on).

Authors observed that in staggered driving conditions, the headways can be much lower than that required for a normal deceleration process. This suggests that drivers often keep a minimum gap considering maximum deceleration capabilities of vehicles while following. Also, as the subject vehicle’s speed would be limited by safe gap (braking rule), the erratic deceleration behaviour of conventional CA models is avoided. This increases the scope of the model as it can now simulate sudden braking of the leader vehicle without causing collision or unrealistic deceleration of the subject vehicle.

2.1.4 Safe gap calculations-following gap

While applying the longitudinal movement rules of the proposed model, the safe following distance was calculated dynamically instead of adopting a constant value. As discussed in (2) of Sect. 1, the safe distance between vehicles would depend on their vehicle types and speeds and hence adopting a constant value is not very realistic. Therefore, a safe distance between the leader and follower was assumed to be a function of velocities and deceleration rates of the two vehicles. Safe distance, shown in Eq. 1, was calculated as the difference between the distance travelled during the reaction plus braking time of the subject vehicle and the distance travelled by the leader vehicle during that time. Figure 4 illustrates car-following and explains the basis for determining the minimum safe distance \(g_{n}^{\text{cf}}\). In Fig. 4, the leader vehicle \(n + 1\) applies brakes at time t = 0; then the subject vehicle \(n\) applies brake after a reaction time \(t_{n}^{\text{r}}\). \(T_{n}\) is the time when the subject vehicle’s speed becomes zero or equal to that of the leader vehicle. In order to avoid collision at \(t = T_{n}\), the distance covered by the subject vehicle \(n\) between t = 0 and \(t = T_{n}\) should be less than the sum of the gap between the two vehicles at t = 0 and the distance covered by the leader vehicle between t = 0 and \(t = T_{n}\). Hence, we have

$$t_{n}^{\text{r}} \cdot v_{n}^{t} + \frac{{(v_{n}^{t} )^{2} }}{{2d_{n}^{ \hbox{max} } }} < g_{n} + \frac{{(v_{n + 1}^{t} )^{2} }}{{2d_{n + 1}^{ \hbox{max} } }},$$

where \(d_{n}^{ \hbox{max} }\) and \(d_{n + 1}^{ \hbox{max} }\) are the maximum deceleration rates of the subject vehicle \(n\) and the leader vehicle \(n + 1\), respectively; \(v_{n}^{t}\) and \(v_{n + 1}^{t}\) are the speeds of the subject and leader vehicles at time \(t;\) and g n is the gap available for the subject vehicle n.

Fig. 4
figure 4

Safe distance while following and lane change

When \(g_{n} = g_{n}^{\text{cf}}\), the safe distance required by the subject vehicle while following is

$$g_{n}^{\text{cf}} = t_{n}^{\text{r}} \cdot v_{n}^{t} + \frac{{(v_{n}^{t} )^{2} }}{{2d_{n}^{ \hbox{max} } }} - \frac{{(v_{n + 1}^{t} )^{2} }}{{2d_{n + 1}^{ \hbox{max} } }}.$$
(1)

A negative value for \(g_{n}^{\text{cf}}\) represents that the distance covered by the leader vehicle is larger than that by the subject vehicle and hence a collision would never happen as the subject vehicle would not be able to reach the leader vehicle. Thus, in this case, the minimum safe distance would be maintained when both vehicles travel an equal distance. Hence, in Eq. 1, assuming the second term is equal to the third term, the safe distance required by the subject vehicle would be

$$g_{n}^{\text{cf}} = t_{n}^{\text{r}} \cdot v_{n}^{t} .$$
(2)

2.1.5 Safe gap calculations-back gap

Inadequacy in lateral movement rule (Sect. 1 (1b)) is addressed by determining the back gap distance dynamically using vehicular deceleration rates and current speeds (explained later). In the proposed model, it is assumed that while making a lane change, the subject vehicle only looks for a safe stopping distance between itself and the incoming vehicle from the rear on the target lane, which is denoted by \(g_{n}^{\text{cb}}\). Conventional brake light models require this distance to be equal to a factor (α) multiplied by the speed of the incoming vehicle. This means they ignore the fact that the incoming vehicle would decelerate in the following time-steps upon seeing the subject vehicle entering the lane ahead of them. They also ignore the speed of the subject vehicle attempting a lane change in calculating the safe distance. This leads to a very conservative lane-changing model, especially for India, where lane-changing behaviour is assumed to be much more aggressive. Safe back gap \(g_{n}^{\text{cb}}\), which is the gap between the subject vehicle (attempting lane change) and the incoming vehicle on the target lane, is calculated considering distances covered by the two vehicles, shown in Fig. 4. Here, unlike the minimum following distance \(g_{n}^{\text{cf}}\), where both vehicles decelerate, the incoming vehicle \(n - 1\) decelerates while the subject vehicle \(n\) accelerates or maintains its current speed on the target lane. Hence, the braking distance for the subject vehicle in Eq. 1 is replaced by the total distance covered by the subject vehicle assuming it maintains its current speed on the target lane. The assumption that the subject vehicle maintains its current speed on target lane would always give a safer back gap compared to that determined based on the assumption that the vehicle accelerates on target lane. Hence, replacing the distance covered by leader vehicle with the distance covered by subject vehicle in Eq. 1 results in the following equation:

$$g_{n}^{\text{cb}} = t_{n - 1}^{\text{r}} \cdot v_{n - 1}^{t} + \frac{{\left( {v_{n - 1}^{t} } \right)^{2} }}{{2d_{n - 1}^{ \hbox{max} } }} - \left( {\frac{{v_{n}^{t} }}{{d_{n}^{ \hbox{max} } }} \cdot v_{n}^{t} } \right) ,$$
(3)

where \(t_{n - 1}^{\text{r}}\) is the reaction time of the incoming vehicle, \(d_{n - 1}^{ \hbox{max} }\) is the maximum deceleration rate of the incoming vehicle from back on target lane, \(v_{n}^{t}\) and \(v_{n - 1}^{t}\) are the speeds of the subject and incoming vehicles, respectively, at time \(t\). For negative \(g_{n}^{\text{cb}}\), its value is taken as the same as in Eq. 2. Note, \(g_{n}^{\text{cf}}\) and \(g_{n}^{\text{cb}}\) are based on continuous equations and then discretised. As the cell length is 0.5 m, which is quite small compared to other CA model, some accuracy loss during discretisation (rounding to nearing 0.5 value) would not affect the model performance. The following sections present the data collection effort and the implementation and validation of the PP-CA model.

3 Data collection

In Ludhiana city, Punjab, India, eight arterial roads, namely (1) Chima Intersection–Samrala Intersection, (2) Chima Intersection–Vishwakarma Intersection, (3) Jagraon Bridge–Jalandhar Bypass, (4) Bharatnagar Intersection–Jagraon Bridge, (5) Bharatnagar Chowk–Model Gram, (6) Bhaiwala Chowk–Shastri Nagar, (7) Ludhiana Bypass and (8) Kundan Vidya Mandir Lane, were selected for this traffic survey. These roads were selected because of the availability of vantage points for mounting cameras and variations in flow among them. A total of 16 h of traffic surveys were conducted using video-camera during peak (09:00–10:00) and off-peak hours (12:00–13:00). Pedestrian foot-over bridges were used to mount cameras as the locations provided a view of a clear road stretch of 80 m. The perspective of this road from the camera also suited the image processing software used for data extraction (TRAZER™). A rectangular trap of 60 m × 7 m on the road was delineated in the beginning to facilitate software calibration. Vehicles’ trajectories were drawn in TRAZER™, a video image processing software developed by Kritikal Solutions Limited, India (www.kritikalsolutions.com). Due to the software limitations, all trajectories were manually marked to ensure accuracy. Figure 5 shows the marked vehicles in TRAZER™. The objectives of the study required accurate speed and gap determination, and hence the accuracy of trajectories was more important than the number of trajectories. Since each hour of recording contained thousands of vehicles, this was assumed to be enough for determining speeds and gaps statistically. Hence, it was decided to collect 2 h data on each arterial in the first round and then collect more videos for roads with high variability (if it existed). A regression towards mean approach was used to determine the adequate sample size on each road. More data would have been unnecessary and expensive as each hour of data requires more than a week if done manually and accurately. A total of 4,983 vehicle trajectories containing frame-wise x-y coordinates of each vehicle on every 25th frame was created to derive traffic flow characteristics. A flow chart explaining the steps for data extraction is presented below.

Fig. 5
figure 5

Marking of vehicles in TRAZER software

3.1 Flow chart for extracting microscopic characteristics

A MATLAB program was developed to extract traffic characteristics such as individual vehicle speeds and gaps, acceleration/deceleration, density, flow and total exposure between vehicle types. Figure 6 shows a flow chart showing the algorithm.

Fig. 6
figure 6

Flow chart showing traffic flow characteristics extraction algorithm

A detailed outline of the flow chart is presented below.

  1. 1.

    Input consists of x-y coordinates of vehicles, frame IDs, vehicle IDs and vehicle type.

  2. 2.

    Determine the total flow as the number of vehicles crossing the survey site per hour.

  3. 3.

    Determine the traffic composition based on the proportion of different vehicle types in the total flow.

  4. 4.

    Determine the total road trap area as the product of road width (7 m) and trap length in camera (60 m).

  5. 5.

    Determine the horizontal projection area for different vehicle types based on their vehicular dimensions.

  6. 6.

    Create frame array for each frame ID consisting trajectory array of all vehicles in that frame.

  7. 7.

    Create trajectory array for each vehicle ID by grouping x-y coordinates based on vehicle IDs.

    1. (a)

      Create fields within each trajectory array such as ID, vehicle type, frame IDs, x and y coordinates, length and width.

    2. (b)

      Discard trajectories with less than two x-y coordinates.

    3. (c)

      Apply the moving average smoothing technique to each trajectory.

    4. (d)

      Calculate speed using the first and the last y-coordinates in the trajectory and corresponding frame ID. Each frame is 1 s apart from the previous one.

    5. (e)

      Calculate acceleration and deceleration using speed differential for each trajectory.

  8. 8.

    Calculate area occupancy as the ratio of the total projection area occupied by all vehicles in a frame array to the total road trap area.

  9. 9.

    Calculate lateral and longitudinal gaps for vehicles that overlaps along length and width, respectively, in a particular frame.

    1. (a)

      This is done for all frames in a particular trajectory array.

    2. (b)

      The lateral gap on any side (left or right) of a vehicle is equal to the minimum of all gaps maintained by that vehicle to other vehicle on that side.

    3. (c)

      For longitudinal gap, the two vehicles have to be in the same frame and hence the maximum gap is equal to trap length.

  10. 10.

    Determine the interaction rate between different vehicle types by measuring the number and types of vehicles involved in overtaking and car-following based on their speeds and whether or not they have lateral and longitudinal gaps. In this study, the interaction rate between two vehicle types is defined as the number of vehicles (say type A) found to be following or overtaking (say type B) per 1,000 observed vehicles. A vehicle was considered to be following if its path had at least 50% overlap with that of the preceding vehicle along the direction of movement (Fig. 7). This was determined from their x-y coordinates provided by TRAZER. Since the camera could only focus on 60 m of road length in front of it, a vehicle having a longitudinal gap of more than 60 m was not considered to be following. Similarly, a vehicle was considered to be overtaking only if it had a lateral overlap (>50%) with an adjacent vehicle and its speed was more than that of adjacent vehicles.

    Fig. 7
    figure 7

    Lateral and longitudinal overlapping in vehicles

  11. 11.

    The output was stored in three dimensional arrays for further processing.

4 Simulation setup

A MATLAB code was written to simulate a 7-m-wide two-lane road of length 5,000 m and four vehicle types namely HMVs, LMVs, 3Ws and 2Ws. As mentioned earlier, in CA models, the road is represented by a lattice of uniformly sized cells and the size of cells affects the computational time and accuracy of the model. Finer cell sizes result in higher accuracy as vehicular gaps and dimensions can be represented more accurately but it is believed that this also increases computational time as there are now more cells in lattice that needs to needs to be processed by computer at every time-step. However, authors believe that computational time is more dependent on the density of vehicles on road, number of lanes and length of road to be simulated, whereas cell size has less effect on computational time. Hence, it was decided to adopt a size that can accurately represent the smallest vehicular gap and dimension observed in study. Since, in mixed traffic, the space headway could be as little as 0.5 m during queues in jam conditions and 2Ws are the smallest vehicles in the study with a maximum width of 0.7 m, a lattice with a periodic boundary condition consisting of 10,000 cells of size 0.5 m × 0.7 m was used for simulation. The open boundaries are usually not preferred as longer lattices are required for various simulation phases. Further, as the length of lattice increases, the number of vehicles to be processed at each time-step also increases for a given density. This increases computational time and still does not guarantee a steady state before measurements. All road links were simulated using their traffic composition and densities per kilometer as input. Measurements were taken through a virtual detector of length 60 m in the middle of lattice to replicate 60 m camera trap used in field and then results were averaged at different occupancies. A total of 10 simulation runs were carried out with each run simulating for 3,600 s using a resolution of 8 time-steps per second. Hence, there were a total 28,800 (3,600 × 8) time-steps in each simulation run. For each simulation run, the first 800 time-steps were discarded as a warm-up to eliminate the initial noise. The simulated vehicular trajectories were created to obtain characteristics such as individual speeds, stream speeds, gaps, occupancies, proportion of a vehicle type in car-following or overtaking. Vehicles do not necessarily continue in the same lane, so in this study car-following does not only represent vehicles in perfect car-following, but also those having staggered car-following [22] with some degree of lateral overlapping (>50%) in the direction of travel as shown in Fig. 7. Since all eight roads had the similar geometry, results can be attributed to traffic flow characteristics.

5 Model validation

5.1 Validation of the model for fundamental diagrams

Simulations were carried out at different area occupancies (ρ a ) to evaluate the proposed model with and without lane preference rule. Figures 8 and 10 show graphs of flow (q) and stream speed (v), respectively, with and without position preference rule at different occupancies, β = 0 and safe/back gap calculated dynamically. Here, PCUs represent passenger car units. In Fig. 8, the fundamental diagram (q-ρ a plots) shows the expected parabolic shape with the highest flow near-area occupancy value of 0.16 and 0.175 for cases without and with position preference rules, respectively. The capacity was found to be higher without preference rule due to the liberty of vehicles to choose any position and thus utilise the road space optimally, leading to the overestimation of flows at a given occupancy level.

Fig. 8
figure 8

Flow-occupancy diagram obtained using position preference based on modified CA model

Thus, the proposed PP-CA model reproduces more realistic capacities than the conventional CA model. Figure 9 shows the fundamental diagram (q-ρ a plots) of observed and simulated values obtained using the PP-CA model. The simulated values are averaged for area occupancies and simulation runs. The speed-occupancy plot in Fig. 10 shows the common trend with some noise around the occupancy of 0.1 due to transition from free flow to congested state. This is also evident from the q-ρ a plot in Fig. 8. Figures 11 and 12 show plots of flow (q) and stream speed (v) against area occupancy, respectively, for only car scenario for different values of β with safe/back gap calculated dynamically. It can be seen that as the value of β increases from 0 to 10, the capacity and stream speed on the road decrease. This is because at higher values of β, the vehicle has a lot more tendency to stick to the preferred position/lane even if the adjacent lanes provide greater longitudinal gaps. Hence, parameter β partially accounts for the phenomenon of not travelling at desired speeds in free flow conditions in case of some vehicles. Not much difference is found when the β value increases from 10 to 20. This can also be verified from Figs. 13 and 14, which shows that as the value of \(\Delta x_{\text{avg}}\) increases from 0 to 9, the stream speed and capacity also increase. In Figs. 13 and 14, the plots represent cars only scenario for different values of \(\Delta x_{\text{avg}}\) and occupancies with position preference rule and safe/back gap calculated dynamically. An increase in value of β results in a decrease in value of \(\Delta x_{\text{avg}},\) which represents an increase in the tendency to stick to the vehicle’s preferred lane. This results in a lower vehicular throughput of traffic stream. \(\Delta x_{\text{avg}}\) is the average difference between preferred position and current/target position for any vehicle type and is obtained from simulation by averaging \(\Delta x_{n}\) for all vehicles of a particular vehicle type. Note that the difference is calculated in cells. Figures 11, 12 and 13 also show that the parameter β is only sensitive at lower values of area occupancy (i.e. between 0.01 and 0.15).

Fig. 9
figure 9

Fundamental diagram comparing PP-CA model simulations and observed traffic flows

Fig. 10
figure 10

Relationship between area occupancy and stream speed obtained using position preference based modified CA

Fig. 11
figure 11

Relationship between area occupancy and flow for different values of β and only car scenario using position preference based modified CA model

Fig. 12
figure 12

Relationship between area occupancy and stream speed for different values of β and only car scenario using the PP-CA model

Fig. 13
figure 13

Relationship between area occupancy and stream speed for different values of Δx (average) and only car scenario using the PP-CA model

Fig. 14
figure 14

Relationship between area occupancy and flow for different values of Δx (average) and only car scenario using the PP-CA model

5.2 Validation of the model for interaction rate estimation

The aim of this study was to develop a PP-CA model to determine interactions between vehicle types for the given traffic flow conditions. Hence, this model was first validated using visual inspection and fundamental diagram (q-ρ a ) and then by comparing pair-wise simulated and observed interactions between vehicles types. For estimating interactions, whenever a vehicle was found to be following or overtaking another vehicle, it was considered to be interacting with that vehicle. Therefore, a vehicle in measurement region can have 0, 1, 2 or 3 interactions based on the type of vehicles and the minimum gaps to the sides and front. If one vehicle is followed by another and at the same time overtaken by the third, it would have two interactions. Similarly, if there was only one vehicle in the frame the number of interactions would be zero. Vehicles travelling side by side (with overlap >50%) and having different speeds were also considered to be involved in overtaking even if the overtaking manoeuvre could not be completed in the video. These interactions were grouped by vehicle types and expressed as the ratio of the number of vehicles interacting to that observed for that vehicle type. Here, vehicles with gap more than 60 m were not assumed to be following. Similarly, the vehicles with longitudinal overlaps were considered as overtaking or overtaken.

Video-graphic data were collected on eight locations as discussed earlier. Since the objective of this paper was to develop a microscopic model that can simulate interaction rates between different vehicle types, the interaction rates were determined for eight vehicle pairs, as shown in Figs. 15 and 16. Eight locations generated a total of 64 data points. Along with these interaction rates, other traffic characteristics such as traffic composition, area occupancy, vehicle-wise maximum speed and mean lateral position on road were also measured for calibration of the model. The simulations were carried out at different area occupancies. Table 1 shows the parameters used in the model for simulation. The observed and simulated interaction rates for different vehicle pair at different locations were compared using paired sample tests. Kleijnen [23] suggested that the student t test can be used to verify that the expected values of \(x_{i}\) and \(u_{i}\) are equal. Then, t-statistic becomes

$${\text{t-value = }}\frac{d - \delta }{{s_{d} \cdot n_{\text{s}}^{ - 1/2} }} ,$$
(4)

where \(d\) is the average of \(n_{\text{s}}\) pairs of differences in \(x_{i}\) and \(u_{i}\), \(\delta\) is the expected value of \(d\), \(s_{d}\) is the estimated standard deviation of \(d\), and n s is the sample size. Since the measured 64 data points were not enough for assuming a Gaussian distribution, Wilcoxon’s signed-rank test with continuity correction was used instead of paired t test. This resulted in a p-value of 0.19. Table 2 shows the results of Wilcoxon’s signed-rank test and Pearson’s correlation coefficient between observed and simulated interaction rates, where V is the sum of ranks of positive difference in the paired data and r is the Pearson’s product moment correlation. We can see that the observed and simulated medians are not significantly different.

Fig. 15
figure 15

Simulated interaction rates between LMV and different vehicle types for different values of area occupancies

Fig. 16
figure 16

Simulated interaction rates between HMV and different vehicle types for different values of area occupancies

Table 1 Values of the parameters used in simulation
Table 2 Results of test of difference in median lanes and Pearson’s correlation coefficient between observed and simulated interaction rates

In Table 1, the values of acceleration, p o, p dec, p bl, p lc and interaction headway are adopted from previous study [24], whereas values of other parameters were observed by authors. Then, validation was carried out by checking for positive correlation between the average simulated interaction rate \(\left( {I_{i}^{\text{s}} } \right)\) and the expected value of the observed interaction rate \(\left( {I_{i}^{\text{o}} } \right)\). As suggested by Kleijnen, it is important that the simulated mean increases if the observed mean increases on any road. Hence, Pearson’s correlation coefficient was calculated assuming that in a perfect model the relationship between simulated and observed interaction rates would be linear was found to be 0.75.

It was clear that there is a medium to high correlation between observed and simulated means of interactions rates. This means that as observed interaction rate increases for any vehicle pair and traffic condition the simulated interaction rate also increases.

6 Vehicular interaction rate simulation

This paper further explores the effect of area occupancy (ρ a ) of road on the amount of interaction between heavy and light vehicle types. Simulations were carried out at different area occupancies keeping vehicular proportion equal for different vehicle types, and interaction rates were measured for LMVs and other vehicle types. Similarly, interaction rates were also measured between HMVs and other vehicle types. It was found that for all vehicle pairs the interaction rate had a parabolic relationship with the area occupancy, as shown in Figs. 15 and 16. Here the interaction rate between two vehicle types was measured as the number of vehicles of type A found to be interacting (following or overtaking) with type B for every 1,000 simulated vehicles of type A found in measurement area. For example, interaction rate for LMV–HMV would mean the number of LMVs found following or overtaking HMVs out of 1,000 LMVs observed in measurement area.

From Figs. 15 and 16, one can see that the interaction rates increase rapidly with area occupancy in the beginning and then decrease after a certain point. This is plausible as at lower occupancies vehicles engage in both car-following and overtaking instances while at higher occupancies the number of overtaking instances reduces. The number of car-following instances was not linearly related to area occupancy. It can be assumed that car-following increases in the beginning owing to the increase in area occupancy on road. But as a particular vehicle can only be followed by at the most two vehicles at a time in no-lane discipline condition, the increase in car-following would not be as significant at higher densities. This is not the case with overtaking manoeuvres as overtaking possibility would cease to exist at very high density. It was found that LMVs had higher interaction rates with LMVs and HMVs because they share the same lane, but had very low interaction with 3Ws as 3Ws travel farthest from the median lane. It was also found that 3Ws had the lowest interaction rate with heavier vehicles such as HMVs and LMVs. This could be one of the reasons for low fatality rate among 3Ws as observed. HMV–HMV interaction rates were not analysed, as there were not many HMV–HMV interactions found in field data and hence cannot be validated.

7 Conclusions

This paper attempts at extending the brake light model to heterogeneous driving behaviour by proposing more realistic longitudinal and lateral movement rules. It considers the effect of position preference of different vehicle types and proposes a modified CA model, position preference based CA (PP-CA) model. This model also attempts to replicate the driving pattern observed on urban arterials in Ludhiana. The average lateral positions obtained by the proposed models were more consistent with the observed values than those obtained by Ma–Ra model. The new position preference parameter (β) was found to have a significant effect on flow and stream speed. This paper also demonstrates the use of interaction rate between vehicle pair as an alternate method to validate the microscopic traffic flow models which, otherwise, used to be validated through fundamental diagrams or individual vehicle trajectories. The interaction rate between vehicles plays a significant role in determining the crash propensity on that road. Vehicles that share the same lane have relatively higher interactions and hence higher crash propensity than those that are segregated by barriers, median or divider. The results of simulation showed that there was a significant relationship between area occupancy, vehicular proportion and interaction rate between them. The higher interaction rates between vehicle pairs namely HMV-2W and LMV-2W may be the cause of higher fatal crashes between these two pairs as commonly observed in developing countries. This study thus provides fresh impetus to the risk analysis modelling using microscopic traffic flow models.