1 Introduction

The ongoing development of autonomous driving is a promising field of research. When autonomous vehicles are finally admitted onto public roads, one can expect several benefits from them. They have the potential to reduce the number and severity of traffic accidents. Additionally, it would enable people who are unable to drive for themselves access to individual mobility. There are, however, several aspects of autonomous driving that currently prohibit its introduction into real world traffic. Among them is driving through inner city traffic and especially at unsignalized intersections. This intersection type is common in Germany in areas with low or medium traffic density. At these intersections the right before left rule applies. It states that one has to yield to a driver approaching on the next street to one’s right and that one has priority over a driver approaching from the next street to the left. Oncoming traffic has priority over turning left. This rule does not, however, provide a defined driving order in all possible scenarios. Instead, situations can occur in which each driver has to yield to at least one other driver, thus creating a deadlock at the intersection. In this case the German traffic regulations for example only state that driving before someone who has priority may only occur after the drivers communicated and thus cooperated with each other [1]. This of course is problematic for an autonomous vehicle (A-V) as it has to interpret human behavior, make a decision based on potentially unreliable predictions and still drive safely and in a way that is acceptable to both its passengers and its human interaction partners.

In this work we focus on two aspects of driving through unsignalized inner city intersections. The first aspect is how intersections influence driving behavior [42]. For that we describe an intersection by intersection complexity. We define intersection complexity based on features which describe an intersection. This includes both the static environment (e.g. visibility or the street width) and the dynamic environment, i.e. the traffic at the intersection. Driving behavior is described based on features obtained from the driven trajectory. We then predict the behavior features using the intersection features as inputs. The basis for that is data from a field study in real world traffic. The study, both the intersection and the behavior features and the prediction are described in detail in Sect. 3. The second aspect of this work focuses on the decision making at unsignalized intersections [43]. We present a decision making algorithm based on a discrete event system (DES) that is able to drive according to the traffic regulations. It is also able to cope with unclear situations like deadlocks or if a vehicle yields despite not having to. The strategy to solve these situations is based on the findings by [20]: They found that human drivers prefer not having to drive first in demanding situations such as a deadlock at a T-junction. Our approach does not require any explicit communication between the vehicles, the decisions are based only on the observable state of the cooperation vehicles, i.e. its position, velocity and acceleration. This is in line with findings from literature that state that human drivers rely on implicit communication when approaching such scenarios [19]. The algorithm, alongside a detailed validation, is presented in Sect. 4.

2 Related Work

Aspects of this work have been covered in literature before. We first present relevant publications for the behavior analysis as described in Sect. 3, and then on the behavior generation (Sect. 4). The first aspect of this work focuses on the influence of intersection complexity on the driving behavior. There are previous publications that use features describing the environment of a driving task to define complexity. [9] assume inner city scenarios as most complex and driving on a highway as least complex. The type of scenario can also be used to discriminate between complexity levels, [20] found a T-junction to be more complex than a symmetrical narrow passage. Further features that have been used before include the difference between signalized and unsignalized intersections [24], whether or not parked vehicles at the side of the road are present [8] and if a driver drove straight through an intersection or turned right or left [12]. Reference [45] uses satellite images and classifies intersections as complex if they have at least one street with multiple lanes, traffic islands, sliplanes or more than four roads leading into the intersection. Another possible feature is visual clutter [14]. All these features so far describe stationary surroundings. However, one can also consider the dynamic environment, i.e. the traffic, to describe the complexity of a situation. Reference [31] defines high complexity as situations that have high demands on both information processing and vehicle control and low complexity if there is low demand for either category. A medium complexity is assigned to scenarios that require high demand in one category and low demand in the other. Reference [21] uses the same definition but omits the medium class. Traffic density [28, 39, 44] can be considered for complexity as well as the occurrence of lane changes [39] or driving after a congestion compared to regular driving [23]. Further aspects of traffic and the environment of an intersection have also been studied, [44] included the number of vehicles from the left and whether or not a zebra crossing was present in their work. Reference [30] defines complexity by the grade of urbanization, the presence of oncoming traffic, leading traffic and the street geometry (straight road, tight corner, soft corner). Reference [4] considers a straight road as less complex than an intersection at which a stop is required or an overtaking maneuver. Reference [15] defines complexity by the number of advertisement signs, buildings, oncoming vehicles and further infrastructure while driving on a highway.

The second aspect of this work deals with decision making in the context of autonomous driving and has also been the focus of many authors. A common method for decision making at intersections and other traffic scenarios are partially observable Markov decision processes (POMDP): [26] uses a POMDP for decision making at intersections and roundabouts. Reference [18] uses a POMDP for real-time decision making where other vehicles are treated as hidden variables to adapt the driving behavior to the most likely behavior of the other drivers. Reference [38] applies a POMDP for decision making at an intersection while turning left. The autors define several critical turning points from which a turn can be executed and select the most efficient one. Additionally, one can also consider limited visibility caused by both static and dynamic objects. A possible solution for that problem is to add virtual vehicles at the edge of the obscured space [25]. Reference [2] uses POMDPs for decision making at intersections and pedestrian crossings with limited visibility. Besides POMDP, further methods for decision making have been employed as well. Reference [37] uses a mixed observability Markov decision process to predict the intention of cooperation partners and base the decision on that. Reference [29] presents a framework that combines prediction, threat detection and decision making. Using a Bayesian network the threat levels of other vehicles are classified and the decision is based on that. A decision can also be made by evaluating possible behavior policies and selecting the optimal one [5, 11]. Reference [6] selects the trajectory of an autonomous vehicle from a list of reference trajectories from human drivers during interaction with an additional vehicle. Finally, one can use a game theoretic approach by considering a game between the ego vehicle and the first oncoming vehicle [36].

All these works have in common that they do not rely on explicit communication between vehicles. Instead they rely on the vehicles’ states that are observable by onboard sensors. Alternatively, decision making at intersections can also be designed to use explicit communication between the vehicles themselves or between the vehicles and a centralized coordination mechanism. Reference [27] presents an algorithm for coordination of autonomous vehicles at an intersection using model predictive control. This decentralized approach requires all vehicles to use the same algorithm and to share their current state. Reference [34] presents a centralized coordination algorithm for autonomous vehicles at unsignalized intersections. The vehicles are assigned arrival times and the problem is formulated as an absolute value problem. Reference [10] determines the driving order by centralized coordination using a mixed-integer linear problem. All vehicles transmit their state and receive their allotted time to pass the intersection. They regulate their velocity accordingly. Versions for mixed traffic and traffic lights are also suggested.

Certain aspects of inner city traffic have been modeled as DES before by using Petri-nets (PN). Reference [41] models an intersection with traffic lights using PNs for the traffic light control and to model the traffic flow. A PN can also be used to model the traffic light control mechanism at several connected intersections as well, using the largest intersection as the master control [16]. PN based traffic lights control can also be used to give arriving emergency vehicles green light at intersections [17]. Reference [7] models a city environment consisting of intersections with traffic lights and connecting streets using deterministic time-based PNs. Reference [33] controls intersections with traffic lights using deterministic and stochastic PNs. The model is adapted in case of incidents that would otherwise cause neighboring intersections to be blocked.

In this work we do not rely on explicit communication with the cooperation vehicles. Instead, the decision making is based only on the observable state of the other vehicles. We consider this to be more realistic, especially in the short term, as we cannot expect every vehicle to be equipped with such communication interfaces anytime soon. We further rely on DES as decisions by the system are easily explainable and they are made using only basic operations.

3 Intersection Complexity for Behavior Prediction

In order to autonomously drive through unsignalized inner city intersections, it is helpful to understand why human drivers drive the way they do. This is important for two reasons: Autonomous vehicles will have to interact with human drivers for the foreseeable future. An understanding of human driving behavior might make these interactions more safe and efficient. It might enable autonomous vehicles to predict the driving behavior of their interaction partners more reliably. One can secondly make such systems behave similar to human drivers, this could improve their acceptance. The evaluation of this section is based on a field study that was conducted in the inner city of Karlsruhe in Germany [42]. In that study 34 participants drove through a predefined course during which they encountered several unsignalized intersections. At one of the intersections they were confronted with instructed drivers who created a deadlock situation. In this work we are investigating the interaction with regular traffic, therefore the runs through this intersection are not part of this work. The data set includes in total 1818 runs through 13 unsignalized T-intersections and 565 runs through 4 unsignalized X-intersections. Four of the remaining T-intersections were specifically selected. This way we were able to include intersections with high and low traffic density and intersections with buildings close to and far from the street. The remaining intersections are included in the data set as they lie along the drive path between the selected intersections. The test vehicle was equipped with a 16 channel lidar, an inertial measurement unit (IMU) and two global navigation satellite system (GNSS) receivers. The data was recorded using the robotic operating system (ROS) [35] and the driven trajectory as well as the transformation of the point clouds to a global reference were generated using a simultaneous localization and mapping (SLAM) approach [13]. We then generated our data set by extracting the runs through the intersections which are included in the analysis. For that only those parts of the trajectory are included in a run that lie within a 35 m radius around the intersection center. Within the point clouds vehicles and pedestrians are detected and their trajectories are tracked. We have presented the work described in this section before in more detail [42].

3.1 Intersection and Behavior Features

From the recorded and preprocessed data we then extract several features to describe both the intersection itself and its surroundings. As we additionally need a way to describe the driving behavior of the participants, behavior features are calculated from the driven trajectories as well. The intersection features include features describing properties of the driven path, the intersection itself and features about the traffic at the intersection the participant had to interact with. The set of all features can be seen as the complexity of an intersection.

The driven path is described by the entry position and the turning direction. For the entry position \(p_{\textrm{e}}\) the T-intersection is rotated such that it resembles the letter “T”. The entry position can then either take the value left, bottom or right. The entry position is not considered in case of the X-intersections because of their symmetry. The turning direction \(p_{\textrm{t}}\) takes one of the values left, straight or right. At T-intersections not all turning directions are possible depending on the entry position.

Further, we define features that describe the traffic at the intersection the participants had to interact with. For that we use the number of pedestrians \(n_{\textrm{p}}\) and the number of vehicles \(n_{\textrm{v}}\) as features. Both pedestrians and vehicles are counted if they are detected in the point clouds during the approach to the intersection. Please refer to [42] for further details on the detection and tracking. The visible vehicles are divided into further features: The number of interaction vehicles \(n_{\textrm{vi}}\) are those vehicles that are within 10 m from the intersection center at the same time as the test vehicle. In order to be counted their observed track has to pass the intersection center. The interaction vehicles are further analyzed if they have the right of way over the test vehicle or if they have to give way; the number of vehicles that fulfill these conditions are counted in \(n_{\textrm{rw}}\) and \(n_{\textrm{gw}}\), respectively.

The final class of intersection features is designed to describe the static environment at the intersection. Among them is the number of trees \(n_{\textrm{t}}\) that are near the intersection and the road a participant uses to enter the intersection. To judge the occlusion of an intersection during the approach we include visibility distances. These are the distances at which reference points in the streets to the left and right of the street the vehicle enters the intersection from are visible for the first time. The reference points are placed on the center line of the streets at a distance of

$$\begin{aligned} d_{\textrm{ref}} = v_{\textrm{max}} \, t_{\textrm{r}} + \frac{v_{\textrm{max}}^2}{2|a_\textrm{b}|} \end{aligned}$$
(1)

from the intersection center. This is the distance that is needed to stop when driving at the speed limit of \(v_{\textrm{max}} = {30\,\mathrm{\text {k}\text {m}\text {h}^{-1}}}\). With a reaction time of \(t_{\textrm{r}} = {1\,\mathrm{\text {s}}}\) and a braking deceleration of \(a_{\textrm{b}} = 6~\text {ms}^2\), the distance of the reference points is \(d_{\textrm{ref}} = {14.12\,\mathrm{\text {m}}}\). We use two variants to calculate the visibility distance, an approach based on the point clouds and one based on object polygons. For the point clouds variant we merge the current and the two point clouds before and after to the merged point cloud \(\vec{P}(d)\). This represents the merged point cloud at distance d from the intersection center. For that the current trajectory point is projected onto the center line of the current lane, the distance is then measured along the lane center. Within \(\vec{P}(d)\) cylinders \(C_{\textrm{s,}i}\) with a radius of 0.6 m are placed between the current location and the reference points i. If there is at least one point of \(\vec{P}(d)\) within \(C_{\textrm{s,}i}\), reference point i is considered not visible at distance d. The visibility distance \(d_{\textrm{v,c,}i}\) to each reference point is then the distance at which the reference point is visible for the first time. Alternatively, we use polygons of the buildings and tree trunks along the intersection to determine the visibility distance. For that we draw a sight line between the current location and the reference points. If this line does not intersect with any polygon, the reference point is visible. Again, the first distance d for which this is true determines the visibility distance \(d_{\textrm{v,p,}i}\) of a reference point. The visibility distance of an intersection is the minimum visibility distance of all its reference points: \(d_{\textrm{v},\cdot } = \min _{i} \left( d_{\textrm{v},\cdot ,i}\right) \). To include the actual and perceived narrowness of the road leading into the intersection, we define three widths that are calculated along the normal of each point of the trajectory. The street width \(w_{\textrm{s}}(d)\) is the distance from the intersection points of the normal at distance d with the street curbs and is calculated based on the map of the intersection. For the visible range the point clouds are analyzed. It describes how far a driver can see to the left and right and is supposed to model the perceived narrowness of the street. For each trajectory position the lidar data is evaluated along the normal at sensor height. The first point within \({\pm 5\,\mathrm{ ^{\circ }}}\) in vertical direction and \({\pm 10\,\mathrm{ ^{\circ }}}\) in horizontal direction determines the visual range. For the visual range \(w_{\textrm{v}}(d)\) this is performed both to the left and right of the trajectory. The available width \(w_{\textrm{a}}(d)\) is a combination of the previous two widths and describes the space on the street that is available to drive on. At each trajectory point the smaller one of the street width \(w_{\textrm{s}}(d)\) and visual range \(w_{\textrm{v}}(d)\) determines the available width. For this the calculation of the available width is adapted such that it includes all points within \({\pm 15\,\mathrm{ ^{\circ }}}\) in vertical direction. All three widths are averaged over the approach interval from 25 m to 7 m before the intersection center. A more detailed introduction into the features discussed here can be found in [42].

To describe the driving behavior at the intersections, we define three features based on the driven trajectory: the commit distance, the velocity drop and the minimum velocity. The commit distance is the distance from the intersection center at which, given the current velocity, stopping before the intersection center is no longer possible:

$$\begin{aligned} d_{\textrm{c}}= \max _{d} \left( d < v(d) \, t_{\textrm{r}} + \frac{v(d)^2}{2|a_\textrm{b}|}\right) \,. \end{aligned}$$
(2)

The commit distance can be interpreted as a measure for the distance at which the final decision to drive is made. The further from the intersection, the more offensive the driving behavior. The minimum velocity is the minimum velocity that the driver assumed during the approach interval of \(d_{\textrm{s}} = {25\,\mathrm{\text {m}}}\) to \(d_{\textrm{e}} =0 \,{\text {m}}\) distance to the intersection center:

$$\begin{aligned} v_{\textrm{min}}= \min (v(d)), \qquad d_{\textrm{s}} > d > d_{\textrm{e}}. \end{aligned}$$
(3)

The final behavior feature is the velocity drop. It describes the ratio between the minimal velocity during the approach \(v_{\textrm{min}}\) to the mean initial approach velocity \(v_{\textrm{a}}\) in the interval from 25 m to 20 m:

$$\begin{aligned} v_{\textrm{d}}= \frac{v_{\textrm{min}}}{v_{\textrm{a}}}\,. \end{aligned}$$
(4)

3.2 Prediction of Driving Behavior

Using the intersection and behavior features from above we can now predict the driving behavior. For that we train several Random Forest (RF) [3] regression models. RFs are employed because of their ease of use and because they can model non-linear dependencies [22]. Several other regression methods could be used here as well. We use the intersection features, or a subset of them, as predictors and predict the behavior features. For each combination of the three behavior features and the two intersection types (X- and T-intersections) 10 models are trained. For each of the 10 models 70 % of the runs are used as the training set, the remaining 30 % are used as the test set. In Table 1 the average and standard deviation of the 10 models are given for all variants. The performance of the RF regression models is evaluated using the root mean squared error (RMSE):

$$\begin{aligned} \textrm{RMSE} = \sqrt{\frac{1}{N} \sum _{k=1}^{N}\left( \hat{y}_{k}-y_{k} \right) ^2}\,. \end{aligned}$$
(5)

N is the number of runs in the test set, \(y_{k}\) is the behavior feature of the k-th run of the test set and \(\hat{y}_{k}\) is the value of the behavior feature estimated by the regression model for the same run. A first analysis was performed using the entire feature set as introduced in Sect. 3.1. For the T-intersection models all 13 features were used. In the case of the X-intersections the entry position \(p_{\textrm{e}}\) was omitted as a feature. The results of that analysis are given in the first row of Table 1. The last row of this table contains the reference value, that is the results of a naive regression model that outputs the mean of the training set. The prediction error of the driving behavior for all three behavior features is well below the reference value with a low standard deviation for both the T-intersections and the X-intersections. The performance of this regression model is especially noteworthy given the fact that driving behavior might also be influenced by a driver’s personality or mood.

Additionally, we investigate whether a dimensionality reduction of the feature set is feasible. For that we first select a subset of the most relevant complexity features. This selection is a compromise between the feature importance of all investigated model variants. The remaining features are the entry position \(p_{\textrm{e}}\) (only for the T-intersections), the turning direction \(p_{\textrm{t}}\), both visibility distance variants \(d_{\textrm{v,c}}\) and \(d_{\textrm{v,p}}\), the street width \(w_{\textrm{s}}\) and the available width \(w_{\textrm{a}}\), the number of trees \(n_{\textrm{t}}\) and the number of visible vehicles \(n_{\textrm{v}}\). This means that there is only one feature describing the traffic. This might, at least in part, be explained by the fact that most runs did not include any cooperation partners as this study was conducted in regular traffic. The performance of the RF regression models with that feature set are given in the second row of Table 1. The regression is less accurate than with the full feature set, but the performance is very similar, indicating that these reduced complexity feature sets are sufficient to predict the driving behavior at intersections.

As the entry position \(p_{\textrm{e}}\) and turning direction \(p_{\textrm{t}}\) are relevant factors to the driving behavior [42], we also train models with only these two complexity features. In case of the X-intersections we only use the turning direction \(p_{\textrm{t}}\). The performance of these RF regression models is given in the third row of Table 1. The results show that prediction is still possible, the performance, however, decreases substantially compared to the full and reduced feature sets. This is especially true for the X-intersection. A possible explanation for the reduced performance might be that both features can only assume three distinct values each. Thus there are only six distinct value combinations possible in the case of the T-intersections and only three combinations for the X-intersections. This limits the number of possible regression values to the same numbers, thus causing a less accurate regression.

Table 1 Mean RMSE regression results for T-intersections and X-intersections using different feature sets and all behavior features: commit distance \(d_{\textrm{c}}\), minimum velocity \(v_{\textrm{min}}\) and velocity drop \(v_{\textrm{d}}\). The standard deviation is in brackets

4 Behavior Generation

The second aspect of this work focuses on an approach to decide on the behavior of an A-V at a T-intersection, i.e. whether it drives first or waits for its cooperation vehicles (C-V) to pass the intersection before it. Both this high-level decision and the resulting longitudinal acceleration of the A-V is covered by our proposed decision making algorithm. There are several challenges associated with this problem: As the driving paths of the A-V and its C-Vs intersect, there oftentimes is no solution that guarantees safety from collisions in any possible scenario. This would only be possible if the A-V always waits for all other vehicles to drive first. This, however, is not a feasible option. It would firstly lead to a deadlock if there is another A-V with the same strategy. This behavior could secondly be more confusing than helpful when interacting with human drivers, especially given that human drivers prefer others to drive first in complex scenarios such as deadlocks at T-intersections [20]. In order to avoid these problems, a certain degree of risk has to be accepted. Also, another challenge is the number of possible interactions between the vehicles that are involved in the situation. If all pairwise interactions are explicitly modeled the model is dependent on the number of cooperation partners. Also, explicitly modeling all interactions would be challenging.

4.1 Basic Setup

The algorithm is modeled as a discrete event system (DES) and does not assume any communication between the vehicles. The only available information is the observable state of the C-Vs, i.e. their position, speed and acceleration and the map of the intersection. As soon as a C-V is closer than 10 m from the start of the intersection we assume that the turning direction is known, e.g. by observing the indicators or from the driven trajectory. There exist previous works from literature that support this assumption [32, 46]. In this work the vehicles follow the center line of their lane, so only the longitudinal acceleration has to be controlled. The map is a generic T-intersection with a 90 \( ^{\circ }\) angle between the bottom street and the street going straight, see Fig. 1 for a schematic. Additionally, we consider occlusions at the intersection. For that we define two points that specify the corners of obstacles between the streets that block the direct line of sight. These points are placed on the bisecting lines between the streets and the distance from the curb is used to parameterize the visibility conditions.

Fig. 1
A schematic diagram depicts a 3-point turn on a road, with a total of 8 cars, indicating the sequence of movements during the turn. C 1 and C 2 are edges. Car A is behind Car L in the same direction. Car P from the right is along the direction of the intersection center. Cars Y 1 and Y 2 from the left are in the same direction and B is different.

Schematic representation of a scenario at a T-intersection. The visibility is determined by the visibility edges C1 and C2. These are placed on the bisecting lines between the streets originating from the intersection center IC. With that the visible street area can be calculated. In this case vehicles B, Y1 and L are visible, vehicles P and Y2 are not visible. The A-V enters the intersection from the bottom direction and turns left. It has to yield to vehicles from the right and has priority over vehicles from the left. Therefore, vehicle P is the P-V (as soon as it becomes visible). As both Y1 and Y2 are turning right, there is no Y-V. If Y2 were to drive straight it would be assigned the Y-V even before its preceding vehicle Y1 passes the intersection. Vehicle B is the B-V as it is driving on the road the A-V intends to enter and is potentially blocking this road if it is too close to the intersection. Vehicle L is driving directly in front of the A-V and is thus the L-V

To simplify the model and reduce the number of vehicles that have to be evaluated, we only consider those vehicles that are currently relevant to the A-V. Each of these vehicles is evaluated independently. With that strategy we avoid having to model the interaction between all possible pairs of vehicles as well. Each of the relevant C-Vs is assigned a virtual traffic light that is either red or green. The A-V only drives offensively if all traffic lights are green, a red light thus means that the A-V cannot drive due to that vehicle. The first relevant C-V is the vehicle that has priority (P-V) over the A-V, i.e. the vehicle closest to the intersection on the next street to the right. If the A-V will turn into the next street to the right itself, there is no P-V as the A-V does not have to yield to any vehicle in this case. Additionally, the vehicle that has to yield (Y-V) to the A-V has to be taken into consideration. The Y-V is the vehicle closest to the intersection that is approaching on the next street to the left. If its path does not intersect with the A-V’s path, the vehicle behind it is evaluated. To ensure a safe passage of the intersection, two more vehicles have to be considered. The blocking vehicle (B-V) is the closest vehicle that is leaving the intersection on the same road as the A-V will and the leading vehicle (L-V) is the vehicle driving directly in front of the A-V on its path. The B-V and the L-V can be the same vehicle. All these vehicles are relevant for the decision of the A-V as either their paths intersect with the A-V’s (this is the case for the P-V and the Y-V) or because they can hinder the A-V from leaving the intersection right away (in the case of the B-V or the L-V). We only consider the vehicles closest to the intersection as only those are directly relevant for the decision of the A-V. A vehicle behind e.g. the P-V is irrelevant as it cannot interact with the A-V as long as the P-V is before the intersection. The same is true for the L-V: The vehicle driving in front of the L-V does not directly affect the A-V. If one of the C-Vs passes the intersection the situation is re-evaluated, the labels are assigned anew and all considerations are based on the new assignments. In the case of limited visibility the A-V might currently not be able to see some of the vehicles, despite them existing. To cope with that possibility certain non-existence is only assumed if a reference point that is placed on the road center at a radius of 25 m from the intersection center is visible. In the case of the B-V the reference point is set to a distance of 15 m and the existence of the L-V is assumed to be known in any case. If the turning direction is not yet known, the worst case is assumed. Both the vehicle assignment and the visibility is showcased in Fig. 1.

4.2 Decision Making Algorithm

As the algorithm for decision making is modeled as a DES, the vehicle is described and controlled by its current state. The state only changes if an event occurs. For the definition of these events features that are based on the observable data are used. Based on the current state the behavior of the A-V, i.e. its acceleration, is determined.

4.2.1 Features

To indicate for which vehicle a feature is calculated, it is marked by a corresponding index: \(\left( \cdot \right) ^x,~x \in \left\{ \textrm{a}, \textrm{p}, \textrm{y}, \textrm{b}, \textrm{l}\right\} \). All distances are measured along the drive path of a vehicle. The distance to scenario \(d_\textrm{s}^{x}(t)\) is positive before, zero within and negative after the intersection. The begin of an intersection is defined as the point where lanes diverge and the end is the point where lanes merge. All features are calculated for the current time t. For better readability this dependence is omitted in the following.

At an intersection, the drive paths of vehicles oftentimes intersect. The area where the lanes of two vehicles overlap is referred to as the common collision zone. For the algorithm only the distances to the collision zones of the A-V with its C-Vs are needed. \(d^{x}_{\textrm{c},x_{\textrm{c}},\textrm{b}}\) and \(d^{x}_{\textrm{c},x_{\textrm{c}},\textrm{e}}\) are the distance of vehicle x to the beginning and the end of the collision zone of the A-V with the C-V \(x_{\textrm{c}}\). The distance of the A-V to the beginning of the collision zone with the P-V is then \(d^{\textrm{a}}_{\textrm{c},\textrm{p},\textrm{b}}\) and the distance of the P-V to the beginning of the same zone is \(d^{\textrm{p}}_{\textrm{c},\textrm{p},\textrm{b}}\). Based on the distance to collision zone the time to collision zone is calculated using the current velocity \(v^{x}\) of vehicle x:

$$\begin{aligned} t^{x}_{\mathrm {c,x_{\textrm{c}},\cdot }} = \frac{d^{x}_{\mathrm {c,x_{\textrm{c}},\cdot }}}{v^{x}}\,. \end{aligned}$$
(6)

Additionally, the distance required to brake to a complete stop assuming the velocity \(v_{\textrm{a}}^{x}\) and the acceleration \(a_{\textrm{a}}^{x}\) is used as a feature:

$$\begin{aligned} d_{\textrm{b}}^{x}\left( v_{\textrm{a}}^{x}, a_{\textrm{a}}^{x}\right) = {\left\{ \begin{array}{ll} -\frac{\left( v_{\textrm{a}}^{x}\right) ^{2}}{2 a_{\textrm{a}}^{x}} \,, & a_{\textrm{a}}^{x} < \, {0}\, \text {ms}^{-2} \\ 0\,, & a_{\textrm{a}}^{x} = \, {0} \, \text {ms}^{-2} \wedge v_{\textrm{a}}^{x} = \, {0} \, \text {ms}^{-2} \\ \infty \,, & \text {otherwise} \end{array}\right. }\,. \end{aligned}$$
(7)

The distance to the last stopping point \(d^{x}_{\textrm{l}}\) is the distance to the point a vehicle has to stop to not interfere with any other driving path through the intersection. The final feature is the free distance behind the B-V. This feature measures the distance between the end of the intersection and the rear of the B-V including the distance to break in an emergency (\(a_{\textrm{e}}= {-7.5}\, \text {ms}^{-2}\)) from the current velocity:

$$\begin{aligned} d^{\textrm{b}}_f = d^{\textrm{b}}_{\textrm{i}} - \frac{1}{2} \ l_{\textrm{v}}+ d_{\textrm{b}}^{\textrm{b}}\left( v^{\textrm{b}}, a_{\textrm{e}}\right) \,, \end{aligned}$$
(8)

where \(d^{\textrm{b}}_{\textrm{i}}\) is the current distance along the driven path from the end of the intersection and \(l_{\textrm{v}}= {4.4\,\mathrm{\text {m}}}\) is the length of the vehicle.

4.2.2 Events

In our model the behavior is supposed to differ depending on the distance of the A-V to the intersection. Thus, the approach to the intersection is split into six zones. The current zone is determined by the A-V’s distance to scenario \(d_\textrm{s}^{\textrm{a}}\). In the first zone (\(d_\textrm{s}^{\textrm{a}} > {40\,\mathrm{\text {m}}}\)) the A-V is not controlled by the decision making algorithm but drives freely. At the beginnings of the second (\({40\,\mathrm{\text {m}}} \ge d_\textrm{s}^{\textrm{a}} > {25\,\mathrm{\text {m}}}\)) and the third (\({25\,\mathrm{\text {m}}} \ge d_\textrm{s}^{\textrm{a}} > {10\,\mathrm{\text {m}}}\)) zone a single prediction of the P-V is performed and the behavior of the A-V is adapted accordingly. The A-V adapts its behavior to show its intention as early as possible. The prediction is only run twice to avoid changing the behavior too often. The fourth zone is the area just before the intersection (\({10\,\mathrm{\text {m}}} \ge d_\textrm{s}^{\textrm{a}} > {1\,\mathrm{\text {m}}}\)). In it the A-V constantly monitors the behavior of its C-Vs and adapts its own behavior if necessary. Zone 5 is the area within the intersection itself (\({1\,\mathrm{\text {m}}} \ge d_\textrm{s}^{\textrm{a}} \ge 0 \,{\text {m}}\)). In these last two zones the final decision on the behavior has to be made and then executed accordingly. The final zone 6 is the street past the intersection where the vehicle is no longer controlled by the decision making algorithm.

Table 2 Base events for the DES for decision making
Table 3 Events of the DES for decision making. Most events are a combination of base events

The model is based on events, most events are themselves a combination of so called base events. Their meaning and definition is shown in Table 2 and the events are presented in Table 3. Each of the four relevant C-Vs has a traffic light event assigned to it. The P-V is the only vehicle that has two variants of that event. In the prediction phase (zones 2 and 3) its light is green (event \(e_{\textrm{1,p,I}}\)) if the A-V is either certain that no P-V exists (base event \(e_{\textrm{b1}}\)) or if it does not expect a conflict with its P-V (the A-V is predicted to enter the intersection at least \(\Delta t_{\textrm{p}}= {2.5\,\mathrm{\text {s}}}\) and \(\Delta d_{\textrm{p}}= {10\,\mathrm{\text {m}}}\) earlier, \(e_{\textrm{b2}}\)). In zones 4 and 5 the light is additionally set to green (\(e_{\textrm{1,p,II}}\)) if the P-V is currently stopped close to the intersection (the velocity is below the stop threshold of \(v_{\textrm{s}}= {0.15\,\mathrm{\text {m}\text {s}^{-1}}}\), it does not accelerate and it is closer than the threshold \(d_{\textrm{n}}= {12\,\mathrm{\text {m}}}\) to the start of the intersection, \(e_{\textrm{b3}}\)) and the wait time \(t_{\textrm{w}}^{\textrm{p}}\) has exceeded its \(t_{\textrm{y}}= {2\,\mathrm{\text {s}}}\) limit (i.e. both vehicles stood for 2 s at the intersection and it is not due to a deadlock, \(e_{\textrm{b4}}\)). The parameters are either set to the authors considerations and are thus options to parameterize the model or are due to physical constraints.

The traffic light of the Y-V (\(e_{\textrm{1,y}}\)) is green if the Y-V is currently not within the common collision zone (\(\lnot e_{\textrm{b5}}\)) and if at least one of these events is true: The A-V is predicted to be able to pass the collision zone before the Y-V (base event \(e_{\textrm{b6}}\)); the Y-V is stationary close before the intersection (\(e_{\textrm{b9}}\)); the distance to the last stop point of the A-V is still large enough so that it is able to stop before it without exceeding the comfort deceleration of \(a_{\textrm{c}}= -2.5 \, \text {m} \, \text {s}^{-2}\) and assuming a velocity within the intersection of \(v_{\textrm{i}}= {6.5\,\mathrm{\text {m}\text {s}^{-1}}}\) if driving straight and \(v_{\textrm{i}}= {4.0\,\mathrm{\text {m}\text {s}^{-1}}}\) if turning (\(e_{\textrm{b7}}\)); the Y-V is slow (\(v_{\textrm{sl}}= {2\,\mathrm{\text {m}\text {s}^{-1}}}\)), it currently brakes such that it will come to a complete stop before the beginning of the collision zone and the A-V has enough space remaining for a hard stop (\(a_{\textrm{h}}= {-4.5}\, \text {m} \, \text {s}^{-2}\)) if it should become necessary (\(e_{\textrm{b8}}\)). The latter two base events allow the A-V to drive despite currently not being predicted to pass the intersection before the Y-V. With these conditions we avoid unnecessarily defensive behavior. Only if the A-V is very close to the intersection and still cannot drive first safely, it yields to the Y-V.

The B-V gives green light (\(e_{\textrm{1,b}}\)) if the A-V is certain that it does not exist (base event \(e_{\textrm{b10}}\)) or if there is enough space (i.e. the length of a vehicle \(l_{\textrm{v}}\) and the minimum distance for a following vehicle during standstill \(d_{\textrm{min}}= {1.5\,\mathrm{\text {m}}}\)) behind the B-V so that the A-V can pass the intersection without the risk of having to stop inside the intersection (\(e_{\textrm{b11}}\)). The L-V has a green traffic light assigned to it (\(e_{\textrm{1,l}}\)) in case it does not exist (\(e_{\textrm{b12}}\)) or after it has passed the intersection (\(e_{\textrm{b13}}\)).

Additionally, some further events are needed for the model. If the A-V enters a new zone in the current time step, event \(e_{2}\) is triggered. Event \(e_{3}\) is triggered if an emergency stop before the intersection is still possible. If the turning patterns of the A-V, the P-V, and the Y-V all intersect with each other, a deadlock is possible (\(e_{4}\)). A deadlock occurs (\(e_{5}\)) if both the P-V (\(e_{\textrm{b3}}\)) and the Y-V (\(e_{\textrm{b9}}\)) as well as the A-V (\(e_{\textrm{b16}}\)) are stopped before the intersection at the same time. If only the P-V and the Y-V are standing at the intersection, a deadlock of the C-Vs occurs (\(e_{6}\)).

4.2.3 DES Model

Each zone has states associated to it. The model can only be in a state that is associated with its current zone. In zones 1 and 6 there is only one state each (\(s_{10}\) and \(s_{60}\)), as the model does not influence the behavior in these states. The remaining states each have a state for offensive driving (states \(s_{21}\), \(s_{31}\), \(s_{41}\) and \(s_{51}\)) and defensive driving (\(s_{22}\), \(s_{32}\), \(s_{42}\) and \(s_{52}\)). Offensive states prepare the A-V for driving directly through the intersection or are the state in which the vehicle actually passes the intersection. The defensive states correspond with waiting before the intersection or describe the waiting state directly. State \(s_{53}\) describes offensive driving after waiting in state \(s_{52}\). The model switches between states if certain events occur. The model and all its states and events are shown in Fig. 2.

Fig. 2
A state transition diagram depicts a traffic light control system with six zones and eleven states for an automatic vehicle. The system transitions between states are based on events, with offensive and defensive states for driving and waiting.

DES of the A-V. If none of the events attributed to the current state occurs, the system remains in that state. These events have been omitted for better readability. The event \(e_{\textrm{g}}= e_{\textrm{1,p,II}}\wedge e_{\textrm{1,y}}\wedge e_{\textrm{1,b}}\wedge e_{\textrm{1,l}}\) describes the case that the traffic lights of all four relevant C-Vs are green in zones 4 and 5. Event \(e_{\textrm{dl}}= e_{4}\wedge e_{5}\wedge e_{\textrm{1,b}}\wedge e_{\textrm{1,l}}\) is true if a deadlock is possible, has occurred and both the L-V and the B-V do not obstruct the A-V from driving

During the approach the model always starts in state \(s_{10}\). It remains there until it leaves the first zone (event \(e_{2}\)). When this happens, the prediction of the P-V is evaluated for the first time. In the prediction phase only the P-V is considered as the A-V only has to yield to this vehicle. In case of green light (\(e_{\textrm{1,p,I}}\)) the A-V assumes its offensive state \(s_{21}\), otherwise it drives more defensively in state \(s_{22}\). When it eventually enters zone 3 the same evaluation is performed again. If the evaluation leads to a green light, it enters state \(s_{31}\) that is associated with offensive behavior, otherwise it enters state \(s_{32}\) and shows defensive behavior. When the A-V leaves zone 3 there is no prediction, it transitions from state \(s_{31}\) to \(s_{41}\) or from \(s_{32}\) to \(s_{42}\), thus keeping its offensive or defensive behavior, respectively. This can be done as the prediction is run constantly (i.e. in every time step) in zones 4 and 5.

In addition to the constant prediction, all four relevant vehicles are now considered for decision making, as the A-V is close to or within the collision zones with its C-Vs in these zones and dangerous situations can thus occur easily. If the A-V is in the defensive state \(s_{42}\) and all four lights are green (event \(e_{\textrm{g}}= e_{\textrm{1,p,II}}\wedge e_{\textrm{1,y}}\wedge e_{\textrm{1,b}}\wedge e_{\textrm{1,l}}\)) and if a deadlock cannot occur (\(\lnot e_{4}\)), it transitions to state \(s_{41}\). If it is in the offensive state \(s_{41}\) it switches to \(s_{42}\) if at least one of the four lights is no longer green (\(\lnot e_{\textrm{g}}\)) and if there is still enough space for an emergency stop by the A-V (\(e_{3}\)). This does not pose a large risk as the parameterization for the green lights is rather conservative. Additionally, this strategy avoids a potentially dangerous stop within the intersection. If the vehicle reaches the end of zone 4 and enters zone 5 (event \(e_{2}\)), it progresses from \(s_{41}\) to \(s_{51}\) or from \(s_{42}\) to \(s_{52}\), respectively. If the vehicle is in state \(s_{51}\) it remains in this offensive state unless at least one of the traffic lights is no longer green (\(\lnot e_{\textrm{g}}\)) and there is still enough space for an emergency stop (\(e_{3}\)). In this case it transitions to state \(s_{52}\). There is no transition from \(s_{52}\) to \(s_{51}\). Instead, the A-V can only leave the waiting state \(s_{52}\) to \(s_{53}\) if all traffic lights are green again (\(e_{\textrm{g}}\)) while no deadlock is possible (\(\lnot e_{4}\)) or if there is a deadlock that the A-V tries to solve (\(e_{\textrm{dl}}= e_{4}\wedge e_{5}\wedge e_{\textrm{1,b}}\wedge e_{\textrm{1,l}}\)). If a deadlock is detected by the A-V it always tries to drive first. An alternative strategy would be to drive after a certain waiting period. State \(s_{53}\) is an offensive state that is assumed after the A-V was defensive. From it, the A-V either progresses to \(s_{60}\) after it leaves the intersection (\(e_{2}\)) or it returns to the defensive state \(s_{52}\) if it can no longer drive safely. The latter is the case if an emergency stop is still possible (\(e_{3}\)) and either a deadlock is possible (\(e_{4}\)) but the cooperation vehicles are not stopped (\(\lnot e_{6}\)) or a deadlock is not possible (\(\lnot e_{4}\)) and not all lights are green (\(\lnot e_{\textrm{g}}\)). State \(s_{60}\) is the only state of zone 6. This state is not controlled by the algorithm as the interaction at the intersection is now over.

4.2.4 Acceleration

So far the DES only describes the current situation of the interaction. To actually control it, the behavior of the A-V has to be set depending on the current state of the DES. For that we set a target velocity for each state (see Table 4) and control the vehicle using the intelligent driver model (IDM) [40]:

$$\begin{aligned} a^{\textrm{a}} &= a_{\textrm{m}} \ \left( 1-\left( \frac{v^{\textrm{a}}}{v_{\textrm{t}}}\right) ^{4} - \left( \frac{d^{*}}{\Delta d} \right) ^{2} \right) \quad \text {with} ~ d^{*} = d_{\textrm{min}}+ v^{\textrm{a}} \ t_{\textrm{min}} + \frac{v^{\textrm{a}} \ \Delta v}{2 \ \sqrt{a_{\textrm{m}} \ a_{\textrm{b}}}}\,. \end{aligned}$$
(9)

With the maximum acceleration \(a_{\textrm{m}} = {2.5}\, \text {ms}^{-2}\), the braking deceleration \(a_{\textrm{b}} = a_{\textrm{c}}\), the target velocity \(v_{\textrm{t}}\) as specified in Table 4, the distance along the drive path to the L-V \(\Delta d\), the difference in velocity \(\Delta v = v^{\textrm{a}} - v^{\textrm{l}}\) and the minimum time between following vehicles \(t_{\textrm{min}} = {1.2\,\mathrm{\text {s}}}\). The acceleration \(a^{\textrm{a}}\) by the IDM is limited to a lower threshold of \(a_{\textrm{min}} = a_{\textrm{c}}\). If there is no L-V \(\Delta d\) is set to infinity and \(v^{\textrm{l}} = {0\,\mathrm{\text {m}\text {s}^{-1}}}\). In states \(s_{42}\) and \(s_{52}\) the A-V is supposed to stop \({1\,\mathrm{\text {m}}}\) before the last stopping point. If this is not possible, the A-V brakes harder (\(a_{\textrm{min}} = a_{\textrm{h}}\)) to still stop at that point. If this is also no longer possible, an emergency stop with \(a_{\textrm{min}} = a_{\textrm{e}}\) is initiated and the A-V will stop directly at the last stopping point. To ensure that the A-V stops at its stopping point, a virtual vehicle is placed such that its rear is \(d_{\textrm{min}}\) before the stop point. The virtual vehicle is not used if there is an L-V that is closer. \(v_{\textrm{t}}\) is set to the same value as in the offensive states \(s_{41}\) or \(s_{51}\). This approach ensures that the A-V proceeds to its stopping point if there is no L-V before the intersection and that the A-V is able to restart after waiting in a queue to proceed to its stop point.

Table 4 Target velocities \(v_{\textrm{t}}\) in ms\(^{-1}\) for the states of the DES. Entries marked with an asterix are set in conjunction with a virtual vehicle to enforce stopping before the intersection

4.3 Simulation Results

To test and validate our proposed decision making system we implemented a simulation framework. To properly test the algorithm, also the C-Vs have to be simulated. For that a simplified version of the proposed algorithm is used because we are only interested in testing the A-V’s algorithm. In it, the conditions for driving depend on fewer features and events and zones 4 and 5 of the original algorithm are merged. In this zone the decision to drive first is not revised, i.e. once the algorithm decides to drive, it continues to do so regardless of any future development of its surroundings. In case of a deadlock, the C-V waits for a random duration before it tries to resolve the situation. The C-Vs detect a deadlock before the A-V does. That way, it is also possible for the C-Vs to drive first despite the A-V driving as soon as it detects a deadlock. That way it is possible to test the behavior of the A-V’s algorithm if someone else tries to resolve a deadlock. Additionally, visibility is not taken into consideration for the C-Vs, all vehicles are visible by the simplified algorithm at all times. Finally, the algorithm of the C-Vs can have some special behavior to test certain aspects of the main algorithm: They can be set to drive first despite having to yield and alternatively they can be set to wait for an arbitrary duration if they are allowed to drive first. This behavior is only shown when the relevant cooperation partner from the C-V’s perspective is the A-V. With both variants we can test the A-V’s behavior towards unexpected behavior. Additionally, the target velocity inside and after the intersection can be reduced. With that one can further ensure that the A-V only drives once the intersection is cleared.

Within the simulation framework, the simulation for a single run is performed as follows: First, the map for the simulation is loaded and all vehicles are initiated. Then each time step is simulated: The currently visible vehicles are determined and only the current states of these vehicles are presented to the algorithm. Then the C-Vs are identified and the features are calculated. Afterwards, the currently active events are checked and the DES is updated. Finally, the acceleration is calculated. These steps are performed for the A-V and all C-Vs.

For the simulations we used the generic map as described above, the visibility distance was set to either \(d_{\textrm{v}} \in \left\{ {7\,\mathrm{\text {m}}}, {14\,\mathrm{\text {m}}}, {21\,\mathrm{\text {m}}}\right\} \) and there were either \(n_{\textrm{c}} \in \left\{ 1,2,3,4,5,6\right\} \) cooperation vehicles present in the simulation. Each of these combinations was simulated 200 times, resulting in 3600 simulations in total. In each simulation run the distances to the intersection of all vehicles and their initial velocities and turning patterns were set randomly within a certain feasible range. The special behavior and the waiting durations were set randomly as well.

None of the simulations resulted in a collision. One should note, however, that it is possible for two C-Vs to restart simultaneously after a deadlock. As the decision to drive is not revised, this would result in a collision. Such a run could safely be disregarded for evaluation as we are only interested in the performance of the A-V’s algorithm. For each run we also measured the time to drive through the intersection \(t_{\textrm{d}}\) (time while the A-V was within \({30\,\mathrm{\text {m}}} > d_{\textrm{s}}^{a} \ge 0 \,{\text {m}}\)). If we average over all runs with the same visibility distance, we get the following average durations and corresponding standard deviations: \(t_{\textrm{d}}\left( d_{\textrm{v}} = {7\,\mathrm{\text {m}}}\right) = {12.10\,\mathrm{\text {s}}}\) (\(\sigma = {6.03\,\mathrm{\text {s}}}\)), \(t_{\textrm{d}}\left( d_{\textrm{v}} = {14\,\mathrm{\text {m}}}\right) = {12.14\,\mathrm{\text {s}}}\) (\(\sigma = {6.18\,\mathrm{\text {s}}}\)) and \(t_{\textrm{d}}\left( d_{\textrm{v}} = {21\,\mathrm{\text {m}}}\right) = {12.16\,\mathrm{\text {s}}}\) (\(\sigma = {6.23\,\mathrm{\text {s}}}\)). As these values are very similar, we did not analyze the results separately for each visibility distance. In Table 5 the time to drive through the intersection is averaged over all runs that have the same number of P-Vs and Y-Vs. The results from that table have to be interpreted with caution as there are some aspects that are not considered, e.g. a leading vehicle that has to wait can increase the duration even though the A-V would not have had to stop. Also, there are only a few runs with more than three vehicles of a kind, the average is thus less reliable. Nonetheless, the results indicate that the algorithm results in reasonable decisions: The average time to pass the intersection increases with the number of cooperation vehicles. The increase is more pronounced for the P-Vs than for the Y-Vs. This is to be expected as one has to yield to the P-Vs instead of the interaction with Y-Vs where one should have to wait less often.

Table 5 Average time to clear the intersection by the number of P-Vs and Y-Vs for all visibility distances

5 Conclusion

The results from Sect. 3 show that the driving behavior of human drivers depends on the intersection. We can thus predict the driving behavior using features that describe the intersection itself, its surroundings and the traffic there. As these features can be considered as a description of an intersection’s complexity, one can conclude that the complexity of an intersection has an influence on the driving behavior. We further show that it is possible to predict the driving behavior using only a subset with the most relevant features. In future work we intend to directly ask human participants for a complexity rating of such situations. With that we hope to find a dependence between the perceived complexity and the resulting behavior.

In Sect. 4 we further present a decision making algorithm that is able to reliably drive through an unsignalized T-intersection while interacting with other drivers. We validate our proposed algorithm with a simulation and the results indicate a reliable performance. Future work on this topic will include variants of this algorithm for further scenarios such as X-intersections, roundabouts or narrow passages. We further intend to run the algorithm on real world maps.