1 Introduction

Road traffic rules should ensure efficient and, above all, safe traffic. In doing so, the participants in road traffic are taken into account with their capabilities. Automated vehicles have much greater potential than manually operated vehicles concerning the exchange and utilization of data so that more suitable solutions can replace static traffic rules on a situational basis. The cooperative behavior of automated cars at the maneuver level can contribute significantly to this. Joint maneuvers can, for example, increase efficiency and comfort in road traffic. For instance, explicitly coordinated cooperative behavior allows vehicles to keep shorter safety distances than human drivers or to give way in conflicting traffic situations, such as changing lanes, entering roundabouts, or at intersections. In summary, cooperative behavior at the maneuver level based on explicit communication enables the optimization of vehicle movements concerning shared objectives, whereas, without this cooperative behavior, vehicles act only based on their own goals.

Another possible use of cooperative maneuver execution addresses emergency situations. Unforeseen events may disrupt the planned movement of a vehicle and require a change in the preconditions for trajectory planning to achieve or maintain a safe state. This is often neither dangerous nor uncomfortable because other road users act considerately and do not force other participants to make last-minute changes in their motion planning, even if just out of self-interest. However, there are situations where prompt response is required to prevent or mitigate collisions. For example, the door of a car parked at the side of the road may suddenly be torn open and protrude into the planned path of the vehicle. Likewise, pedestrians or bicyclists may unexpectedly block the path of travel, for example, by suddenly changing the direction and speed of travel without correctly being predicted by the automated vehicle.

Fig. 1
An illustration of 2 vehicles on two lanes and a pedestrian stepping accidentally on the lane and the vehicle shifting its path from one lane to the other.

Exemplary depiction of an emergency situation. The depicted pedestrian steps unexpectedly and irregularly into the lane, forcing the approaching automated vehicle to adapt its plan. The illustration indicates a cooperative lane change in response to the event

Figure 1 shows an exemplary traffic situation in which an immediate reaction of the automated vehicle is required. There, an automated vehicle approaches a suddenly occurring pedestrian on the right of two parallel lanes leading in the same direction. Depending on the time and location of the obstacle’s occurrence and the speed of the approaching vehicle, a specific braking rate must be attained to avoid a collision with the obstacle without changing lanes. There may be constellations in which a lane change is more favorable in terms of an associated cost function than a pure braking maneuver, e.g., a high braking rate is required without a lane change, or a collision cannot be avoided without changing the lane due to the physically limited braking rate. To execute a lane change, a suitable gap is required in the adjacent lane so that the lane change does not create a risk of collision. If an appropriate gap is available, it can be used by the swerving vehicle to resolve the situation. However, if this is not the case, cooperative behavior of vehicles in the target lane would be desirable so that the vehicle can tackle the emergency as smoothly as possible. Because of the dynamic nature of the situation, achieving this goal requires a quick agreement among the vehicles involved. Thus, a joint maneuver could increase road safety in safety-critical cases by allowing a coordinated, targeted response without the uncertainty and delay of inexplicit human communication.

Another critical point of cooperative maneuver-level behavior is the decision-making of an automated vehicle. The decisions that an automated vehicle has to make in road traffic range from very simple to complex. Examples include starting to move when a traffic light has just switched to green, selecting a cruising speed, choosing a distance to the vehicle in front, when to change lanes, and selecting a suitable gap for a lane change or crossing an intersection. Complexities are added by the traffic dynamics, differing or even unknown goals of road users, and their interactions. Reinforcement learning, a subcategory of machine learning, is particularly suitable for problems where it is relatively easy to evaluate the outcome of a decision, but engineering an algorithm to solve a given task is very complex or too time-consuming.

Up to this point, three essential aspects of Cooperative Automated Driving have been outlined. The research conducted in the CoInCiDE project on these three aspects is expounded in this chapter. Sect. 2 presents research on a foundational universal cooperation methodology based on explicit vehicle-to-vehicle communication (V2V). In the following Sect. 3, research on the further development of the method with regard to emergency situations is presented. Sect. 4 contains the research results on reinforcement learning methods for cooperative maneuver-level decision-making. Last, this chapter is concluded in Sect. 5.

2 Framework of Explicitly Negotiated Maneuver Cooperation via V2V

While human drivers on the road must rely mainly on implicit communication and communication methods that can rarely be interpreted beyond doubt, automated vehicles can easily exchange data via explicit communication. This enables explicit agreements between vehicles regarding joint maneuvers to be executed. This section presents a method, the Space-Time Reservation Procedure (STRP), based on the work already published on this topic [14, 15, 24, 25].

2.1 Related Work

Several approaches for the coordination and cooperation of automated vehicles based on explicit data exchange have already been documented in the literature. And there are already several message types that support cooperative driving functions defined or under development. Some of the messages have already been standardized or are in the process of being standardized. The Cooperative Awareness Message (CAM) is already standardized and contains basic information such as the position and velocity of the sender [6]. The likewise standardized Decentralized Environmental Notification Message (DENM) can be used for exchanging data on particular danger spots in the road network [7]. It contains information about the type and position of the area to be described. Collective Perception Messages (CPM) can be used to share information about obstacles and other road users detected by the sensors of the originating system [8]. This message is already standardized, too.

In many cases, specific, frequently occurring traffic situations are considered. One example is the change between parallel lanes. An approach is to equalize the speeds of the vehicles involved in the lane change to enable the maneuver [22]. This method is adapted from a technique for cooperation at intersections [28]. Another method for cooperative lane changes on highways achieves safe lane change maneuvers based on a minimum safety spacing model (MSS), even in complex situations [34]. The method performs trajectory planning based on the distances at different points in time between the involved vehicles calculated by the MSS.

The Maneuver Coordination Message (MCM) that is currently under standardization [5] allows the exchange of trajectories. Based on this, an approach in which vehicles continuously publish their currently planned trajectory is presented in [19]. In addition to the currently planned trajectory, a trajectory can be broadcast that is marked as desired and conflicts with the plans of other road users. Other vehicles can adjust their planned trajectory so that it no longer conflicts with other road users’ desired trajectory. The desired trajectory can be executed once all trajectory conflicts are resolved. This method can be extended by a coordination protocol [35] which allows vehicles to form cooperative groups. A similar method that also relies on MCM and the continuous exchange of trajectories is presented in [21]. In this method, other trajectories in addition to the reference trajectory are sent that can be either more favorable for the sending vehicle or advantageous for other vehicles but to the disadvantage of the sending vehicle. Cooperation is achieved by evaluating the received trajectories and adjusting the reference trajectory.

A co-simulation framework for evaluating and testing cooperative driving functions is presented in [20]. The framework couples a vehicle dynamics simulation and a traffic flow simulation. It contains a machine learning module to generate and evaluate test scenarios. These three components together allow for extracting scenes from the traffic flow simulation, automatically testing them using the vehicle dynamics simulation, and evaluating the cooperative driving functions.

The space time reservation procedure (STRP) is another approach to achieve cooperative maneuver level behavior of automated vehicles [15]. The method is based on a structured negotiation about reservations of road space for agreeing on binding cooperative maneuvers. This approach has also been tested for more than two participating vehicles [24] and by test drives with two automated research vehicles [14]. Moreover, universality has been investigated to cover all traffic scenarios [25]. In the following, this method is presented in detail.

2.2 Definition of a Cooperative Maneuver

The foundation of the STRP is a set of rules for explicitly defining joint maneuvers. These rules avoid misunderstandings and allow the details of coordinated maneuvers to be described precisely. The reservation templates described in Sect. 2.3 are specifically adapted for different types of joint maneuvers. In this section, attributes that are used for all templates are explained. The method is based on reserving temporarily and spatially limited traffic space for the exclusive use of one automated vehicle. A data set represents the restriction of the traffic space reserved during a cooperative maneuver. First, this includes information for uniquely identifying the lane containing the reservation area. This is covered by two points \(P_0\) and \(P_1\) connected by the lane to be identified. Both points are described by their longitude, latitude, and elevation coordinates. The reservation area is longitudinally bounded by the length values \(s_0\) and \(s_1\). Both values refer to the point \(P_0\) and determine the exact start and end of the reservation lengthwise. In the lateral direction, the reservation area is predetermined by the lane width. Therefore, the reservation area is spatially unambiguously defined. A time interval \([t_0,t_1]\) specifies the time limit within which the reserving vehicle must start to enter the reservation area. Otherwise, the cooperative maneuver becomes invalid. The reservation templates for different situations extend this basic definition as needed for specific traffic situations.

With this set of rules for explicitly defining reservation areas for cooperative maneuvers, a schematic negotiation process between road users can take place. A vehicle can use a definition of a reservation area to request cooperative behavior from other road users via vehicle-to-vehicle communication. To do this, a request message containing the reservation definition is broadcast. All receiving vehicles can then evaluate the request based on the requested reservation and ignore, reject, or accept it depending on their own goals. The evaluation of the responses is done solely by the requesting vehicle. It can execute the intended maneuver if the cars required for the coordinated maneuver have agreed to collaborate. Due to physical limits and incompatible objectives, a vehicle may not send an acceptance message. In this case, the requesting vehicle can cancel the reservation using an abort message so that other participants do not avoid the reservation area unnecessarily. If a vehicle has agreed to a reservation, the agreement is binding. The vehicle must then avoid the area according to the reservation definition, provided that the reserving car starts to enter the reservation area within the time interval \([t_0,t_1]\).

2.3 Reservation Templates

In order to make the method universally applicable for standard driving maneuvers occurring on regular streets, three patterns for reservation are defined. These differ, e.g., in terms of additional data that is transmitted and the end of a cooperative maneuver. The first template covers a vehicle’s intention to change from a parallel lane to the lane containing the reservation area, e.g., a standard lane change. The second template covers cases in which vehicles leave the original lane, use another lane for a limited distance, and change back to the initial lane afterward. This can be used, for example, to drive around a traffic obstruction in the presence of oncoming traffic. The third template defines a reservation area located on the original lane of the vehicle. This template is suitable, for example, at intersection crossings for cooperation with cross-traffic. A more detailed presentation of the reservation templates can be found in [25].

2.3.1 Lane Change

To keep the length of the reservation area as short as possible and still allow the lane-changing vehicle a certain tolerance, an additional parameter v defines a speed at which the boundaries of the reservation area specified by \(s_0\) and \(s_1\) move along the direction of the road from time \(t_0\). Furthermore, joint maneuvers agreed upon based on this reservation template end with their activation. That means the cooperative maneuver ends as soon as the reserving vehicle begins to enter the reservation area in the interval \([t_0,t_1]\). After that, the vehicles involved continue their journey individually.

Fig. 2
2 illustrations. A, depicts a graph with a dark-shaded area, a hatched area around the dark-shaded area, and an unshaded area. The illustration below depicts the vehicle on a lane and a change of path and a vertical dotted line is depicted from the dark shaded area of the graph to the point of the change of path of the vehicle.

Reservation shape for lane change: The s-t-diagram shows three different areas for the target lane. The hatched area must not be used by the reserving vehicle, the dark green area must be used for starting to enter the target lane, and the white area can be used after; \(\tau \) is an exemplary path of the vehicle, adapted from [25]

The sequence of a cooperative maneuver with this reservation template is shown in Fig. 2. At the bottom of the figure, the two lanes are sketched, and the points \(P_0\) and \(P_1\), as well as the distances \(s_0\) and \(s_1\), are drawn in so that the reservation area marked in green is spatially clearly delimited. In the upper part of the figure, an s-t diagram is shown. In this, distances \(s_0\) and \(s_1\), as well as the time interval \([t_0,t_1]\), and an exemplary path on the target lane \(\tau \) are drawn. Furthermore, the chart shows three different areas. The hatched area indicates the longitudinal positions and times where the vehicle must not be on the target lane. This is the case before \(t_0\) and spatially before the lower limit of the reservation determined by \(s_0\). The area in which the vehicle must begin to enter the reservation area is shown in dark green. Within the time interval, the spatial boundaries move with velocity v so that this area forms a parallelogram in the chart. The white space in the diagram marks time and space intervals on the target lane that may be used after the vehicle has activated the cooperative maneuver by entering the reservation area within the dark green area.

2.3.2 Evasion With Oncoming Traffic

This reservation template allows the requesting vehicle to avoid an obstacle by using the lane of, e.g., the oncoming traffic. The vehicle must start entering the reservation area within the interval \([t_0,t_1]\). The defined reservation area is spatially static and must be left before reaching the upper longitudinal limit defined by \(s_1\). The maneuver ends as soon as the vehicle has left the reservation area; there is no predefined time end. Figure 3a is analogous to Fig. 2. The s-t diagram refers to the target lane. The dark green room indicates when and in which area to enter the reservation. The hatched areas must not be entered at all within the target lane. The fading green color indicates the unlimited temporal validity of the reservation. The maneuver ends when the vehicle leaves the area. A possible path \(\tau \) of the car on the target lane is drawn in black.

Fig. 3
4 illustrations. The first illustration depicts a s-t graph and the illustration below depicts reservation for evasion. The graph depicts a rectangular area with a dark shade and a light shade and an inclined line in it labeled tau with a hatched area surrounding it. The dotted line is brought down to the evasion path of the vehicle in the illustration below. B, also depicts an s t graph but a small rectangular portion and a larger hatched portion and a smaller hatched portion with an illustration below, and others.

Reservation shapes for evasion and lane keeping, adapted from [25]

2.3.3 Lane Keeping

This reservation pattern is suitable, e.g., for a crossing passage. In this case, the reservation area is also spatially static and unrestricted in time. The start of the entry must lie in the interval \([t_0,t_1]\). Figure 3b shows this area in dark green in the s-t-diagram. After that, the reservation is valid for an unlimited time until the reservation area has been left.

2.4 Simulations and Driving Experiments

Several experiments were conducted in simulation and using two automated research vehicles to analyze the method in more detail. Eclipse ADORe [13] is used to run the research vehicles and the simulation. For more information, please refer to this source. The research vehicles are equipped with hardware for vehicle-to-vehicle communication, special sensors, and other devices to operate the automation. An accurate map of an actual urban intersection in Braunschweig, Germany, is used for the experiments. The map was shifted accordingly to perform the driving experiments on a test site.

Fig. 4
A curved four-lane road with vehicles and a rectangular reservation area with times t D, t C, t B, and t A depicted. Curved lanes are also represented at the top and at the bottom.

Simulation of a cooperative lane change: The reservation area depicted in green is requested by the lane changing vehicle (blue), adapted from [25]

2.4.1 Simulation: Lane Change

In the simulation, two automated vehicles start about 200 m distant from a merging area. Coordination is required to drive through the area as efficiently as possible. The test drive results are shown in Fig. 4. Two vehicles are plotted at four consecutive time instants with \(t_A<t_B<t_C<t_D\). At the earliest time, \(t_A\), both vehicles approach the merging area in parallel lanes without coordination. Just before time \(t_B\), the reservation area marked in green is requested by the lane changing vehicle, depicted in blue. The lane following vehicle shown in red has evaluated this and agreed to the request. At the time \(t_C\), the reservation area is just activated by the lane changing vehicle entering. Thus, the cooperative maneuver is finished, and both vehicles continue independently on the now single-lane road. As a result, the method is shown to coordinate the situation appropriately. The lane keeping vehicle brakes slightly, and the lane changing vehicle drives through the area without braking.

Fig. 5
A curved four-lane road with two vehicles, a left turn and a right turn on the road and the left turning vehicle is depicted in a dark shade, a curved reservation area for the left turning vehicle is highlighted.

Driving experiment with three vehicles at an intersection, adapted from [25]

2.4.2 Driving Test: Three Vehicles at an Intersection

Since only two automated vehicles were available for the driving experiments, one of the three vehicles was simulated. Figure 5 shows the situation during the cooperative maneuver. The left-turning vehicle, shown in red, and the straight-out vehicle, shown in blue, are the two physical vehicles. The third car (green) is simulated. While approaching the intersection, the left-turning vehicle had requested the shown reservation area, and the other two conflicting cars had agreed to the maneuver. As a result, the left-turning vehicle can pass the intersection unimpeded. In contrast, the other two vehicles reduce speed to the required extent until the left-turning car has cleared the respective lane. Although the usefulness of this experiment in terms of traffic efficiency is not apparent at first glance, there are situations in which such a cooperative maneuver is beneficial. For example, such cooperation can allow the automated vehicle to turn safely in heavy traffic, possibly including mixed traffic. Furthermore, it can enable fast and reliable priority for emergency vehicles.

2.5 Conclusion

Cooperation at the maneuver level between road users can contribute to the efficient use of road space. The presented approach uses vehicle-to-vehicle communication and a method designed for explicit negotiation and agreement of cooperative maneuvers. Various reservation templates establish the universal applicability of the technique. These templates are not limited to the traffic situations discussed in this section but may also be used for other conflicts between road users. The driving experiments and simulations conducted to research and improve the method show that it is suitable to ensure coordination in the studied situations. Furthermore, by design, the technique is inherently safe against message loss and suitable for mixed traffic scenarios. Its decentralized architecture allows flexible use at any place. The reader is referred to the publications [14, 15, 24, 25] for a deeper look at this method and more results of many simulations and driving experiments in various traffic situations.

3 Cooperation in Emergency Situations

This section discusses research on adapting the cooperation method presented in Sect. 2 to emergency situations. The effectiveness of the method to coordinate maneuvers of automated vehicles in emergency situations is evaluated by both simulations and driving tests.

3.1 Related Work

The related work regarding vehicle-to-vehicle communication-based cooperation of automated vehicles given in Sect. 2.1 is relevant here, too. In addition, a few publications concerning emergency maneuvers shall be presented here.

The authors of [16] propose a method for guaranteeing safety based on verifying the planned trajectory while the vehicle is in motion. The core of the approach is a two-step evasive strategy based on a discrete decision for an evasive maneuver and the computation of an appropriate low-level control to follow this maneuver. The method was validated in simulation.

An approach for lateral control in evasive maneuvers is proposed in [4]. The method, based on a sliding mode control, calculates a steering angle taking into account, among other factors, the tire slip saturation. Simulations show that lane changes are possible within 1.1 s at speeds of up to 130 km/h under certain circumstances. Another proposal involves taking into account the dynamics of the steering system during evasive maneuvers [27]. The model predictive control in this publication contains two models. Besides the vehicle model, also a steering model is included.

A parameterization of a geometric path for an evasive maneuver based on reinforcement learning is proposed in [9]. The path consisting of straight lines and clothoids is then executed by means of a model predictive control loop. Simulations of a common emergency situation show that the method significantly outperforms human drivers.

3.2 Approach

The basic framework of the cooperation and negotiation method has already been stated in Sect. 2. This approach is adapted to the particular requirements in emergencies. Negotiating the cooperative maneuver in the shortest possible time without avoidable delay is of the utmost importance in emergency situations. This is because these situations are highly dynamic, and any delay reduces the ability to respond to the situation. For example, evasive maneuvers may become impossible because of the intermediate progress of the surrounding traffic. Therefore, negotiation is started immediately after a cause for an emergency response is detected. Due to the safety-critical nature of emergencies, cooperative maneuver requests are of higher priority than other requests. The receiving vehicles can consider that during the evaluation of the request.

3.3 Simulations and Driving Experiments

To verify and investigate the method, simulations and driving experiments are conducted. The basis in each instance is the traffic situation shown in Fig. 1. The parameters, such as speeds and distances, vary in the different runs. A simulation run and a driving experiment are presented below with their evaluation and results.

3.3.1 Simulation

At the time \(t=0\), the obstacle occurs on the lane of the lane changing vehicle (lc-vehicle). At this point in time, the vehicle approaches the obstacle at a speed of 27.73 m/s, and the lane following vehicle (lf-vehicle) on the adjacent lane to the left is driving at a speed of 27.16 m/s. The distances measured along the road from the front bumpers of the vehicles to the position of the obstacle are 43.16 m for the lf-vehicle and 38.96 m for the lc-vehicle. In this scenario, the obstacle does not block the entire width of the right lane. The blocking is limited to the right side of the lane so that only the outer 50% of the width is blocked at the longitudinal position \(d=0\). The physically maximum possible braking rate of the lc-vehicle is assumed to be 9.81 m/s2. Even with hypothetical constant deceleration at this rate, a collision would occur between the obstacle and the lc-vehicle since the braking distance exceeds the distance to the block. Therefore, the lc-vehicle immediately starts negotiating a cooperative maneuver and requests a reservation area just before the obstacle. The lf-vehicle in the target lane accepts the request and brakes to allow the requesting vehicle to change lanes.

Fig. 6
A multiple-line graph of distance along the road to the obstacle in meters versus time in seconds for If vehicle and I c vehicle. The If vehicle depicts a decrease from (0, 50) to (3, 25) and the I c vehicle depicts a decrease from (), 47) to (3, negative 47). All points are approximated.

Simulation: Distances between the front and rear bumpers of the lf-vehicle (red) and the lc-vehicle (blue) and the obstacle located at \(d=0\), with \(t=0\) being the point in time of the obstacle appearance; the distances are measured along the lane. The hatching patterns indicate the lateral area used by the vehicle: Horizontal single hatching indicates the use of the right half of the right lane, and inclined single hatching indicates the use of the left lane

Figure 6 shows the positions of the two vehicles in the distance-time diagram. Time \(t=0\) corresponds to the point in time of the obstacle occurrence. The obstacle is longitudinally located at \(d=0\). Two curves that are connected by a hatching are plotted for each vehicle. The two curves correspond to the longitudinal positions of the front and rear bumpers of both vehicles. The hatching patterns indicate which lateral zone the vehicles use at the respective time. The inclined single hatching corresponds to the left lane, which is unaffected by the obstacle at \(d=0\). The horizontal parallel hatching indicates the usage of the lateral zone blocked from \(d=0\), i.e., the right 50% of the right lane. Directly after the occurrence of the obstacle, the lc-vehicle requests a reservation which is evaluated and accepted by the lf-vehicle. A gray box in the diagram depicts the reservation area. The lf-vehicle brakes sharply to respect the reserved area for the emergency lane change. Within the interval in time and longitudinal position, the lc-vehicle leaves the blocked part of the right lane and changes towards the left lane. The cross-hatching in the diagram indicates the short period in which the lc-vehicle uses both the left and the blocked part of the right lane. Figure 7 shows the development of the velocities of both vehicles during the scenario. While the lf-vehicle brakes and reduces its speed by approx. 7 m/s to assist the emergency evasion of the lc-vehicle, the latter reduces its velocity marginally.

Fig. 7
A multiple-line graph of velocity versus time for I c vehicle and If vehicle. The I c vehicle depicts a decrease from (0, 27.8) to (3, 27.9) with a dip in between. The If vehicle depicts a decrease from (0, 27) to (1.5, 20) and a small rise to (3, 21). All points are approximated.

Simulation: Velocities of lc-vehicle (blue) and lf-vehicle (red) during the simulation. \(t=0\) is the point in time when the obstacle appears

Fig. 8
A photograph of two vehicles on a lane, one of them changing the lane with hills in the background.

Automated research vehicles VIEWCar II (left) and FASCar E during the demonstration of a cooperative emergency lane change at the IEEE Intelligent Vehicles Symposium 2022

3.3.2 Driving Experiment

In addition to the simulations, physical tests were performed with automated research vehicles. These tests of the method were demonstrated at the IEEE Intelligent Vehicles Symposium 2022 in Aachen, Germany. Figure 8 shows the two automated research vehicles on the site during the driving demonstration. The results of the tests are presented in the following.

The two research vehicles, FASCar E and VIEWCar II, were used for the driving experiments and demonstrations. These vehicles are provided with the necessary hardware for communication via ITS-G5. The software framework for vehicle automation ADORe [13], further developed in the CoInCiDE project, is used in both cars for these tests. Currently, the maximum deceleration set by the automated research vehicles is limited to 3 m/s2. This limitation is due to the vehicle interface and cannot be influenced by the automation software. To account for that limitation, the driving test distances are larger than those used in the simulation. In this way, meaningful results can be obtained despite the restriction.

The initial situation of the scenario is again two automated vehicles traveling in the same direction on adjacent lanes. The point in time of the virtual obstacle occurrence is defined as \(t=0\), and its longitudinal position is \(d=0\). In this scenario, the entire width of the right lane is blocked by the obstacle, so the vehicle must have left it entirely before passing this location. The lc-vehicle in the right, blocked lane approaches the obstacle at a speed of 13.65 m/s at a distance of 73.86 m at time \(t=0\). The lf-vehicle driving on the adjacent lane travels at this time with 13.45 m/s at a distance of 81.85 m measured along the lane in the same driving direction. Figure 9 shows the distances analogously to the evaluation in Sect. 3.3.1. The longitudinal distances from the front and rear bumpers to the obstacle are plotted for both vehicles. The hatching again gives information about the lateral position of the vehicles. Here, the inclined line hatching corresponds to the use of the unblocked left lane, and the horizontal line hatching indicates the use of the right lane, which is blocked from \(d=0\). The cross-hatching represents areas where both lanes are used at the same time.

Immediately after the virtual obstacle appears, the lc-vehicle requests a reservation area in the target lane so the obstacle can be passed without braking. After evaluating this emergency request, the lf-vehicle sends a confirmation message. Thus, the cooperative maneuver is bindingly agreed upon. The temporarily and spatially limited reservation area is indicated by a gray box in Fig. 9. Right at the beginning of this area, the lc-vehicle activates the reservation. The cross-hatching indicates the partial use of both lanes. Before reaching the obstacle at the longitudinal position \(d=0\), the lane change is completely finished, and both vehicles drive on the left lane one after the other. Figure 10 shows the speeds of the two vehicles during the experiment. The speed profile of the lc-vehicle is almost constant. The lf-vehicle, on the other hand, brakes and thus enables the cooperative maneuver.

Fig. 9
A multiple-line graph of distance along the road to the obstacle versus time in seconds for If vehicle and I c vehicle. The If vehicle depicts a decrease from (0, 80) to (6, 23) and the I c vehicle depicts a decrease from (0, 75) to (6, negative 23). All values are approximated.

Test drive: Distances between the front and rear bumpers of the lf-vehicle (red) and the lc-vehicle (blue) and the obstacle located at \(d=0\), with \(t=0\) being the point in time of the obstacle appearance; the distances are measured along the lane. The hatching patterns indicate the lateral area used by the vehicle: Horizontal single hatching indicates the use of the right lane, and inclined single hatching indicates the use of the left lane

Fig. 10
A multiple-line graph of velocity versus time for the I c vehicle and the If vehicle. The I c vehicle depicts a constant line of (0, 14) and the If vehicle depicts a dip at (4, 10) followed by a rise at (6, 12.9). All values are approximated.

Test drive: Velocities of lc-vehicle (blue) and lf-vehicle (red) during the test drive. \(t=0\) is the point in time when the obstacle appears

3.4 Conclusion

Emergencies in road traffic can hardly be avoided due to complexity and, not least, due to humans. Therefore, appropriate handling of such situations is of the utmost importance for developing automated vehicles. A basic example of such a hazardous situation is an obstacle’s sudden and unforeseen occurrence within the planned path of movement. The most basic method of responding to such a situation is to brake the vehicle to avoid a collision or at least reduce the impact energy as much as possible. Evasive maneuvers can be used in some instances to avoid heavy braking or even to avoid collisions. The prerequisite is that the traffic space required for swerving is not in use by other road users. The method discussed in this section aims to use vehicle-to-vehicle communication to negotiate a reservation of the space required for an evasive maneuver with conflicting road users. Hence, evasive maneuvers should be possible in more situations than before, thus avoiding heavy braking maneuvers and collisions.

The evaluation of the performed simulation and test drive shows that the method is suitable for this purpose. It was shown in the test drive that the cooperative behavior reduced the impact of the obstacle. The simulation is parameterized so that a braking maneuver within the lane cannot avert a collision. Initially, a lane change is impossible because of the blocked adjacent lane. However, the cooperative behavior negotiated using the presented method can effectively resolve the emergency situation without causing a collision. Thus, the method can prevent collisions and reduce the impact of unforeseen obstacles. To further improve cooperative emergency behavior, future research can address, e.g., pre-negotiation of emergency responses and lane sharing in emergency situations. This could prevent collisions in a wider range of situations.

4 Implicitly Cooperative Decision-Making

The research presented here builds upon prior work on cooperation of automated vehicles [25] and the use of reinforcement learning for decision-making [26]. While the previous reference investigated deep Q-learning for the decision-making of an automated vehicle without considering interactions between road users, this section presents a method that does this based on the soft actor-critic approach [10] and proximal policy optimization algorithms [29]. For this purpose, a multi-agent system is built, and partly cooperative objective functions are designed. A common problem of road traffic is selected to show and research the methodology. Figure 11 shows a traffic situation similar to a highway entrance. Two lanes are in parallel for a limited stretch of way, with the right lane ending at the end of the segment and the left lane proceeding as part of a road with an arbitrary number of lanes. In such traffic situations, participants with different objectives interact, implicitly communicate and sometimes even cooperate.

4.1 Related Work

In literature, several methods have already been documented to implement parts of the decision-making of an automated vehicle using reinforcement learning methods. There are a few examples where end-to-end learning approaches are employed [3, 33] with the decision-making being part of the end-to-end architecture. But the task of automated driving is usually split into subtasks that are solved by different methods. Tram et al. [30] use deep Q-learning to adjust the speed of an automated vehicle as it passes through an intersection. The surrounding traffic, which consists of simulated manually driven vehicles, is used as input for an artificial neural network and a recurrent artificial neural network for comparison. As a result, the automated vehicle passes the intersection without collision in the majority of cases for both networks, with better results obtained from the recurrent network.

Wang et al. [31] consider lane changing and investigate a methodology to perform it in various situations. To do this, they model the problem with a state space consisting of road information such as curvature and width and vehicle dynamics information such as acceleration, speed, and position. Here, the reinforcement learning agent serves as the lateral controller, and the action space contains the yaw acceleration of the vehicle. The results show that the lane change controller manages the control task but lacks robustness and flexibility. The principal author later reformulated the task and published an approach for the lateral control during lane changing using deep deterministic policy gradient [32]. As a result, stable lane changing is achieved with the proposed architecture.

Kurzer et al. [18] propose a method to represent the environment in a generalized way with as few restrictions as possible. This is intended to improve the capability for generalization of the methods using this representation. To do this, the path in front of the vehicle is divided into segments and properties such as time to occupancy and time to vacancy are assigned to each segment. Together, these pieces of information form the state representation. Experiments presented in the paper show the successful abstraction of the environment representation from the concrete driving situation.

Bouton et al. [1] propose a decision-making algorithm for automated vehicles to navigate at intersections. In addition to a reinforcement learning algorithm, a model checker is used to make the decisions safe. Furthermore, perception errors are addressed with the help of a recurrent neural network. As a result, the algorithm proves to be robust and safe concerning the decisions.

For relevant examples of multi-agent reinforcement learning, reference is made to the survey by Hernandez-Leal et al. [12] and the article by Canese et al. [2]. Both references provide a literature review on multi-agent reinforcement learning.

Fig. 11
An illustration of a highway entrance with two lanes merging into a single lane with gap identifiers marked from g 1 to g 5,

Overview of the map and gap identifiers; gap \(g_5\) has no longitudinal lower bound in this case, adapted from [26]

4.2 Approach

The approach involves two independent agents that follow their goals defined by a reward function in a scenario. The lane change agent (lc-agent) has to change lanes in a limited time and on a limited road section while the lane following agent (lf-agent) follows the lane the lc-agent wants to change to. By adjusting the speed, the lf-agent can let the lc-agent merge cooperatively. The agents differ in terms of the algorithm used, the state space, and the action space definitions. The specific components are described in Sects. 4.2.1 and 4.2.2.

The characteristics of the map are part of the state spaces of both agents. The map is depicted in Fig. 11, with l being the longitudinal distance between the first and last possibility to change lanes. In addition, a speed limit \(v_{\text {speedlimit}}\) is defined for the area in which the examined traffic situation occurs. Other road characteristics, e.g., curvature, are not part of the state spaces to keep the definition general. Besides the map-specific part of the environment, the traffic participants themselves are part of the environment as they interact and limit the possibilities of the other participants in the scenario.

4.2.1 Lane Change Agent

The lc-agent controls the light-colored vehicle depicted in Fig. 11 and selects the gap on the target lane depending on the observed state of the environment. The proximal policy optimization algorithm [29] is used for this purpose. The state description consists of the longitudinal boundaries of the five gaps depicted in Fig. 11, the velocities of the vehicles on the target lane, the position and velocity of the ego-vehicle, and the longitudinal boundaries of the lane change area. The action space contains the discrete gap selection. For training the agents, a reward signal is used to induce the intended properties of the agents. For this purpose, a reward function is defined that rewards high values of \(v_\text {ego}\) and penalizes the use of the original lane in each time step. A systematic parameter study has been conducted to define the exact reward functions of both agents. The lc-agent’s reward function \(R_{\text {lc}}\) is defined as follows:

$$\begin{aligned} R_{\text {lc}} = \left\{ \begin{array}{rl} -0.8+\frac{1}{14}\times v_{\text {ego}}, & \text {if lane change is not finished} \\ \frac{1}{14} \times v_{\text {ego}}, & \text {if lane change is finished} \end{array}\right. \end{aligned}$$
(1)

4.2.2 Lane Following Agent

For the lf-agent, the soft actor-critic algorithm [11] is used. This is an off-policy algorithm that seeks to maximize both expected reward and entropy. The state description of this agent consists of the longitudinal distances to the vehicle in front, the vehicle behind, the vehicle that attempts to perform a lane change, and the boundaries of the lane change area. Moreover, the ego velocity \(v_\text {ego}\) as well as the velocities of the lane changing vehicle and the vehicles in front and behind are part of the state representation. The continuous action space consists of a set-point velocity input to the trajectory planning. The reward function \(R_{\text {lf}}\) depends on the velocity \(v_{\text {ego}}\) and, to induce a partly-cooperative behavior, on the lane change state of the lc-agent:

$$\begin{aligned} R_{\text {lf}} = \left\{ \begin{array}{rl} 0.7\times v_{\text {ego}}, & \text {if lane change is not finished} \\ 1\times v_{\text {ego}}, & \text {if lane change is finished} \end{array}\right. \end{aligned}$$
(2)

4.3 Experiment

The experiment involves training the agents in dense surrounding traffic. During the training, the policies of the agents are continuously evaluated. The scenario shown in Fig. 11 is used for the experiment.

4.3.1 Configuration

The length of the segment in which a lane change is possible is \(l=200\,\textrm{m}\). The speed limit in the scenario is set to \(v_{\text {speedlimit}}=13.89\,\text {m/s}\). Each episode consists of 40 training steps and takes 40 s of simulation time. A collision between traffic participants is impossible as the action space consists of inputs for the trajectory planning, which is inherently safe. Multiple simulations are conducted to identify proper hyperparameters. Table 1 gives the most important hyperparameters that are selected for the training process of the agents as they turned out to perform best after a limited parameter study. Besides, the standard parameters are chosen, as given in [11, 29].

Table 1 Hyperparameters

The surrounding traffic on the middle and left lanes of the road shown in Fig. 11 is simulated by SUMO [23]. This traffic consists of differently parameterized vehicles, so random and busy traffic situations arise. The individual speed limit of each vehicle is taken from a normal distribution considering but not always obeying the global limit \(v_{\text {speedlimit}}\). As a measure of the density, an emission probability of 43% for each of the middle and left lanes is specified. This value determines the probability of the emission of one vehicle each second.

Two agents are permanently trained during the simulation. The lc-agent starts from a standstill 150 m distant from the beginning of the merging lane and drives towards it. After the end of the episode, as defined above, the agent is reset to the starting position, and the next episode begins. The simulation control ensures that the lf-agent always controls a vehicle that is at a position suitable for potential cooperation during training. The vehicles of the lf-agent and the lc-agent are controlled by ADORe [13]. The interaction between the agents takes place solely implicitly through their behavior and the understanding of that behavior. Each agent executes decision-making at a frequency of \(1\,\textrm{Hz}\).

Fig. 12
A line graph of accumulated reward per episode versus training steps. It depicts an increase from (0 times 10 power 3, 150) to (30 times 10 power 3, 353) followed by a fluctuating trend and decreases to (200 times 10 power 3, 350). All values are approximated.

Reward per episode during evaluation of the lane following agent

Fig. 13
A line graph of accumulated reward per episode versus training steps. The line increases from (0 times 10 power 3, negative 30) to (50 times 10 power 3, 10) fluctuates, and decreases to (200 times 10 power 3, 10). All points are approximated.

Reward per episode during evaluation of the lane change agent

4.3.2 Results

The training was conducted for 200,000 training steps. After every ten thousandth training step, twenty episodes were executed for evaluation. For each episode, the cumulative reward is logged. Figure 12 shows the accumulated reward per episode in relation to the training progress for the lf-agent, Fig. 13 shows that data analogously for the lc-agent. Initially, as the number of training episodes progresses, the rewards of both agents per episode increase continuously and reach their maxima. Then, the rewards remain approximately constant until the training is discontinued after 200,000 training steps. Figure 14 shows the number of evaluation episodes with and without successful lane changes depending on the training progress. While the number of episodes without a successful lane change increases at the beginning of training, the slope flattens sharply with progressing training.

Fig. 14
A multiple-line plot of the number of episodes versus training steps for I c vehicle and If vehicle. The I c vehicle depicts a step-wise increase from (0 times 10 power 3, 0) to (200 times 10 power 3, 350). The If vehicle depicts a slight increase with a step-wise increase initially followed by a constant line. All points are approximated.

Number of successful   and unsuccessful episodes

4.4 Conclusion

The experiment results show the suitability of reinforcement learning methods for the partially cooperative decision-making process. The agents use the soft actor-critic and the proximal policy optimization algorithms to cooperatively adapt their behavior to the other agent and maximize the reward per episode. Regardless of the presence of automated and manually driven vehicles, understanding the other vehicles’ behavior is essential for efficiently accomplishing those situations. Extending the state spaces by a prediction of the vehicles in the scenario may further improve the performance.

The experiment does not consider direct communication via vehicle-to-vehicle communication. However, many cooperation methods work based on explicit communication. The combination of these two techniques can be addressed in the next steps. Furthermore, the variance of reward functions of the agents can be increased. More objectives can be considered, and thus more general applicability of the method can be reached.

5 Conclusion

The last three sections cover three important aspects of maneuver-level cooperation of automated vehicles. First, a fundamental method for defining, negotiating, and agreeing on cooperative maneuvers is presented. The STRP is based on reserving temporarily and spatially limited traffic space for exclusive use. The driving tests with automated research vehicles and simulations show that the method is suitable for effective cooperative resolutions of conflicts on the road. The different reservation templates allow universal applicability in conflict situations occurring in traffic. In the second part, research on cooperation in emergency situations is presented. The investigated approach is based on STRP and tested both in test drives and simulations. It is shown that the method allows to avoid collisions and to mitigate the impact of suddenly occurring obstacles by performing cooperative emergency maneuvers. The last part presents research on a cooperative high-level decision-making method. It is based on reinforcement learning algorithms and does not require explicit communication. Simulations show that cooperative behavior can be elicited by defining suitable objective functions for the vehicles present in a traffic scenario.

The results contribute to the achievement of safe and efficient behavior of automated vehicles in the three addressed aspects of cooperative automated driving. Based on the research presented in this chapter, the cooperative behavior of automated vehicles can be further researched. For example, an integration of STRP relying on explicit communication into the method for cooperative decision-making can be investigated. This could then be used to research an integrated approach for decision-making and explicitly negotiated cooperation of automated vehicles.