1 Introduction

When automated vehicles (AV)s first enter traffic, they will not drive in isolation but share the road with predominantly human drivers. Thus, interacting with them is crucial for smooth and efficient operation. This is especially important in interactive situations where the actions of multiple vehicles are tightly coupled. For instance, a driver on a highway might decide to slow down so that another driver can merge, or a driver might start to nudge into the adjacent lane, hoping that the driver behind will slow down and open a gap.

A key aspect to master such scenarios with AVs is to consider interactions with human drivers. However, to reduce the computational complexity of motion planning, most state-of-the-art planners follow a structure that overlooks these mechanisms. In particular, they follow a predict-then-plan scheme, where the motion planning is separated into a prediction step, where the future motion of surrounding drivers is predicted, and a subsequent planning step, where the motion of the AV is determined. During the planning, surrounding vehicles are treated as moving objects with an immutable trajectory.

While this separation poses a useful simplification for many traffic scenarios, it can lead to situations similar to the frozen robot problem [34], a state in which the predictions of other traffic participants block all paths, and thus the planner is not able to find a solution to its goal anymore. Fig. 1 illustrates this issue for a merge scenario. Following a predict-then-plan structure, the AV, in blue, first predicts the future motion of surrounding vehicles and plans its trajectory in a subsequent step. In Fig. 1(a) the result in low traffic is shown. Following the same principle in high traffic, shown in Fig. 1(b), the planner is unable to find a collision-free trajectory onto the highway and the AV stops at the end of the lane.

Fig. 1
Two illustrations of autonomous vehicle in a merge scenario, unable to find a collision-free trajectory onto the highway due to blocked predictions. In a, due to low traffic, vehicles from one lane turn and merge in front of the vehicles from another lane. In b, due to high traffic, vehicles move in three lanes without possibilities to merge.

Illustrated are the results of a planner following a predict-then-plan structure for a merge scenario in (a) low traffic and (b) high traffic. While in low traffic, separating prediction and planning is a useful simplification, in high traffic it can lead to suboptimal, overly conservative driving behavior of the AV

Some approaches are already able to overcome the structural limitation of planners following a predict-then-plan scheme by solving the prediction and planning task simultaneously. These planners can be categorized into the following three classes: Forward simulation methods, multi-agent methods, and game-theoretic methods.

Forward Simulation Methods: One technique to generate interaction-aware behavior is via forward simulation. Here, the current traffic scene is simulated for different actions of the AV. Transition models are used to describe how the environment changes due to the actions of the AV and further how other drivers react to these changes in the environment. We refer to such techniques as forward simulation methods. Most sampling-based planning methods that consider interactions can be associated with this category [11, 27]. An important group among the sampling-based planning methods are methods based on partially observable Markov decision process (POMDP), e.g. as presented in [17].

The behavior of other agents is often modeled with specific driver models such as the Intelligent Driver Model (IDM) [35] or the Minimum Overall Braking Induced by Lane change (MOBIL) model [18]. An example where the IDM is used to determine the reaction of other vehicles is presented in [11]. Here, to generate the behavior for the AV, multiple candidate trajectories are simulated and evaluated based on the effect imposed on others. In forward simulation methods the influence the AV exerts on others is not explicitly given, but must be determined by trying out several actions and subsequent forward simulations of the traffic scene.

Multi-Agent Methods: In multi-agent methods, the separate prediction of other vehicles is replaced by planning coupled trajectories. Therefore, the traffic scene is modeled as a multi-agent planning problem with the underlying assumption that all traffic participants behave towards optimizing a joint objective [2, 4, 5, 8, 19, 22, 32]. The AV then solves the multi-agent problem assuming that other agents will also roughly follow their part of the plan. Varying weights can be used to model different levels of cooperation or incorporate asymmetries in the traffic scene [4, 7]. To cope with uncertainties in the behavior of humans, these methods are combined with tracking approaches to estimate if humans roughly follow the same model [5, 32].

Game-Theoretic Methods: In real traffic, the assumption that each driver is behaving towards optimizing a common objective might not be valid, since some drivers are only interested in optimizing their own driving. To model interactions among agents with different objectives, a game-theoretic perspective might be more suitable. Several game-theoretic methods have already successfully been used, e.g., for lane change, merge, intersection, round-about, and overtaking scenarios [6, 10, 12, 13, 21, 29,30,31, 38, 39]. E.g., in [30], human-like driving behavior, e.g., slowing down before intersections or nudging into the adjacent lane while doing a lane change, could be generated.

Apart from these driving applications, game-theoretic methods have been used for agile maneuvering of multiple ground vehicles in close proximity [40], and automated car racing [23, 25, 37, 39], where it is shown that game-theoretic planners yield complex strategies such as blocking and faking and significantly outperform baseline MPC planners.

In game-theoretic formulations, there is no optimal solution in the traditional sense, but depending on the game’s structure, different solutions are possible, also referred to as equilibria. Therefore, an important feature to categorize game-theoretic methods is the type of solution they are solving for. In literature, it is distinguished between Nash and Stackelberg equilibria. A Nash equilibrium describes a set of strategies where no individual agent can benefit from unilaterally changing its strategy, given that all other agents will stick to their strategy. This type of equilibrium has been investigated, e.g., in [3, 10, 14, 20, 37, 40].

Compared to a Nash equilibrium, a Stackelberg equilibrium involves turn taking and, therefore, an asymmetry in the decision-making process. It is typically modeled for a two-player game, where one player is the leader, and the other is the follower. The leader chooses its strategy first, and the follower then optimizes its strategy as the best response to the leader’s strategy. In contrast, the Nash solution can be seen as the best response from everyone to everyone else without hierarchical turn-taking. Stackelberg equilibria are considered in [6, 12, 23, 29, 30, 33, 41, 42]

In this work, we present a model based on a game-theoretic formulation that directly captures interactions between a AV and a human driver (HD) as a Stackelberg game. This algorithm enables AVs to be aware of how their actions influence other drivers and thereby allows generating interaction-aware driving behavior. In contrast to existing works, the algorithm can account for general nonlinear state and input constraints. Additionally, we present mechanisms to integrate cooperation and courtesy into interaction-aware methods to prevent overly aggressive driving behavior, which has been reported as an issue of existing approaches.

2 Problem Statement

To derive a model to directly capture interactions, we consider a system with one AV, referred to as the leader L, and one HD, referred to as the follower F. The system’s state at time t is given by the leader’s and follower’s state \(\textbf{x}^L_t,\textbf{x}^F_t\in \mathcal {X}\), where \(\mathcal {X}\) is the set that contains all possible states. The leader’s and follower’s actions are described by their trajectories \(\xi _{L}(t),\xi _{F}(t):[0,\mathcal {T}]\rightarrow \mathcal {X}\). Further, each agent has its individual objective function denoted by \(J_{L}\) and \(J_{F}\).

The objective is minimized subject to the vehicle’s initial state \(\xi (0) = \textbf{x}_{0} \) and the evolution of the state described by the trajectory, which is only allowed to pass through the set of feasible states \(\mathcal {X}_{\text {feasible}}(t) \subseteq \mathcal {X}\). \(\mathcal {X}_{\text {feasible}}(t)\) encodes, for instance, collision avoidance. Additionally, system dynamics and bound constraints can be enforced by \(D(\xi (t), \dot{\xi }(t),\ddot{\xi }(t), \dots ) = 0\). The set of all feasible trajectories \(\xi (t)\) is denoted by \(\Xi \).

In contrast to traditional multi-agent systems, we assume a turn-taking structure, where the follower optimizes its trajectory as a response to the leader’s trajectory. To do so, the follower predicts the leader’s future motion \(\tilde{\xi }_{L}\) and then plans by minimizing its objective function \(J_{F}\) considering these predictions. Therefore, the follower’s optimal trajectory can be described as:

$$\begin{aligned} \mathop {\text {arg}\,\text {min}}_{\xi _{F} \in \Xi _F} J_{F}(\textbf{x}^L_{0},\textbf{x}^F_{0},\tilde{\xi }_{L},\xi _{F}) \end{aligned}$$
(1)

For simplicity, we assume that for short time horizons, a human can predict the trajectory of the AV sufficiently well, such that the prediction \(\tilde{\xi }_{L}\) can be assumed to be the actual trajectory \(\xi _{L}\) of the AV. Hence, the optimal trajectory of the follower as a function of the leader’s trajectory \(\xi _{L}\) is given as:

$$\begin{aligned} \xi _F^*(\textbf{x}^L_{0},\textbf{x}^F_{0},\xi _{L}) = \mathop {\text {arg}\,\text {min}}_{\xi _{F} \in \Xi _F} J_{F}(\textbf{x}^L_{0},\textbf{x}^F_{0},\xi _{L},\xi _{F}) \end{aligned}$$
(2)

With this link between the leader’s actions and the follower’s actions the optimal trajectory for the AV can be stated as:

$$\begin{aligned} \xi _{L}^* = \mathop {\text {arg}\,\text {min}}_{\xi _{L} \in \Xi _L} J_{L}\left( \textbf{x}^L_{0},\textbf{x}^F_{0},\xi _{L}, \xi _F^*(\textbf{x}^L_{0},\textbf{x}^F_{0},\xi _{L})\right) \end{aligned}$$
(3)

Equation (3) gives the leader the ability to reason about how its actions will influence the follower’s response and is therefore the fundamental model which enables interaction-aware planning.

3 Bi-level Formulation

The derived model, in (3), describes a Stackelberg game, where the leader decides on its behavior first and the follower optimizes its behavior given the decision of the leader (Fig. 2). If the follower’s best response to the leader’s actions can be stated in closed form, (3) can be solved as a standard optimal control problem (OCP). However, this is, in general, not the case since \(\xi _{F}^*\) is the outcome of an OCP itself. This results in a nested or bi-level optimization problem. Further, solving the underlying Stackelberg game would require planning until \(\mathcal {T}\), which is the end of an interaction. However, the end of an interaction is not trivial to determine and requires the consideration of a varying time horizon.

Fig. 2
A structural representation depicts a bi-level optimization problem where a follower's optimal response depends on the leader's actions, creating a nested optimization problem. The leader's decision affects the follower's optimization, and the end of an interaction is not trivial to determine, requiring a varying time horizon.

Structure of a bi-level optimization problem. Here, the follower optimizes its objective function as a response to the given actions of the leader

In the following, we propose an approximate solution to (3) based on model predictive control (MPC), where we solve the problem on a receding horizon with a fixed length T, execute the first action and then replan. We utilize multiple shooting methods and discretize the time horizon \(t \in [0,T]\) into \(N = T / \tau \) intervals, where \(\tau \) denotes the duration of one time step. To improve readability, we subsume the state and input sequences of the leader and follower as \(\textbf{x}:=[\textbf{x}_0,\ldots ,\textbf{x}_N]^T\) and \(\textbf{u}:=[\textbf{u}_0,\ldots ,\textbf{u}_{N-1}]^T\). In the following, the resulting nonlinear programs (NLP)s of the follower and leader are stated. The equality constraints \(\textbf{h}\) can be used to represent constraints imposed by the system dynamics while the inequality constraints \(\textbf{g}\) collect bound constraints, collision constraints, and dynamic constraints.

3.1 NLP of the Follower

The follower’s NLP is parametrized by the leader’s states and inputs \((\textbf{x}^L,\textbf{u}^L)\) and can be formulated as:

$$\begin{aligned} \mathop {\text {arg}\,\text {min}}_{\textbf{x} ^F, \textbf{u} ^F} \quad & J_{F}(\textbf{x}^L,\textbf{x}^F,\textbf{u}^{F}) \end{aligned}$$
(4a)
$$\begin{aligned} \text {s.t.} \quad & \textbf{h}_{F}(\textbf{x}^F,\textbf{u}^F) = 0 , \end{aligned}$$
(4b)
$$\begin{aligned} & \textbf{g}_{F}(\textbf{x}^L,\textbf{x}^F,\textbf{u}^F) \le 0 \end{aligned}$$
(4c)

3.2 NLP of the Leader

The leader’s bi-level optimization problem can be stated as:

$$\begin{aligned} \mathop {\text {arg}\,\text {min}}_{\textbf{x} ^L, \textbf{x} ^F,\textbf{u} ^L, \textbf{u} ^F} \quad & J_{L}(\textbf{x}^L,\textbf{x}^F,\textbf{u}^{L}) \end{aligned}$$
(5a)
$$\begin{aligned} \text {s.t.} \quad & \textbf{h}_{L}(\textbf{x}^L,\textbf{u}^L) = 0 , \end{aligned}$$
(5b)
$$\begin{aligned} & \textbf{g}_{L}(\textbf{x}^L,\textbf{x}^F,\textbf{u}^L) \le 0 , \end{aligned}$$
(5c)
$$\begin{aligned} & (\textbf{x} ^F,\textbf{u} ^F) \in \underset{\textbf{x} ^F,\textbf{u} ^F }{\mathop {\text {arg}\,\text {min}}} \{J_{F}(\textbf{x}^L,\textbf{x}^F,\textbf{u}^{F}) : \textbf{h}_{F} =0, \, \textbf{g}_{F} \le 0\} \end{aligned}$$
(5d)

Formulating the follower’s optimization problem as a constraint, (5d), ensures that only optimal solutions for the follower are considered feasible solutions for the leader.

4 Single-Level Representation

To efficiently solve (5), we need to reformulate the bi-level optimization problem into a regular, single-level problem. Therefore, we assume that the follower will act optimally with respect to its own objective function (4). With this assumption, we can replace the inner optimization problem with its necessary conditions for optimality.

If the follower’s problem is convex, the Karush Kuhn Tucker (KKT) conditions are necessary and sufficient for optimality. However, due to the combinatorial nature of driving it is, in general, non-convex, e.g., due to non-linear collision avoidance constraints or a non-convex cost function. To obtain locally optimal solutions, we convexity the follower’s problem around an initial guess, which at the same time encodes the considered homotopy class. For the convexification, the constraints are linearized, and the cost function is approximated by a 2. order Tailer expansion.

By replacing the follower’s optimization problem with its KKT conditions in (5), we obtain the following single-level optimization problem:

$$\begin{aligned} \mathop {\text {arg}\,\text {min}}_{\textbf{x} ^L, \textbf{x} ^F,\textbf{u} ^L, \textbf{u} ^F, \boldsymbol{\lambda },\boldsymbol{\mu }} \quad & J_{L}(\textbf{x}^L,\textbf{x}^F,\textbf{u}^{L}) \end{aligned}$$
(6a)
$$\begin{aligned} \text {s.t.} \quad & \textbf{h}_{L}(\textbf{x}^L,\textbf{u}^L) = 0 , \end{aligned}$$
(6b)
$$\begin{aligned} & \textbf{g}_{L}(\textbf{x}^L,\textbf{x}^F,\textbf{u}^L) \le 0 , \end{aligned}$$
(6c)
$$\begin{aligned} & \nabla _{(\textbf{x}^F,\textbf{u}^F)} L(\textbf{x}^L,\textbf{x}^F,\textbf{u}^F,\boldsymbol{\lambda },\boldsymbol{\mu }) = 0 , \end{aligned}$$
(6d)
$$\begin{aligned} & \textbf{h}_{F_{\text {lin}}}(\textbf{x}^F,\textbf{u}^F) = 0 , \end{aligned}$$
(6e)
$$\begin{aligned} & \textbf{g}_{F_{\text {lin}}}(\textbf{x}^L,\textbf{x}^F,\textbf{u}^F) \le 0 , \end{aligned}$$
(6f)
$$\begin{aligned} & \boldsymbol{\mu } \ge 0 , \end{aligned}$$
(6g)
$$\begin{aligned} & \boldsymbol{\mu } \bot \textbf{g}_{F_{\text {lin}}}(\textbf{x}^L,\textbf{x}^F,\textbf{u}^F) \end{aligned}$$
(6h)

with the Lagrangian

$$\begin{aligned} L(\textbf{x}^L,\textbf{x}^F,\textbf{u}^{F},\boldsymbol{\lambda },\boldsymbol{\mu }) & =J_{F_{\text {con}}}(\textbf{x}^L,\textbf{x}^F,\textbf{u}^{F}) \\ + & \boldsymbol{\lambda }^T\textbf{h}_{F_{\text {lin}}}(\textbf{x}^F,\textbf{u}^{F}) + \boldsymbol{\mu }^T\textbf{g}_{F_{\text {lin}}}(\textbf{x}^L,\textbf{x}^F,\textbf{u}^{F}) \end{aligned}$$

Here, \(\boldsymbol{\lambda }\) and \(\boldsymbol{\mu }\) are the KKT multipliers and \(\textbf{h}_{F_{\text {lin}}}, \textbf{g}_{F_{\text {lin}}}\) and \(J_{F_{\text {con}}}\) are the constraints and objective after the convexification. For the reformulation we assume sufficient regularity of the follower’s NLP, differentiability of \(\textbf{h}_{F}\) and \(\textbf{g}_{F}\), and the cost function \(J_{F}\) to be twice differentiable.

4.1 Solving the Complementarity Constraints

The leader’s NLP in (6) forms an instance of a mathematical program with complementarity constraints (MPCC). Due to the complementarity constraints \(\boldsymbol{\mu } \bot \textbf{g}_{F_{\text {lin}}}\), MPCCs are non-smooth and non-convex. MPCC are particularly challenging to solve because at every feasible point, ordinary constraint qualifiers (CQ) such as LICQ or Mangasarian-Fromovitz CQ are violated [9]. Therefore, to solve (6), we reformulate the complementarity constraints using relaxation methods [16] as shown in (7).

$$\begin{aligned} - \epsilon \le \boldsymbol{\mu }^T \textbf{g}_F. \end{aligned}$$
(7)

With \(\epsilon > 0\) a regularized NLP is obtained, and CQ can be satisfied again. The smaller \(\epsilon \) is chosen, the closer any feasible solution is to achieving complementarity. However, if \(\epsilon \) is chosen too small, the problem may be numerically unstable and the solver will fail to find a feasible solution at all.

5 Application to Motion Planning for AVs

So far the derived model represents a formulation of how interactions between a robot and a human can be considered during motion planning or decision-making for robots in general. In the following, we present a modeling to apply the bi-level algorithm to motion planning for AVs. The section starts by stating the OCP used for trajectory optimization of an AV. This OCP contains the system dynamics, bound constraints, as well as an objective function to encode desirable driving behavior. For the purpose of the evaluation in Sect. 6, we assume that a good approximation of a human objective function is provided. Such a function could be obtained, e.g., via inverse reinforcement learning.

5.1 Trajectory Optimization for AVs

The OCP used for trajectory optimization can be stated as:

$$\begin{aligned} & \mathop {\text {arg}\,\text {min}}_{{\textbf {x}}, {\textbf {u}}} J_{\text {base}} = J_{{\textbf {x}}} + J_{{\textbf {u}}}+ J_{\dot{{\textbf {u}}}} \end{aligned}$$
(8a)
$$\begin{aligned} \text {s.t.}: & \nonumber \\ & {\textbf {x}}_{k+1} = {\textbf {f}}({\textbf {x}}_{k},{\textbf {u}}_{k}) & & k = 0,\dots , N-1 \end{aligned}$$
(8b)
$$\begin{aligned} & {\textbf {x}}_{0} = \hat{{\textbf {x}}} & & \end{aligned}$$
(8c)
$$\begin{aligned} & \underline{{\textbf {g}}_{\text {dyn}}} \le {\textbf {g}}_{\text {dyn}}({\textbf {x}}_k) \le \overline{{\textbf {g}}_{\text {dyn}}} & & k = 1,\dots , N \end{aligned}$$
(8d)
$$\begin{aligned} & {\textbf {g}}_{\text {col}}({\textbf {x}}^F_k,{\textbf {x}}^L_k) \le 0 & & k = 1,\dots , N \end{aligned}$$
(8e)
$$\begin{aligned} & {\textbf {g}}_{\text {obs}}({\textbf {x}}_k) \le 0 & & k = 1,\dots , N \end{aligned}$$
(8f)
$$\begin{aligned} & \underline{{\textbf {x}}} \le {\textbf {x}}_k \le \overline{{\textbf {x}}} & & k = 1,\dots , N \end{aligned}$$
(8g)
$$\begin{aligned} & \underline{{\textbf {u}}} \le {\textbf {u}}_k \le \overline{{\textbf {u}}} & & k = 0,\dots , N-1 \end{aligned}$$
(8h)

The objective function \(J_{\text {base}}\) is used to generate a desirable driving behavior. The equality constraints (8b) enforces the vehicle dynamics. Further, (8c) ensures that the trajectory is planed from the current state \(\hat{{\textbf {x}}}\). The inequality constraints (8d)–(8h) are used for collision avoidance and to account for physical limitations of the real system.

5.1.1 Vehicle Model

To describe the dynamics of the vehicle (8b), the kinematic single-track model is used. The vehicle state at time k, \(\textbf{x}_k = [x_k,y_k,\psi _k, v_k]^T\), is described by the lateral and longitudinal position (xy) of the vehicle’s center of gravity, the orientation \(\psi \), and the absolute velocity v. Together with the input \(\textbf{u}_k = [\delta _k, a_k]^T\) consisting of steering angle \(\delta \) and acceleration a, the dynamics of a vehicle are given by (Fig. 3):

Fig. 3
An illustration of the kinematic bicycle model. The vehicle state is represented by a vector, and the input consists of steering angle and acceleration. The dynamics are given by an equation, x k.

Kinematic bicycle model

Here, \(\beta \) is the slip angle which is given by \(\beta = \arctan \left( \frac{l_r}{l}\tan (\delta )\right) \).

Further, l is the wheelbase, and \(l_r\) is the distance between the center of gravity and the rear axis. To obtain the discrete dynamics model \(\textbf{x}_{k+1} = f(\textbf{x}_{k},\textbf{u}_{k})\) in (8b) we use a fourth-order Runge-Kutta method.

To ensure the validity of the kinematic single-track model [28], \({\textbf {g}}_{\text {dyn}}\) (8d) are introduced to limit the lateral acceleration as follows:

$$\begin{aligned} |v_{k}\dot{\psi }_{k}| = |\frac{v_{k}^2}{l}\tan (\delta _{k})\cos (\beta _{k}) | \le a_{\text {lat,max}} = {4}\, \frac{{\text {m}}}{{\text {s}^2}} \end{aligned}$$
(9)

To also limit the jerk, the following constraints on the acceleration change are introduced:

$$\begin{aligned} j_{\text {min}} \le \frac{a_k - a_{k-1}}{\tau } \le j_{\text {max}} \end{aligned}$$
(10)

Here, \(j_{\text {min}}\) and \(j_{\text {max}}\) are the minimum and maximum allowed jerk values.

5.1.2 Collision Avoidance

The collision avoidance constraints (8e) are formulated pairwise between vehicles. Hereby, the shape of one vehicle is approximated by a finite number of circles and the shape of the second vehicle is approximated with a superellipses, as illustrated in Fig. 4. Compared to regular ellipses, superellipses provide a more accurate approximation of the vehicle’s rectangular shape [24].

Fig. 4
Two illustrations depicts a vehicle shape approximation using multiple circles and a superellipse of order 4. The collision avoidance constraints are formulated pairwise between vehicles, with the first vehicle approximated by circles and the second by a superellipse.

Illustrated are the shape approximations by (a) multiple circles and (b) a superellipse of order \(n=4\)

Collision avoidance between a point \(\textbf{p} = [x,y]^T\) and a superellipse defined by the semi-major a, the semi-minor b, and order \(n \in \mathbb {N}\) can be formulated as:

$$\begin{aligned} \root n \of {\left( \frac{x}{a}\right) ^n + \left( \frac{y}{b}\right) ^n} \ge 1 \end{aligned}$$
(11)

Similarly, collision avoidance between a circle with radius r and a superellipse can be formulated as a point mass constraint on the center point of the circle \(\mathbf {p_c} = [x_c,y_c]^T\). Therefore, \(\mathbf {p_c}\) needs to be outside the Minkowski sum of the superellipse and a circle with radius \(\frac{r}{2}\), see Fig. 5.

To maintain an efficient formulation, the Minkowsi sum is approximated by an enlarged superellipse. In case of a superellipse of order \(n=4\), enlarging the semi-major and semi-minor by the radius r is a sufficient over approximation, see Fig. 5.

Fig. 5
A graphical representation of the Minkowski sum of a superellipse and a circle, with the radius of the circle being half the distance between the two axes of the superellipse. The collision avoidance between a point and the superellipse or circle is formulated as a point mass constraint on the center point of the circle or superellipse.

Comparison of the Minkowski sum, shown in blue, and a superellipse with the semi-major and semi-minor enlarged by r, shown in red. The original superellipse is shown in grey

Henceforth, the collision avoidance constraints can be stated as:

$$\begin{aligned} \root n \of {\left( \frac{x_c}{a+r}\right) ^n + \left( \frac{y_c}{b+r}\right) ^n} \ge 1 \end{aligned}$$
(12)

5.1.3 Objective Function

The objective function (8a) consists of three components, \(J_{{\textbf {x}}}\), \(J_{{\textbf {u}}}\), \(J_{\dot{{\textbf {u}}}}\), penalizing deviations from a desired state \(\textbf{x}_{\text {ref}} = [x_{k,\text {ref}}, y_{k,\text {ref}}, \psi _{k,\text {ref}}, v_{k,\text {ref}}]^T\), any control effort, and any changes in control, respectively. The function can be stated as:

$$\begin{aligned} J_\text {base}(\textbf{x},\textbf{u}) & = J_{{\textbf {x}}} + J_{{\textbf {u}}}+ J_{\dot{{\textbf {u}}}} \end{aligned}$$
(13a)
$$\begin{aligned} & = \sum _{k=1}^N \begin{pmatrix} x_k - x_{k,\text {ref}} \\ y_k - y_{k,\text {ref}} \\ \psi _k - \psi _{k,\text {ref}} \\ \Delta v \end{pmatrix}^T Q \begin{pmatrix} x_k - x_{k,\text {ref}} \\ y_k - y_{k,\text {ref}} \\ \psi _k - \psi _{k,\text {ref}} \\ \Delta v \end{pmatrix} \end{aligned}$$
(13b)
$$\begin{aligned} & + \sum _{k=0}^{N-1}{\textbf{u}_k}^T R_u {\textbf{u}_k} \end{aligned}$$
(13c)
$$\begin{aligned} & + \sum _{k=1}^{N-1}({\textbf{u}_k} - \textbf{u}_{k-1})^T R_{\dot{u}} ({\textbf{u}_k} - \textbf{u}_{k-1}) \end{aligned}$$
(13d)
$$\begin{aligned} & + ({\textbf{u}_0} - {\hat{\textbf{u}}})^T R_{\dot{u}} ({\textbf{u}_0} - {\hat{\textbf{u}}}) \end{aligned}$$
(13e)

With the velocity vector \(\textbf{v} = [v\cos (\psi +\beta ), v\sin (\psi +\beta )]^T\) and the road tangential unit vector \(\textbf{t}\), \(\Delta v = \textbf{v} \cdot \textbf{t} - v_{\text {ref}}\) measures the difference between the current velocity along the road and the reference velocity \(v_{\text {ref}}\). Further, \({\hat{\textbf{u}}}\) is the control input from the previous step. Finally, \(R_u\),\(R_{\dot{u}}\) and Q are weighting matrices used to model the desired driving behavior.

6 Evaluation

The efficacy of the proposed bi-level algorithm is evaluated in two different settings. First, the ability of the AV to deliberately influence the HD’s state through its driving behavior is investigated. These experiments make use of the direct link between the AV’s actions and the HD’s response which the bi-level approach provides.

Since in real driving applications, the goal of the AV is to drive efficiently and comfortably rather than to influence the state of other vehicles, the focus of the second part, is to demonstrate how the approach can be used to plan interaction-aware, cooperative driving behavior.

Apart from the efficacy, the algorithm’s runtime is analyzed followed by a discussion highlighting the advantages and limitations of the algorithm.

6.1 Base Scenario

We evaluate our approach in multi-lane scenarios as shown in Fig. 6, where the AV is depicted in blue and the HD is depicted in gray. For the purpose of these experiments, the AV is considered the leader, and the HD is considered the follower. In the following, we will use the terms leader and AV as well as follower and HD interchangeably.

Both vehicles have a width of \({2.0\,\mathrm{\text {m}}}\) and a length of \({4.0\,\mathrm{\text {m}}}\). Collision avoidance is implemented using a superellipse of order \(n = 4\) for the leader and two circles for the follower. Further parameters are given in Table 1.

Fig. 6
An illustration of multi-lane scenarios has two arrows, labeled A V and H D, pointing in opposite directions, with a dashed line connecting them. A V and H D denote the leader and the follower. The dashed line represents a connection or relationship between the straight road and merge road.

Depending on the experiment, either a multi-lane or a merging scenario, where the right lane ends, is considered

Table 1 MPC parameters

The follower directly uses the cost function (13) for its trajectory optimization with the weights and vehicle characteristics given in Table 1. The leader’s NLP is also based on (13) but additionally considers the KKT conditions of the follower’s NLP as constraints, as stated in (6). Further, the leader’s objective function is augmented with additional cost terms to set scenario-specific incentives.

If not stated otherwise, the initial and reference states listed in Table 2 are used for the leader and follower.

Table 2 Leader’s and follower’s initial and reference states

6.2 Influence the Human’s State

The following two experiments investigate the leader’s ability to influence the follower’s state. To provide the appropriate incentives, the leader’s objective function is augmented with \(J_\text {influence}\). The leader’s objective is, therefore, the following weighted sum:

$$\begin{aligned} J_L = w_L J_\text {base} + w_{\text {influence}} J_{\text {influence}} \end{aligned}$$
(14)

Henceforth, a ratio of \(\frac{w_{\text {influence}}}{w_L} = 10^7\) is used.

6.2.1 Slow Down the Human

In this experiment, the leader’s goal is to slow down the follower to a certain velocity, \(v_{\text {ref}}^F\). To incentivize this behavior, deviations of the follower’s velocity to \(v_{\text {ref}}^F\) are penalized. The scenario-specific \(J_{\text {influence}}\) is therefore set to

$$\begin{aligned} J_{\text {influence}} = \sum _{k = 1}^{N} (\textbf{v}_{k}^F\cdot \textbf{t}_{k} - v_{\text {ref}}^F)^2 \end{aligned}$$
(15)

with \(\textbf{v} = [v\cos (\psi +\beta ), v\sin (\psi +\beta )]^T\) and \(\textbf{t}\) as the road tangential unit vector.

The results for a desired velocity of \(v_\text {ref}^F = {5.}\, {\frac{{\text {m}}}{{\text {s}}}}\) are illustrated in Fig. 7. As can be seen, the leader changes to the left lane to get in front of the follower. Despite its interest in driving fast, the leader starts to brake, forcing the follower to slow down. To prevent the follower from overtaking, the leader drives close to the center of the road.

Fig. 7
An illustration depicts a highway scene with 3 sets of 2 cars, each with a different velocity. The first set drives at 10 meters per second at t = 0.0 seconds, the second at 6.5 meters per second at t = 1.2 seconds, and the third at 5 meters per second at t = 6.0 seconds. In a bi-level approach, the follower's velocity is penalized to prevent overtaking.

By sole penalizing the follower’s velocity the bi-level approach yields an intuitive solution; The leader has to change the lane and needs to brake to slow down the follower

6.2.2 Push the Human to the Adjacent Lane

In this experiment, the ability to also influence the follower in the lateral direction is investigated. Therefore, a 3-lane road is considered, see Fig. 8.

The leader’s goal is to enforce a lane change of the human to the adjacent left lane. This incentive is encoded by setting \(J_{\text {influence}}\) to penalize deviations of the follower’s lateral position to a reference \(y_\text {ref}^F\) as:

$$\begin{aligned} J_{\text {influence}} = \sum _{k = 1}^{N} (y_{k}^F - y_{\text {ref}}^F)^2 \end{aligned}$$
(16)

Figure 8 shows the behavior for \(y_\text {ref}^F = {8.5\,\mathrm{\text {m}}}\), which corresponds the center of the leftmost lane. To push the follower to the left, the leader changes lanes and slows down, almost coming to a full stop. The leader thereby blocks the middle lane, which forces the follower to also slow down to avoid a collision. To continue, the follower starts an overtaking maneuver. At the same time, the leader accelerates again to stay next to the follower, blocking him from changing back to his original lane.

Fig. 8
An illustration depicts a highway scene with 3 sets of 2 cars, each with a different velocity. The first set drives at 10 and 10 meters per second at t = 0.0 seconds, the second at 4.1 and 2.2 meters per second at t = 1.5 seconds, and the third at 9.9 and 9.8 meters per second at t = 6.0 seconds.

The leader changes lanes and brakes harshly to enforce an overtaking maneuver of the follower. As soon as the follower tries to overtake, the leader accelerates again, blocking the follower from changing back to the middle lane

6.3 Interaction-Aware Trajectory Optimization

In real traffic, the primary goal of the AV is to drive comfortably and efficiently rather than to change the state of surrounding vehicles in a certain way. Therefore, the generated behavior when planning trajectories with the proposed interaction-aware algorithm in different lane change scenarios is investigated next. To better show the effect of the planned behavior, the desired velocity of the follower is increased to \(v_\text {ref}^F = {15.} \, {\frac{{\text {m}}}{{\text {s}}}}\). Throughout the scenarios, the leader aims to perform a lane change to the left.

6.3.1 Efficient Planning

We start by formulating the leader’s objective in an egocentric way, similar to how it is formulated for planners following a predict-then-plan scheme. Here, the leader solely considers attributes of its own trajectory, formulated by only optimizing \(J_\text {base}\).

The resulting trajectories are shown in Fig. 9. As can be seen, the leader plans a very efficient lane change without any acceleration. However, as a response, the follower has to brake harshly to avoid a collision, see Fig. 10. This aggressive cut in is a result of the leader knowing that the follower will react, which the leader then exploits to further optimize its own driving behavior.

Fig. 9
An illustration depicts a highway scene with 3 sets of 2 cars, each with a different velocity. The first set drives at 15 and 10 meters per second at t = 0.0 seconds, the second at 9.3 and 10.0 meters per second at t = 2.4 seconds, and the third at 10.0 and 10.0 meters per second at t = 6.0 seconds.

When only considering its own costs, the leader performs an aggressive lane change

Fig. 10
A line graph of vehicle's velocity versus time, with a sudden drop in speed as it changes lanes. Vehicle 1 moves at 10 meters per second from 0.0 to 9.0 seconds. Vehicle 2 starts from 15 meters per second and drops to 9 meters per second and finally moves at 10 meters per second from 4.0 to 9.0 seconds.

While the leader can perform a smooth lane change without accelerating, the follower has to brake harshly to avoid a collision

This example shows that interactive behavior not only occurs when the leader is incentivized to alter the state of the follower but also emerges out of efficiency.

6.3.2 Cooperative Interaction-Aware Planning

The proposed interaction-aware model gives the leader the ability to anticipate the follower’s reaction. When naively using an egocentric objective function, the leader exploits the follower’s response and generates an overly aggressive behavior, as demonstrated in the previous example.

To mitigate this effect, the impact imposed on others must be considered in the objective function of the leader. Therefore, a formulation base on a cooperative cost function that includes the leader’s and followers’s cost in the leader’s objective is considered in the following:

$$\begin{aligned} J_{\text {cooperative}} = \alpha J_{F,\text {base}} + (1-\alpha ) J_{L,\text {base}} \end{aligned}$$
(17)

In this formulation, the variable \(\alpha \in [0,1]\) determines to which extent the leader’s and the follower’s cost are considered. Therefore, \(\alpha \) provides a way to design different driving behaviors, ranging from overly aggressive to overly conservative.

The impact the parameter \(\alpha \) has on the generated behavior is investigated in the following. Therefore, we consider a scenario including a mandatory lane change for the leader, see Fig. 11. The different \(\alpha \)-dependent acceleration and velocity profiles for \(\alpha = 0.0\), \(\alpha = 0.5\) and \(\alpha = 0.99\) are illustrated in Fig. 12.

In detail, for \(\alpha = 0.0\), the leader does not accelerate, and all the discomfort has to be carried out by the follower. This represents the aggressive, egocentric behavior presented in the previous experiment. With a larger \(\alpha \), the leader increases its acceleration until reaching the acceleration limits. In the case of \(\alpha = 0.99\), the leader mostly considers the follower’s cost and tries to intervene with its optimal plan as little as possible. This value of \(\alpha \) generates a very conservative behavior similar to a predict-then-plan approach. With \(\alpha = 0.5\), the leader’s and the follower’s cost are considered equaly, which leads to an approximately equal distribution of discomfort. Note however, that, besides adjusting the acceleration during the lane change, the leader also adapts its stationary velocity depending on \(\alpha \).

Fig. 11
An illustration depicts a highway scene with 4 sets of 2 cars. The first set drives at 15 and 10 meters per second at t = 0.0 seconds, the second at 9.1 and 10.0 meters per second at t = 2.4 seconds, the third at 12.1 and 13.0 meters per second at t = 2.4 seconds, and the fourth at 14.7 and 15.7 meters per second at t = 2.4 seconds.

Illustrated is a scenario where the leader has to perform a lane change to the left. Depending on the value of \(\alpha \), different behaviors are generated, ranging from overly aggressive to overly conservative

Fig. 12
Three sets of line graphs of vehicle's acceleration in meters per square second and velocity in meters per second versus time for alpha = 0.0, 0.5, and 0.99. Two vehicles start at different acceleration and velocities and then move at same velocities in the same lane.

Depending on the value of \(\alpha \) different acceleration and velocity profiles are obtained. Thereby, the langer \(\alpha \) is, the more discomfort the leader accepts. Further, with different \(\alpha \) the vehicles approach different stationary velocities which might significantly differ from their desired velocities

6.3.3 Courtesy Constraints

The cooperative cost formulation presented in the previous experiment has the side effect that for \(\alpha > 0.0\), the leader permanently drives faster than its desired velocity \(v_\text {ref}^L\). For some scenarios, e.g., overtaking a slow-moving truck on the highway, a temporal increased velocity might be acceptable or even desirable for traffic efficiency. However, in most situations, a vehicle in front does not adapt its velocity to the desires of rear traffic.

An alternative to the cooperative cost formulation is introducing courtesy constraints. With these constraints, the leader’s impact on others can be limited without altering the leader’s objective function.

In this experiment, we introduce a constraint such that the leader is allowed to, at max, cause a deceleration of \(a_{\text {limit}}\) to the follower. To enforce this, the following constraints are added to the leader’s NLP:

$$\begin{aligned} \textbf{g}_{\text {courtesy},k} = a_{k}^F - a_{\text {limit}}\ge 0 \end{aligned}$$
(18)

Here, \(a_{k}^F\) is the acceleration of the follower.

The effect of the courtesy constraint with \(a_{\text {limit}} = -2.0\) on the considered merging scenario is illustrated in Fig. 13. By introducing the constraint, the leader accelerates during the lane change, which successfully limits the induced deceleration to \(-{2.0\,\mathrm{\frac{{\text {m}}}{{\text {s}}^2}}}\). The velocity profiles are shown in Fig. 13b. Compared to the cooperative cost formulation, the leader returns to its desired velocity of \({10.0\,\mathrm{\frac{{\text {m}}}{{\text {s}}}}}\) after the successful merge.

Fig. 13
Two line graphs of acceleration and velocity versus time for the leader's acceleration and velocity and the follower's acceleration and velocity. The courtesy constraint limits the follower's deceleration to negative 2.0 meters per square seconds and allows the leader to return to its desired velocity of 10.0 meters per second after the merge.

Shown are the acceleration (a) and velocity (b) profiles when planning with the courtesy constraints. Introducing these constraints into the leader’s NLP generates a behavior that successfully limit the follower’s deceleration to \(a_{\text {limit}} = -2.0\). Further, after the merge is completed, the leader returns to its desired velocity

Both, the cooperative cost and the courtesy constraint method have traffic scenarios where they are particularly suited. E.g., when overtaking a slower driving vehicle on the highway, the cooperative cost formulation might be more suitable as it leads to a temporal increase in velocity for the duration of the overtaking maneuver. In contrast, for a merging scenario or a permanent lane change, the courteous constraint method might be the better choice since the leader returns to its desired velocity after the merge is completed.

6.4 Runtime Experiments

The presented method for interaction-aware trajectory optimization computes an open-loop solution for the AV. More precisely, the control inputs are functions of time and not of the state. To adapt to unforeseen changes in the environment, the algorithm needs to run in an MPC fashion. For MPC, a sufficiently high update rate is crucial. Therefore, we analyze the performance of the algorithm with a proof of concept MPC implementation.

The MPC was implemented in Python. All necessary derivatives were calculated using the open-source software CasADi [1]. CasADi utilizes automatic differentiation methods to accurately calculate the derivatives. Compared to, finite difference methods, automatic differentiation is faster and more accurate. Further, IPOPT [36] was used to solve the formulated NLP. IPOPT is a general-purpose solver for large-scale nonlinear problems. We cold started the IPOPT solver with a feasible solution of the desired driving maneuver, which we obtained by sequentially solving a single vehicle NLP, as in (8), first for the leader and then for the follower. This initialization was only performed for the very first iteration of the planner. All subsequent iterations were warm started with the solution of the previous iteration. To get a better initial guess, the previous solution was shifted by the duration between the planning iterations.

The timing results were obtained by considering a merging scenario, with the two most relevant methods for the application to real traffic, namely, the cooperative cost function method, with \(\alpha = 0.5\) and the courtesy constraints method, with \(a_{\text {limit}} = -2.0\). We simulate each method for \({9.0\,\mathrm{\text {s}}}\). A horizon length of \(N = 30\) steps is considered for the MPC. Further parameters were taken from Table 1. The runtime results are obtained by running the MPC implementation 100 times on the merging scenario with both methods. The mean solve time over the 100 simulation runs are shown in the histogram in Fig. 14. Additionally, the mean and standard deviation of the mean solve times are listed in Table 3. All timing results were obtained on an Intel Core i7-8565U CPU with a clock rate of 1.80GHz.

Fig. 14
Two bar graphs of frequency versus time in milliseconds. The peak value is (95, 51) in graph a and (78, 36) in graph b. Values are estimated.

Mean solve times obtained by running the MPC implementation 100 times on the merging scenario with (a) the cooperative cost and (b) the courtesy constraint method

Table 3 Mean and standard deviation of the mean solve times obtained by running the MPC implementation 100 times

Even though the experiments were conducted with an MPC implementation that leaves great potential for improvements, we could already demonstrate our algorithm’s real-time capability with mean solve times of \({96.82\,\mathrm{\text {m}\text {s}}}\) and \({83.85\,\mathrm{\text {m}\text {s}}}\), respectively. The presented results can be considered a conservative estimate of the achievable performance. However, in the future, this could be greatly improved by utilizing tailored solvers and implementing the approach in a high-performance programming language, e.g., C++.

7 Algorithm Discussion

A core assumption that we made to obtain the model for interaction-aware planning, stated in (3), is that the human does not try to influence the AV but rather reacts to its actions. According to [30, 31], this is a valid assumption for a wide range of interactive scenarios. Further, compared to a Nash equilibrium, it might even be the better model for how humans act in interactive situations since humans typically do not solve games in their everyday lives when they are not playing chess [15].

The formulated NLP (6), is a non-convex and non-smooth problem. As such, one can not expect to find globally optimal solutions. However, we use derivative-based optimization methods to find local optima. These methods require an initial guess, which sets the considered homotopy, as solutions of local methods are typically in the same homotopy as the initial guess. In the context of automated driving, homotopies are often thought of as maneuvers. Thus, we use the initial guess to encode the desired driving maneuver. Via the experiments, we empirically observe that initializing with a rough, but feasible initial guess of the desired maneuver is sufficient to reliably solves the problem. To take multiple maneuvers into consideration, it is advisable to combine the presented approach with a global method. E.g., a higher abstraction behavior planner based on an arbitration scheme as in [26] could be used to generate good initial guesses.

The focus of the experiments was to analyze the capabilities and the performance of the proposed bi-level planner. As such, the algorithm was evaluated in a tailored simulation environment, where one important modeling assumption was that the human driver is always attentive. However, in real traffic, this is not the case, and human drivers are sometimes distracted and do not respond to the actions of the AV. Therefore, the presented algorithm needs to be combined with an intention estimation, e.g, as presented in [5], to cope with unattentive drivers.

8 Conclusion

In this chapter, we presented an algorithm that is able to generate interaction-aware trajectories for AV. The interaction between a HD and an AV is modeled as a Stackelberg game, where the human responds rationally to the AV’s actions optimizing its own objective. This leads to a nested optimization problem which we approximate by MPC based on a bi-level optimization formulation. To solve this, we reformulated the problem into a single-level representation, exploiting the assumption that the human will act optimally with respect to its objective function. We solve the obtained NLP using derivative-based optimization methods. The presented algorithm is able to solve the interaction-aware trajectory optimization problem in a continuous state and input space. Further, in contrast to existing methods, general nonlinear state and input constraints can be considered, which allows for an accurate dynamics model.

The algorithm enables the AV to anticipate how surrounding HD will react to its actions. This gives the AV the possibility to deliberately influence the state of the human. Here, simply encoding the desired effect into the AV’s objective function is enough to generate complex, interaction rich behavior, without the need for hand designed decision heuristic. Further, interactive behavior does not only occur if incentivized in the AV’s objective function, but also emerges out of optimizing the AV’s behavior.

However, care must be taken to avoid that the AV exploits interactions to further optimize its own objective, and thereby generates an overly aggressive driving behavior. To prevent such an aggressive behavior, the AV’s objective is extended to also consider the costs of the HD.

As an alternative to modifying the AV’s objective function, we presented a strategy to establish courtesy in the planning algorithm via additional constraints. These constraints allow a motion planner to utilizes an egocentric objective function, provided that the negative impact imposed on other vehicles is limited.

The experiments demonstrated the efficacy of our algorithm and suggest that the algorithm can be used in challenging interactive driving scenarios. Further, we could achieve real-time performance even with an unoptimized proof-of-concept implementation.