Abstract
The state-of-the-art technology in the field of vehicle automation will lead to a mixed traffic environment in the coming years, where connected and automated vehicles have to interact with human-driven vehicles. In this context, it is necessary to have intention prediction models with the capability of forecasting how the traffic scenario is going to evolve with respect to the physical state of vehicles, the possible maneuvers and the interactions between traffic participants within the seconds to come. This article presents a Bayesian approach for vehicle intention forecasting, utilizing a game-theoretic framework in the form of a Mixed Strategy Nash Equilibrium (MSNE) as a prior estimate to model the reciprocal influence between traffic participants. The likelihood is then computed based on the Kullback-Leibler divergence. The game is modeled as a static nonzero-sum polymatrix game with individual preferences, a well known strategic game. Finding the MSNE for these games is in the PPAD \(\cap\) PLS complexity class, with polynomial-time tractability. The approach shows good results in simulations in the long term horizon (10s), with its computational complexity allowing for online applications.
Similar content being viewed by others
1 Introduction
Risk identification in traffic is essential for guaranteeing safe driving in automated vehicles. Dangerous scenarios could arise from an inaccurate estimation of future trajectories of other traffic participants, given the uncertainty of human behavior. Prediction is therefore necessary in order to guarantee a safe decision making. Nevertheless, a reliable estimation can not be limited to a short-term prediction based on dynamic or kinematic models, but it has to take into account possible future interactions and influence between the traffic participants in the scenario.
Many trajectory predictors are data-driven-based and they suffer from exponentially increasing sample complexity when trying to predict trajectories across a joint space in a multi-agent environment, as the number of agents increases. Edge cases constitute another challenge, particularly when the learning data set is largely comprised of straightforward traffic scenarios.
This paper presents a methodology to predict the future trajectories of vehicles in traffic, considering the information coming from sensors in terms of actual state of the vehicles and interactions and mutual influence between them, that can come from possible future traffic outcomes. The proposed model is therefore multi-agent, not based on data-driven learning and generally adaptable to any topology. The contributions of this study are the following:
-
1.
Definition of an innovative and effective framework for vehicle intention prediction through the use of the Bayes’ theorem. The approach considers both the rational prior outcome of the traffic scenario through a Mixed Strategy Nash Equilibrium (MSNE) and the current evidence from the vehicle data as the likelihood. The game-theoretic framework takes into account the interactions and reciprocal influence between the vehicles, while the likelihood corrects the prior estimate considering the actual short-term trajectories that the agents are going to take.
-
2.
The model outputs probability distributions over a set of possible trajectories that vehicles can take. The finite action space of the agents allows the game to be modeled as a static nonzero-sum polymatrix game (network coordination game) with individual preferences. Polymatrix games with individual preferences belongs to the PPAD \(\cap\) PLS complexity class, with polynomial-time tractability [1,2,3]. The computational time satisfies the real-time application. The simplification of the game does not negatively impact performance, as the results are still acceptable even for long-term horizons of 10 seconds.
The organization of this paper are as below. In Sect. 2, some background related to the different models of trajectory prediction algorithm is provided. The formulation of the problem can be found in Sect. 3. In Sect. 4, the approach is presented. Section 5 shows the details of a case study. In Sect. 6, some results in simulation environment in terms of performances and computational time are provided. Finally, the results and potential future research directions are discussed in Sect. 7.
2 Related Works
Most motion prediction approaches can be categorized following the three-levels scheme proposed by Ref. [4]: (1) Physics-based motion prediction algorithms, which rely on dynamic and kinematic models of the system [5, 6]; (2) Maneuver-based motion models, which consider the maneuver intention of the traffic participant and incorporate a high-level-strategic layer [7,8,9]; (3) Interaction-aware prediction algorithms, which consider the possible interactions and inter-dependencies between the vehicles in the traffic scenarios [10, 11].
The physics-based motion prediction algorithms have an acceptable reliability only on a short-term horizon, since the possible maneuver and the long-term interactions between the vehicles are ignored. Also the maneuver-based motion models fall short in the context of a complex traffic scenario. As stated by Ref. [10] indeed, many trajectory prediction approaches focus on estimating marginal prediction samples of possible future trajectories of single vehicles, failing in considering interactions and mutual influence between the traffic participants. The interaction-aware prediction algorithms try to provide a solution to this problem by modeling the multi-agent environment.
A second classification, transversal to the previous one proposed by Ref. [8], divides the approaches into: (1) Deterministic models, which predict for each agent a unique trajectory that is considered the most probable [12,13,14]; (2) Stochastic models, which define a non-deterministic framework for the estimation, for instance predicting the likelihood of a finite set of outcome-representative trajectories [8, 15,16,17,18], or defining a probabilistic state-space occupancy grid [19]. The predictive method proposed in this article can be classified as stochastic, maneuver-based and interaction-aware. It is stochastic because the output is not a single trajectory but a probability distribution over a set of trajectories, maneuver-based since maneuvers are predicted, interaction-aware because the forecast outcome includes the result of a MSNE, within a multi-agent interactive and game theoretic framework.
Game theory is a powerful approach that can be used in prediction and decision making. Recent examples can be found in Refs. [20,21,22]. In Ref. [22], Nash and Stackelberg equilibria are applied for human-like decision making, modeling the different driving styles and social interaction characteristics. In Ref. [20], the authors define an online method of predicting multiagent interactions estimating their SVO (Social Value Orientation). The interactions between agents are modeled as a best-response game and the control policy is found by solving the dynamic game and finding the Nash equilibrium. In Ref. [21], the other agents’ cost function parameters are estimated online then used to find the Nash equilibrium in a discretized dynamic game. These works model the interaction as a multi-agent dynamic game, since the action space consists on the agents’ vehicles inputs. However, the problem of finding a Nash equilibrium of a dynamic game can be computationally unfeasible in real-time applications, particularly under a long time horizon. For this reason, the current study considers the high-level decisions that vehicles can take, or the possible maneuvers in the scenario, as the action space. This allows to have a finite action space, a N-person static finite polymatrix game that can always be solved by finding a Nash equilibrium in mixed strategies [23].
The approach proposed in this article has been inspired particularly by the work of Refs. [11, 15] and [24]. In Ref. [15], the inference of a distribution of high-level, abstract driving maneuvers has been taken as reference for the desired output. Moreover, the philosophy under the proposed Bayesian network in Ref. [15] reflects the Bayesian approach proposed in this article. In Ref. [11], a Nash equilibrium in mixed and behavioural strategies is used to predict the future vehicles’ maneuvers. The mixed strategies for a player are a probability distribution over all of the player’s pure strategies, therefore the possible maneuvers. Finally, the computation of the MSNE utilizes a merit function, known as the Gradient-based nikaido-isoda (GNI) function, as introduced in Ref. [24].
3 Problem Statement
For each traffic participant in the scenario, a set of representative trajectories are computed based on map information (Fig.1). These trajectories represent particular maneuvers and behaviours that the vehicle could take in the future. The definition of these trajectories has been inspired particularly by the work of Ref. [15]. The trajectory computation is performed in two stages: (1) Computation of the geometrical path through a Particle Swarm Optimization algorithm. (2) Definition of the acceleration profile along the path. Local uncertainty along the trajectory is introduced through a evolution kinematic model with Gaussian noise on the vehicle state and input variables.
The trajectories defined for each vehicle are:
-
Acceleration trajectory: trajectory in which \(a \sim N(a \mid \mu _a, \sigma ^{2}_{a})\), where \(\mu _a = 1.5 \ \mathrm{m/s}^{2}\) and \(\sigma ^{2}_{a} = \sigma _{0}^{2} + \alpha t\), \(\sigma _{0}^{2} = 0.5 \ \textrm{m}^{2}/\textrm{s}^{4}\), \(\alpha = 1\text{e}-3\)
-
Constant speed trajectory: trajectory in which \(a \sim N(a \mid \mu _a, \sigma ^{2}_{a})\), where \(\mu _a = 0 \ \mathrm{m/s}^{2}\) and \(\sigma ^{2}_{a} = \sigma _{0}^{2} + \alpha t\), \(\sigma _{0}^{2} = 0.5 \ \textrm{m}^{2}/\textrm{s}^{4}\), \(\alpha = 1\text{e}-3\)
-
Braking trajectory: trajectory in which the acceleration is the output of an Intelligent driver model (IDM) when it is overlapping with trajectories of other vehicles, \(a \sim N(a \mid \mu _{a}, \sigma ^{2}_{a})\), where \(\mu _{a} = \textrm{IDM}\) or \(-0.5 \ \mathrm{m/s}^{2}\) (if there is no overlapping) and \(\sigma ^{2}_{a} = \sigma _{0}^{2} + \alpha t\), \(\sigma _{0}^{2} = 0.5 \ \textrm{m}^{2}/\textrm{s}^{4}\), \(\alpha = 1\text{e}-3\)
-
Harsh braking trajectory: trajectory in which \(a \sim N(a \mid \mu _a, \sigma ^{2}_{a})\), where \(\mu _a = -3.0 \ \mathrm{m/s}^{2}\) and \(\sigma ^{2}_{a} = \sigma _{0}^{2} + \alpha t\), \(\sigma _{0}^{2} = 0.5 \ \textrm{m}^{2}/\textrm{s}^{4}\), \(\alpha = 1{\text{e}}-3\)
Considering the state vector \({\varvec{x}} = \{ \ x, \ y, \ \psi , \ v \}^{\text{T}}\), where x and y are the Cartesian coordinates of the position, \(\psi\) is the heading and v is the speed, the time step t in the trajectory can be represented by the following normal distribution:
Where \(p({\varvec{x}}_{t} \mid \tau )\) is the probabilty density function (pdf) of the state \({\varvec{x}}\) conditioned by the choice of the trajectory \(\tau\) at time step t, \(\varvec{\mu }_{t}^{\tau }\) comes from the \(\tau\) trajectory computation and \(\varvec{\varSigma }_{t}^{\tau }\) is the covariance matrix on the state space that gives local uncertainty. Starting from a simple bicycle model of the system \(\varvec{\dot{x}}(t) = {\varvec{f}}({\varvec{x}}(t), {\varvec{u}}(t))\), where \({\varvec{u}}(t) = \{ a(t), \delta (t) \}^{\text{T}}\) is the vector of the inputs (acceleration a(t) and steering angle \(\delta (t)\)), through inverse kinematics the steering angle \(\delta (t)\) required is computed. Considering the linearized time-discrete version \({\varvec{x}}_{t+1} = {\varvec{A}}_{t}{\varvec{x}}_{t} + {\varvec{B}}_{t}{\varvec{u}}_{t}\) the dynamic model of the covariance matrix is defined:
Where \(\varvec{\varGamma }_{t} = diag(\sigma _{a}^{2}(t), \sigma _{\delta }^{2}(t))\) is the diagonal covariance matrix of the inputs, with variances linearly increasing over time.
The policy of the i-th traffic participant \(\pi (\tau _{j} \mid {\varvec{x}}_{0})\) is a probability distribution over the finite number of trajectories available \(\tau _{j} ,\ \ j = 1,\ldots , N_{i}\) and represents the future intention of the vehicle, conditioned by the actual vehicle state \({\varvec{x}}_{0}\). Marginalizing on the agent policy, the pdf of the state vector \({\varvec{x}} = \{ \ x, \ y, \ \psi , \ v \}^{\text{T}}\) at time \({t}\) conditioned by the initial state \({\varvec{x}}_{0}\) is:
The result is a Gaussian Mixture Model distribution, weighted by the agent policy \(\pi (\tau _{j} \mid {\varvec{x}}_{0})\). This policy represents the strategic uncertainty over the intentions of the vehicle and it is the object of interest of the prediction approach presented in this article.
4 Application of the Bayes’ Theorem
The Bayes’ theorem is applied:
Where \(p(\tau _{j})\) is the prior probability of the trajectory \(\tau _{j}\) and \(p({\varvec{x}}_{0} \mid \tau _{j})\) is the likelihood, which gives a measure of compatibility between the current evidence coming from data \({\varvec{x}}_{0}\) and the pre-computed trajectory \(\tau _{j}\). The \(p(\tau _{j})\) is the result of a MSNE, \(p({\varvec{x}}_{0} \mid \tau _{j})\) is found defining a likelihood function.
4.1 Prior Probability Through Mixed Strategy Nash Equilibrium
The MSNE is the solution to a non-cooperative game involving two or more players, considering mixed strategies (probability distributions over the action space) instead of pure strategies. A mixed strategy profile is considered an MSNE if each player’s strategy is the best response to the strategies of all other players. This decision to use the prior distribution as an MSNE comes from the hypothesis that every traffic participant, as a rational player, is prone to adopt his optimal strategy in the multi-agent traffic scenario.
Each vehicle (player) \(i = 1,\ldots ,M\) in the scenario is supposed to choose among a finite set of \(N_{i}\) trajectories, defined by the notation \({\mathcal {T}}^{i} = \{\tau _{1}^{i},\ldots ,\tau _{N_{i}}^{i} \}\). N indicates the total number of trajectories that the vehicles can take, i.e. \(N = \sum _{j=1}^{M}N_{j}\).
Let the notation
denote the mixed-strategy vector of the i player, i.e. the probability distribution over the available trajectories, while
denotes the vector of the mixed-strategies of all the M players.
Let \(\varvec{\gamma }_{-i} = \left[ \varvec{\theta }_{1}^{\textrm{T}},\ldots ,\varvec{\theta }_{i-1}^{\textrm{T}},\varvec{\theta }_{i+1}^{\textrm{T}}, ... \varvec{\theta }_{M}^{\textrm{T}} \right] ^{\textrm{T}}\) denote the set of mixed-strategies except i. The notation \(\varvec{\tau } = \{ \tau ^{1},\ldots , \tau ^{M} \}\) indicates the joint pure strategy among all the players (i player chooses trajectory \(\tau ^{i}\)), while \(f_{i}\) denotes the payoff function for the i player and it takes into account safety, efficiency and comfort.
The following problem is considered:
A point \(\varvec{\gamma }^{*}\) that satisfies Eq.(7) is called a Nash equilibrium (NE). Every N-person static finite game in normal form admits a noncooperative NE solution in mixed strategies [23].
4.1.1 Payoff Function
The payoff function \(f_{i}\) for each traffic participant takes into account safety (\(f_{i}^{\textrm{S}}\)), comfort (\(f_{i}^{\textrm{C}}\)) and efficiency (\(f_{i}^{\textrm{E}}\)) with weighting coefficients (\(\alpha ^{\textrm{S}}\), \(\alpha ^{\textrm{C}}\), \(\alpha ^{\textrm{E}}\)) as shown in Eq.(8). The definition of payoff function considering pure strategies is presented here, which means that player i chooses trajectory \(\tau ^{i}\) with probability 1.
The payoff function for safety \(f_{i}^{S}\) depends on the joint pure strategy of all the players \(\varvec{\tau }\), while both the ones for efficiency \(f_{i}^{\textrm{E}}\) and comfort \(f_{i}^{\textrm{C}}\) depend only on the trajectory chosen by player i, that is \(\tau ^{i}\). The weighting coefficients have been tuned to optimize the performance in the simulation environment used in this article [25].
The safety payoff is computed following Eq. (9):
Where \(f_{i,j}^{S}(\tau ^{i}, \tau ^{j})\) is the safety payoff for player i (and player j for symmetry) considering trajectory \(\tau ^{i}\) for player i and trajectory \(\tau ^{j}\) for payer j, \(w^{S}\) is the penalty for crash, \(\gamma\) is the discount factor, \(\varvec{\mu }_{t}^{\tau ^{j}}\) is the mean vector (x, y) of the multivariate normal distribution \(N({\varvec{x}}_{t} \mid \varvec{\mu }_{t}^{\tau ^{j}}, \varvec{\Sigma }_{t}^{\tau ^{j}} )\), which gives the pdf of the position of vehicle j on its trajectory \(\tau ^{j}\) at time step t.
The definition of the safety payoff, in particular the exponent in Eq. (9), is inspired by the Mahalanobis distance [26], with a slightly modification. Indeed, the Mahalanobis distance defines a distance measure between a point and a distribution, while here a distance between two distributions is needed. Therefore it is necessary to modify the covariance matrix, that is \(\varvec{\hat{\varSigma }} = 0.5(\varvec{\varSigma }_{t}^{\tau ^{j}} + \varvec{\varSigma }_{t}^{\tau ^{i}}) + \beta {\varvec{I}}\). The covariance matrix is therefore the average between the covariance matrices of the two distributions, with the additional term \(\beta {\varvec{I}}\). This term has two scopes: (1) to guarantee in any case a payoff when the two distributions are close, even with small covariance matrices, (2) to avoid that \(\varvec{\hat{\Sigma }}\) is badly conditioned for the inversion. A further description is given in Fig.2.
The comfort payoff is the following:
Where \(a_{t}^{\textrm{long}}\) and \(a_{t}^{\textrm{lat}}\) are the longitudinal and lateral acceleration at time step t of the trajectory \(\tau ^{i}\), while \(w_{\textrm{long}}^{\textrm{C}}\) and \(w_{\textrm{lat}}^{\textrm{C}}\) are the respective penalties.
The efficiency payoff is the following:
In Eq.(11), \(v_{\textrm{lim}}\) is the speed limit and \(v_{t}\) the speed at time step t of the trajectory \(\tau ^{i}\).
Considering the previous definitions, it is possible now to extend it in the general case of mixed strategies, in which the action of player i is a probability distribution over the possible trajectories: \(\varvec{\theta }_{i} = \left[ p(\tau _{1}^{i}),\ldots ,p(\tau _{N_{i}}^{i}) \right] ^{\textrm{T}}\). \(\tau _{k}\), with \(k = 1,\ldots ,N_{i}\), are the available trajectories of player i.
Reminding that \(\varvec{\gamma } = [ \varvec{\theta }_{1}^{\textrm{T}},\ldots ,\varvec{\theta }_{M}^{\textrm{T}} ]^{\textrm{T}}\) denotes the set of the mixed-strategies of all the M players, the objective is to find an expression for the expected payoff in case of mixed strategies, i.e. \(\underset{\varvec{\tau } \sim \varvec{\gamma }}{{\mathbb {E}}} [ f_{i}(\varvec{\tau })]\).
4.1.2 Polymatrix Coordination Game with Individual Preferences
The definition of the players’ payoff allows to model the game as a polymatrix coordination game with individual preferences, a well known strategic game. Let’s define the matrix \({\varvec{P}}^{i,j} \in {\mathbb {R}}^{N_{i} \times N_{j}}\) as the matrix of safety payoffs between vehicle i and vehicle j. The \(p_{n,m}^{i,j}\) element of the matrix is:
Where \({\mathcal {T}}^{i}\) and \({\mathcal {T}}^{j}\) indicate the sets of trajectories available to the i and j vehicle. Therefore the (n, m) element of the matrix \({\varvec{P}}^{i,j}\) is the safety payoff for vehicle i and j given that vehicle i chooses the trajectory \(\tau _{n}^{i}\) and vehicle j the trajectory \(\tau _{m}^{j}\).
Let’s recall that N indicates the total number of all the available trajectories of the vehicles in the scenario. Considering the matrix \(\varvec{\hat{Q}}^{i} \in {\mathbb {R}}^{N_{i} \times N}\), defined as:
It is possible to define the matrix \({\varvec{Q}}^{i} \in {\mathbb {R}}^{N \times N}\) for each traffic participant i:
Considering the vector \(\varvec{\hat{r}}^{i} \in {\mathbb {R}}^{N_{i}}\) for each vehicle i, whose element \(\hat{r}_{j}^{i}\) is:
The vector \({\varvec{r}}^{i} \in {\mathbb {R}}^{N}\) for each vehicle i is defined:
The expected payoff in the mixed strategy game is:
That, considering the definitions Eqs. (12) and (15) are:
The expression can be made quadratic with respect to \(\varvec{\gamma } \in {\mathbb {R}}^{N}\) by means of the definitions Eqs.(14) and (16):
The component \(\varvec{\gamma }^{\textrm{T}}{\varvec{Q}}^{i}\varvec{\gamma }\) gives the payoff given by the interaction with other drivers, so linked with safety and trajectories’ overlapping. The component \(\varvec{\gamma }^{\textrm{T}}{\varvec{r}}^{i}\) is the payoff exclusively dependent by the choice of the player, therefore linked with comfort and efficiency.
Note that the Eq. (18) is the definition of a static nonzero-sum polymatrix coordination game (network coordination game) with individual preferences. The matrix \({\varvec{P}}^{i,j} \in {\mathbb {R}}^{N_{i} \times N_{j}}\) is the symmetric payoff of the bimatrix game \(\{i,j\}\), the vector \(\varvec{\hat{r}}^{i} \in {\mathbb {R}}^{N_{i}}\) represents the individual preference function of player i. Polymatrix games with individual preferences belongs to the PPAD \(\cap\) PLS complexity class, with polynomial-time tractability [1, 2].
The Gradient-based Nikaido-Isoda (GNI) function [24] is used to find the NE. With the payoff function defined in Eq. (19), the gradient can be computed analytically with evident gain in terms of computational efficiency.
4.1.3 Optimization via Gradient-Based Nikaido-Isoda Function
The NE of the N-player game considered in this article Eq. (7) is found using the GNI Function, introduced in Ref. [24]. Let’s indicate the expected payoff for player i with the function \(g_{i}\) for simplicity of notation:
The GNI function \(V(\varvec{\gamma },\eta )\) is the following:
Reminding that M is the number of players (traffic participants) in the scenario, \(\varvec{\gamma } = [ \varvec{\theta }_{1}^{\textrm{T}},\ldots ,\varvec{\theta }_{M}^{\textrm{T}} ]^{\textrm{T}} \in {\mathbb {R}}^{N}\) with \(N = \sum _{j=1}^{M}N_{j}\) is the vector of the mixed strategies of the players and \(\varvec{\theta }_{j} \in {\mathbb {R}}^{N_{j}}\) is the mixed strategy vector of the j player. \(\nabla _{j}g_{j}(\varvec{\gamma })\) denotes the gradient of \(g_{j}\) with respect to \(\varvec{\theta }_{j}\) that indicates the direction of maximum increase of the payoff for player i in its action space \(\Theta _{i}\).
The idea under this merit function is that every player can locally improve their objectives using the steepest descent direction, instead of computing a globally optimal solution [24].
The gradient of \(V_{i}(\varvec{\gamma },\eta ) := g_{i}(\varvec{\gamma }) - g_{i}(\varvec{\tilde{\gamma }}(i, \eta ))\) is the following:
Where \({\varvec{E}}_{i} := {\varvec{F}}_{i}{\varvec{F}}_{i}^{\textrm{T}} \in {\mathbb {R}}^{N \times N}\), \({\varvec{F}}_{i} = [ {\varvec{0}}^{N_{i} {\times} \sum _{j=1}^{i-1}N_{j}} \ {\varvec{I}}_{i} \ {\varvec{0}}^{N_{i} {\times} \sum _{j=i+1}^{M}N_{j}}]^{\text{T}} \in {\mathbb {R}}^{N {\times} N_{i}}\), \({\varvec{H}}_{g_{i}}(\varvec{\gamma }) := \nabla (\nabla g_{i}(\varvec{\gamma }))\) is the hessian matrix, \({\varvec{I}} \in {\mathbb {R}}^{N \times N}\) and \({\varvec{I}}_{i} \in {\mathbb {R}}^{N_{i} \times N_{i}}\) are identity matrices.
Considering the function \(g_{i}(\varvec{\gamma })\) in Eq. (20), the gradients are:
The modified vector of mixed strategies \(\varvec{\tilde{\gamma }}(i, \eta ) \in {\mathbb {R}}^{N}\) is:
The Hessian \({\varvec{H}}_{g_{i}}(\varvec{\gamma }) \in {\mathbb {R}}^{N \times N}\) is:
As it is evident, \(\nabla V_{i}(\varvec{\gamma },\eta )\) can be computed analytically by using Eqs. (22), (23), (24) and (25). Finally, the gradient of \(V(\varvec{\gamma },\eta )\) is:
That is used for the descent iteration:
The fact that \(\varvec{\gamma }^{(k+1)}\) belongs to the allowed space \(\varGamma\) (defined in Eq. (6)) is not ensured by considering only Eq. (27) because the gradient \(\nabla V(\varvec{\gamma }^{(k)},\eta )\) does not belong in general to \(\varGamma\). This is the reason for which it is necessary to project \(\nabla V(\varvec{\gamma }^{(k)},\eta )\) into \(\varGamma\) before computing the descent in Eq. (27) and this theme is faced in Sect. 4.1.4.
4.1.4 Gradient Projection
In the optimization problem described in Sect. 4.1.3, the gradient \(\nabla V(\varvec{\gamma }^{(k)},\eta )\) is projected into the feasible space \(\varGamma\). This is necessary in order to allow that \(\varvec{\gamma }^{(k)} \in \varGamma\) \(\ \forall k = 1,2,...\) The projection procedure is based on the consideration that, from the definition of \(\varGamma\) in Eq. (6), every mixed strategy \(\varvec{\theta }_{i}\) must belong to the space \(\varTheta\), defined in Eq. (5). This definition ensures that \(\varvec{\theta }_{i}\) is actually a probability distribution over the possible trajectories of player i, i.e. \(\sum _{j=1}^{N_{i}}\theta _{i,j} = 1\). Considering that \(\nabla V = [\nabla _{1} V,\ldots , \nabla _{i} V, ... \nabla _{M} V]^{\textrm{T}}\), where \(\nabla _{i}V\) is the gradient with respect to \(\varvec{\theta }_{i}\), for each \(\nabla _{i}V\) the following steps are executed:
-
1.
Projection properly defined:
$$\begin{aligned} \nabla _{i}^{\parallel } V(\varvec{\gamma }^{(k)},\eta ) = \nabla _{i} V(\varvec{\gamma }^{(k)},\eta ) - <\nabla _{i} V(\varvec{\gamma }^{(k)},\eta ), \varvec{\hat{n}}>\varvec{\hat{n}} \end{aligned}$$(28)Where \(\varvec{\hat{n}}\) is the normal versor to the hyperplane \(\sum _{j=1}^{N_{i}}\theta _{i,j} = 1\), that is \(\varvec{\hat{n}} = {\textbf{1}}\), while with the notation \(<\cdot , \cdot>\) is indicated the scalar product.
-
2.
Tuning of the module:
-
(a)
Computation of the step:
$$\begin{aligned} \varvec{\theta }_{i}^{(k+1)} = \varvec{\theta }_{i}^{(k)} - \rho \nabla _{i}^{\parallel } V(\varvec{\gamma }^{(k)},\eta ) \end{aligned}$$(29) -
(b)
While \(\ \exists \ \theta _{i,j} < 0\):
$$\begin{aligned} \begin{aligned}&\nabla _{i}^{\parallel } V(\varvec{\gamma }^{(k)},\eta ) = \alpha \nabla _{i}^{\parallel } V(\varvec{\gamma }^{(k)},\eta ) \qquad \qquad \alpha = 0.9 \\&\textrm{repeat} \ \mathrm {(a)} \end{aligned} \end{aligned}$$(30)
-
(a)
4.2 Likelihood Function
The evaluation of the likelihood term \(p({\varvec{x}}_{0} \mid \tau _{j})\), which gives a measure of how likely is the current state \({\varvec{x}}_{0}\), coming from sensor data, with respect to trajectory \(\tau _{j}\), is based on the following steps:
-
1.
A short-term trajectory \(\hat{\tau }\) is computed through a simple bicycle model \(f(\cdot )\), starting from current state \({\varvec{x}}_{0} \sim N({\varvec{x}} \mid \varvec{\mu }_{0}, \varvec{\varSigma }_{0})\) and considering the input variables \({\varvec{u}}_{t} = \{a_{t}, \delta _{t} \} \sim N({\varvec{u}} \mid {\varvec{u}}_{0}, \varvec{\varSigma }_{t}^{u})\), therefore considering acceleration and steering angle normally distributed around the initial estimated input \({\varvec{u}}_{0} = \{a_{0}, \delta _{0} \}\).
-
2.
The Kullback-Leibler divergence is measured between the pdf of the short-term trajectory \(p({\varvec{x}}_{t} \mid \hat{\tau })\) and the pdf of the reference trajectory \(p({\varvec{x}}_{t} \mid \tau _{j})\) at each time step t:
$$\begin{aligned}{} \mathrm {D_{KL}}\left( p({\varvec{x}}_{t} \mid \hat{\tau }) \Vert p({\varvec{x}}_{t} \mid \tau _{j})\right) = \int _{{\mathbb {R}}^{n}} N({\varvec{x}}_{t} \mid \varvec{\mu }_{t}^{\hat{\tau }}, \varvec{\Sigma }_{t}^{\hat{\tau }} )\ln {\frac{N({\varvec{x}}_{t} \mid \varvec{\mu }_{t}^{\hat{\tau }}, \varvec{\Sigma }_{t}^{\hat{\tau }} )}{N({\varvec{x}}_{t} \mid \varvec{\mu }_{t}^{\tau _{j}}, \varvec{\Sigma }_{t}^{\tau _{j}} )}}\text{d} {\varvec{x}} \end{aligned}$$(31)The Kullback-Leibler divergence \(\mathrm {D_{KL}}(P \Vert Q)\) can be seen as the information lost when the distribution Q is used to approximate the distribution P, in this case, considering the whole horizon, how much the trajectory \(\tau _{j} \ (Q)\) is failing in representing the "true" trajectory \(\hat{\tau } \ (P)\).
-
3.
\(p({\varvec{x}}_{0} \mid \tau _{j})\) is computed through a soft-max function that takes as a parameter the sum over the horizon of the Kullback-Leibler divergence between the time steps of the two trajectories:
$$\begin{aligned}{} p({\varvec{x}}_{0} \mid \tau _{j}) = \frac{\exp {(- \beta \sum _{t=1}^{T} \mathrm {D_{KL}}(p({\varvec{x}}_{t} \mid \hat{\tau }) \Vert p({\varvec{x}}_{t} \mid \tau _{j}))})}{\sum _{k=1}^{N_{i}}\exp {( - \beta \sum _{t=1}^{T} \mathrm {D_{KL}}(p({\varvec{x}}_{t} \mid \hat{\tau }) \Vert p({\varvec{x}}_{t} \mid \tau _{k}))})} \end{aligned}$$(32)
The philosophy under the computation of \(p({\varvec{x}}_{0} \mid \tau _{j})\) is to measure how much the current state of the vehicle \({\varvec{x}}_{0}\), considering the initial input \({\varvec{u}}_{0}\), is coherent with respect to the strategic choice of trajectory \(\tau _{j}\). A graphical representation is given in Fig. 3.
5 Case Study
In this section, an example of application is reported, showing briefly how the predictor performs.
The simulation environment is Automated Driving Open Research (ADORe) [25], an open source modular software library and toolkit for decision making, planning, control and simulation of automated vehicles, developed by the Institute of Transportation Systems of the German Aerospace Center (DLR).
The trajectory predictor is applied in the intersection scenario showed in Fig. 4. In the simulation, the vehicle 2 accelerates till the speed of \(10 \ \mathrm{m/s}\) while vehicle 1 decelerate smoothly and stops at the cross. The prediction for this example has a horizon of \(5 \ \text{s}\).
In Table 1, a punctual estimate is given at time \(t = 6 \ \text{s}\) circa, corresponding to the situation shown in Fig. 4. The table provides an example on how the correction mechanism works: the Prior predicts with a high probability that vehicle 1 stops at the intersection and vehicle 2 accelerates, but it also gives space to the opposite possibility, which is strongly reduced by the Likelihood. Note that the Prior does not predict a crash situation, indeed even if vehicle 1 proceeds with constant speed (16%), vehicle 2 accelerates or stops at the intersection, avoiding the collision. This is also evident in the \({\varvec{P}}^{i,j}\) matrix, defined in Eq. (12) and shown in Table 2.
The performance of the approach is measured using metrics defined in Ref. [27], in particular the minimum average displacement error (minADE):
the minimum final displacement error (minFDE):
and the Missing rate (MR):
The trajectory \(\tau _{j}^{i}\), considered for the computation, is the one with the highest posterior probability \(\pi (\tau _{j}^{i} \mid x_{0}^{i})\) of player i. The expression \(\mathbbm {1}_{{\chi }^2_{0.95}} ( \cdot )\) in the MR definition indicates the indicator function that is equal to one if the null hypothesis of the chi-squared goodness of fit test is not rejected, otherwise zero. In particular, the argument of \(\mathbbm {1}_{{\chi }^2_{0.95}} ( \cdot )\) is the test statistic, the reference distribution is a chi-squared distribution \(\chi _{2}\) with 2 degree of freedom and 0.95 confidence interval. The MR is basically testing at each time step if the actual point of the trajectory belongs to the distribution of the predicted trajectory.
Regarding the episode in Fig. 4, the results are shown in Table 3. The Table illustrates that both vehicles, vehicle 1 and vehicle 2, have an average displacement error along the episode of 2.79 m and 1.21 m, respectively, with a combined average of 2.0 m. During the last step of the episode, vehicle 1 exhibits a final displacement error of 3.01 m, while vehicle 2 records a final displacement error of 3.06 m, with 3.04 m considering both. This reaffirms the reliability of the predictions even after 5 seconds have elapsed. The missing rate considering both vehicles is 0.34, meaning that for the 66% of the episode length, the vehicles belong to the predicted distributions.
In Fig. 5, the minADE for vehicles 1, 2 and for both vehicles is shown. For the vehicle that stops (vehicle 1, Fig. 4), the minADE is higher because of the more dynamic behavior of the vehicle but it converges around \(2.7 \ \textrm{m}\) when the vehicle starts to decelerate. For vehicle 2 the minADE is lower, the vehicle has indeed a more constant behavior, but it grows rapidly, this is due to the higher uncertainty on the position for a vehicle that is in movement.
In Fig. 6, the real trajectories are represented in red. In blue are shown the predicted trajectories and the probability is given by the color intensity. A slight shift is applied to each trajectory to allow a better readability. The observation of the graph reveals that trajectories with the highest posterior probability (indicated by the dark blue lines) align more accurately with the actual trajectories (depicted in red), affirming the prediction's high quality. The graph provides a clearer comprehension of the episode, showcasing the potential trajectories considered and the actual ones followed by the vehicles.
6 Results and Discussion
This section presents the results of some simulations in ADORe in terms of minADE, minFDE and MR. Figure 7 shows the scenarios simulated. Here is a brief description of the scenarios:
-
1.
Scenario 1: this scenario represents a case study from the preceding chapter, wherein the car turning left halts at the intersection, affording the car with high priority to proceed.
-
2.
Scenario 2: in this case the car turning left holds precedence over the car proceeding straight, which stops at the intersection.
-
3.
Scenario 3: in this scenario, the merging car pauses at the intersection, granting passage to the car with high priority.
-
4.
Scenario 4: the car with high priority reduces its speed, permitting the other car to merge smoothly.
Each scenario has been repeated different times. The prediction horizon is 10s. For the first 8s of the simulation, predictions are collected. The simulation ends at \(t = 18 s\).
Table 4 shows the results of the simulation scenarios in terms of minADE, minFDE and MR, while Table 5 presents the results of the \(1{\textrm{st}},\ 12{\textrm{th}},\ 24{\textrm{th}},\ 50{\textrm{th}}\) places in the WAYMO competition in 2021. The two tables cannot be compared, since the first shows the results of predictions in a simulation environment, the second presents results of prediction on a real road dataset. However, the simulator used (ADORe) is not only a simulation environment but also a tool for decision making and control of AVs in traffic, currently used by the Institute of Transportation Systems of the German Aerospace Center. This suggests a high level of realism in the trajectories taken by the vehicles. Figure 8 showcases the time required by the algorithm to compute the Nash equilibrium and to perform the complete prediction. From these data some observations can be drawn:
-
The approach shows good performances in simulation in the long term horizon (10 s) and outperforms the acceptable standards of the WAYMO interactive data set results. Nevertheless, a fair comparison can only be established once the model is tested using real-road data sets.
-
The computation of the equilibrium always takes less than 12 ms and the whole prediction is performed in less than 120 ms, considering 11 trajectories in the scenario. This allows a online application of the predictor. Other algorithms, based on the computation of Nash equilibria in multi-agent dynamic games, often require computational time on the order of seconds. The simulations have been carried out with a processor Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz 2.71 GHz.
-
The weak point of the algorithm is the trajectories computation, as the Particle Swarm Optimization approach is not computationally efficient. This is reflected in the significant difference between the time required from the Nash equilibrium computation and the total time required. The difference is basically required by the definition of the trajectories. Nonetheless, the total time required still allows a online application of the algorithm.
7 Conclusions
In this paper, an innovative approach for predicting trajectories in traffic is proposed. The approach combines an interaction-aware motion model with a physics-based and maneuver-based model. The Bayes’ theorem is applied: the prior estimate takes into account the rational evolution of the traffic scenario by computing the Mixed strategy nash equilibrium (MSNE) among the participating vehicles. The likelihood adjusts the prior estimate by incorporating data coming from the vehicles in terms of position, heading, speed, acceleration and steering angle. This allows for consideration of the possibility of irrational decisions by the participating vehicles that may have been discarded in the prior estimate. The output of the approach is a probability distribution over a set of representative trajectories for each vehicle.
This innovative framework, which combines a priori game-theoretic considerations with the a posteriori data from the road, constitutes a crucial contribution of this study. Another important contribution is the modeling of the interactive scenario as a polymatrix coordination game with individual preferences, a well known strategic game with desirable computational complexity. The preliminary and indicative experiments to test the approach show good results and good computational efficiency. Future areas of development include incorporating a model for online estimation of drivers’ behavior and implementing a decision-making algorithm based on this predictor.
Abbreviations
- GNI:
-
Gradient-based nikaido isoda
- minADE:
-
Minimum average displacement error
- minFDE:
-
Minimum final displacement error
- MR:
-
Missing rate
- MSNE:
-
Mixed strategy nash equilibrium
- NE:
-
Nash equilibrium
- pdf:
-
Probability density function
- PLS:
-
Polynomial local search
- PPAD:
-
Polynomial parity arguments on directed graphs
References
Cai, Y., Daskalakis, C.: On minmax theorems for multiplayer games. In: Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms. 10, 217–234 (2011). https://doi.org/10.1137/1.9781611973082.20
Rahn, M., Schäfer, G.: Efficient equilibria in polymatrix coordination games. Math Found. Comput. Sci. 529–541, (2015)
Fearnley, J., Goldberg, P.W., Hollender, A., Savani, R.: The complexity of gradient descent: CLS = PPAD \(\cap\) PLS. J. ACM 70(1), 1–74 (2022). https://doi.org/10.1145/3568163
Lefevre, S., Vasquez, D., Laugier, C.: A survey on motion prediction and risk assessment for intelligent vehicles. ROBOMECH J. 1, 1 (2014). https://doi.org/10.1186/s40648-014-0001-z
Brännström, M., Coelingh, E., Sjöberg, J.: Model-based threat assessment for avoiding arbitrary vehicle collisions. IEEE Trans. Intell. Transp. Syst. 11, 658–669 (2010)
Kaempchen, N., Schiele, B., Dietmayer, K.C.J.: Situation assessment of an autonomous emergency brake for arbitrary vehicle-to-vehicle collision scenarios. IEEE Trans. Intell. Transp. Syst. 10, 678–687 (2009)
Althoff, M., Stursberg, O., Buss, M.: Model-based probabilistic collision detection in autonomous driving. IEEE Trans. Intell. Transp. Syst. 10, 299–310 (2009)
Chai, Y., Sapp, B., Bansal, M., Anguelov, D.: MultiPath: multiple probabilistic anchor trajectory hypotheses for behavior prediction. In: Proceedings of the Conference on Robot Learning, PMLR. 100, 86–99 (2020)
Xie, G., Gao, H., Qian, L., Huang, B., Li, K., Wang, J.: Vehicle trajectory prediction by integrating physics- and maneuver-based approaches using interactive multiple models. IEEE Trans. Ind. Electron. 65(7), 5999–6008 (2018). https://doi.org/10.1109/TIE.2017.2782236
Sun, Q., Huang, X., Gu, J., Williams, B.C., Zhao, H.: M2I: from factored marginal trajectory prediction to interactive prediction. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 18–24 June, New Orleans, LA, USA. (2022) https://doi.org/10.48550/ARXIV.2202.11884
Hu, M., Xie, G., Gao, H., Cao, D., Li, K.: Manoeuvre prediction and planning for automated and connected vehicles based on interaction and gaming awareness under uncertainty. IET Intell. Transp. Syst. 13(6), 933–941 (2019). https://doi.org/10.1049/iet-its.2018.5353
Luo, W., Yang, B., Urtasun, R.: Fast and furious: real time end-to-end 3D detection, tracking and motion forecasting with a single convolutional net. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18–23 June, Salt Lake City, UT, USA. 3569–3577 (2018)
Sadeghian, A., Legros, F., Voisin, M., Vesel, R., Alahi, A., Savarese, S.: CAR-net: clairvoyant attentive recurrent network. In: ECCV: European Conference on Computer Vision, 8–14 September, Munich, Germany. 162–180 (2018)
Barrios, C., Motai, Y.: Improving estimation of vehicle’s trajectory using the latest global positioning system with Kalman filtering. IEEE Trans. Instrum. Meas. 60(12), 3747–3755 (2011). https://doi.org/10.1109/TIM.2011.2147670
Schreier, M., Willert, V., Adamy, J.: Bayesian, maneuver-based, long-term trajectory prediction and criticality assessment for driver assistance systems. In: 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), 8–11 October, Qingdao, China. 334–341 (2014) https://doi.org/10.1109/ITSC.2014.6957713
Cui, H., Radosavljevic, V., Chou, F., Lin, T., Nguyen, T., Huang, T., et al.: Multimodal trajectory predictions for autonomous driving using deep convolutional networks. In: 2019 International Conference on Robotics and Automation (ICRA), 20–24 May, Montreal, QC, Canada. 2090–2096 (2018) https://doi.org/10.1109/ICRA.2019.8793868
Hubmann, C., Becker, M., Althoff, D., Lenz, D., Stiller, C.: Decision making for autonomous driving considering interaction and uncertain prediction of surrounding vehicles. In: IEEE Intelligent Vehicles Symposium (IV), 11–14 June. Redondo Beach, CA, USA 2017, 1671–1678 (2017). https://doi.org/10.1109/IVS.2017.7995949
Gu, J., Sun, C., Zhao, H.: DenseTNT: end-to-end trajectory prediction from dense goal sets. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 10–17 October, Montreal, QC, Canada. 15283–15292 (2021)
Hong, J., Sapp, B., Philbin, J.: Rules of the road: predicting driving behavior with a convolutional model of semantic interactions. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15–20 June, Long Beach, CA, USA. 8446–8454 (2019)
Schwarting, W., Pierson, A., Alonso-Mora, J., Karaman, S., Rus, D.: Social behavior for autonomous vehicles. Proc. Natl. Acad. Sci. 116(50), 24972–24978 (2019). https://doi.org/10.1073/pnas.1820676116
Le Cleac’h, S., Schwager, M., Manchester, Z.: LUCIDGames: online unscented inverse dynamic games for adaptive trajectory prediction and planning. IEEE Robot. Autom. Lett. 6(3), 5485–5492 (2021). https://doi.org/10.1109/LRA.2021.3074880
Hang, P., Lv, C., Xing, Y., Huang, C., Hu, Z.: Human-like decision making for autonomous driving: a noncooperative game theoretic approach. IEEE Trans. Intell. Transp. Syst. 22(4), 2076–2087 (2021). https://doi.org/10.1109/TITS.2020.3036984
Basar, T., Olsder, G.: Dynamic noncooperative game theory. 2nd ed. (Classics in applied mathematics 23). United States: Society for Industrial and Applied Mathematics (1999)
Raghunathan, A., Cherian, A., Jha, D.: Game Theoretic optimization via gradient-based Nikaido-Isoda function. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, 9–15 June, Long Beach, CA, USA. vol. 97 of Proceedings of Machine Learning Research. PMLR. pp. 5291–5300 (2019) Available from: https://proceedings.mlr.press/v97/raghunathan19a.html
Heß, D., Lapoehn, S., Lobig, T., Nichting, M., Markowski, R., Lauermann, J., et al.: Automated Driving Open Research (ADORe). DLR (German Aerospace Center), Institute of Transportation Systems. https://github.com/eclipse/adore#readme
Mahalanobis, P.C.: On the generalized distance in statistics. Natl. Inst. Sci. India 2, 49–55 (1936)
Ettinger, S.M., Cheng, S., Caine, B., Liu, C., Zhao, H., Pradhan, S., et al.: Large scale interactive motion forecasting for autonomous driving: the Waymo Open Motion Dataset. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 10–17 October, Montreal, QC, Canada. 9690–9699 (2021)
Acknowledgements
The work was supported by the Institute of Transportation Systems of the German Aerospace Center (DLR).
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Academic Editor: Hong Wang
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Lucente, G., Dariani, R., Schindler, J. et al. A Bayesian Approach with Prior Mixed Strategy Nash Equilibrium for Vehicle Intention Prediction. Automot. Innov. 6, 425–437 (2023). https://doi.org/10.1007/s42154-023-00229-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42154-023-00229-0