A Bayesian Approach with Prior Mixed Strategy Nash Equilibrium for Vehicle Intention Prediction

The state-of-the-art technology in the field of vehicle automation will lead to a mixed traffic environment in the coming years, where connected and automated vehicles have to interact with human-driven vehicles. In this context, it is necessary to have intention prediction models with the capability of forecasting how the traffic scenario is going to evolve with respect to the physical state of vehicles, the possible maneuvers and the interactions between traffic participants within the seconds to come. This article presents a Bayesian approach for vehicle intention forecasting, utilizing a game-theoretic framework in the form of a Mixed Strategy Nash Equilibrium (MSNE) as a prior estimate to model the reciprocal influence between traffic participants. The likelihood is then computed based on the Kullback-Leibler divergence. The game is modeled as a static nonzero-sum polymatrix game with individual preferences, a well known strategic game. Finding the MSNE for these games is in the PPAD ∩\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\cap$$\end{document} PLS complexity class, with polynomial-time tractability. The approach shows good results in simulations in the long term horizon (10s), with its computational complexity allowing for online applications.


Introduction
Risk identification in traffic is essential for guaranteeing safe driving in automated vehicles.Dangerous scenarios could arise from an inaccurate estimation of future trajectories of other traffic participants, given the uncertainty of human behavior.Prediction is therefore necessary in order to guarantee a safe decision making.Nevertheless, a reliable estimation can not be limited to a short-term prediction based on dynamic or kinematic models, but it has to take into account possible future interactions and influence between the traffic participants in the scenario.
Many trajectory predictors are data-driven-based and they suffer from exponentially increasing sample complexity when trying to predict trajectories across a joint space in a multi-agent environment, as the number of agents increases.Edge cases constitute another challenge, particularly when the learning data set is largely comprised of straightforward traffic scenarios.
This paper presents a methodology to predict the future trajectories of vehicles in traffic, considering the information coming from sensors in terms of actual state of the vehicles and interactions and mutual influence between them, that can come from possible future traffic outcomes.The proposed model is therefore multi-agent, not based on 1 3 data-driven learning and generally adaptable to any topology.The contributions of this study are the following: 1. Definition of an innovative and effective framework for vehicle intention prediction through the use of the Bayes' theorem.The approach considers both the rational prior outcome of the traffic scenario through a Mixed Strategy Nash Equilibrium (MSNE) and the current evidence from the vehicle data as the likelihood.
The game-theoretic framework takes into account the interactions and reciprocal influence between the vehicles, while the likelihood corrects the prior estimate considering the actual short-term trajectories that the agents are going to take.2. The model outputs probability distributions over a set of possible trajectories that vehicles can take.The finite action space of the agents allows the game to be modeled as a static nonzero-sum polymatrix game (network coordination game) with individual preferences.Polymatrix games with individual preferences belongs to the PPAD ∩ PLS complexity class, with polynomial-time tractability [1][2][3].The computational time satisfies the real-time application.The simplification of the game does not negatively impact performance, as the results are still acceptable even for long-term horizons of 10 seconds.
The organization of this paper are as below.In Sect.2, some background related to the different models of trajectory prediction algorithm is provided.The formulation of the problem can be found in Sect.3. In Sect.4, the approach is presented.Section 5 shows the details of a case study.In Sect.6, some results in simulation environment in terms of performances and computational time are provided.Finally, the results and potential future research directions are discussed in Sect.7.

Related Works
Most motion prediction approaches can be categorized following the three-levels scheme proposed by Ref. [4]: (1) Physics-based motion prediction algorithms, which rely on dynamic and kinematic models of the system [5,6]; (2) Maneuver-based motion models, which consider the maneuver intention of the traffic participant and incorporate a highlevel-strategic layer [7][8][9]; (3) Interaction-aware prediction algorithms, which consider the possible interactions and inter-dependencies between the vehicles in the traffic scenarios [10,11].The physics-based motion prediction algorithms have an acceptable reliability only on a short-term horizon, since the possible maneuver and the long-term interactions between the vehicles are ignored.Also the maneuver-based motion models fall short in the context of a complex traffic scenario.As stated by Ref. [10] indeed, many trajectory prediction approaches focus on estimating marginal prediction samples of possible future trajectories of single vehicles, failing in considering interactions and mutual influence between the traffic participants.The interaction-aware prediction algorithms try to provide a solution to this problem by modeling the multi-agent environment.
A second classification, transversal to the previous one proposed by Ref. [8], divides the approaches into: (1) Deterministic models, which predict for each agent a unique trajectory that is considered the most probable [12][13][14]; (2) Stochastic models, which define a non-deterministic framework for the estimation, for instance predicting the likelihood of a finite set of outcome-representative trajectories [8,[15][16][17][18], or defining a probabilistic state-space occupancy grid [19].The predictive method proposed in this article can be classified as stochastic, maneuver-based and interaction-aware.It is stochastic because the output is not a single trajectory but a probability distribution over a set of trajectories, maneuver-based since maneuvers are predicted, interaction-aware because the forecast outcome includes the result of a MSNE, within a multi-agent interactive and game theoretic framework.
Game theory is a powerful approach that can be used in prediction and decision making.Recent examples can be found in Refs.[20][21][22].In Ref. [22], Nash and Stackelberg equilibria are applied for human-like decision making, modeling the different driving styles and social interaction characteristics.In Ref. [20], the authors define an online method of predicting multiagent interactions estimating their SVO (Social Value Orientation).The interactions between agents are modeled as a best-response game and the control policy is found by solving the dynamic game and finding the Nash equilibrium.In Ref. [21], the other agents' cost function parameters are estimated online then used to find the Nash equilibrium in a discretized dynamic game.These works model the interaction as a multi-agent dynamic game, since the action space consists on the agents' vehicles inputs.However, the problem of finding a Nash equilibrium of a dynamic game can be computationally unfeasible in realtime applications, particularly under a long time horizon.For this reason, the current study considers the high-level decisions that vehicles can take, or the possible maneuvers in the scenario, as the action space.This allows to have a finite action space, a N-person static finite polymatrix game that can always be solved by finding a Nash equilibrium in mixed strategies [23].
The approach proposed in this article has been inspired particularly by the work of Refs.[11,15] and [24].In Ref. [15], the inference of a distribution of high-level, abstract driving maneuvers has been taken as reference for the desired output.Moreover, the philosophy under the proposed Bayesian network in Ref. [15] reflects the Bayesian approach proposed in this article.In Ref. [11], a Nash equilibrium in mixed and behavioural strategies is used to predict the future vehicles' maneuvers.The mixed strategies for a player are a probability distribution over all of the player's pure strategies, therefore the possible maneuvers.Finally, the computation of the MSNE utilizes a merit function, known as the Gradient-based nikaido-isoda (GNI) function, as introduced in Ref. [24].

Problem Statement
For each traffic participant in the scenario, a set of representative trajectories are computed based on map information (Fig. 1).These trajectories represent particular maneuvers and behaviours that the vehicle could take in the future.The definition of these trajectories has been inspired particularly by the work of Ref. [15].The trajectory computation is performed in two stages: , where u(t) = {a(t), (t)} T is the vec- tor of the inputs (acceleration a(t) and steering angle (t) ), through inverse kinematics the steering angle (t) required is computed.Considering the linearized time-discrete version x t+1 = A t x t + B t u t the dynamic model of the covariance matrix is defined: Where t = diag( 2 a (t), 2 (t)) is the diagonal covariance matrix of the inputs, with variances linearly increasing over time.
The policy of the i-th traffic participant ( j | x 0 ) is a probability distribution over the finite number of trajectories available j , j = 1, … , N i and represents the future inten- tion of the vehicle, conditioned by the actual vehicle state x 0 .Marginalizing on the agent policy, the pdf of the state vector x = { x, y, , v} T at time t conditioned by the initial state x 0 is: The result is a Gaussian Mixture Model distribution, weighted by the agent policy ( j | x 0 ) .This policy repre- sents the strategic uncertainty over the intentions of the vehicle and it is the object of interest of the prediction approach presented in this article.

Application of the Bayes' Theorem
The Bayes' theorem is applied: Where p( j ) is the prior probability of the trajectory j and p(x 0 | j ) is the likelihood, which gives a measure of com- patibility between the current evidence coming from data x 0 and the pre-computed trajectory j .The p( j ) is the result of a MSNE, p(x 0 | j ) is found defining a likelihood function.

Prior Probability Through Mixed Strategy Nash Equilibrium
The MSNE is the solution to a non-cooperative game involving two or more players, considering mixed strategies (probability distributions over the action space) instead of pure strategies.A mixed strategy profile is considered an MSNE if each player's strategy is the best response to the strategies of all other players.This decision to use the prior distribution as an MSNE comes from the hypothesis that every traffic participant, as a rational player, is prone to adopt his optimal strategy in the multi-agent traffic scenario.Each vehicle (player) i = 1, … , M in the scenario is sup- posed to choose among a finite set of N i trajectories, defined by the notation } .N indicates the total number of trajectories that the vehicles can take, i.e.N = ∑ M j=1 N j .Let the notation denote the mixed-strategy vector of the i player, i.e. the probability distribution over the available trajectories, while denotes the vector of the mixed-strategies of all the M players. Let indicates the joint pure strategy among all the players (i player chooses trajectory i ), while f i denotes the payoff function for the i player and it takes into account safety, efficiency and comfort.
The following problem is considered: A point * that satisfies Eq.( 7) is called a Nash equilibrium (NE).Every N-person static finite game in normal form admits a noncooperative NE solution in mixed strategies [23].

Payoff Function
The payoff function f i for each traffic participant takes into account safety ( f S i ), comfort ( f C i ) and efficiency ( f E i ) with weighting coefficients ( S , C , E ) as shown in Eq.( 8).The definition of payoff function considering pure strategies is presented here, which means that player i chooses trajectory i with probability 1.
The payoff function for safety f S i depends on the joint pure strategy of all the players , while both the ones for efficiency f E i and comfort f C i depend only on the trajectory chosen by player i, that is i .The weighting coefficients have been tuned to optimize the performance in the simulation environment used in this article [25].
The safety payoff is computed following Eq.( 9): (5) Where f S i,j ( i , j ) is the safety payoff for player i (and player j for symmetry) considering trajectory i for player i and trajectory j for payer j, w S is the penalty for crash, is the discount factor, j t is the mean vector (x, y) of the multivariate normal distribution N(x t | j t , j t ) , which gives the pdf of the position of vehicle j on its trajectory j at time step t.
The definition of the safety payoff, in particular the exponent in Eq. ( 9), is inspired by the Mahalanobis distance [26], with a slightly modification.Indeed, the Mahalanobis distance defines a distance measure between a point and a distribution, while here a distance between two distributions is needed.Therefore it is necessary to modify the covariance matrix, that is Σ = 0.5( j t +  i t ) + I .The covariance matrix is therefore the average between the covariance matrices of the two distributions, with the additional term I .This term has two scopes: (1) to guarantee in any case a payoff when the two distributions are close, even with small covariance matrices, (2) to avoid that ̂ is badly conditioned for the inversion.A further description is given in Fig. 2.
The comfort payoff is the following: Where a long t and a lat t are the longitudinal and lateral acceleration at time step t of the trajectory i , while w C long and w C lat are the respective penalties.
The efficiency payoff is the following: (9) In Eq.( 11), v lim is the speed limit and v t the speed at time step t of the trajectory i .Considering the previous definitions, it is possible now to extend it in the general case of mixed strategies, in which the action of player i is a probability distribution over the possible trajectories: T denotes the set of the mixed-strategies of all the M players, the objective is to find an expression for the expected payoff in case of mixed strategies, i.e.

Polymatrix Coordination Game with Individual Preferences
The definition of the players' payoff allows to model the game as a polymatrix coordination game with individual preferences, a well known strategic game.Let's define the matrix P i,j ∈ ℝ N i ×N j as the matrix of safety payoffs between vehicle i and vehicle j.The p i,j n,m element of the matrix is: Where T i and T j indicate the sets of trajectories available to the i and j vehicle.Therefore the (n, m) element of the matrix P i,j is the safety payoff for vehicle i and j given that vehicle i chooses the trajectory i n and vehicle j the trajectory j m .Let's recall that N indicates the total number of all the available trajectories of the vehicles in the scenario.Considering the matrix Qi ∈ ℝ N i ×N , defined as: It is possible to define the matrix Q i ∈ ℝ N×N for each traffic participant i: Fig. 2 The computation of the safety payoff in Eq. ( 9) is based on a distance measure between the multivariate normal distributions that describe the uncertainty about the future steps of the trajectories Considering the vector ri ∈ ℝ N i for each vehicle i, whose element ri j is: The vector r i ∈ ℝ N for each vehicle i is defined: The expected payoff in the mixed strategy game is: That, considering the definitions Eqs. ( 12) and ( 15) are: The expression can be made quadratic with respect to ∈ ℝ N by means of the definitions Eqs.( 14) and ( 16): The component T Q i gives the payoff given by the interac- tion with other drivers, so linked with safety and trajectories' overlapping.The component T r i is the payoff exclusively dependent by the choice of the player, therefore linked with comfort and efficiency.Note that the Eq. ( 18) is the definition of a static nonzerosum polymatrix coordination game (network coordination game) with individual preferences.The matrix P i,j ∈ ℝ N i ×N j is the symmetric payoff of the bimatrix game {i, j} , the vec- tor ri ∈ ℝ N i represents the individual preference function of player i. Polymatrix games with individual preferences belongs to PPAD ∩ PLS complexity class, with polyno- mial-time tractability [1,2].
The Gradient-based Nikaido-Isoda (GNI) function [24] is used to find the NE.With the payoff function defined in Eq. ( 19), the gradient can be computed analytically with evident gain in terms of computational efficiency.

Optimization via Gradient-Based Nikaido-Isoda Function
The NE of the N-player game considered in this article Eq. ( 7) is found using the GNI Function, introduced in Ref. ( ) . Let's indicate the expected payoff for player i with the function g i for simplicity of notation: The GNI function V( , ) is the following: Reminding that M is the number of players (traffic participants) in the scenario, ∑ M j=1 N j is the vector of the mixed strategies of the players and j ∈ ℝ N j is the mixed strategy vector of the j player.∇ j g j ( ) denotes the gradient of g j with respect to j that indicates the direction of maximum increase of the payoff for player i in its action space Θ i .
The idea under this merit function is that every player can locally improve their objectives using the steepest descent direction, instead of computing a globally optimal solution [24].
The gradient of V i (, ) ∶= g i () − g i (γ(i, )) is the following: is the hes- sian matrix, I ∈ ℝ N×N and I i ∈ ℝ N i ×N i are identity matrices.
Considering the function g i ( ) in Eq. ( 20), the gradients are: The modified vector of mixed strategies γ(i, ) ∈ ℝ N is: The Hessian H g i ( ) ∈ ℝ N×N is: As it is evident, ∇V i ( , ) can be computed analytically by using Eqs.( 22), ( 23), ( 24) and (25).Finally, the gradient of V( , ) is: That is used for the descent iteration: (20) where γj (i, ) = θj =  j − ∇ j g j (), if j = i  j , otherwise The fact that (k+1) belongs to the allowed space (defined in Eq. ( 6)) is not ensured by considering only Eq. ( 27) because the gradient ∇V( (k) , ) does not belong in general to .This is the reason for which it is necessary to project ∇V( (k) , ) into before computing the descent in Eq. (27)  and this theme is faced in Sect.4.1.4.

Gradient Projection
In the optimization problem described in Sect.4.1.3,the gradient ∇V( (k) , ) is projected into the feasible space .This is necessary in order to allow that (k) ∈ ∀k = 1, 2, ... The projection procedure is based on the consideration that, from the definition of in Eq. ( 6), every mixed strategy i must belong to the space , defined in Eq. ( 5).This definition ensures that i is actually a probability distribution over the possible trajectories of player i, i.e.
where ∇ i V is the gradient with respect to i , for each ∇ i V the following steps are executed: 1. Projection properly defined: Where n is the normal versor to the hyperplane ∑ N i j=1 i,j = 1 , that is n = 1 , while with the notation < ⋅, ⋅ > is indicated the scalar product.

Likelihood Function
The evaluation of the likelihood term p(x 0 | j ) , which gives a measure of how likely is the current state x 0 , coming from sensor data, with respect to trajectory j , is based on the following steps: 1.A short-term trajectory τ is computed through a sim- ple bicycle model f (⋅) , starting from current state x 0 ∼ N(x | 0 , 0 ) and considering the input variables , therefore considering acceleration and steering angle normally distributed around the initial estimated input u 0 = {a 0 , 0 }.

The Kullback-Leibler divergence is measured between
the pdf of the short-term trajectory p(x t | τ ) and the pdf of the reference trajectory p(x t | j ) at each time step t: The Kullback-Leibler divergence D KL (P‖Q) can be seen as the information lost when the distribution Q is used to approximate the distribution P, in this case, considering the whole horizon, how much the trajectory j (Q) is fail- ing in representing the "true" trajectory τ (P). 3. p(x 0 | j ) is computed through a soft-max function that takes as a parameter the sum over the horizon of the Kullback-Leibler divergence between the time steps of the two trajectories: The philosophy under the computation of p(x 0 | j ) is to meas- ure how much the current state of the vehicle x 0 , considering the initial input u 0 , is coherent with respect to the strategic choice of trajectory j .A graphical representation is given in Fig. 3.

Case Study
In this section, an example of application is reported, showing briefly how the predictor performs. (31) Fig. 3 The likelihood Eq. ( 4) defines a distance between the trajectories coming from the map topology ( 1 and 2 ) and a trajectory obtained considering constant acceleration and steering angle ( τ ).In this case, 2 is expected to be more likely with respect to 1 , considering initial state and input (x 0 , u 0 ) The simulation environment is Automated Driving Open Research (ADORe) [25], an open source modular software library and toolkit for decision making, planning, control and simulation of automated vehicles, developed by the Institute of Transportation Systems of the German Aerospace Center (DLR).
The trajectory predictor is applied in the intersection scenario showed in Fig. 4. In the simulation, the vehicle 2 accelerates till the speed of 10 m∕s while vehicle 1 deceler- ate smoothly and stops at the cross.The prediction for this example has a horizon of 5 s.
In Table 1, a punctual estimate is given at time t = 6 s circa, corresponding to the situation shown in Fig. 4. The table provides an example on how the correction mechanism works: the Prior predicts with a high probability that vehicle 1 stops at the intersection and vehicle 2 accelerates, but it also gives space to the opposite possibility, which is strongly reduced by the Likelihood.Note that the Prior does not predict a crash situation, indeed even if vehicle 1 proceeds with constant speed (16%), vehicle 2 accelerates or stops at the intersection, avoiding the collision.This is also evident in the P i,j matrix, defined in Eq. ( 12) and shown in Table 2.
The performance of the approach is measured using metrics defined in Ref. [27], in particular the minimum average displacement error (minADE): the minimum final displacement error (minFDE): and the Missing rate (MR): The trajectory i j , considered for the computation, is the one with the highest posterior probability ( i j | x i 0 ) of player i.The expression 1 2 0.95 (⋅) in the MR definition indicates the indicator function that is equal to one if the null hypothesis of the chi-squared goodness of fit test is not rejected, otherwise zero.In particular, the argument of 1 2 0.95 (⋅) is the test statistic, the reference distribution is a chi-squared distribution 2 with 2 degree of freedom and 0.95 confidence interval.The MR is basically testing at each time step if the actual point of the trajectory belongs to the distribution of the predicted trajectory.
Regarding the episode in Fig. 4, the results are shown in Table 3.The Table illustrates that both vehicles, vehicle 1 and vehicle 2, have an average displacement error along the episode of 2.79 m and 1.21 m, respectively, with a combined average of 2.0 m.During the last step of the episode, vehicle 1 exhibits a final displacement error of 3.01 m, while vehicle Fig. 4 The intersection scenario in the simulation 2 records a final displacement error of 3.06 m, with 3.04 m considering both.This reaffirms the reliability of the predictions even after 5 seconds have elapsed.The missing rate considering both vehicles is 0.34, meaning that for the 66% of the episode length, the vehicles belong to the predicted distributions.
In Fig. 5, the minADE for vehicles 1, 2 and for both vehicles is shown.For the vehicle that stops (vehicle 1, Fig. 4), the minADE is higher because of the more dynamic behavior of the vehicle but it converges around 2.7 m when the vehicle starts to decelerate.For vehicle 2 the minADE is lower, the vehicle has indeed a more constant behavior, but it grows rapidly, this is due to the higher uncertainty on the position for a vehicle that is in movement.
In Fig. 6, the real trajectories are represented in red.In blue are shown the predicted trajectories and the probability is given by the color intensity.A slight shift is applied to each trajectory to allow a better readability.The observation of the graph reveals that trajectories with the highest posterior probability (indicated by the dark blue lines) align more accurately with the actual trajectories (depicted in red), affirming the prediction's high quality.The graph provides a clearer comprehension of the episode, showcasing the potential trajectories considered and the actual ones followed by the vehicles.

Results and Discussion
This section presents the results of some simulations in ADORe in terms of minADE, minFDE and MR. Figure 7 shows the scenarios simulated.Here is a brief description of the scenarios: 1. Scenario 1: this scenario represents a case study from the preceding chapter, wherein the car turning left halts at the intersection, affording the car with high priority to proceed.2. Scenario 2: in this case the car turning left holds precedence over the car proceeding straight, which stops at the intersection.3. Scenario 3: in this scenario, the merging car pauses at the intersection, granting passage to the car with high priority.
Fig. 5 The figure shows how does the minADE evolve in time Scenario 4: the car with high priority reduces its speed, permitting the other car to merge smoothly.
Each scenario has been repeated different times.The prediction horizon is 10s.For the first 8s of the simulation, predictions are collected.The simulation ends at t = 18s.
Table 4 shows the results of the simulation scenarios in terms of minADE, minFDE and MR, while Table 5 presents the results of the 1st, 12th, 24th, 50th places in the WAYMO competition in 2021.The two tables cannot be compared, since the first shows the results of predictions in a simulation environment, the second presents results of prediction on a real road dataset.However, the simulator used (ADORe) is not only a simulation environment but also a tool for decision making and control of AVs in traffic, currently used by the Institute of Transportation Systems of the German Aerospace Center.This suggests a high level of realism in the trajectories taken by the vehicles.Figure 8 showcases the time required by the algorithm to compute the Nash equilibrium and to perform the complete prediction.From these data some observations can be drawn: • The approach shows good performances in simulation in the long term horizon (10 s) and outperforms the is not computationally efficient.This is reflected in the significant difference between the time required from the Nash equilibrium computation and the total time required.The difference is basically required by the definition of the trajectories.Nonetheless, the total time required still allows a online application of the algorithm.

Conclusions
In this paper, an innovative approach for predicting trajectories in traffic is proposed.The approach combines an interaction-aware motion model with a physics-based and maneuver-based model.The Bayes' theorem is applied: the prior estimate takes into account the rational evolution of the traffic scenario by computing the Mixed strategy nash equilibrium (MSNE) among the participating vehicles.The likelihood adjusts the prior estimate by incorporating data coming from the vehicles in terms of position, heading, speed, acceleration and steering angle.This allows for consideration of the possibility of irrational decisions by the participating vehicles that may have been discarded in the prior estimate.The output of the approach is a probability distribution over a set of representative trajectories for each vehicle.This innovative framework, which combines a priori game-theoretic considerations with the a posteriori data from the road, constitutes a crucial contribution of this study.Another important contribution is the modeling of the interactive scenario as a polymatrix coordination game with individual preferences, a well known strategic game with desirable computational complexity.The preliminary and indicative experiments to test the approach show good results and good computational efficiency.Future areas of

( 1 ) 2 0 + t , 2 0
Computation of the geometrical path through a Particle Swarm Optimization algorithm.(2) Definition of the acceleration profile along the path.Local uncertainty along the trajectory is introduced through a evolution kinematic model with Gaussian noise on the vehicle state and input variables.The trajectories defined for each vehicle are: • Acceleration trajectory: trajectory in which a ∼ N(a | a , 2 a ) , where a = 1.5 m∕s 2 and 2 a = = 0.5 m 2 ∕s 4 , = 1e − 3

Fig. 1
Fig. 1 Example of trajectories.The ellipses represent the multivariate normal distributions at each time step

2 .
Tuning of the module: (a) Computation of the step: (b) While ∃  i,j < 0 :

Fig. 6 Fig. 7
Fig. 6 Here the real trajectories (red) and the predicted ones (blue) are shown

Fig. 8
Fig.8 Computational time required considering the number of trajectories in the scenario.The simulations have been carried out with a processor Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz 2.71 GHz Considering the state vector x = { x, y, , v} T , where x and y are the Cartesian coordinates of the position, is the heading and v is the speed, the time step t in the trajectory can be represented by the following normal distribution: Where p(x t | ) is the probabilty density function (pdf) of the state x conditioned by the choice of the trajectory at time step t, t comes from the trajectory computation and t is the covariance matrix on the state space that gives local uncertainty.Starting from a simple bicycle model of the system ẋ

Table 1
Estimation at t = 6 s, corresponding to the situation in Fig.4

Table 2
Matrix P i,j at t = 6 s of the collision payoffs for the vehicles 1 and 2, the definition of this matrix can be found in Eq.12

Table 3
minADE and minFDE of the episode shown in Fig.4.The length of the episode is around 5 s

Table 4
minADE, minFDE and MR of the scenarios

Table 5
Results of the 1st, 12th, 24th, and 50th places in the WAYMO competition in 2021