1 Introduction

Researchers investigating the route selection of vehicles have long time ago defined the rule of equilibrium: the vehicles select their route so that none of the vehicles can change to another route to decrease its travel time (Wardrop 1952). Traffic engineers have also observed that the equilibrium route selection may result in longer travel times than the optimal route selection, like for example in the Braess paradox (Braess 1968). It took until 2002 when Gödel price winner computer scientists have investigated this question, and they proved that the equilibrium travel times cannot be worse to any extent if certain conditions hold (Roughgarden and Tardos 2002). There are many computer science models with different details and analytical results. The analytical results coming from computer science models are not widely adopted by traffic engineers. The Dagstuhl Seminar 16091 ”Computational Challenges in Cooperative Intelligent Urban Transport” was a good occasion to bring together traffic engineers and computer science experts to exploit the synergies between the two groups. This paper follows this line and reviews computer science models for the routing problem in order to improve the efficiency of traffic. Exploiting synergies will be more and more important when many autonomous vehicles will be part of the traffic, and the route selection of autonomous vehicles will be done by software algorithms which will be based on computer science models. The autonomous vehicles are situated in their environment, which means that they perceive their environment, make decisions and then take actions. The critical question is whether these perception-decision-action cycles actually lead to the same behaviour as the behaviour from the classical traffic engineering flow models. For example, a basic assumption of traffic engineering is that the traffic is assigned in accordance with the equilibrium. Will the perception-decision-action cycles lead to this equilibrium? Can we ensure that extreme long travel times do not occur?

The technologies to support the development of autonomous vehicles are progressing very fast. There are futuristic demo videos promising that autonomous vehicles travel in a smooth and fast way without creating congestion in the city. However there are no proofs that algorithmically driven autonomous vehicles are so smart by themselves that their joint behaviour will be optimal. Autonomous vehicle developers usually focus on the capabilities of individual vehicles, and there is less attention on the joint behaviour of a large group of autonomous vehicles. As far as we know, there is no survey of the computer science models of the joint behaviour of many autonomous vehicles.

We systematically review computer science models for the routing problem, we highlight how they contribute to solving the problem, and we evaluate how the different models can accommodate the concept of autonomous vehicles. We review the models from different aspects. We review if and how the models can predict the joint behaviour of a large group of autonomous vehicles assuming that each autonomous vehicle is trying to optimise its own route rationally and algorithmically. We review how the predicted joint behaviour compares with the optimal joint behaviour. The aim is to find good models to prove the guarantees that the joint behaviour is not much worse than the optimal joint behaviour. We review if the predicted joint behaviour can guarantee that each, or at least the majority of the vehicles can find its individually optimal route in the given situation. Finally, we review if the models support individual vehicles, in the sense that each vehicle can autonomously make the route selection decision itself. These aspects are summarized in a table after the models are discussed.

The reviewed computer science models (Sect. 2) range from the static routing game model to the fully dynamic online routing game model. The models were selected into the review to cover dynamic aspects from static to over time dynamic, in order to illustrate the complexity of the routing problem. The selected models also illustrate that it is hard to have in the same model the provable guarantees on the joint behavioural level and the applicability in autonomous vehicles. The static routing game model (reviewed in Subsection 2.1) corresponds to the classic static assignment model of traffic engineering which assumes that the traffic is always assigned in accordance with the equilibrium. The congestion sensitivity of the routing game model is represented with a cost function which corresponds to traffic engineering rule that the more vehicles are on a road, the slower their speed will be. The model of the evolutionary dynamics of repeated games (Subsection 2.2) includes dynamics in the model by repeating the static game. This can be seen as a model of day-to-day commuting to work. The queuing model (Subsection 2.3) includes dynamics over time during the game in the model. The congestion sensitivity of the queuing model corresponds to the waiting time at traffic lights and road crossings. The online routing game model (Subsection 2.4) includes dynamics over time during the game in the model, and it has the congestion sensitivity of both the routing game and the queuing model. The online routing game corresponds to the dynamic user assignment model of traffic engineering, but without details like lane changing, give way intersections, traffic light timing, etc.

1.1 Road traffic as a complex multi-agent system

From computer science point of view, autonomous vehicles are autonomous agents who make autonomous decisions based on their own information and their own preferences (d’Inverno and Luck 2004).

If there is only a single autonomous vehicle, then only one agent senses the environment and takes actions to adapt to the changing environment in order to avoid any obstacle and to reach its destination. Typical research on autonomous vehicles focuses on this, and the main issues are how to perceive the environment (Peters et al. 2010; Corke et al. 2007), how to avoid collisions (Halim et al. 2016; Haider et al. 2020), and how to control the vehicle to keep the planned trajectory (Chou and Tasy 1993; Gonzalez et al. 2016). This is a single agent problem.

The traffic in a road network involves several autonomous vehicles, and the overall traffic emerges as the result of the collective behaviour of several autonomous vehicles. From informatics point of view, the traffic system of autonomous vehicles is viewed as a multi-agents system. One of the best books on the introduction to multi-agents systems is (Wooldridge 2009). As Rosenschein (2013) states, game theory (Shoham and Leyton-Brown 2009) is a good foundation for multi-agent decision making. The routing problem in a road network is a multi-agent system problem, and we will focus on this.

This multi-agent system approach is mainly a computer science approach. According to the multi-agent approach, the environment of the agents of the routing problem is highly inaccessible and dynamic. This means that an agent cannot obtain complete, accurate, up-to-date information about the environment’s state, and the environment changes beyond the agent’s control (Wooldridge 2009). The decisions of the agents depend on their own incomplete information, and the overall behaviour of the system heavily depends on the decision strategy of the agents in the system. As we will see in the review, this results in a complex system which may show unwanted behaviour, like extreme swings (which means e.g. too long travel time in our case). Computer scientists may find this review useful for their research work towards the best model that can guarantee the avoidance of any unwanted behaviour. Software engineers base their development on models, so they may find this review useful for their development work. For example, if computer scientists are able to prove that the travel times in their model is never worse than e.g. three times the optimum travel time, and the software engineers develop a route selection software for their autonomous vehicles based on that model, then the software engineers can verify that their software has a given guarantee on the travel times. Another example is related to environment engineering which is an important part of multi-agent system engineering (Mascardi et al. 2019). If a model includes stigmergic coordination in the environment to improve the properties of the route selection mechanism, then software engineers might want to implement this in the infrastructural environment of the autonomous vehicles.

Computer science models are already complex, and traffic engineers usually have even more complex models (HCM 2016). Traffic engineers go into details like the throughput capacity of roads (Akçelik 2003), the lane changes of vehicles (Moridpour et al. 2010), the impact of lane structures on the throughput of give way intersections (Zhou et al. 2005; Brilon 2008), the impact of the timing of traffic lights (Dion et al. 2004), the uncertainty of driver behaviour (Treiber et al. 2000; Laval and Leclercq 2010), etc. Microscopic models describe the traffic in terms of many-particle models in which each particle corresponds to a vehicle and its driver (Treiber and Kesting 2013). Microscopic models are also related to traffic signal control and how traffic signal control can be adapted to the dynamic changes (Bazzan 2008). This review does not go into such details of the traffic engineering approaches, because our focus is on how to avoid unwanted behaviour on the global level. The added value of the models with higher abstraction is the potential to investigate and to prove properties in order to use the analytical results to improve social welfare. However, traffic engineers may find this review useful to see what kind of strategy and information are needed for traffic agents in order to avoid any unwanted behaviour of the traffic system on the global level.

1.2 The routing problem

From computer science point of view, the road traffic is a large-scale and open multi-agent system that tries to solve the routing problem. The routing problem is a network with traffic flows going from a source node to a destination node. The vehicles of the traffic flows continuously enter the network at the sources, they choose their routes to their destination, and quit the network at the destination. The traffic is routed in a congestion sensitive manner. If more vehicles enter a road, then their travel time will be longer.

Figure  1 shows a simplified routing problem where two flows (1 and 2, indicated with arrows) enter the road network, and want to reach their corresponding goal nodes. Both of them can find its optimal route to their destination using a path search algorithm (Halim et al. 2019). If the flows select the shortest routes to their destination, then they cause congestion on the jointly used road. Each flow can avoid the congestion on a longer route in this network. If one of the flows selects the longer route, then the other can select the shortest route without congestion.

Fig. 1
figure 1

A simplified routing problem

The participants of the traffic want to optimise their trips. We are interested in the different models for the routing problem. A model describes the possible actions of the participants, and the available information that they can use to select the best action. The participants are assumed to be rational, in the sense that each participant selects an action which is the best response to the expected actions of the other participants. A model involves a solution concept (Halpern and Moses 2014) which forecasts the outcome (or the possible outcomes) of the problem if the participants are rational. The most common solution concept is the equilibrium concept. The system is in equilibrium, if none of the agents can select another action to achieve better results from its own point of view.

A basic assumption of traffic engineering is that the traffic flows are assigned to possible paths in accordance with the equilibrium, which is either static equilibrium (Beckmann et al. 1956; Wardrop 1952) or dynamic equilibrium (Merchant and Nemhauser 1978a, b; Peeta and Ziliaskopoulos 2001). The static equilibrium concept of traffic engineers is in line with the Nash equilibrium concept of game theory (Nash 1951). The equilibrium is an important concept, because none of the agents has an incentive to deviate from the equilibrium, therefore the equilibrium seems to be a stable state of the system. If the equilibrium meets the design criteria, then we can ensure that the global behaviour of the multi-agent system meets our design goals.

The classic models assume an idealistic situation: all the agents know what the equilibrium is, all the agents know what other agents do, and all the agents know what their action needs to be to achieve the equilibrium. However, in accordance with the basic theory of multi-agent systems (Wooldridge 2009), the agent behaviour goes in cycles: the agents perceive the current state of their environment (possibly communicating with other agents), decide what action to perform, and then perform the action. The classic game theory models describe static situations, while agent behaviour involves time. If time is taken into account, then the decisions of the agents may evolve over time. This is studied in evolutionary game theory (Weibull 1997).

In addition, the collective of autonomous vehicles play dynamically evolving games: the agents join the game in a sequence when they start their trip, they influence the game for a while when they are on the go, and then they quit the game when they finish their trip. In these games, the decisions of the agents are often intertemporal choices: the current decision of an agent may influence the costs of all the agents in the future. When an agent selects a route to follow, then the agent may contribute to a congestion which will develop in the network sometime later. This means that the agents have to make decisions without exactly knowing future input data. This type of problems are called online problems and they are often solved with online algorithms (Borodin and El-Yaniv 1998).

We discuss four major models that incorporate some or all of the above mentioned approaches. The four models are: the algorithmic game theory model, the evolutionary dynamics of repeated games model, the queuing model, and the online routing game model. After that, we discuss how these models relate to the traffic engineering domain.

1.3 Objectives of the survey

The main focus of this survey is on how the different models support the decision making capabilities of autonomous vehicles, and how the models can be used to prove the properties of the emerging traffic. The autonomous vehicles perceive the current status of their environment, and the critical question is if they can determine from this information the possible best route to their destination.

The different models give different answers to this question. There are models that have an answer to the existence of the possible best autonomous route selection, but the model cannot tell how each individual vehicle can find out its own possible best route. There are models that can tell how each individual vehicle can chose its own route in the hope that it will be the possible best, but the model cannot guarantee that these route selections will be the possible best, or close to the possible best.

Our survey focuses on the fundamental questions. One question is whether the perception-decision-action cycles of autonomous agents (Wooldridge 2009) can lead the agents to the possible best outcome? Or the possible best outcome is just an idealistic view which can never be approached in reality? Which model can support the autonomous vehicles to find the possible best outcome? The review stays at this fundamental level. If we can find the answers to these fundamental questions, then we can start to use these models to build real-world computer systems to support autonomous vehicles, and go into the details of traffic engineering, but the traffic engineering details are out of the scope of this survey.

When we write “autonomous vehicle”, we mean both computer driven autonomous vehicles and human driven vehicles supported by navigation devices. In the case of computer driven autonomous vehicles we can expect that the algorithmic decision strategies of the models are strictly followed. Humans may not always follow the advice of the navigation device, especially if they are influenced by emotions (Halim and Rehan 2020). Humans may also have special driving habits (Halim et al. 2015). In the survey, we assume that the drivers follow the recommendations of the model. Bazzan and Klügl (2005) investigate a case study which includes the impact of the different rates of drivers adhering to the route recommendations. For an introduction to travel related behavioural modelling, see e.g. the literature review in Ben-Elia and Shiftan (2010).

When we write “possible best route”, we assume that there is a utility function which can express the preferences of the vehicles. In the simplest case the utility function is the travel duration, and the vehicles want to minimise the travel duration. The autonomous vehicles might have other objectives as well, for example they might want to minimise fuel cost or road toll, they might want to choose routes with nice scenery, or they want to select tranquil routes. The vehicles might want to optimise on multiple objectives. We assume that multiple objectives can be combined into a single utility function, which is the standard approach in multi-agent systems (Wooldridge 2009). Examples of multiple objective route selection are in e.g. (Blue et al. 1997) or (Paricio and Lopez-Carmona 2019).

A single utility function is relevant for algorithmic decision making by computers. We assume that the computers can compute the best outcome. Humans may not have enough knowledge and time to maximise their utility and they may tend to minimise the possibility that a non-chosen alternative is better than the chosen one (Chorus et al. 2008). Bounded rationality is important, but it is out of the scope of this survey.

The computer science models are defined over graphs that represent the road network. The road network might contain intercity highways and complex urban road networks as well. From the computer science point of view, these road networks are all graphs, and the models are defined for both highway and city road networks. The routing problem is more important for city networks, because in these complex networks there are more options for route selection than in intercity routes, where sometimes there might be only one relevant option.

Once we have models, then we can prove the properties of the emerging traffic. Some unwanted properties may derive from the structure of the road network, and traffic engineers may use this information in the design of the road infrastructure. For example Varga (2015b) analytically investigated the online routing game model, and pointed out two novel forms of paradox phenomena in that case when cars try to adapt to the real-time status of the traffic. These paradox phenomena have similarity to the Braess paradox (Braess 1968) of the classic routing problem, in the sense that although the throughput capacity of the network is extended, sometimes the overall performance will be reduced. These paradox phenomena have implications for the structure of the road network: traffic engineers should try to exclude parallel roads from the road network if the vehicles use navigation devices exploiting real-time information. When this result was presented to traffic engineers, then they realised the connection between their empirical observations and the formal model.

The routing problem has to be solved in special road networks as well, like in container terminals and warehouses where automated guided vehicles are used (Wurman et al. 2008). In these settings the routing problem is called multi-agent path finding (MAPF) (Stern et al. 2019). The major difference between the routing problem and the MAPF problem is that the MAPF setting is usually under the control of a single authority, while the vehicles are independent autonomous agents in the routing problem. With multi-agent system terms, the MAPF agents are benevolent cooperative distributed problem solvers. There are algorithms for finding the optimal or suboptimal route selection in MAPF settings (Sharon et al. 2015; Max et al. 2014). These algorithms are computationally hard and cannot be applied in real-world problems with hundreds of agents. In addition, the execution of the optimal solution of the MAPF problem assumes that the agent cooperate, and none of the agents deviates to another route which is individually better. Because the vehicles of the routing problem are independent autonomous agents, the routing problem is wider than MAPF, and the MAPF approach does not really fit the general routing problem. If the MAPF approach was applied to autonomous vehicles in a city, then the passengers of autonomous vehicles would feel themselves like citizens in a utopia town under the control of a central authority.

Our survey is directed towards the decision making strategy of the individual vehicles of traffic flows. Traffic flows are continuous, therefore strategies related to the changing of the departure time (e.g. delaying the departure time after the peak hour) is out of scope of this survey.

1.4 Methodology of the survey

We are interested in the route selection of autonomous vehicles, and we would like to know how the different models for the routing problem support the trustworthy decision making of autonomous agents. This paper is interested in the trustworthiness of the route selection of autonomous vehicles. If the solution concept of a model says that the rational behaviour of the agents can assure the possible shortest travel durations, then a navigation software developed on that model can be trusted by the users. If a model cannot avoid that sometimes the travel duration of a few of the agents becomes extremely long, then a navigation software developed on that model will not be trusted by the users. In a practical example, if a user of a navigation device experiences once that the device navigates the user to the work place through a trip which takes ten hours instead of the usual one hour, then that user would not trust that device, and the user would throw away that navigation device.

The trustworthiness of a model is a bit similar to incentive compatibility, in the sense that it is the best interest of the agents to follow the routing strategy of a trustworthy routing model. However, there is some difference, because incentive compatibility is related to the truthfulness of the agents, while trustworthiness is related to the complexity of the problem. Incentive-compatibility means that the participants of a game can achieve the best outcome to themselves by acting according to their true preferences, and an agent cannot gain by pretending to have different preference. The routing problem is a complex dynamic problem, and the continuous rational optimisation efforts of the agents may result in extreme swings in the system. We say that a model is trustworthy if the set of the components of the model, as discussed in the rest of this survey, can ensure that these extreme swings can be avoided.

The different aspects of the survey of the models are summarised in Table 1, and they are shortly described in the following.

The agents (first row of Table 1) are the decision makers in the model, and they select an action from the available options. If the agents are the traffic flows, then it means that each traffic flow agent can decide how to divide the flow among the possible routes to the destination of the traffic flow. If the agents are the “particles” of the traffic flow, then, at every moment over the time of the game, the full traffic flow can change to any of the available routes to the destination of the traffic flow. The decision making agent is called “particle”, because in this case the traffic flow is often regarded as fluid flow, with infinitesimally small “particles”. If the agents are the sequent vehicles of the traffic flow, then every time when a member of the traffic flow starts its journey, it can select any of the available routes to the destination of the traffic flow. The next member of the traffic flow may select a different route, if the other route seems to be better by the starting time of that agent. We prefer models where the individual vehicles are the decision makers.

Table 1 Summary of the Survey of the Models of the Routing Problem

An important aspect is if the model is static or dynamic (second row of Table 1). A static model focuses on flows that do not change during a game. The dynamic model focuses on the evolution of the flows over time. The dynamic model focuses either on the evolution of the route selection between repeated static games, or the dynamic model focuses on how the route selection of the agents evolves over time during a single game. The latter type of dynamics is either closed, when the set of active agents does not change, or open, when the set of active agents changes over time. We prefer models where the route selection of the agents evolves over time during a single game.

Another aspect is how the agents perceive their environment, shown in the third row of Table 1. If a model does not say anything about perception, then it usually assumes that every agent knows everything in the game, e.g. the level of the congestion sensitivity of each road, and the reasoning of other agents as well. This assumes that everything is common knowledge (in the logic sense, see (Fagin et al. 1995; Meyer and van der Hoek 1995)). A fact is common knowledge, if all the agents know the fact, and they all know that they know the fact, and they all know that they all know that they know the fact, and this can be continued to infinity. If the model has repeated game dynamics, then usually the agents observe the outcome of the previous game, which usually includes the travel times of all agents in the previous game. If the model is closer to reality, than the agents do not have full common knowledge, and they can perceive only measurable properties in the environment, like for example the travel times on each road. We prefer models where each agent can perceive measurable properties in the environment.

The most important aspect of the model is the solution concept, shown in the fourth row of Table 1. The typical solution concept of static games is the Nash equilibrium, where none of the agents can select another route to achieve better results. The dynamic equilibrium solution concept, the same way as the Nash equilibrium, also tries to find the traffic flow distribution where none of the agents can change to another route to reduce its travel time, but the dynamic equilibrium takes into account that the traffic flow may vary over time, and the best route may change over time. The solution concept of the repeated game dynamics is the \(\epsilon \)-approximate equilibrium. The \(\epsilon \)-approximate equilibrium captures the notion of convergence to a static equilibrium. The system converges to the \(\epsilon \)-approximate equilibrium, if the more times the game is repeated, then the more agents experience travel times not more than \(1 + \epsilon \) times the travel time in the static equilibrium. Many dynamical systems do not necessarily converge to a fixed point like the Nash equilibrium. In this case, the dynamic solution concept may be based on a set of system states, called sink states (Goemans et al. 2005) or attraction states. Sooner or later the system enters one of the sink states, and then it never leaves the set of sink states. This is also called attracting equilibrium, if the sink set attracts the system. This kind of dynamic solution concept is in the focus of current research (Papadimitriou and Piliouras 2016, 2019; Omidshafiei et al. 2019). Although an equilibrium is the outcome of rational behaviour, agents may sometimes come to another solution. As Kleinberg et al. (2011) write: “limiting ourselves to equilibria as a reference point could lead us to qualitatively incorrect conclusions about system behavior”.

The outcome of the solution concept describes what the model predicts for the state of the system if the agents behave rationally (fifth row of Table 1). The outcome of the solution concept has conditions for most of the models, except for the online routing game model. In the case of Nash equilibrium, the outcome is one or more static states of the system. In the case of \(\epsilon \)-Nash equilibrium, the outcome is a fluctuation which converges to one of the static Nash equilibriums. In the case of dynamic equilibrium over time, the outcome is a persistent property of the system corresponding to the dynamic equilibrium. In the case of attracting equilibrium, the outcome is a fluctuating system. If the system is chaotic, then the fluctuation may not be predictable, in a better case, the range of the fluctuation can be limited. We prefer models that have unique and stable solution concept.

The last row, the “conclusion of the solution concept” row of Table 1 summarises the main results from the model and the solution concept. In the case of the routing game model, the distance between the equilibrium and the optimum has a limit, if certain conditions hold. The evolutionary dynamics model can tell the speed of convergence to the equilibrium, depending on the conditions. The queuing model can tell the existence of the dynamic equilibrium, in the case of special conditions. The online routing game model can only show in experiments that the fluctuation of the system may be limited, depending on the prediction methods. The online routing game model can prove that the system reaches and then stays close to the equilibrium only in a special case.

2 Models of the routing problem in computer science

In this section we go into the deeper details of the models of Table 1.

2.1 Routing game from algorithmic game theory

From algorithmic game theory point of view (Nisan et al. 2007), the routing problem is a network with source routing, where end users simultaneously choose a full route to their destination and the traffic is routed in a congestion sensitive manner (Roughgarden 2007). There are non-atomic and atomic routing games. In the non-atomic routing games, the change in the traffic flows can be infinitesimally small, while in the atomic routing game the change in the traffic flows is considerable.

Non-atomic routing games consist of a road network, a total traffic flow, and a throughput characteristic of the road network. The road network is modelled with a directed graph; the total traffic flow is modelled with a vector of traffic flows with each flow denoting the amount of flow on a trip from a source vertex to a target vertex; and the throughput characteristic of the road network is modelled with a cost function that maps the total traffic on the edges to the travel time on the edges. The non-atomic routing game is a triple (Grc), where

  • G is a directed multi-graph \(G=(V, E)\) with vertex set V and edge set E where each \(e\in E\) is characterized by a cost function \(c_e\).

  • r is the total flow given by a vector of flows with \(r_i\), \(i\in \{1, 2,...,n\}\) denoting the flow aiming for a trip \(P_i\) from a source vertex \(s_i\) of G to a target vertex \(t_i\) of G.

  • c is the cost function of G with \(c_e\) for each edge e of G, and it maps the total flow \(f_e\) on edge e to the travel time on the edge. The cost functions are nonnegative, continuous, and nondecreasing.

There is a one to one correspondence between flows and source-target pairs. The flow \(r_i\) is called commodity, and each agent is identified with one commodity. \(P_i\) is the set of paths of the source-target vertex pair \(s_i\)-\(t_i\). For a flow \(r_i\) and a path \(p \in P_i\), \(f_p^i\) denotes the traffic assignment to path p, i.e. the amount of traffic of commodity i that is assigned to the path p. The traffic assignment f for the whole routing game is \(f = \{f_p^i | i\in \{1, 2,...,n\} \wedge p \in P_i \}\). The allowed strategies of the agent associated with commodity i are those traffic flow assignments to the paths where \(\sum _{p \in P_i} f_p^i = r_i\).

Atomic routing games are also modelled with the triple (Grc), but the agents and their allowed strategies are different from the non-atomic game. Several flows can be associated with the same source-target pair. Each agent is identified with one flow (i.e. flow \(r_i\)). The allowed strategies of the agent associated with flow \(r_i\) are those traffic flow assignments where there is a single \(p \in P_i\) for which \(f_p^i = r_i\), and \(f_{p\prime }^i = 0\) for all other \(p\prime \in P_i\). With other words, the agents can select only one path and they cannot divide their flow among several paths, but there can be several agents on the same source-target pair.

Routing games are a special type of congestion games (Rosenthal 1973), because the structure of the graph G determines the possible order of the edges used by the flows. In routing games, an agent associated with flow i can select the ordering of the edges only from the ordering of the edges of the paths leading from source vertex \(s_i\) to target vertex \(t_i\). In congestion games, agents do not have this restriction, and the the edges are called resources.

2.1.1 Solution concept

The solution concept of the routing game is the equilibrium concept. The equilibrium is a state where none of the agents has an incentive to change its behaviour unilaterally, because in all other cases the agent would be worse off. Because none of the agents has an incentive to deviate from the equilibrium, algorithmic game theory assumes that the equilibrium will be the outcome of the game.

In order to define the equilibrium formally, first we need to define the cost of the different route selections of the agents. The cost of a path p is the sum of the costs of the edges of the path: \(c_p(f_p^i)=\sum _{e \in p} c_e(f_e)\), where \(f_e\) is the total amount of traffic on edge e. The cost of a traffic assignment of a flow r is the cost of all paths of all commodities: \(C(r)=\sum _{i=1}^n\sum _{p \in P_i}c_p(f_p^i)\).

A non-atomic traffic flow assignment f to all \(f_p^i\) is an equilibrium traffic flow assignment of r, if for every commodity \(i\in \{1, 2,...,n\}\) and every pair \(p,p\prime \in P_i\) with \(f_p^i>0\), \(c_p(f_p^i) \le c_{p\prime }(f_{p\prime }^i)\). In other words, all paths used by the equilibrium flow assignment have the possible minimum cost, and all paths of a given commodity used by the equilibrium flow assignment have equal cost. In the rest of the paper we will refer to this equilibrium with different names: Nash equilibrium, or Wardrop equilibrium, or static equilibrium.

An atomic traffic flow assignment f to all \(f_p^i\) is an equilibrium traffic flow assignment of r, if for every commodity \(i\in \{1, 2,...,n\}\) and every pair \(p,p\prime \in P_i\) with \(f_p^i>0\), \(c_p(f_p^i) \le c_{p\prime }({\hat{f}}_{p\prime }^i)\), where \({\hat{f}}\) if the flow assignment identical to f except that \({\hat{f}}_p^i = 0\) and \({\hat{f}}_{p\prime }^i = r_i\). In other words, none of the players can switch its traffic flow assignment to another path in order to decrease its cost, i.e. all players have the possible minimum cost at the given traffic assignment of the other players.

In order to be able to define the efficiency of the equilibrium, first we need to define what the optimal outcome of the routing game is. There are several concepts of optimum, like the Hicks optimum (Hicks 1939), the Pareto optimum (Debreu 1954) and the sum optimum. Algorithmic game theory uses the sum optimal concept: the assignment of flow r is optimal flow assignment if the assignment of the flows of the commodities minimises C(r).

The efficiency of the routing game is measured with the price of anarchy concept (Christodoulou 2008). The price of anarchy of a routing game is the ratio between the worst cost of an equilibrium flow assignment of the game and the cost of the optimal flow assignment.

2.1.2 Main results

Several properties of the equilibrium and the price of anarchy of routing games have been proved (Roughgarden 2007).

Equilibrium of non-atomic routing games: It is proved that every non-atomic routing game has at least one equilibrium flow assignment (Schmeidler 1973; Beckmann et al. 1956). If a non-atomic routing game has two different equilibrium flow assignments, then the costs of the edges are the same in both assignments on every edge, i.e. all equilibrium flow assignments have the same total cost (Milchtaich 2000).

Equilibrium of atomic routing games: There are atomic routing games that do not have an equilibrium. However, if there are different restrictions on the atomic routing game, then equilibrium exists. If the traffic flows of an atomic routing game are equal, then there is at least one equilibrium (Rosenthal 1973). If the cost functions of an atomic routing game are of the \(a_e\cdot x+ b_e\) form, then there is at least one equilibrium (Fotakis et al. 2005). If the equilibrium of an atomic routing game is not unique, then the cost of the different equilibriums may be different.

Price of anarchy of non-atomic routing games: If the cost functions of a non-atomic routing game can be non-linear, then one can create cost functions to exceed any given bound on the price of anarchy. If the cost functions of a non-atomic routing game are of the \(a_e\cdot x+ b_e\) form, then the price of anarchy is at most \(\frac{4}{3}\) (Roughgarden and Tardos 2002).

Price of anarchy of atomic routing games: If the cost functions of an atomic routing game are of the \(a_e\cdot x+ b_e\) form, then the price of anarchy is at most \(\frac{(3 + \sqrt{5})}{2}\). If the traffic flows (\(r_i\)) of an atomic routing game are equal and the cost functions are of the \(a_e\cdot x+ b_e\) form, then the price of anarchy is at most \(\frac{5}{2}\) (Awerbuch et al. 2005).

These results are summarised in Table 2. The last line in the table refers to the route selection. If the model supports individual vehicles, then each vehicle of a traffic flow may select its own route which may be different from the route of other vehicles in the same traffic flow. If the model does not support individual vehicles, then the route selection is on the traffic flow level.

Table 2 Summary of the Main Findings Related to the Routing Game Model

2.1.3 Evaluation

Algorithmic game theory (Nisan et al. 2007) investigated the routing problem where decentralised autonomous decision making is applied by the traffic flows. This game theory model is in line with the assumption of the traffic engineers, who assume that the traffic is always assigned in accordance with the equilibrium (Beckmann et al. 1956; Wardrop 1952). If some restrictions are applied to the model, then the existence of equilibria and an upper limit on the price of anarchy are proved. The restrictions are not too restrictive for traffic engineering, because the travel time on a road is approximately linear function of the traffic flow up to the point when the traffic flow reaches the maximum capacity of the road and the road becomes jammed. These results show how bad the overall traffic is when decentralised autonomous decision making is applied by the traffic flows. The limit on the price of anarchy shows that autonomous decision making may result in considerable overhead, and that some kind of coordination is needed to come near to the social optimum.

The routing game model has limitations. The above highlighted properties of the routing game model are important findings, however this model describes a situation where each flow (controlled by a single agent) continuously occupies its route, and if another flow assignment is considered, then the change is considered to take effect immediately along the full route. Transition periods are ignored. It is not realistic to assume that a traffic flow of autonomous vehicles is controlled by a single agent, because the users of autonomous vehicles would feel that they are under control of a central agency. It is more realistic to assume that each subsequent vehicle of the traffic flow would make its own autonomous decision. It is not realistic to assume that the change in the traffic flow would take effect immediately, either. The linear cost function does not take into account that the roads have maximum capacity.

The Nash equilibrium concept of the classic game theory model has other limitations as well (Halpern 2011). The model has several idealistic assumptions: all the agents know the current game, all the agents know what the equilibrium is, all the agents know what other agents do, and all the agents know what their role is in the equilibrium. However, in reality, the agents may not know exactly the game, because there may be changes any time: the traffic flows may change, the cost function may change due to road works and accidents, etc. In addition, the agents may not exactly know the equilibrium all the time, if finding the equilibrium is computationally hard. Even if the agents know what the equilibrium is, the agents would need some kind of coordination to position themselves in the traffic assignment of the equilibrium.

The classic game theory models describe static situations, while agent behaviour involves time. Evolutionary game theory investigates a kind of time aspect.

2.2 Evolutionary dynamics of repeated games

The evolutionary dynamics of games (Weibull 1997) is usually investigated in repeated games where the agents receive feedback by observing their own and other agents’ action and cost, and in the next game they change their own action based on these observations. The agents use an adaptation algorithm to decrease their cost. There are different types of dynamics depending on the kind of feedback received by the agents. We discuss the replicator dynamics and the no-regret dynamics of repeated routing games.

The replicator dynamics of repeated non-atomic routing games is investigated in (Fischer and Vöcking 2004; Sandholm 2001). In replicator dynamics, the agent observes its own and its opponents’ cost in each game. If an opponent achieved considerably better result, then the agent switches its strategy to that of the opponent. The probability of the switch is proportional to the difference in the costs. This is similar to what people often do: each day on the way to work, they observe other vehicles going through a parallel route to the same destination as they do, and if the other vehicles arrive earlier, then the next day they change to the other route. This replicator dynamics is modelled with the following equationFootnote 1: \(\dot{f_p^i}=\lambda _p^i \cdot f_p^i \cdot (\mathop {{{\,\mathrm{avg}\,}}}\limits _{{\hat{p}} \in P_i}(c_{{\hat{p}}}(f_{{\hat{p}}}^i)) - c_p(f_p^i))\), where \(\lambda _p^i\) is the factor for the path p of commodity i. The equation means that the speed of the relative change in flow assignment \(f_p^i\) is proportional to the difference between the average cost of commodity i and the cost of the flow assignment \(f_p^i\). Fischer and Vöcking (2004) proved that this replicator dynamics keeps \(r_i\) constant. An important property of this replicator dynamics is that there is no exploration: if no traffic is assigned to a path (i.e. \(\exists p \in P_i: f_p^i=0\)), then traffic will never be generated on that path, and if some traffic is assigned to a path, then the traffic will never become zero on that path.

The no-regret dynamics of repeated non-atomic routing games is investigated in (Blum et al. 2006). The non-atomic routing game is played in a sequence 1, 2, ..., T, with traffic flow assignments \(f_{(1)},f_{(2)},...,f_{(T)}\). The agent \(a_i\) of commodity i incurs the costs \(c_{(1)i},c_{(2)i},...,c_{(T)i}\) in this game sequence. For a path \(p \in P_i\) and game step t, \(f_{(t)p}^i\) denotes the traffic assignment to path p. The regret R(T) of agent \(a_i\) at a given game is the difference between the agent’s average cost in the already played games and the cost of the best fixed path in hindsight for the agent’s commodity: \(R(T) = ( \frac{\sum _{t=1}^Tc_{(t)i} )}{T} - \mathop {\min }\limits _{p \in P_i}( ( \frac{\sum _{t=1}^T c_{p}(f_{(t)p}^i))}{T} ) \). An adaptation algorithm for selecting traffic assignments at each routing game is no-regret if, for any sequence of flow assignments, the expected regret goes to zero as the number of games goes to infinity. We can say that if an agent applies no-regret adaptation algorithm, then the adaptation algorithm is a machine learning algorithm that learns the best fixed traffic assignment for all games. The regret R(T) is an expected value, and the no-regret algorithm contains randomness in order to allow for exploration: the no-regret algorithm may assign some traffic to a path that had no traffic assigned in the previous game.

2.2.1 Solution concept

The solution concept of the evolutionary dynamics of repeated games is the \(\epsilon \)-approximate equilibrium, which is an approximation of an evolutionarily stable equilibrium. So first we are going to discuss what “evolutionarily stable equilibrium” means, and how “approximation” can be defined. Sometimes “approximation” is called “convergence”.

The issue of equilibrium is a bit more complex in evolutionary dynamics of games than in static games. The Nash equilibrium states that no agent can gain by deviating from the equilibrium. However, not all Nash equilibriums are evolutionarily stable. A Nash equilibrium is not stable over the cycles of evolution, if it is possible that one of the agents deviates from the Nash equilibrium without increasing its cost, and than the reaction of other agents takes the game to another Nash equilibrium. The Nash equilibrium is an evolutionarily stable equilibrium (Maynard-Smith and Price 1973), if all the agents increase their cost if they deviate from the equilibrium. If the system is in an evolutionarily stable equilibrium, then none of the agents is willing to deviate from this equilibrium over the cycles of evolution.

The equilibrium of evolutionary games has an approximation aspect as well. The game evolves and converges to the equilibrium, but it may not come to the exact equilibrium in finite time. In addition, the no-regret dynamics allows that the agents sometimes select exploratory routes which may not be close to the equilibrium. An ”approximate equilibrium” has to be defined. The approximate equilibrium of evolutionary games is measured with the proportion of the agents close to a kind of equilibrium.

In replicator dynamics, the approximate equilibrium is defined in relation to the average cost (Fischer and Vöcking 2004). The f flow assignment is in \(\epsilon \)-approximate equilibrium if the proportion of the flow that is assigned to routes far from the average is below a threshold, namely \(f_\epsilon \le \epsilon \) where \(f_\epsilon = \sum _{p \in P_\epsilon } f_p\) where \(P_\epsilon = \bigcup _{i=1}^n \{p \in P_i | c_p(f_p^i) \ge (1+\epsilon ) \cdot \mathop {{{\,\mathrm{avg}\,}}}\limits _{{\hat{p}} \in P_i } (c_{{\hat{p}}}(f_{{\hat{p}}}^i)) \}\). Note that the traffic assignment in an \(\epsilon \)-approximate equilibrium might be quite different from the Nash equilibrium if some part of the traffic flow is assigned to very slow paths and no traffic is assigned to the fast paths. In this case, the replicator dynamics does not find such fast paths, because it does not do exploration. In contrast, the no-regret dynamics might take the system out of this kind of \(\epsilon \)-approximate equilibrium, because the no-regret dynamics does exploration. This is why the replicator dynamics usually makes the assumption that at least some small traffic is already assigned to each possible path, including the fast paths.

In no-regret dynamics, the approximate equilibrium is defined in relation to the minimum cost (Blum et al. 2006). The f flow is in \(\epsilon \)-Nash equilibrium if the total cost under this flow assignment is close to the total cost when every cost is replaced with the minimum cost of its own commodity, namely \( \sum _{i=1}^n \mathop {{{\,\mathrm{avg}\,}}}\limits _{p \in P_i } (f_p^i \cdot c_p(f_p^i)) - \sum _{i=1}^n r_i \cdot \mathop {\min }\limits _{p \in P_i } (c_p(f_p^i)) \le \epsilon \). Note that the \(\epsilon \)-Nash equilibrium is expected to come to close to a Nash equilibrium over time, because there is exploration in no-regret dynamics, and sooner or later the fast paths are discovered. The \(\epsilon \)-Nash equilibrium of no-regret dynamics is a special case of \(\epsilon \)-approximate equilibrium where the system approximates a static Nash equilibrium.

The speed of convergence is also important issue in the evolutionary dynamics of games. We would expect that the agents quickly come close to the static equilibrium.

2.2.2 Main results

Fischer and Vöcking (2004) consider a restricted non-atomic routing game, which includes only those paths of the original game that have at least some small traffic assigned initially, and where the adaptation probability is the same on all flows of the non-atomic routing game. Then they prove that the repeated routing game with replicator dynamics converges to the Nash equilibrium of this restricted routing game. If the cost functions of the non-atomic routing game are strictly increasing, then this Nash equilibrium is evolutionarily stable with replicator dynamics. If the initial traffic flow assignment has at least some traffic flow on each path of the original routing game, then the traffic flow assignment converges to the Nash equilibrium of the original routing game.

For single-commodity non-atomic routing games, the replicator dynamics converges to an \(\epsilon \)-approximate equilibrium within time \({\mathcal {O}} (\epsilon ^{-3} \cdot ln(\frac{c_{max}}{ {\bar{c}}_{opt}}) ) \) where \(c_{max}\) is the possible maximum cost of the paths in the network and \({\bar{c}}_{opt}\) is the average cost at the global optimum. For multi-commodity routing games, the replicator dynamics converges to an \(\epsilon \)-approximate equilibrium within time \({\mathcal {O}} (\epsilon ^{-3} \cdot \frac{c_{max}}{ {\bar{c}}_{opt} } ) \). Note that the convergence to the \(\epsilon \)-approximate equilibrium is convergence to the Nash equilibrium only if the initial traffic flow assignment has at least some traffic on each path of the Nash equilibrium.

For non-atomic routing games with linear cost functions, the no-regret dynamics converges to the \(\epsilon \)-Nash equilibrium (Blum et al. 2006). However the \(\epsilon \)-Nash equilibrium is true only for the average traffic assignment after some time, and not for every single traffic flow assignment after some time. This is because no-regret algorithms may explore new possibilities, and may occasionally make traffic assignment out of the \(\epsilon \)-Nash equilibrium. The speed of convergence depends on the applied no-regret algorithm. Blum et al. (2006) also proved that the cost of an \(\epsilon \)-Nash equilibrium traffic assignment is close to the cost of the Nash equilibrium, the difference depends on \(\epsilon \). Because of this, the price of anarchy results of the static routing game apply to these no-regret dynamics games when they are in \(\epsilon \)-Nash equilibrium.

Repeated atomic routing games are investigated in (Blum et al. 2006) only on limited games: single commodity, only parallel paths, and only cost functions equal to the traffic flow.

These results are summarised in Table 3.

Table 3 Summary of the Main Findings Related to the Evolutionary Dynamics of Repeated Games

2.2.3 Evaluation

In repeated routing games, the decisions are on the flow level, i.e. a flow is an agent. The repeated game approach captures the evolutionary dynamic between routing games. This might be a good model of human behaviour when the route selection is based on the experience of the previous day. However, individual decision making is not included in the model, only average human behaviour on the flow level. The repeated routing game has similar limitations as the algorithmic game theory model: each flow continuously occupies its route, changing to another route is considered to take effect immediately along the full route, and all the agents know what other agents do.

The evolutionary dynamics in repeated games needs to collect information to make decisions. In the case of replicator dynamics, the average cost of commodity i in the last game and the cost of the given path in the last game must be known to decide how to change the flow assignment on the given path in the next game. In the case of no-regret dynamics, the cost of all paths of commodity i in all played games must be known to determine which flow assignment would have been the best in hindsight. The no-regret dynamics aims to learn the best fixed flow assignment for all games.

The learning algorithm of the replicator dynamics is just computing the above discussed replication equation of the model. The learning algorithm of the no-regret dynamics is complex and different versions were developed (Kalai and Vempala 2005). The speed of convergence to the \(\epsilon \)-Nash equilibrium depends on the applied algorithm.

Both the evolutionary and the no-regret dynamics converge to the equilibrium, therefore they can be applied in practice to find out the static equilibrium traffic assignment. However, the evolutionary dynamics assumes that the same static game is repeated in each game, i.e. the traffic flows (r in the routing game model) do not change from game to game, and the traffic flow assignment is static during the game. In addition, the computation needs to collect all the incurred costs in all games, which probably needs a service, either per commodity or for all commodities. If the equilibrium is learnt during the repeated games by the service, then the service can guide the route selection of individual autonomous vehicles. The service has to collect feedback from individual autonomous vehicles. The feedback from the individual autonomous vehicles is needed to let the service know the current traffic flow assignment, and the guidance to individual autonomous vehicles is needed to make the autonomous vehicles keep to the traffic flow assignment of the equilibrium.

Altogether, the evolutionary dynamics of repeated games is useful if we are interested in the static equilibrium of static games, and instead of computing the equilibrium, we want to learn it by repeating the game. The drawback is that the the evolutionary dynamics of repeated games does not handle changing traffic conditions (e.g. increased cost on a road because of an accident), therefore it is less useful for autonomous vehicles that want to adapt to traffic that evolves over time.

2.3 Queuing model

The non-atomic queuing network (Cominetti et al. 2017) (sometimes called fluid queuing network) is a way of investigating how non-atomic traffic flows evolve over time. Non-atomic queuing networks investigate continuous-time dynamics instead of repeated game dynamics. Continuous-time dynamics means that the traffic flows may evolve over time during the routing game. The main difference between the non-atomic queuing network and the routing game is how the edges and their cost are modelled. An edge (shown in Fig. 2) consists of a queue followed by a link which has a constant delay \(delay_e\) and a maximum capacity \(capacity_e\). The capacity is the maximum possible traffic flow on the edge. The non-atomic queuing network introduces the concept of time into the model. At time t, the length of the queue is \(queue_e(t)\). The flow entering the edge at time t is \(inflow_e(t)\), and the flow exiting the edge is outflow(t).

Fig. 2
figure 2

Queuing model of an edge

The cost for a “particle” that enters the edge at time t is the waiting time in the queue plus the constant delay: \(c_e(t) = \frac{queue_e(t)}{ capacity_e} + delay_e\). The cost is a function of time, and the inflow influences the cost indirectly through the queue. If the queue is not empty (i.e. \(queue_e(t)>0\)) or if the queue is empty and the inflow exceeds the capacity (i.e. \((queue_e(t)=0) \wedge (inflow_e(t)>capacity_e)\)), then the speed of the growth of the queue is proportional to the difference between the inflow to the edge and the maximum capacity of the edge: \(\dot{queue_e(t)} =inflow_e(t) - capacity_e \). Otherwise the queue remains zero and never takes a negative value.

The above dynamics of the queue determines the outflow of the edge the following way: if \(queue_e(t)>0\), then \(outflow_e(t+delay_e)=capacity_e\), and if \(queue_e(t)=0\), then \(outflow_e(t+delay_e)=\min (inflow(t),capacity_e)\). The flows are maintained at each vertex, i.e. the sum of the outflows on the incoming edges of the vertex are equal to the sum of the inflows of the outgoing edges of the vertex.

Agents are associated with commodities (source-target pairs), and the allowed strategies of the player associated with commodity i are those traffic flow assignments to the paths where \(\sum _{p \in P_i} f_p^i(t) = r_i\) at every time t.

There are load-dependent transit time extensions of the non-atomic queuing network where \(delay_e\) is not constant. The delay of an edge is treated as a function of only the flow rate at the time of entry to the edge in (Carey and Subrahmanian 2000). However, this model is in most of the cases unrealistic because it does not preserve the first-in-first-out (FIFO) property, which is part of most models. The FIFO property is important for traffic engineering applications, because without this property the traffic flows would create strange patterns, e.g. one vehicle may overjump another. The approach of (Köhler and Skutella 2005) assumes that at each point in time, the entire flow on an edge travels with uniform speed, and that this speed depends only on the current load of that edge (the amount of vehicles on the road). This approach keeps the FIFO property, but if the flow is increasing (or decreasing) at the entrance of the edge, then it immediately slows down (or speeds up) the flow at the end of the edge, which is a bit unrealistic for vehicles.

A kind of “atomic queuing network” is introduced in (Hoefer et al. 2011). They introduce the weighed temporal network congestion game with FIFO policy, which has some similarity to the above non-atomic queuing network, but there are considerable differences. One difference is related to the way the time concept is introduced into the model. In a weighed temporal network congestion game with FIFO policy, the flow \(r_i\) is called task and the \(r_i\) unit of task is sent along the network with a timing of the edges similar to a queuing network, i.e tasks are put into a queue and they are processed in the order of arrival. The tasks (they could be regarded as “flow packets”) are started from their origin node simultaneously at the same time by all players like in the static routing game. Agents are associated with commodities, and the allowed strategies of the agent associated with commodity i are those traffic flow assignments where there is a single \(p \in P_i\) for which \(f_p^i = r_i\), and \(f_{p\prime }^i = 0\) for all other \(p\prime \in P_i\). In other words, the players can select only one path and they cannot divide their flow among several paths. Weighed temporal network congestion games with FIFO policy are in between the static atomic routing game model and the queuing network model.

2.3.1 Solution concept

The solution concept for the queuing model is the dynamic equilibrium all the time (Cominetti et al. 2017), which corresponds to the Nash equilibrium over time defined in (Koch and Skutella 2011) and to the dynamic equilibrium of traffic engineers (Merchant and Nemhauser 1978a, b; Peeta and Ziliaskopoulos 2001).

In order to be able to define the equilibrium in the non-atomic queuing network, the notion of dynamic lowest cost needs to be defined. The difficulty is that the cost of each edge is determined at the time when the flow “particle” enters the edge, and the queues may change by the time the flow “particle” gets to the edge. A ”particle” starting its journey in the network on a path \(p=(e_1 e_2 ... e_k)\) at time t incurs the cost \(c_p(t)= c_{e_1}(t) + c_{e_2}(c_{e_1}(t) + t) + ... + c_{e_k}( c_{e_{k-1}}( ... ) + ... + c_{e_2}(c_{e_1}(t) + t) + c_{e_1}(t) + t ) \). The “particle” might not “know” this cost when it starts its journey.

The “particle” starting at time t travels along a dynamic lowest cost path, if the actually incurred cost is the minimum possible cost among those paths of its own commodity where some traffic is assigned at starting time t. The dynamic lowest cost for commodity i at time t is \(c_{dlc_i}(t) = \mathop {\min }\limits _{p \in P_i \wedge c_{p}(t)>0}(c_{p}(t)) \). The dynamic lowest cost may change all the time.

The commodity i is in dynamic equilibrium all the time if \(\forall p \in P_i \wedge c_{p}(t)>0 : c_p(t) = c_{dlc_i}(t) \) is true all the time. The non-atomic queuing network is in dynamic equilibrium all the time, if all the commodities are in dynamic equilibrium all the time. This definition of dynamic equilibrium corresponds to the Nash equilibrium over time defined in (Koch and Skutella 2011) and to the dynamic equilibrium of traffic engineers (Merchant and Nemhauser 1978a, b; Peeta and Ziliaskopoulos 2001).

The non-atomic queuing network reaches a steady state if there exists a time \({\hat{t}}\) such that all the queues are frozen to a constant, i.e. \(\dot{queue_e(t)} = 0 \) for all edges \(e \in E\) and for all \(t>{\hat{t}}\). The travel times of all edges are constant in a steady state. Note that the flow allocation may change in a steady state on those edges where the inflow does not exceed the capacity of the edge, and the inflow remains below the capacity of the edge with the the change in the flow allocation.

2.3.2 Main results

If the total traffic flow into a non-atomic queuing network exceeds the total capacity of the network, then the queues will grow without limit with time, therefore the results below refer to those instances where this is not the case.

Koch and Skutella (2011) proved that in a non-atomic queuing network the following statements are equivalent: (1) The queuing network is continuously in dynamic equilibrium, (2) On every edge, the total inflow and the total outflow are continuously equal, and (3) No flow overtakes any other flow. Koch and Skutella (2011) also proved that the dynamic equilibrium can be computed in polynomial time in networks where every source-target-path has the same total free flow transit time. The authors also prove price of anarchy properties by considering the snapshots at every moment as a static flow, and then taking the price of anarchy properties for these snapshots.

Cominetti et al. (2015) proved that a dynamic equilibrium exists in single-commodity non-atomic queuing networks with constant inflow. Cominetti et al. (2017) claim that the dynamic equilibrium of a single-commodity non-atomic queuing queuing networks reaches a steady state in finite time.

The above results are in line with the findings of Hoefer et al. (2011) who approach the temporal routing problem from temporal network congestion games. Temporal network congestion games may have different coordination policies, but if they have the FIFO coordination policy, then they are similar to the queuing model defined above. However the queuing model has continuous flows, while the temporal routing problem has tasks which could be considered as “flow packets”, as discussed in the model section. Hoefer et al. (2011) proved that for unweighed single-commodity temporal network congestion games with the FIFO policy, a Nash equilibrium always exists, and a Nash equilibrium can be computed efficiently. The existence of Nash equilibrium is not guaranteed if the temporal congestion game with the FIFO policy is weighed single-source, or if it is unweighed multi-commodity.

The only major result related to the load-dependent transit time extensions of the non-atomic queuing network is that finding the dynamic lowest cost path is computationally hard (Köhler and Skutella 2005).

These results are summarised in Table 4.

Table 4 Summary of the Main Findings Related to the Queuing Model

2.3.3 Evaluation

The queuing model is close to the dynamic traffic flow assignment model of traffic engineers (Peeta and Ziliaskopoulos 2001). However the queuing model does not have the same congestion sensitivity as the classic routing model. The queuing model does not have load-dependent cost of the edge if the queue is empty and the inflow is below the maximum capacity, because in this inflow range the edge has a constant delay. The queuing model has a kind of load-dependent cost of the edge only when the inflow exceeds the maximum capacity of the edge. However, above the maximum-capacity flow, the queue grows to infinity over time at constant inflow. Therefore the queuing model is not a full extension of the static routing game to the time dimension.

Regarding the practical applicability for autonomous vehicles, the queuing model has similar limitations as the static routing game: the model describes well the equilibrium, but does not describe how this equilibrium can be achieved by the individual agents who participate in the game. The queuing model assumes the idealistic situation where the agents have complete knowledge of the current and the future states of all queues and edges. The notion of dynamic lowest cost path is defined, but the computation of the dynamic lowest cost requires the knowledge of future states of the queues in the network. The queuing model does not say how the agents can find this out. If the agents do not know the dynamic lowest cost path, then they do not know how to assign traffic flow to it.

One of the results says that the dynamic equilibrium of single-source non-atomic queuing queuing networks reaches a steady state in finite time. This is nice, because if the traffic reaches a steady state, then there are no big fluctuations in the network, and the traffic is more efficient. However, the model does not say how the steady state is achieved, and how the agents change their strategy over time. The model only assumes that the network is in dynamic equilibrium, i.e. the agents always select dynamic shortest paths, and if this is the case, then the queuing network reaches a steady state. Since the agents do not know which path will be dynamic shortest path, they do not know how to assign traffic to it, and the agents do not know how to bring the network into dynamic equilibrium, nor into steady state.

In addition, the only result related to the dynamic equilibrium and the steady state of multi-commodity networks is that the equilibrium is not guaranteed in unweighed multi-commodity temporal network congestion games with FIFO policy. It seems that a general queuing network is a complex system which may not reach a dynamic equilibrium over time.

So the final conclusion is that the researchers of this area defined the concept of dynamic equilibrium, and they tried to prove the existence of the dynamic equilibrium in queuing networks. The proof was successful in very special cases. General queuing networks do not seem to have this kind of equilibrium. A major challenge of the future research in this area could be the shift of focus towards sink states or attracting equilibriums (Goemans et al. 2005; Papadimitriou and Piliouras 2016, 2019; Omidshafiei et al. 2019).

2.4 Online routing game

The online routing game model (Varga 2015a) investigates how atomic traffic flows evolve over time. It contains elements of the routing game model, the queuing model and the concept of online mechanisms (Parkes 2007).

The online routing game model has both the concept of the maximum capacity (from the queuing model) and the inflow dependent delays below the maximum capacity (from the routing game model). Therefore, the online routing game describes the load-dependent cost of the edges in all flow ranges. The online routing game model may comprise other important aspects as well: feedback from the traffic network, intention-awareness and intention aware prediction.

In the online routing game model (Varga 2015a), the traffic flow is made up of individual agents who follow each other, and the agents of the traffic flow individually decide which path to select, depending on the real-time situation. The online routing game model is like an atomic routing game model, but it is over time, and it combines the static atomic routing game model (Roughgarden 2007) with the online mechanism concept (Parkes 2007). The online routing game model resembles the static routing game model in the concepts of flow, cost, vertices and edges, and it resembles the model of online mechanisms in the sequences of time periods and decisions. A T is introduced into the model to represent one time unit, because the edges have load-dependent travel time. The cost of the edges depend (among others) on the incoming flow, i.e. the number of vehicles entering the edge in a time unit.

The online routing game model is the sextuple \(<t, T, G, c, r, k>\), where \(t=\{1, 2, ...\}\) is a sequence of time steps, T time steps give one time unit (e.g. one minute), G is a directed multi-graph representing the road network, c is the cost function of G with \(c_e\) for each edge e of G, r is a vector of flows, and \(k=(k^1, k^2, ... )\) is a sequence of decision vectors \(k^t=(k^t_1, k^t_2, ... )\) made in time step t.

Edges have FIFO property, and there is a minimum following distance \(gap_e\) on the edges which corresponds to the maximum capacity in the queuing model. The cost function maps the flow \(f_e(\tau )\) (that enters the edge e at time \(\tau \)) to the travel time on the edge. The cost \(c_e\) for the agent entering the edge e at time step t is never less than the remaining cost of any other agent already on edge e at time step t increased with the time gap \(gap_e\), which is specific to edge e. This ensures the FIFO policy. The value of the flow \(f_e(\tau )\) is the number of agents that entered the edge e between \(\tau -T\) (inclusive) and \(\tau \) (non-inclusive). If two agents enter an edge exactly at the same time \(\tau \), then one of them (randomly selected) suffers a delay \(gap_e\), which is part of its cost on edge e, and its remaining cost is determined at the delayed time, so its cost on edge e will be \(gap_e+c_e(f_e(\tau +gap_e))\). If several agents enter an edge at the same time, then they are randomly ordered with a delay \(gap_e\). The cost functions are nonnegative, continuous, and nondecreasing. The cost functions have a constant part which does not depend on the inflow to the edge, and a variable part which varies with the inflow to the edge. The variable part is not known to any agent of the model. The agents learn the variable cost only when an agent exits an edge, and reports its cost to other agents.

Figure  3 illustrates the concepts of the edge of the online routing game model. The travel time is computed when an agent enters the edge, and the computed travel time models the environment of the agent. The travel time is reported when the agent exits the edge, and the other agents perceive the travel time on the edge through the reported travel time.

Fig. 3
figure 3

Online routing game model of an edge

The decision made by the agent of the flow \(r_i\) in time period t is \(k^t_i\). The decision \(k^t_i\) is how the agent is routed on a single path of the paths leading from \(s_i\) to \(t_i\). The actual cost of a path \(p=(e_1, e_2, e_3, ...)\) for a flow starting at time \(\tau \) is \(c_p(\tau )=c_{e_1}(f_{e_1}(\tau ))+c_{e_2}(f_{e_2}(\tau +c_{e_1}(f_{e_1}(\tau ))))+c_{e_3}(f_{e_3}(\tau +c_{e_1}(f_{e_1}(\tau ))+c_{e_2}(f_{e_2}(\tau +c_{e_1}(f_{e_1}(\tau ))))))+...\) , because the actual cost of an edge is determined at the time when the flow enters the edge.

Intention-aware Online Routing Games are a special type of routing games, where the agents can perceive their environment as described above, and in addition they also receive aggregated information about the intentions of other agents in the system. As we will see in the next subsection, online routing games may behave in an undesirable way. One of the causes of this bad behaviour is that agents become aware of congestions with delay, only after the congestion is formed. These theoretical results formally underpin the conjecture in (Wahle et al. 2000), where the authors run traffic simulations and conclude that one of the reasons for the unwanted behaviour is that the real-time information reflects the state of the network some time ago. The agents may have to make intertemporal decisions, and they may have to take into account what the expected traffic will be by the time they get to a given road. Selecting different routes involves different future points of time. The future traffic might be completely different from the currently observed real-time situation.

In order to facilitate the agents to make predictions, and to include the future state of the traffic in their decisions, intention-aware prediction methods were proposed. In the intention-aware prediction methods, the agents communicate their intentions to a service. The role of the service is to support stigmergic communication among the agents, and it is often implemented by bio-inspired techniques (Zafar et al. 2011). The service aggregates the data about the agent collective, and it sends a feedback to the agents (Claes and Holvoet 2014). The intention-aware (de Weerdt et al. 2016) and the intention propagation (Claes et al. 2011) approaches are based on this scheme. The coordination mechanism provided by these schemes can scale with the complexity of real-world application domains.

The online routing game model was extended to intention-aware online routing game (Varga 2015a) to include intention-aware prediction. When an agent has made a decision on its planned route, then it sends its selected intention to the service. The service forecasts future traffic states. The prediction is based on the current state and the intentions (planned routes) of the agents. The service provides the forecasts back to those agents who are still planning their route, and these agents use this information to make decision, and when they have made a decision, then they also communicate their intention to the service.

2.4.1 Solution concept

We would prefer if online routing games converged to a specific state like a Nash equilibrium, but as we can see in the previous section, no guarantees are known for the existence of the dynamic equilibrium over time in general queueing networks. Many results in different dynamical games show that many games exhibit complex unpredictable behaviour, or in the best case, they converge to a cycle (Palaiopanos et al. 2017). Online routing games seem to be complex games that tend to cycle around a kind of equilibrium. The solution concept of online routing games could be this convergence to a cycle, however the prove of this convergence is not yet proved. Instead, the online routing game seems to fluctuate around an attracting equilibrium, and this fluctuation is measured empirically. Two empirical measurement concepts are defined to measure the convergence to a cycle: the benefit of online real-time data and the measure of intertemporal equilibrium.

One of the aims of the online routing game model is to investigate if the agents are better off by making decisions based on online real-time data, or not. Online real-time data are the costs reported by the agents when they exit the edges. In order to measure if the agents are better off with real-time data, the concept of benefit of online real-time data is introduced (Varga 2014). The worst(/average/best) case benefit of online real-time data at a given flow is the ratio between the cost of the maximum(/average/minimum, correspondingly) cost of the flow and the cost of the same flow with using the same decision making strategy (e.g. lowest cost) but only with the constant part of the cost functions.

Online routing games are continuously evolving games: agents join the game in a sequence, they influence the game for a while, and then they quit the game. In this game, the decisions of the agents are intertemporal choices: the current decision of the agent may affect the cost of the agents in the future. The equilibrium of this evolving game is the intertemporal equilibrium, as described in (Varga 2018b).

Intertemporal equilibrium (Hicks 1975) has two interpretations in economic theory. One interpretation is related to the intertemporal aspect of the choice, e.g. is it better to spend now or is it better to save now and spend later. In this approach, the critical point is the expectations of the agents. The other interpretation is related to the temporal aspect of the equilibrium: at any given time, the economy is in disequilibrium, and the equilibrium can be interpreted only in the long term. The intention-aware online routing game model focuses on the latter interpretation, and takes into account that agents have intertemporal expectations.

Because it is not known if convergence to the equilibrium of intention-aware online routing games can be guaranteed or not, the intertemporal equilibrium is characterised by a measurement value with three components for each commodity: 1) the worst case difference as the maximum of the difference between the maximum cost and the static equilibrium cost, normalised with the static equilibrium cost, 2) the average difference as the difference between the average travel cost of the agents and the static equilibrium cost, normalised with the static equilibrium cost, and 3) the quasi equilibrium within the disequilibrium is the average of the absolute difference between the cost of the agents consecutively finishing the game, normalised with the static equilibrium cost. Thus the quantitative measure of intertemporal equilibrium is defined the following way:

Let \(ORG=<t, T, G, c, r, k>\) be an online routing game over the finite sequence of time steps t. Let \(c_{r_i}(\tau )\) be the cost (i.e. total travel time) of the agent of trip \(r_i\in r\) when it exits the game at the destination of trip \(r_i\) at time step \(\tau \in t\). Let \(e_{r_i}\) be the static equilibrium travel time for trip \(r_i\).

The measure of intertemporal equilibrium of ORG is \(<WD,AD,QE>\) where

  • \(WD=\mathop {\max }\limits _{r_i\in r}(\mathop {\max }\limits _{\tau \in t} ( \frac{c_{r_i}(\tau )-e_{r_i}}{ e_{r_i} } ))\)

  • \(AD=\mathop {{{\,\mathrm{avg}\,}}}\limits _{r_i\in r}(\frac{ \mathop {{{\,\mathrm{avg}\,}}}\limits _{\tau \in t}(c_{r_i}(\tau )-e_{r_i}) }{ e_{r_i} } )\)

  • \(QE=\mathop {{{\,\mathrm{avg}\,}}}\limits _{r_i\in r}(\mathop {{{\,\mathrm{avg}\,}}}\limits _{\tau \in t} ( \frac{ |c_{r_i}(\tau )-c_{r_i}(\tau +1)|}{ e_{r_i}}))\) .

2.4.2 Main results

It is proved that if the agents of the online routing game selfishly try to minimise their cost computed from the currently observable cost (real-time data), then equilibrium is not guaranteed, although a static equilibrium exists (Varga 2014). ”Single flow intensification” may also happen: agents subsequently entering the online routing game select alternative faster routes, and they catch up with the agents already on route, and this way they cause congestion. As a result, sometimes some online routing games may produce strange behaviour, and the agents may be worse off by exploiting real-time information than without exploiting real-time information (i.e., the worst-case benefit of online real-time data is bigger than 1).

It is proved (Varga 2015a) that there is no guarantee on the equilibrium, even if intention-aware prediction is applied. The WD component of the intertemporal equilibrium can be above any bound in specific networks. This means that if the agents selfishly exploit intention-aware predictions, then in some networks and in some cases the agents may be worse off by exploiting real-time information and prediction than without. However, it is proved (Varga 2016b) that in a small but complex enough network of the Braess paradox (Braess 1968), where there is only one source–destination pair, the agents might just slightly be worse off in the worst case with real-time data and prediction (i.e., the worst-case benefit of online real-time data is just slightly bigger than 1). It is also proved (Varga 2017) that in the Braess network of Varga (2016b), the system converges to the static equilibrium within a relatively small threshold (i.e. the WD component of the intertemporal equilibrium is relatively small). The conjecture in (Varga 2016a) says that the system converges to the static equilibrium in bigger networks as well, if simultaneous decision making is prevented. This conjecture neither has been proved nor refuted analytically.

Investigations of more complex networks indicate that the prediction method applied by the prediction service has a great impact on the quantitative measures of the intertemporal equilibrium. The formal description of the algorithms of two intention-aware prediction methods were presented in (Varga 2018a): the detailed prediction method and the simple prediction method.

The detailed prediction method takes into account all the intentions already submitted to the service, then it computes what will happen in the future if the agents execute the plans assigned by these intentions, and then it computes for each route in the network the predicted travel time by taking into account the predicted future travel times for each road of the route. The prediction algorithm used in (de Weerdt et al. 2016) is close to this detailed prediction method, but the main difference is that the prediction algorithm of (de Weerdt et al. 2016) uses probabilistic values, while the detailed prediction method is deterministic.

The simple prediction method also takes into account all the intentions already submitted to the service, and then it computes what will happen in the future if the agents execute the plans assigned by these intentions. However, when the simple prediction method computes for each route in the network the predicted travel time, then it takes into account only that travel time prediction for each road which was computed at the last intention submission. This way, the simple prediction method needs a little bit less computation. The simple prediction method is a kind of approximation and does not try to be an exact prediction of the future. As time goes by, if no new prediction is generated for a road, then the simple prediction method ”evaporates” the last prediction for that road, like the bio-inspired technique of (Claes et al. 2011).

It turned out in (Varga 2018a) that the detailed prediction method did not lead to a better intertemporal equilibrium in the experiments than the simple prediction method. It seems that the more detailed knowledge of the future from the current situation and intentions may not be better for the traffic.

These results are summarised in Table 5.

Table 5 Summary of the Main Findings Related to the Online Routing Game Model

2.4.3 Evaluation

The congestion sensitivity of the online routing game model in ”normal” traffic flow ranges is like the congestion sensitivity of the routing game model, and the online routing game model can handle excessive flow ranges as well, because \(gap_e\) defines a maximum capacity for the edge e. The maximum capacity is essential in those cases when two or more vehicles want to enter an edge at almost the same time and these vehicles have to be ”serialized”. The online routing game model includes evolutionary dynamics over time as well, because the agents of the traffic flow make decisions in a sequence. This way, the online routing game combines the features of all the models mentioned above: the routing game model, the evolutionary dynamics of repeated games, and the queuing model.

An important new feature of the online routing game model is that it includes the concept of perception. The routing game model and the queuing model do not say how the agents make decision, these two models assume that the agents make decision with full rationality, i.e. by knowing everything about the game and the other agents. The evolutionary dynamics of repeated games includes a kind of perception, but only from the previous games, which must have the same set-up. In contrast to the other models, the online routing game model includes continuous perception, and the agents can always take into account the real-time status when they make decision. This way, the online routing game is like the navigation software available in modern vehicles, and in autonomous vehicles as well.

The results of the online routing game model are mainly negative: exploiting real-time information, in some networks, may sometimes result in unwanted traffic. This is observed in real traffic situations as well, so the online routing game model points out a real problem that needs to be solved.

The intention-aware online routing game has the same advantages and disadvantages as the online routing game model. In addition, the intention-aware online routing game allows the investigation of those cases, where a service can predict the future from the current traffic situation and the intentions of the agents. The online navigation applications like Google Maps, Waze, TomTom, etc. know the intentions of the agents they serve, and they could use this information to make predictions. They could exactly tell what would happen in the near future if the agents that receive the routing plan from these applications exactly followed these plans. Therefore the intention-aware online routing game model can be used to investigate such information technology solutions, and how these information technology solutions can serve autonomous vehicles.

Unfortunately the developers of the above mentioned online navigation applications do not publish how they take into account (if they do) the future traffic conditions in their route planning. The intention-aware online routing game model can be used to investigate different prediction methods in order to design prediction methods that produce better traffic for autonomous vehicles.

The intention-aware online routing model has been used in empirical simulation investigations, and the analytic proofs are steel needed. The model is rather new and the attention of theoretical researchers has to be attracted. Nevertheless, the empirical investigations pointed out that if the autonomous vehicles do not use prediction, or if they use an improper prediction method, then it may create worse traffic than the static equilibrium would indicate.

2.5 Summary of the evaluation of the models against the objectives of the survey

The objectives of the survey was described in Subsection 1.3. The above subsections contain the surveyed models and their evaluation. The evaluation of the discussed models against the objectives of the survey is summarised in Table 6.

Table 6 Summary of the Evaluation of the Models against the Objectives

3 Relation of the models to traffic engineering

In the first part of this section (Subsection 3.1), we illustrate with a few examples the connection of the models of the previous section to traffic engineering approaches. The examples are meant to attract the attention of traffic engineers. A full survey could be the topic of another paper.

In the second part of this section (Subsection 3.2) we discuss traffic simulators. The models of the previous section are built from mathematical concepts, and they mainly focus on macro level issues like the existence of the solution concept and the limit on the price of anarchy. The traffic engineering approaches are built from practical point of view. The traffic engineering solutions are often tested in traffic simulators before implementing them in reality. The traffic simulators are important not only for traffic engineers, but for the models in this paper as well, because the simulator approach provides a solution to find or to approximate an equilibrium traffic assignment, which can be used to test the solution concepts of the models, like Antal et al. (2020) do.

3.1 Traffic engineering examples

The Wadrop equilibrium (Beckmann et al. 1956; Wardrop 1952) rule in traffic engineering states that travellers always try to chose the path with least travel effort from origin to destination. As a result, traffic is assigned in a way that no traveller can decrease travel effort by changing to another path. This is in line with the routing game model of Subsection 2.1. The routing game model identifies the conditions to prove the upper limit on the price of anarchy. The traffic assignment is used to design road networks. This way traffic engineers can design road networks with reduced price of anarchy.

Smith et al. (2015) investigate day-to-day re-routing and day to day green-time response. Their scenario is similar to the evolutionary dynamics of repeated games discussed in Subsection 2.2, where the criteria for convergence to an equilibrium is identified. Smith et al. (2015) investigate two types of responsive control policies, and they show a counter example for which there is no Wardrop equilibrium consistent with one of the policies. The model of Subsection 2.2 could be useful to identify the general criteria for the good control policies. The model of Subsection 2.2 focuses on the equilibrium, whereas Smith et al. (2015) are interested in the throughput of the network. They show that the throughput of the evolution of the network meets any constant feasible demand under one of their responsive control policies.

Boyer et al. (2015) investigated the problem of the stability properties of traffic equilibria when online adaptive route choice is based on GPS-based decision making. The GPS-based decision making is similar to the online routing game model of Subsection 2.4. Boyer et al. (2015) identify conditions on the network topology and the latency functions of the network for stable traffic equilibria, but only for a single destination problem. Similarly, the queuing model of Subsection 2.3 could prove the existence of equilibrium and stability only for single destination problems. Using the model of Subsection 2.4, Varga (2017) was able to prove that if the GPS-based decision making is improved with prediction, then the system converges to the static equilibrium within a relatively small threshold in a Braess network where there is only a single source destination pair.

Traffic engineers are interested in the stability and the throughput of traffic flows. Zhang et al. (2020) use the car following model, and they show that a well designed cruise control system can greatly improve traffic flow stability, which results in higher traffic throughput.

Knoop et al. (2010) showed that in the case of dynamic traffic assignment, even for a very small network, the addition of a link can increase the travel time, similarly to the static case of the Braess paradox. Knoop et al. (2010) call this the dynamic extension of the Braess paradox. Using the online routing game model of Subsection 2.4, (Varga 2015b) pointed out the same paradox phenomena in autonomously self-adapting navigation.

3.2 Traffic simulators

Traffic engineers usually apply traffic simulations to evaluate planned changes in the road infrastructure before deploying the changes in the real world. The traffic engineer creates an expected traffic scenario for a planned road network, then the traffic simulation software assigns the traffic to the roads in the simulated road network, and then the simulation software executes the simulation scenario with this traffic assignment. The travel times and other parameters are measured during the run of the simulation scenario. The measured parameters are used to predict and judge the adequacy of the planned road network.

There are macroscopic and microscopic traffic simulations. Macroscopic simulations deal with aggregated elements of the traffic system, like traffic flows. Microscopic simulations deal with individual elements of the traffic system, like individual vehicles.

A traffic simulation software is usually based on a model which is very close to the real world. The main model in microscopic traffic simulation software is the car following model (Pourabdollah et al. 2017) which replicates real driving behaviours, and may be defined for different vehicles like cars, trucks, buses, etc. The car following model assumes that a vehicle maintains a safe space and time gap between itself and the vehicle that precedes it. Usually this behaviour results in slower vehicle speeds if the density of the vehicles is higher. According to empirical measurements, the speed is a linear function of the density (in a reasonable range) (Notley et al. 2009). If we assume that the incoming flow is constant into the edges of the routing game model (Subsection 2.1) and the online routing game model (Subsection 2.4), then the incoming traffic flow into an edge is proportional to the following distance on the edge. Therefore the linear cost function in the above models is a good approximation of the cost function resulting from the car following model (Poissonnier et al. 2019; Nagatani 2002; Treiber et al. 2000). The routing game model is a good approximation of the macroscopic simulation, and the online routing game is a good approximation of the microscopic simulation.

Although the focus of our survey is not on traffic simulators, we give a short overview of traffic simulation systems. From the late 1990’s until recently many reports on simulation system comparison have been published, e.g. (Jones et al. 2004; Ratrout and Rahman 2009; Kotushevski and Hawick 2009; Matcha et al. 2020). Table 7 summarises the most important simulators. The summary is based on (Saidallah et al. 2016; Pell et al. 2017). The references to the simulators in the table can be found in (Barceló 2010). In the case of commercial simulators, very often, the information can be found on their web sites.

The free and open simulation software set SUMO (Simulation of Urban MObility) traffic simulation suite (Lopez et al. 2018) was used in the investigations of the online routing game model, because it is open source, easily available to researchers, has a lot of functionality, and provides good statistics.

Table 7 List of Traffic Simulators

There are also other advanced 3D simulators, not included in the table, like the CARLA Simulator (Dosovitskiy et al. 2017), the LGSVL SimulatorFootnote 2 and the MATISSE (Multi-Agent based TraffIc Safety Simulation systEm) (Torabi et al. 2018; Al-Zinati and Wenkstern 2019). These simulators require powerful computers with GPU video cards, because they provide realistic 3D views which can be used to generate camera and LIDAR sensor data. The AgentDrive simulator (Schaefer et al. 2016, 2017) focuses mainly on aspects like trajectory planning, collision avoidance, and cooperative lane changing. The main focus of these simulators is on the agent-environment interactions, like the sensor data and the local control of the vehicle. Our focus is on the routing level control, and these simulators are less relevant for our survey. The focus of our survey is specifically on the evaluation of the models of the routing problem as stated in the introduction section.

3.2.1 Traffic simulation solutions

Because traffic simulation software is not a formal model, it does not have it own solution concept. However, the SUMO simulation software gives tools to find the solution concept of traffic engineering, which is the Wardrop equilibrium (Wardrop 1952). The Wardrop principle says that none of the vehicles can reduce its travel cost (usually the travel time) by using a different route. On the macro level, the Wardrop equilibrium corresponds to the Nash equilibrium of the static routing game. On the micro level, each vehicle of the traffic flow can select a different route, like in the queuing model and in the online routing game model. Using the terms of the queuing model and the online routing game model, we can say that the system is in Wardrop equilibrium on the micro level, if each vehicle selects the route with the dynamic lowest cost. If this is the case during the whole simulation run, then the system is in dynamic equilibrium all the time. In traffic engineering, the “dynamic equilibrium all the time” concept is called dynamic user equilibrium.

The macro level simulation corresponds to the routing game model. If the cost functions of a routing game are linear, then at least one Nash equilibrium exists (see the routing game model in Subsection 2.1), however finding the Nash equilibrium may take a long time, because it is computationally hard. The traffic simulation software tries to find the Nash equilibrium iteratively, which is basically playing the repeated routing game with replicator dynamics (see Subsection 2.2). Replicator dynamics converges to the Nash equilibrium in the case of non-atomic games. If the vehicles can be regarded “infinitesimally” small, then traffic flows in traffic simulation can be approximated with non-atomic games.

The micro level simulation is closest to the online routing game model among the discussed models. Unfortunately the dynamic equilibrium all the time is not known to exist for any online routing game. Nevertheless, the SUMO simulation software provides tools to find a dynamic user equilibrium in an iterative way. The first step of the iteration is to find the shortest routes for the vehicles using the currently observable travel times on the edges. The next step of the iteration is to modify the route selections for the vehicles with a given probability towards the observed faster routes. This is like applying the replicator dynamics in repeated games (Subsection 2.2). These steps are repeated for given times. Unfortunately no guarantee on the convergence to a dynamic user equilibrium is known, therefore deciding on the number of iterations is an experimental process. If we want to relate this iterative process to the discussed models, then this iterative process is similar to playing the online routing game (Subsection 2.4) repeatedly with replicator dynamics. The SUMO simulation software provides two methods to adjust the probabilities during the iteration process: the Iterative Assignment (Dynamic User Equilibrium) method of Gawron (1998) and the classic logistic regression method (Hosmer 2013).

Because playing the online routing game iteratively requires long time and a lot of resources, the SUMO simulation software can replace the online routing game with the queuing model (Subsection 2.3), because it needs less resources. Nevertheless, the iterative method still needs too much resources and time to be applied in an autonomous vehicle. In addition, playing a complete scenario involves knowing the future, and the autonomous vehicle does not know the future. Therefore the iterative dynamic user assignment method is not a realistic solution for an autonomous vehicle to select its route.

3.2.2 Evaluation of the traffic simulation approach

Simulated traffic is very close to real world traffic, which is a very complex system. Traffic simulation software applies most of the concepts and models discussed in the previous Section 2. Unfortunately the more complex a model, there are less guarantees on the existence of a useful solution of the model, and there are no analytical solutions. The best we can say is that the traffic is fluctuating and we can investigate this fluctuation through simulation. Iterative simulation with evolutionary probabilistic adaptation may find a solution, but because the existence of dynamic equilibrium is not guaranteed, there is no guarantee that the iterative simulation will reach an exact dynamic equilibrium traffic assignment.

4 Open questions and challenges

The classic routing game model has been studied for a long time, and the most important issues have been investigated. This model is mainly established, assuming that the routing problem is viewed from a static point of view. The results point out the main issues of the routing problem, and the answers to the issues may serve as a reference point for further research. The main open question related to the routing game model is whether the static concepts of the model can be applied to dynamic systems. The investigation of this question needs other non-static models, like those surveyed in this paper.

The investigation of the repeated routing game model proved that a good machine learning algorithm can make the system converge to a static equilibrium. An open question is to find the best machine learning algorithms or methods that suit the routing problem and can assure the quick convergence to the static equilibrium. If such algorithms and methods are developed, then the practical challenge of finding the best route for the daily commute in the usual cases can be supported.

The queuing model includes dynamic aspects of the routing problem, and this model can be used to investigate how the traffic evolves over time, but congestion sensitivity is included in the model only above the maximum capacity of the edges. The analytical investigation of the properties of the multi-commodity games has a lot of challenges, because multi-commodity queuing games do not seem to converge to or stay in a stable equilibrium. Unfortunately the queuing model does not fit to the route selection process of individual vehicles, therefore any analytical result can only be used in the design of the road network and the traffic in the network. If a limit on the deviation from the equilibrium can be proved for certain type of multi-commodity queuing networks, then this result could be used to design road networks which could have a limiting effect on the fluctuation of the traffic in the road network.

The analytical results related to the online routing game model have only shown the lack of stable equilibrium and the lack of any guaranteed limit on the fluctuation of the traffic in a general network. Intention-aware online routing games are more promising. The big challenge is to find good intention-aware prediction methods to limit the fluctuation of the traffic, and to avoid unwanted occasional delays of the vehicles. A good intention-aware prediction method may combine intention-aware prediction with typical historical data about the traffic. If such prediction methods can be found, then these methods could be used to guide individual vehicles in a smooth and fast way.

Table 8 Summary of the Open Questions and Challenges

Table 8 summarises the open questions and challenges. The above questions and challenges are related to the fundamentals of the routing problem. If the questions of these fundamental issues are answered, then the next step is to enrich the models with traffic engineering details mentioned in the introduction section of this survey. This is again a big challenge.

5 Conclusions

In this paper we have reviewed computer science models of the routing problem. This topic is important, because more and more autonomous vehicles will be deployed, and the routing of these vehicles needs proper computer science models to be able to develop the best information technology for their control. The autonomous vehicles are situated in their environment, which means that they perceive their environment, make decisions and then take actions. Autonomous vehicles are like autonomous agents, and the collective of autonomous vehicles is a multi-agent system. The critical question is whether these perception-decision-action cycles actually lead to the same behaviour as the behaviour from the classical traffic flow models.

The least complex model is the routing game model which basically corresponds to the Wardrop macro model of traffic engineers. This model is suitable for analytical proofs of general properties, like the existence of the static equilibrium and the limit of the price of anarchy. However the routing game model assumes full information and rationality, and it does not describe well that the autonomous vehicles are situated in their environment.

The repeated routing game approach captures a kind of perception-decision-action cycle, however it assumes that the same static game is played repeatedly all the time, only the strategies of the agents may change. The evolutionary dynamics of repeated games leads to the static equilibrium if the games satisfy some conditions. This model is mainly suitable to make the agents learn the static equilibrium of routing games. It is useful to know the static equilibrium, however this model cannot directly guide the perception-decision-action cycles of autonomous vehicles.

The queuing model is more complex, and it describes somewhat the dynamic timing aspect of routing games. There are analytical results for the equilibrium of the non-atomic queuing model, but if load-dependent transit times are taken into account, or if multi-commodity games are considered, or if atomic queuing models are considered, then positive analytic results are hard to achieve. In addition, the queuing model assumes full rationality, and it does not describe well that the autonomous vehicles are situated in their environment.

The online routing game model is the most complex, and it describes the best the collective behaviour of the autonomous vehicles with perception-decision-action cycles. The complexity of the model hinders the achievement of analytical results. Nevertheless the empirical investigations pointed out that if the traffic flows are generated by autonomous vehicles that are not coordinated, or they do not have proper prediction methods, than sometimes the traffic may be worse than the traffic indicated by classical traffic flow models. The intention-aware online routing game approach could achieve similar results as the iterative traffic simulation approach. While iterative traffic simulation is not suitable for the embedded autonomous vehicles, the intention-aware online routing game approach can be adjusted to the perception-decision-action cycles of autonomous vehicles. This highlights the importance of research on intention-aware traffic flow prediction methods for autonomous vehicles.