Computing Dynamic User Equilibrium on Large-Scale Networks Without Knowing Global Parameters

Dynamic user equilibrium (DUE) is a Nash-like solution concept describing an equilibrium in dynamic traffic systems over a fixed planning period. DUE is a challenging class of equilibrium problems, connecting network loading models and notions of system equilibrium in one concise mathematical framework. Recently, Friesz and Han introduced an integrated framework for DUE computation on large-scale networks, featuring a basic fixed-point algorithm for the effective computation of DUE. In the same work, they present an open-source MATLAB toolbox which allows researchers to test and validate new numerical solvers. This paper builds on this seminal contribution, and extends it in several important ways. At a conceptual level, we provide new strongly convergent algorithms designed to compute a DUE directly in the infinite-dimensional space of path flows. An important feature of our algorithms is that they give provable convergence guarantees without knowledge of global parameters. In fact, the algorithms we propose are adaptive, in the sense that they do not need a priori knowledge of global parameters of the delay operator, and which are provable convergent even for delay operators which are non-monotone. We implement our numerical schemes on standard test instances, and compare them with the numerical solution strategy employed by Friesz and Han.


Introduction
This paper is concerned with a class of models known as Dynamic User Equilibrium (DUE). DUE problems have been studied within the broader context of Dynamic Traffic Assignment (DTA), which is concerned with modeling time-varying traffic flows consistent with established traffic flow theory. DTA models are greatly influenced by Wardrop's equilibrium principle [52], which is seen as a Nash-like equilibrium condition in an aggregative game: (a) Wardrop's first principle, also known as the user optimality principle, states that road segments used in an equilibrium should display the same travel costs (i.e. delay); 1 arXiv:2010.04597v2 [eess.SY] 15 Jun 2021 (b) Wardrop's second principle, known as the system's optimality principle, assumes that drivers behave cooperatively, in making travel decisions so that the over system costs (aggregate delays) are minimized.
Logically, the behavioral maxims (a) and (b) are disconnected, and a substantive literature in transportation research is concerned with the design of computational architectures aligning these potentially conflicting principles. Since the seminal work of [33,34], dynamic extensions of Wardrop's principles have paved the way to the introduction of notions like DUE and Dynamic System Optimal (DSO) models. For comprehensive reviews of DTA models, we refer to [28,39,51]. In the last two decades there have been many efforts to develop a theoretically and sound formulation of DUE, acceptable to modelers and practitioners alike. Analytical DUE models tend to be of two varieties: (1) Route Choice (RC) DUE [12,33,34,55], and (2) Simultaneous Route and Departure Choice (SRDC) DUE [13][14][15]41]. Both types of DUE rest on two pillars: 1. A mathematical notion of equilibrium; 2. A model of network performance, based on some physical laws describing traffic flows.
The second pillar is known in the literature as Dynamic Network Loading (DNL). Equilibrium is usually expressed in terms of Wardrop's first principle. Mathematical approaches to describe equilibrium contain variational inequalities (VI) [13,55], nonlinear complementarity problems [25,38], differential variational inequalities [11,37] and fixed point problems [15]. In this paper we choose the VI formulation of DUE, and our aim is to advance computational techniques for the practical solution of DUE. Our research builds on, and extends, recent advances in computational approaches to DUE reported in [24]. As is well known computing user equilibrium is a challenging task; Its main complication arises since it constitutes an interconnected computational procedure, coupling equilibrium computation with DNL. The DNL, which could be understood as the first layer of the problem, aims at describing the spatial and temporal evolution of traffic flows on a network that is consistent with established route and departure choices of travelers. This is done by formulating appropriate dynamics to flow propagation, flow conservation, link delay, and path delay on a network level. In general, DNL models have the following components: 1. Some form of link and/or path dynamics; 2. An computationally-friendly relationship between flow/speed/density and link traversal time; 3. Flow propagation constraints; 4. A model of junction dynamics (Riemann Solvers) and delays; 5. A model of path traversal times, and 6. Appropriate initial conditions. DNL generates the path delay operator, which is the key input when computing an equilibrium given the delays on user routes (travel costs). This is the second layer of the problem, and of main interest in this paper. At this layer one has to use some equilibrium solver, whose performance depends significantly on the information we have about the structural properties of the delay operator. However, since the delay operator is itself the result of a computational procedure, it is not available in closed form, and thus one is confronted essentially with a black-box upon which we can assume whatever we find useful, but the empirical validation of these assumptions is very hard. It is thus of utmost importance to have at our disposal efficiently implementable algorithms which are:
(i) Adaptive to arrival of new information about unknown global parameters; (ii) Provably convergent under mild monotonicity assumptions.
We argue that, up to now, none of the perceived DUE solvers meet both of these criteria. To support this claim, we present Table 1, where the current state-of-the-art in DUE computation is summarized. 1 We infer from Table 1 that known algorithmic strategies for solving the DUE problem require knowledge about the global Lipschitz constant and some sort of monotonicity of the path delay operator. Since the delay operator is not given to us in closed form, both assumptions are practically not verifiable. Algorithmic strategies which are provably convergent without explicit knowledge of these global properties, are thus to be seen as a very valuable contribution.

Our Contributions
This paper makes a significant step-ahead relative to the perceived computational literature on DUE, by describing two numerical algorithms acting directly in infinite-dimensional Hilbert spaces. Our algorithms share the following features: (i) Strong convergence to a single user equilibrium; 1 In this table we focus on algorithms acting directly on the infinite-dimensional Hilbert space formulation of DUE. A much larger literature on this topic exists which is concerned with finite-dimensional approximations. In the parlance of numerical mathematics, the latter would correspond to a first discretize, then optimize strategy. As the two approaches are quite different, it would not provide fair comparisons.
(ii) Adaptive step-size choices without the need to know global Lipschitz parameters of the delay operator; (iii) Provably convergent under a plain pseudo-monotonicity assumption on the path delay operator.
(iv) Include inertial and relaxation effects to potentially speed up the convergence.
While items (ii) and (iii) don't need much motivation, our emphasis on strongly convergent methods seems to be somewhat pedantic at first sight, so it deserves some words of explanation. In infinite-dimensional settings strongly convergent iterative schemes are much more desirable than weakly convergent ones since strong convergence translates the physically tangible property that the energy h n − h * 2 of the error between the iterate h n and a solution h * eventually becomes arbitrarily small. Of course, any numerical solution technique designed for solving a problem in infinite dimensions must be applied to a finite-dimensional approximation of the problem. Exactly in such situations strongly convergent methods are extremely powerful, because they guarantee stability with respect to numerical discretization. In fact, [17] demonstrated that strongly convergent schemes might even exhibit faster convergence rates as compared to their weakly convergent counterparts. It seems therefore fair to say that strong convergence is an extremely desirable property of solution schemes, with clearly observable physical consequences on the performance and stability of algorithms. As a matter of fact, [15] employs a projected gradient iteration of Halpern type [3,18], which forces trajectories to converge strongly to some DUE.
Adaptivity in the step-size policy frees us from any unavailable information about the global Lipschitz constant of the delay operator. It allows us to tune the step size on-the-fly and guarantees convergence for general pseudo-monotone operators with good performance properties.
Operator splitting methods with inertia and relaxation have received quite some attention in recent years, see e.g. [1,27,32]. These schemes are motivated by Nesterov's accelerated method [35], and therefore the main motivation for inertial methods is to speed up the convergence rate. To the best of our knowledge this is the first time that inertial and relaxation effects are investigated in the context of DUE computation and under weak pseudo-monotonicity assumptions.
Remark 1.1. In previous work [10] investigated the DUE with a strongly convergent FBF variant. This paper replaces and significantly extends our previous work by the explicit consideration of inertial effects.

Organization of the paper
Sections 2 and 3 describe user equilibrium and the DNL procedure we use in our numerical experiments. In setting up these two layers we follow closely [24]. Section 4 describes the algorithms we construct and investigate in this paper. Building on the MATLAB toolbox publicly available at https://github.com/DrKeHan/DTA and documented in [24]. We report the outcomes of our experiments in Section 5. Technical facts and proofs are organized in Sections 6.1 and 6.2.

Dynamic User Equilibrium
We introduce a few notations and terminologies for the ease of presentation below.
• P: set of paths in the network.
• W: set of origin-destination (O-D) pairs in the network.
• Q w : fixed O-D demand between w ∈ W.
• P w : subset of paths that connect O-D pair w.
• t: continuous time parameter in the fixed time horizon [t 0 , t 1 ].
• h p (t): departure rate along path p at time t.
• A p (t, h): effective travel cost along path p with departure time t under the path profile h.
• ν w (h): minimum travel cost between O-D pair w ∈ W for all paths and all departure times. For each O-D pair w ∈ W we are given an exogenous demand Q w > 0; This represents the number of drivers who have to travel from the origin to the destination described by w. The list Q = (Q w ) w∈W is often called the trip table. In DUE modeling, the single most crucial ingredient is the path delay operator, which maps a given vector of departure rates (path flows) h to a vector of path travel times. We stipulate that path flows are square integrable functions over the planning horizon, so that h p ∈ L 2 ([t 0 , t 1 ]; R + ) and h = (h p ; p ∈ P) ∈ H := L 2 ([t 0 , t 1 ]; H). To measure the delay of drivers on paths, we introduce the operator D : H → H, h → D(h), with the interpretation that D p (t, h) is the path travel time of a driver departing at time t from the origin of path p, and following this path throughout. This operator is the result of some DNL procedure, which is an integrated subroutine in the dynamic traffic assignment problem. See Section 3 for a description of the DNL used in our computational experiments.

Formulation of DUE as a Variational inequality
On top of path delays, we consider penalty terms of the form φ(t + D p (t, h) − τ), penalizing all arrival times different from the target time τ > 0 (i.e. the usual time of a trip on the O-D pair w). The function φ : [−∞, ∞) → [0, ∞] should be monotonically increasing with φ(a) > 0 for a > 0 and φ(a) = 0 for a ≤ 0. Define the effective delay operator as (2.1) We thus obtain an operator A : H → H, mapping each profile of path departure rates h to effective delays We follow the perceived DUE literature, and stipulate that Wardrop's first principle holds: Users of the network aim to minimize their own travel time, given the departure rates in the system. Thus, a user equilibrium is envisaged, where the delays (interpreted as costs) of all travelers in the same O-D pair are equal, and no traveler can lower his/her costs by unilaterally switching to a different route. To put this behavioral axiom into a mathematical framework, we first formulate the meaning of "minimal costs" in the present Hilbert space setting. Recall the essential infimum of a measurable function g : [t 0 , t 1 ] → R as ess inf{g(t) : t ∈ [t 0 , t 1 ]} = sup x ∈ R : Leb({s ∈ [t 0 , t 1 ] : g(s) < x}) = 0 , where Leb(·) denoted the Lebesgue measure on the real line. Given a profile h ∈ H, define On top of minimal costs, we have to restrict the set of departure rates to functions satisfying a basic flow conservation property. Specifically, insisting that all trips are realized, we naturally define the set of feasible flows as The set of feasible flows X is sequentially closed and convex, but not sequentially compact (i.e. path departure rates are note a-priori assumed to be bounded as the above definition involves Lebesgue-integrable functions). We are now ready to give our first definition of user equilibrium.
. We denote by Ω ⊂ X the (possibly empty) set of DUE.
In [13] it is observed that the definition of DUE can be formulated equivalently as a variational inequality VI(A, X): This notion of equilibrium is very useful, since it allows us to apply a large variety of algorithms to solve VI(A, X), and in fact it can be seen as the basis of most of the computational approaches to DUE. We now spell out sufficient conditions guaranteeing existence of DUE.

Assumption 1.
• The penalty function φ : [t 0 , t 1 ] → R + is continuous and there exists ∆ > −1 such that (2.6) • The DNL satisfies the FIFO principle and each link has finite capacity.
• The effective delay operator is weak-to-weak continuous on bounded subsets of X. Proof. See [19].
The construction of the delay operator requires a specification of a DNL (i.e. traffic flow generation). We focus in this work on a macroscopic model of network loading based on fluid dynamic approximations of traffic flow on networks, known as the Lighthill-Whitham-Richards (LWR) model [30,42]. The LWR model is able to describe the physics of kinematic waves (e.g. shock waves, rarefaction waves), and allows network extension that capture the formation and propagation of vehicle queues as well as vehicle spill-back. We will formulate the LWR-based DNL as a system of partial differential algebraic equations (PDAE), which uses vehicle density and queues as the unknown variables, and computes link dynamics, flow propagation, and path delay for any given vector of path departure rates. 6

The differential variational inequality formulation
It has been observed in [14] that DUE can be equivalently formulated as a differential variational inequality [37]. From an algorithmic point-of-view this relation is interesting as it allows us to use time-stepping methods to compute approximate user equilibria [11,15]. Independent of algorithmic considerations, we regard this identification as an important conceptual insight, and thus deserves some remarks here. The precise connection between DVI and DUE goes as follows: ; w ∈ W} as the state trajectory of a controlled dynamical system with the interpretation that x w (t) is the cumulative traffic up to time t on paths connecting the origin-destination pair w ∈ W. The definition of this state-variable requires that its dynamic evolution is described by the linear differential equation Additionally, it must satisfy the natural initial and boundary-value conditions The differential variational inequality describing DUE reads then as follows: Find h ∈ H such that (2.7), (2.8) and the instantaneous optimality condition holds. Note that this defines a time-dependent complementarity system which has been used in a DUE model with a simplified bottleneck structure in [38]. See [15] for a formal proof on the correctness of this interpretation.

Dynamic Network Loading
The purpose of this section is to explain the dynamic network loading model used in our numerical investigation. We are considering the LWR model on networks, adopting the description in terms of a system of Differential Algebraic Equations (DAE). This formulation of the DNL procedure has the advantage over its mathematically equivalent description in terms of a system of partial differential algebraic equations that it avoids the use of partial differential operators, and thus is much more amenable to numerical discretization strategies.

The Lighthill-Whitham-Richards link model
Network loading acts on the same oriented graph G = (V, E) as in Section 2, where links I i ∈ E have a certain length measured by the interval [a i , b i ]. The within-link dynamics are captured by the scalar conservation law The fundamental diagram f i (ρ) = ρ · v i (ρ) is assumed to be continuous, concave and vanishes is the jam density on link I i . Moreover, there exists a unique global maximum of f i at the value ρ c i . We focus on the triangular fundamental diagram where v i , w i > 0 denote the forward and backward kinematic wave speeds, respectively. At junctions we need to make sure that relevant boundary conditions are satisfied to respect basic physical principles. Consider a junction with m incoming and n outgoing links. At each such junction, the following conservation property must hold: This condition simply means that inflow into the junction equals outflow. However, this condition alone does not guarantee a unique flow profile at these m + n links. Additional conditions, usually formulated in terms of Riemann solvers and demand/supply conditions must be imposed. We refer to [5,16] for reviews.

The variational representation of link dynamics
While (3.1) captures within-link dynamics, the inter-link propagation of congestion requires a careful treatment of junction dynamics. The overall system of PDEs leads to a complex system of junction dynamics and conservation laws which is very hard to handle computationally. We follow a different approach here, which is more amenable to numerical computations. We briefly introduce a variational representation of the link dynamics, based on the generalized Lax-Hopf formula, originally developed in [2,6,7], which leads to a DNL procedure in terms of a system of differential algebraic equations (DAE). Compared to the flow-based approach described in Section 3.1, the DAE based formulation has the following main advantages: (1) the primary variable is flow instead of density); (2) no partial differential operators are involved; (3) it introduces simplified boundary conditions. We only give a high-level description of this approach, detailed enough so that the reader is able to understand the mechanics of the numerical solver. A rigorous description can be found in [16].
Consider the Moskowitz function N i (t, x) which measures the cumulative number of vehicles that have passed location x along link I i by time t. The following identities hold: It follows immediately that N i (t, x) satisfies the Hamilton-Jacobi equation Denote by f in i (t) and f out i (t) the link I i inflow and outflow. The cumulative link entering and exiting vehicle counts are defined as where "up" and "down" represent the upstream and downstream boundaries of the link, respectively. [23] derive explicit formulae for the link demand and supply based on a variational formulation known as the Lax-Hopf formula [2,6,7], as follows: and These two relations express the link demand and supply, which are inputs of the junction model, in terms of N up and N down . This means that one no longer has to compute the dynamics within the link, but focus instead on the cumulative counts at the two boundaries of the link. Note that, when discretizing the DNL in time, we immediately obtain the link transmission model [54]. In general, the approach just described gives rise to the link-based formulation of DNL [23].

Junction Dynamics
In a path-based DNL procedure one must incorporate established routing information into the junction model. Such information is usually formulated by some behavioral assumption on drivers' preferences. In the numerical scheme we consider, such information is provided in terms of an endogenously given flow distribution matrix is the proportion of flow incoming into link i and continuing by following link j at a given junction. Abstractly, if Θ represents some junction model, we have the functional relationship .,m and f in (t) = ( f in j (t)) j=1,...,n , are the computed incoming and outgoing flows.
Dynamics at the origin nodes At the origin nodes, we employ a simple point-queue model, in the spirit of Vickrey [49]. Let o be a given origin node, and denote by q o (t) the volume of the point queue. Let link j be connected to the origin. We assume that where P o denotes the set of paths originating from o. The first term on the right represents the inflow into the queue, while the second term represents flow leaving the queue, modeling the demand at the origin as ) denotes the composition of two functions. λ o (t) is the exit time function for the potential queuing at the origin o.

Fixed Point formulation of DUE
Once a DNL procedure has been fixed, the effective delay operator A(h) can be evaluated. The definition of DUE allows us to construct a suitable fixed-point problem which is the basis for the design of iterative numerical schemes for computing DUE. In fact, it is easy to see that h * ∈ H is a path-departure rate profile corresponding to a DUE if and only if the residual is zero, i.e. r τ (h * ) = 0, τ > 0. Here, we call P X (x) the orthogonal projection in L 2 onto the set X ⊂ H. A classical iterative scheme to find the roots of a nonlinear function is the Picard fixedpoint iteration to localize a fixed point of the map h → P X (h − τA(h)). Under strong a-priori continuity and monotonicity assumptions on the effective delay operator A, the projected gradient (a.k.a. forward-backward) method, Algorithm 1, generates a sequence {h n } n∈N which will weakly converge to some DUE. This iterative solver is used in the software package developed in [24], and has also been employed in many studies before. Weak convergence (see Definition 6.1 in Section 6.1) of the thus constructed sequence {h n } n∈N is known when the operator A is inverse strongly monotone (co-coercive) with modulus µ > 0 provided that the step sizes τ ∈ (0, 2µ). Note that co-coercivity is equivalent to Lipschitz continuity with Lipschitz constant 1 µ . Thus, for making method (4.1) a provably convergent algorithm, we need to know the Lipschitz constant to pin down an upper bound on the step sizes. Strong convergence of {h n } n∈N requires even stronger uniform monotonicity assumption of the operator A over the set X (Theorem 25.8 [3]), 2 or other modifications of the basic template (4.1) are needed. [15] present a strongly convergent variant of (4.1) using a Halpern-type modification of the basic scheme above. Both assumptions, Lipschitz continuity and uniform monotonicity, are very restrictive in the context of computing DUE. While continuity of the effective delay operator has been established in the context of the LWR network loading procedure [22], monotonicity estimates are hardly available for realistic DNL procedures and not very likely to hold in practice. Therefore, strongly convergent algorithm which are provably convergent to a solution under mild monotonicity assumptions are highly desirable for modeling, optimization and simulation of traffic networks.

Computing DUE under weak assumptions
Our aim is to design and study alternative numerical schemes for computing DUE, which require significantly less stringent a-priori assumptions on the delay operator, but still come with rigorous convergence guarantees. We summarize our working assumptions below, while Section 6.1 gathers precise mathematical definitions for the readers' convenience. To cope with the unavailable information about the Lipschitz constant, we construct adaptive algorithms, and thus do not need information of this hardly available global parameter. Instead a simple and efficient update procedure of the step size parameters is proposed which depends on pointwise variations of the delay operator rather than global variations. This is a major advantage of the methods we propose here, both from a conceptual and practical point of view, as it allows us to decide step-sizes "online".
The next assumption is concerned with the structural properties with impose on the delay operator.

Assumption 3. The delay operator A is pseudo-monotone on H:
For all h 1 , h 2 ∈ H, we have Pseudo-monotonicity is a significant weakening of the (strict) monotonicity required when applying the fixed point iteration scheme (4.1). Some intuition for this concept can be given by considering the simpler case when the operator is integrable. Any smooth real-valued function f : H → R induces an operator A : H → H via its gradient A(h) = ∇ f (h) (unique thanks to the Riesz representation theorem). Note that f is (strictly) convex if and only if the gradient map is a (strictly) monotone operator. If f is merely quasi-convex, the gradient operator is pseudo-monotone and vice versa. Assumptions 1-3 are the standing hypothesis for the rest of this paper. Building on them, we now describe the numerical schemes we analyze.
Our basic algorithmic design principle follows the forward-backward-forward (FBF) splitting scheme, originally due to [48]. In its original form, it ensures that path flows will weakly converge if Stopping condition not satisfied then Compute y n = P X (h n − τ n A(h n )), Update the step size sequence

end if end for
to a DUE, provided that the delay operator is monotone and Lipschitz continuous in the L 2 norm. In the special case of variational inequalities, it has been shown that pseudo-monotonicity suffices for weak convergence [4]. Actually, one can easily see that weak convergence holds for a large class of non-monotone VIs satisfying an angle property at the solution set [9], and this is the main reason why FBF is an attractive numerical solution scheme for DUE. FBF updates a current path departure rate profile h by first applying (4.1), in order to produce the extrapolated search point y = P X (h − τA(h)) (first forward-backward step). It then performs another forward step in path space, by calling the DNL procedure at the just constructed extrapolation point y, and shifts density into directions where the difference in the "travel costs" between the current path flow h and the new search point y is large. Algebraically, this leads to the correction step h + = y + τ(A(h) − A(y)). We would like to emphasize that this correction step does not involve an additional projection onto the feasible set X. This reduces the computational complexity of FBF relative to its close cousin the extragradient algorithm due to [29], and speeds up the computations in practice whenever projections are expensive to implement (see [4] for extensive numerical evidence supporting this claim).
In order to force strong convergence of the sequence of path departure rates {h n } n∈N , we augment the scheme by an Halpern-type relaxation procedure. The pseudo-code of the resulting DUE solver is displayed in Algorithm 2.
Algorithm 2 has been analyzed in [10] in detail. In particular, we demonstrated strong convergence of the numerical scheme by proving Theorem 4.1 below.
Update the step size Update the inertia parameter Then the sequence {h n } generated by Algorithm 2 converges strongly to h * = arg min{ z : z ∈ Ω}.
Departing from here, our aim in this paper is to significantly extend our previous work by designing a new FBF-based inertial algorithm, which meets all the desiderata spelled out in the introduction: i) Adaptive step-sizes, ii) Weak monotonicity, and iii) strong convergence.
To achieve a possible convergence acceleration and to meet conditions i)-iii), we include relaxation and inertial effects into our algorithm. To the best of our knowledge, this is the first available, provably convergent, relaxed-inertial splitting algorithm for computing DUE.
The basic idea behind inertial algorithms is to use information accumulated from past iterations in order to introduce momentum. This is achieved by computing the extrapolated point z = h + α(h − h ) in the first step of each iteration. The introduction of momentum is classical, and can be traced back to the heavy-ball method of Polyak [40]. We adapt momentum by injecting relaxations steps in a disciplined way to force the trajectory to converge strongly to a DUE. The so-constructed new strongly convergent method, to be called the inertial forward-backward-forward (IFBF) algorithm, is displayed in Algorithm 3.
In the convergence analysis of Algorithm 3 it turns out that any positive sequences { n } n∈N , {β n } n∈N ⊂ are admissible for strong convergence of the sequence of path flows {h n } n∈N generated by this Algorithm. The sequence {τ n } n∈N has the same role as the step size λ in the basic fixed point iteration (4.1). Hence, we have to choose it small enough to ensure convergence (theoretically smaller than the reciprocal of the Lipschitz constant of the delay operator). Nevertheless we can realize IFBF without any a-priori knowledge of the Lipschitz constant by implementing the adaptive step-size policy (4.6). As we will see in Lemma 6.7, the step size sequence {τ n } n∈N has a limit and lim n→∞ τ n = τ ≥τ := min µ L , τ 0 .
The parameter α can be any constant in (0, 1). The main theoretical result of this paper reads as follows.

Numerical Experiments
We present preliminary computational examples of the simultaneous route-and-departure-time dynamic user equilibrium on the Nguyen network [36] and the Sioux falls network. Detailed network parameters, including coordinates of nodes and link attributes, are sourced and adapted from [24]. Given that our DUE and DNL formulations are path-based, enumeration of paths was applied to generate the path set using the Frank-Wolfe algorithm. We apply Algorithm 2 and Algorithm 3 with the embedded DNL procedure based on a timestepping scheme discretizing the PDAE formulation described in Section 3. We compare our method with the projected gradient algorithm (4.1), as implemented in the MATLAB toolbox documented in [24]. 3 As remarked in [24], projected gradient requires strong monotonicity to   All the computations reported in this section were performed using the MATLAB (R2017b) package on a standard desktop with Intel i5 processor and 8 GB of RAM.

Performance of the fixed-point algorithm
The termination criterion for the fixed-point algorithm is set as follows:

Performance of the Algorithms
We run all three methods for fixed number of iterations and report the last iterate of the algorithm (h Final ), the corresponding effective delay operator (A(h Final )), a numerical merit function (GAP), as well as a measure for the speed of convergence. The construction of our numerical merit function follows [24]. It is designed to measures the distance to equilibrium via the following version of a gap function for all w ∈ W. Hence, GAP w represents the range of travel costs experienced by all drivers in the given O-D pair w ∈ W. In fact, it is clear that GAP w ≥ 0, and in an exact DUE, the gap should be zero for all O-D pairs, justifying the interpretation of GAP as a numerical merit function.  is regulating the mesh-size of the time grid in the numerical solution of the DNL. max. Iterations is the total number of iterations we let all three algorithms run on each test instance, and λ is the relaxation parameter in Algorithm 3. The construction of local parameters has been done in a simple way, without involving extensive search over the parameter space which would very likely improve the reported results. The FB Algorithm 1 has been implemented as in [24], using the constant step size τ = 10 on the Nguyen, and τ = 2 on the Sioux falls test network. FBF and IFBF (Algorithms 2 and 3) are implemented with the adaptive step-size (4.3) and (4.6), respectively. The relaxation and inertial sequences employed in FBF and IFBF are reported in Table 4.  Table 4: List of method-specific parameters. N.N. stands for "not needed".
shows the distribution of the values of our merit function (5.1) on the Nguyen network, and Figure  3 displays the same statistic for the Sioux falls network.
We see that in all our experiments the distribution of the O-D gaps is concentrated around 0.2 across all networks for FB and IFBF. This suggests that these algorithms preform similarly in terms of producing approximate equilibrium solutions. The decisive advantage of IFBF is however that it is guaranteed to converge strongly to the minimum norm solution, without requiring strict monotonicity of the delay operator. Figure 4 displays the path departure rates as well as the corresponding effective path delays on randomly selected paths in the considered test networks. We see from these plots that the path departure rates peak out around the minima of the effective delay, which reflects equilibrium behavior on the routes.
We finally display a figure which gives some indication on the relative speed of convergence of each of the tested algorithms. We compute for each method the "relative energy" sequence which measures the decay of energy of the path departure rates generated by the algorithms. [24] call this the relative gap, and we follow this terminology in the labeling of the figures. For all our methods this sequence must converge to 0, and we can consider one method faster than the other if the rate of convergence of the energy sequence dominates the other. Figure 5 shows the evolution of the relative energy sequences for each method. We see that already non-optimized step size parameters lead to some acceleration in the IFBF scheme when compared to other solvers.
6 Convergence Analysis

Preliminaries
The purpose of this section is to collect some standard concepts from real Hilbert spaces. Throughout this section we let H be a real Hilbert space with scalar product ·, · and associated norm · . In order to prove our main convergence results, we need the following standard facts. For all x, y ∈ H and α ∈ R, we have x + y 2 ≤ x 2 + 2 y, x + y . (ii) P C (x) − y 2 ≤ x − y 2 − x − P C (x) 2 for all y ∈ C.
3. sequentially weakly continuous if x n x then A(x n ) A(x).
The next classical fact shows that solutions of VI(X, A) defined in terms of pseudo-monotone operators can be determined via "weak formulation".
The next technical results are basic convergence guarantees for real-valued sequences. Lemma 6.5. [53] Let {a n } n∈N be a sequence of nonnegative real numbers, and {β n } n∈N be a sequence in (0, 1) such that n β n = ∞. Suppose that {b n } n∈N is a sequence with lim sup n b n ≤ 0. If a n+1 ≤ (1 − β n )a n + β n b n ∀n ∈ N then lim n→∞ a n = 0. Lemma 6.6. [43] Let {a n } n∈N be a sequence of nonnegative real numbers, {β n } n∈N be a sequence of real numbers in (0, 1) with n β n = ∞ and {b n } n∈N be a sequence of real numbers. Assume that a n+1 ≤ (1 − β n )a n + β n b n for all n ≥ 1.

Proof of Theorem 4.2
We start with an auxiliary technical result, which guarantees that weak cluster points of the algorithm produce solutions of DUE. It is based on techniques from [50]. Lemma 6.8. Let {w n } be a sequence generated by Algorithm 3. If there exists a subsequence {w n k } convergent weakly to z ∈ H and lim k→∞ w n k − y n k = 0, then, having Assumption 1-3 in place, we know z ∈ Ω.
Proof. Recall that y n = P X (w n − τ n A(w n )). By Lemma 6.2(i), we have w n k − τ n k A(w n k ) − y n k , x − y n k ≤ 0, ∀x ∈ X, or, equivalently, 1 τ n k w n k − y n k , x − y n k ≤ A(w n k ), x − y n k , ∀x ∈ X.
Consequently, we have 1 τ n k w n k − y n k , x − y n k + A(w n k ), y n k − w n k ≤ A(w n k ), x − w n k , ∀x ∈ X. (6.5) Since {w n k } is weakly convergent, it is bounded. Then, by the Lipschitz continuity of A, {A(w n k )} is bounded. As w n k − y n k → 0, {y n k } is also bounded and τ n k ≥ min{τ 0 , µ L }. Passing (6.5) to the limit as k → ∞, we get lim inf k→∞ A(w n k ), x − w n k ≥ 0, ∀x ∈ X. (6.6)

Moreover, we have
A(y n k ), x − y n k = A(y n k ) − A(w n k ), x − w n k + A(w n k ), x − w n k + A(y n k ), w n k − y n k ≥ − A(y n k ) − A(w n k ) · x − w n k + A(w n k ), x − w n k − A(y n k ) · w n k − y n k .
Since lim k→∞ w n k − y n k = 0 and A is L-Lipschitz continuous on H, we get from the above lim k→∞ A(w n k ) − A(y n k ) = 0.
Next, we show that z ∈ Ω. We choose a sequence { k } of positive numbers decreasing and tending to 0. For each k ≥ 1, we denote by N k the smallest positive integer such that Since { k } is decreasing, it is easy to see that the sequence {N k } is increasing. Furthermore, for each k ≥ 1, since {y N k } ⊂ X, we can suppose A(y N k ) 0 (otherwise, y N k is a solution) and, setting we have A(y N k ), v N k = 1 for each k ≥ 1. Now, we can deduce from (6.7) that, for each k ≥ 1, Since A is pseudo-monotone (Assumption 3) on H, we get This implies that Now, we show that lim k→∞ k v N k = 0. Indeed, since w n k z and lim k→∞ w n k − y n k = 0, we obtain y N k z as k → ∞. Since {y n } ⊂ X, we clearly have z ∈ X as well. Since A is sequentially weakly continuous on X, {A(y n k )} converges weakly to A(z). We can suppose A(z) 0 (otherwise, z is a solution). Since the norm mapping is sequentially weakly lower semi-continuous, we have Together with {y N k } ⊂ {y n k } and k → 0 as k → ∞, we readily conclude 0 ≤ lim sup k→∞ k v N k = lim sup k→∞ k A(y n k ) ≤ lim sup k→∞ k lim inf k→∞ A(y n k ) = 0.
Hence, lim k→∞ k v N k = 0. Now, letting k → ∞, then the right hand side of (6.8) tends to zero by A is uniformly continuous, {w N k }, {v N k } are bounded and lim k→∞ k v N k = 0. Thus, we get lim inf k→∞ A(x), x − y N k ≥ 0.
Hence, for all x ∈ X, we have A(x), x − z = lim k→∞ A(x), x − y N k = lim inf k→∞ A(x), x − y N k ≥ 0. By Lemma 6.4, z ∈ Ω. This completes the proof.

Conclusion
This paper builds on recent advances in the computational theory of dynamic user equilibrium. Building on the network extension of the LWR model and its formulation in terms of a system of differential algebraic equations. Our aim is to advocate the use of strongly convergent fixed point iterations for computing dynamic user equilibrium which are provably convergent under mild a-priori monotonicity assumptions on the path delay operator, and which are adaptive in the sense that no global bound on the Lipschitz constant needs to be known. We focussed on the construction of new strongly convergent forward-backward-forward algorithms, augmented by relaxation and inertial modifications. We tested the performance of our algorithms in the Nguyen and the Sioux falls network, and provide thereby evidence that our methods improve upon pre-implemented solvers. In future research we aim to improve the fixed point iteration by reducing its complexity in terms of calls of the DNL subroutine. Indeed, the price to pay for provable convergence under weaker assumptions is that under FBF splitting we have to evaluate the delay operator twice per iterations, which is computationally costly. The FB iteration needs only a single call of the DNL, but converges only under strong monotonicity assumptions. In a future publication we will describe a single-call variant of the FBF, which still guarantees strong convergence under the same monotonicity assumptions as used in this paper. Another very interesting direction of research we plan to pursue is to replace the costly DNL procedure by alternative approximation schemes motivated by machine learning approaches as commenced in [44].