Abstract
The paper addresses an optimal ensemble control problem for nonlocal continuity equations on the space of probability measures. We admit the general nonlinear cost functional, and an option to directly control the nonlocal terms of the driving vector field. For this problem, we design a descent method based on Pontryagin’s maximum principle (PMP). To this end, we derive a new form of PMP with a decoupled Hamiltonian system. Specifically, we extract the adjoint system of linear nonlocal balance laws on the space of signed measures and prove its well-posedness. As an implementation of the designed descent method, we propose an indirect deterministic numeric algorithm with backtracking. We prove the convergence of the algorithm and illustrate its modus operandi by treating a simple case involving a Kuramoto-type model of a population of interacting oscillators.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Nonlocal continuity equations on the spaces of probability measures arise as macroscopic mathematical models of multi-agent dynamical systems describing the time evolution of large ensembles (beams, crowds, swarms, populations, networks) of structurally identical objects (e.g., elementary particles, people, animals, “neurons” of natural or artificial neural networks etc.). The main idea is to treat the many-particle dynamics as a whole by focusing on its “statistical” behavior assuming that the agents are homotypic and, therefore, indistinguishable.
Passing to the limit in the number of agents, a large set of individuals (described by a system of many similar ODEs) is replaced by their continual probability distribution, named the “mean field” (driven by a single transport PDE). This idea, rooted in statistical mechanics [28], has been found useful in different areas of applied mathematics such as mathematical biology [20, 21, 27, 39], modeling of pedestrian and urban traffic [25, 26, 42], mathematical neuroscience [37] and even theoretical foundations of artificial intelligence [13, 40, 49, 50], just to name a few.
Recent results in the analysis on the space of measures, achieved in the works of Ambrosio, Gigli, Lott, Otto, Santambrogio, Savaré, Villani, and others, have been proved fruitful for mathematical control theory, largely spurred by the variety of mentioned applications and the needs of control engineering. The starting point was the derivation of a mathematically rigorous “mean field limit” of the classical multi-agent optimal control problem [29, 30] (see also [12, 31]). In the consequent few years, the cornerstones of the classical optimal control theory—such as Pontryagin’s maximum principle (PMP) [7, 9,10,11, 25, 44,45,46], and the dynamic programming method [5, 6, 23, 38]—were extended to the area of mean field control.
The mean field PMP, which is at the focus of the present paper, was obtained on different levels of generality by various mathematical strategies. Its particular version was first derived in [44] for a specific “shepard’s” problem over the local continuity equation, and subsequently, for a general linear problem with relaxed controls [45]; these version of PMP are mainly reconstructed from the differential properties of flows of the driving vector field by standard analytical methods such as Filippov’s lemma. A result of the similar spirit for another particular local problem was recently obtained in [13] as a specification of a more general PMP [11] by an original technique of generalized Lagrange multipliers on the convex subset of Radon measures with unit mass. Notice that, in the local case, PMP takes the familiar form as it is formulated in terms of a certain decoupled optimality system with an explicit backward adjoint equation—a non-conservative transport PDE.
The first result in this line was obtained in [8] for a particular “bi-level” optimization problem; a natural strategy was to pass to the limit in the usual PMP conditions for conventional control problems obtained by the “finite-agent” approximations in the Dobrushin’s framework. Similar arguments, based on a finite-dimensional approximation and Ekeland’s variational principle, were used in [47] to prove an impulsive version of PMP for a nonlocal transport equation with states being measure-valued curves of bounded variation. For general (non-impulsive) nonlocal transport equations, PMP was first proved by an appropriate extension of the classical technique of needle-shaped control variations for problems without [11] and with [9] additional state-constraints. Another approach, relying on an appropriate linearization of the nonlocal dynamics, was further proposed in [10]. A different method to derive the necessary optimality conditions for mean-field control problems was suggested in [16] exploiting an appropriate generalization of Karush–Kuhn–Tucker conditions. Also, in [19, 34, 50], alternative versions of the mean-field PMP were obtained for stochastic optimal control problems.
1.1 Numerical Solution: Mainstream Approaches and Their Pitfalls
The use of existing analytical methods is limited to the simplest mean-field control problems, while the transition of these results to the numerical context is fraught with critical technical difficulties. Here, PMP would be a promising footing if it were not for a number of significant flaws. The key drawback is due to the mentioned coupling in the Hamiltonian system. The state of such a Hamiltonian equation—a measure on the cotangent bundle of the state space—is always singular, even if the solution of the primal continuity equation—a measure on the state space—has a density. This makes it impossible to solve the Hamiltonian system by the standard numerical schemes and, consequently, the existing forms of PMP do not provide a descent algorithm.
In the finite-dimensional case, a wide range of various direct and indirect numerical methods are described in numerous works. For nonlocal continuity equations, the numerical solution of optimal control problems still remains a burning question, which is principal for the transfer of the mean-field control theory to the practice of control engineering. The mainstream approaches are represented by the following two families:
-
1.
Semi-direct (finite-particle) method: Approximation of the initial distribution by a discrete measure and transformation of a distributed control system to a high-dimensional ODE. The resulting finite-dimensional control problem is solved directly or using special techniques such as, e.g., “random batch” methods [35].
-
2.
Direct method: Total discretization of a nonlocal equation and reduction of a variational problem to mathematical programming.
In practice, both the mentioned approaches typically lead to unsatisfactory results. The first one returns one to a high-dimensional classical optimal control problem followed by the “curse of dimensionality”; in fact, this approach rejects the very heart of the mean-field approximation along with all profits of the statistical averaging, while it draws us back to the need of keeping track of all individual representative of a large population. The second approach leads to a complex (high-dimensional, nonlinear and non-convex) mathematical programming problem, which is not always satisfactory solved even by commercial solvers. Here, the main difficulty is the presence of non-local terms depending on the density distribution over the entire spatial grid making the computations much more demanding. This feature also leads to a dramatic loss in the efficiency of parallelization, since integration steps require interprocessor communications of the “all-to-all” pattern.
In contrast to the classical setting, the bibliography on indirect numeric algorithms for optimal mean-field control is poor. There are only few results [2, 13, 43, 48, 49], all focusing on particular problems, and relying on adequate necessary optimality conditions. The work [43] deals with the so-called “shepard’s problem”, where one has to steer the population of non-interacting individuals to a given target set; the proposed numeric algorithm is based on a specific form of PMP. On the conceptual level, the algorithm [13] (named in the cited paper a “shooting method”) is a variant of the classical Krylov-Chernous’ko algorithm—probably the first indirect algorithm based on PMP in the history of optimal control. The convergence of the algorithm essentially depends on the convexity of the cost functional, and is not guaranteed in general, even for the finite-dimensional case \(\mu _t = \delta _{x(t)}\). An alternative algorithm was proposed in [49] for the linear problem of ensemble control employing an exact formula for the increment of the cost functional and feedback control variations. In [48], a version of the gradient descent method was constructed for a mean-field optimal control problem over a nonlocal Fokker-Planck-Kolmogorov equation modeling interactions in a Kuramoto type model: the first variation of the objective functional and the adjoint equation are obtained by a formal Lagrange method due to the model specifics. Finally, to the best of our knowledge, there are no results of this sort for the general \(\mu \)-nonlinear problem.
1.2 Goals, Contribution, and Organization of the Paper
In the present work, we put forth an indirect numerical method for optimal mean-field control. Namely, we design a PMP-based indirect deterministic numeric algorithm with backtracking line search for a class of optimal ensemble control problems involving nonlocal continuity equations in the space of probability measures. The method can be viewed as an adequate version of the classical gradient descent method, and demonstrates encouraging results in a series of numeric experiments. To our knowledge, this is the first indirect descent algorithm for mean-field control problems, nonlinear in measure.
The derivation of the algorithm is based on a set of new theoretical results, which are of independent interest. First, we derive the linearized form of the original nonlocal transport PDE. In contrast to [10], our arguments apply to nonlocal perturbations of the vector field, and therefore, cover the case, when the control is injected into the nonlocal term of the dynamics. As a byproduct, we compute the first variation of the cost functional within the class of weak variations of the control function. Another contribution is a new, equivalent articulation of PMP, where the Hamiltonian equation on the cotangent bundle of the state space is decoupled into the primal (forward) and dual (backward) parts; the dual systems turns to be a system of nonlocal linear balance laws (continuity equations with sources).
The rest of paper is organized as follows: A statement of the optimal control problem is presented in Sect. 1.3. Section 2 collects the necessary notation, and several noteworthy facts from the topology, analysis, and differential calculus over the space of probability measures. In Sect. 2.6, we introduce the concept of a flow of a nonlocal vector field and calculate a “directional derivative” of the flow along a nonlocal vector field. Sections 3–5 dwell on a simplified version of the stated optimization problem, where the running cost rate is lifted, and the driving vector field is affine in the control variable; this technical simplification is not critical but enables us to shorten the presentation of the main results.
In Sect. 3, we exhibit two standard representations of the increment of the cost functional. The first one is formulated in the language of flows of nonlocal vector fields, while the second formula is written down in terms of the mentioned Hamiltonian system. In Sect. 4, noting that none of these representations are suitable for numerical purposes, we derive the third version of the cost increment, which relies on the notion of adjoint equation. The corresponding numerical algorithm is presented in Sect. 5. We study the convergence of the algorithm, discuss certain principal aspects of its technical implementation and demonstrate its modus operandi by treating a simple but illustrative case, namely, an aggregation problem for a mean-field Kuramoto-type oscillatory model. Finally, in Sect. 6, the obtained results are extended to the general problem, involving the running cost and the nonlinear dependence on the control variable.
1.3 Problem Statement
Given the data \( V:I\times {{\mathbb {R}}}^{n}\times \mathscr {P}_{2}({{\mathbb {R}}}^{n})\times U\rightarrow {{\mathbb {R}}}^{n}\), \(L:I\times {{\mathbb {R}}}^{n}\times \mathscr {P}_{2}({{\mathbb {R}}}^{n})\times U\rightarrow {{\mathbb {R}}}\), \(\ell :\mathscr {P}_{2}({{\mathbb {R}}}^{n})\rightarrow {{\mathbb {R}}},\) consider the following optimal control problem (P) on a fixed finite time interval \(I\doteq [0,T]\):
We assume that control signals are functions \(t \mapsto u(t)\) of time variable only, and take values in a given set \(U \subseteq {{\mathbb {R}}}^m\), i.e., \({\mathscr {U}} \doteq L^\infty (I; U)\), where \(L^\infty \) is equipped with the weak* topology \(\sigma (L^\infty , L^1)\).
Optimization problems of this sort appear in the framework of multi-agent dynamical systems, where the measure \(\mu _t\) represents the spatial distribution of agents at time t. The specified class of controls implies that u acts simultaneously on all agents (one can imagine that we are able to influence a common agents’ environment rather than agents in person). An important example of the nonlocal vector field is
where f models an external force pushing the agents and K stands for their internal interaction. Typical terminal cost functionals are
Here, \( \ell _{1} \) represents the potential (l) and interaction (W) energy terms, while \( \ell _{2} \) is related to the averaged control problem [51], where the goal is to bring the expectation of the distribution \(\mu \) to some target position \(m_T\). Finally, common versions of running cost term are
\( L_1 \) represents the “total energy” of the control action, and \( L_2 \) captures the problem of following a desired path \( t\mapsto m(t) \).
2 Preliminaries
In this section, we introduce some notations, and recall several useful facts from analysis on the metric space of probability measures.
2.1 Notation
Throughout the paper, we use the following notation:
-
\( |\cdot |\) the Euclidean norm on \( \mathbb {R}^{n} \).
-
\( {\varvec{B}}_r\subset \mathbb {R}^{n} \) the closed unit ball of radius r centered at the origin.
-
\( f_{\sharp }\mu \) pushforward measure for \( \mu \in {\mathscr {P}}(\mathbb {R}^n) \) and a Borel function \( f:\mathbb {R}^n\rightarrow \mathbb {R}^m \).
-
\(\textrm{spt}\mu \) the support of a measure \(\mu \).
-
\( \mathbb {M}^{m,n} \) the space of matrices A with m rows and n columns.
-
\(x =\begin{pmatrix} x^{1}\\ \vdots \\ x^{n} \end{pmatrix}\) an n-dimensional column vector, i.e., \( x\in \mathbb {M}^{n,1} = {{\mathbb {R}}}^{n}\).
-
\(p = \begin{pmatrix} p_{1}&\cdots&p_{n} \end{pmatrix}\) an n-dimensional row vector, i.e., \( p \in \mathbb {M}^{1,n}=({{\mathbb {R}}}^{n})^{*} \).
-
A vector field f on \( {{\mathbb {R}}}^{n} \) is a family of n real-valued functions \( f^{i}=f^{i}(t,x) \), \( i=1,\ldots ,n \).
-
A vector field f on \( {{\mathbb {R}}}^{n}\times ({{\mathbb {R}}}^{n})^{*} \) is a family of 2n real-valued functions \( f^{i}=f^{i}(t,x,p) \), \( f_{i}=f_{i}(t,x,p) \), \( i=1,\ldots ,n \).
-
\( \textrm{div}_x f = \sum _{i=1}^{n} \partial _{x^{i}}f^{i} \) divergence of the vector field \( f=f(t,x) \) in x.
-
\( \textrm{div}_{(x,p)} f=\sum _{i=1}^{n}\left( \partial _{x^{i}}f^{i}+\partial _{p_{i}}f_{i}\right) \) divergence of the vector field \( f=f(t,x,p) \) in (x, p) .
-
\(D_xf = \begin{pmatrix} \partial _{x^{1}}f^{1}&{}\cdots &{}\partial _{x^{n}}f^{1}\\ \vdots &{}\ddots &{}\vdots \\ \partial _{x^{1}}f^{n}&{}\cdots &{}\partial _{x^{n}}f^{n} \end{pmatrix}\) derivative of the vector field \( f=f(t,x) \) in x.
-
\(\nabla _{x}\psi = \begin{pmatrix} \partial _{x^{1}}\psi&\cdots&\partial _{x^{n}}\psi \end{pmatrix}\) gradient of a real-valued function \( \psi =\psi (t,x,p) \) in x.
-
\(\nabla _{p}\psi = \begin{pmatrix} \partial _{p_{1}}\psi \\ \vdots \\ \partial _{p_{n}}\psi \end{pmatrix}\) gradient of a real-valued function \( \psi =\psi (t,x,p) \) in p.
Below, we will also deal with vector measures whose values belong to \( \mathbb {M}^{1,n} \), i.e., \(\nu = \begin{pmatrix} \nu _{1}&\cdots&\nu _{n} \end{pmatrix}\), where \( \nu _{1},\ldots ,\nu _{n} \) are Radon measures on \( {{\mathbb {R}}}^{n} \). Given \( \varphi :{{\mathbb {R}}}^{n}\rightarrow {{\mathbb {R}}}^{n} \), we set \( \displaystyle \langle \nu ,\varphi \rangle = \int \varphi \cdot \,d\nu \doteq \sum _{i=1}^{n} \int \varphi ^{i}\,d\nu _{i}\).
Let X be a Polish space. From measures on X one can construct several important topological spaces: \( {\mathscr {M}}(X)\supset {\mathscr {P}}(X) \supset {\mathscr {P}}_{2}(X) \supset {\mathscr {P}}_{c}(X). \) Here \( \mathscr {M}(X) \) consists of all signed Radon measures, \( {\mathscr {P}}(X) \) of all probability measures, \( {\mathscr {P}}_{2}(X) \) of all probability measures with finite second moments, \( {\mathscr {P}}_{c}(X) \) of all compactly supported probability measures. Below, the Wasserstein distance [1] on \( {\mathscr {P}}_{2}(X) \) is always denoted by \( W_{2} \).
Given a Radon measure \( \mu \) on \( \mathbb {R}^n \), denote by \( L_{\mu }^{p}(\mathbb {R}^{n};\mathbb {R}^{m}) \) the space of all \( \mu \)-measurable maps (equivalence classes) \( f:{{\mathbb {R}}}^{n}\rightarrow {{\mathbb {R}}}^{m} \) such that \( \Vert f\Vert _{L^p_\mu }\doteq \left( \int \vert f \vert ^{p}\,d\mu \right) ^{1/p}<\infty \). If \( \mu \) is the n-dimensional Lebesgue measure \( {\mathscr {L}}^{n} \), we simply write \( L^p(\mathbb {R}^{n};\mathbb {R}^{m}) \).
2.2 The Space \(\mathscr {P}_{c}({\mathbb {R}}^{n})\) and Functions of Probability Measures
The role of the main arena of our paper will be played by the space \( \mathscr {P}_{c}({{\mathbb {R}}}^{n}) \) endowed with the so-called final topology.
Definition 2.1
Let \(({\mathscr {X}}_n, \tau _n)\) be a sequence of topological spaces such that \({\mathscr {X}}_n\subset {\mathscr {X}}_{n+1}\) with continuous inclusion on every n. Let \({\mathscr {X}} = \cup _n{\mathscr {X}}_n\). The final topology is the strongest topology \(\tau \) on \({\mathscr {X}}\) which lets the inclusions \(\textrm{id}_n:{\mathscr {X}}_n\rightarrow {\mathscr {X}}\) be continuous for every n.
In our case, \(({\mathscr {X}}_n,\tau _n) \doteq ({\mathscr {P}}({\varvec{B}}_n),W_2)\) and \({\mathscr {X}} \doteq {\mathscr {P}}_c({\mathbb {R}}^n)\). The final topology \( \tau \) on \( \mathscr {P}_{c}({{\mathbb {R}}}^{n}) \) enjoys the following properties [33]:
-
\(\mu _n \xrightarrow {\tau } \mu \) if and only if \(\mu _n\xrightarrow {W_{2}} \mu \) in \(\mathscr {P}({\varvec{B}}_N)\) for some N,
-
if \({\mathscr {K}} \subset {\mathscr {P}}_c({\mathbb {R}}^n)\) is compact, then \({\mathscr {K}}\subset \mathscr {P}({\varvec{B}}_N)\) for some N,
-
\(\tau \) is a Hausdorff topology but it is not induced by any distance.
Below, we will constantly deal with mappings \( \Phi :I\times {{\mathbb {R}}}^{n}\times \mathscr {P}_{c}({{\mathbb {R}}}^{n})\rightarrow {{\mathbb {R}}}^{m} \) of a particular regularity. Recall the respective
Definition 2.2
Let \( \Phi \) be a map \(I\times {{\mathbb {R}}}^{n}\times \mathscr {P}_{c}({{\mathbb {R}}}^{n})\rightarrow {{\mathbb {R}}}^{m} \). We say that
-
1.
\( \Phi \) is a Carathéodory map if and only if \( t\mapsto \Phi (t,x,\mu ) \) is measurable for each \( (x,\mu ) \), and \( (x,\mu )\mapsto \Phi (t,x,\mu ) \) is sequentially continuous for each t.
-
2.
\( \Phi \) is locally bounded if its restriction on any compact subset of \( I\times {{\mathbb {R}}}^{n}\times \mathscr {P}_{c}({{\mathbb {R}}}^{n}) \) is bounded.
-
3.
\( \Phi \) is locally Lipschitz if and only if, for each t, the restriction of \( (x,\mu )\mapsto \Phi (t,x,\mu ) \) to any compact set \( \mathscr {K} \subset {{\mathbb {R}}}^{n}\times \mathscr {P}_{c}({{\mathbb {R}}}^{n}) \) is Lipschitz with some constant \( L_{\mathscr {K}} \), independent of t.
-
4.
\( \Phi \) is sublinear if and only if there exists \( C>0 \) such that \( \left|\Phi (t,x,\mu )\right|\le C \left( 1+\vert x\vert \right) \) for all t, x, \( \mu \).
Thanks to the outlined properties of the final topology, the definitions of the local boundedness and local Lipschitzianity can be given in the following equivalent way:
- 2\( ' \).:
-
\( \Phi \) is locally bounded if and only if, for any compact \( \Omega \subset {{\mathbb {R}}}^{n} \), there exists \( C_{\Omega }>0 \) such that \( \left|\Phi (t,x,\mu )\right|\le C_{\Omega } \) for all \( t\in I \), \( x\in \Omega \), \( \mu \in \mathscr {P}(\Omega ) \);
- 3\( ' \).:
-
\( \Phi \) is locally Lipschitz if and only if, for any compact \( \Omega \subset {{\mathbb {R}}}^{n} \), there exists \( L_{\Omega }>0 \) such that \( \left|\Phi (t,x,\mu ) - \Phi (t,x',\mu ')\right|\le L_{\Omega } \left( \vert x-x'\vert + W_{2}(\mu ,\mu ') \right) \) for all \( t\in I \), \( x,x'\in \Omega \), \( \mu ,\mu '\in \mathscr {P}(\Omega ) \).
2.3 Derivatives in the Space of Probability Measures
There are several concepts of derivative of a function \({\mathscr {P}} \rightarrow {{\mathbb {R}}}\). In this paper, we shall employ the notion of “intrinsic derivative” [18].
Definition 2.3
(\(\mathscr {C}^{1}\) maps) A function \( F :\mathscr {P}_{c}({{\mathbb {R}}}^{n})\rightarrow {{\mathbb {R}}}\) is said to be of class \( \mathscr {C}^{1} \) if and only if there exists a sequentially continuous, locally bounded map \( \frac{\delta F}{\delta \mu }:{\mathscr {P}}_{c}({{\mathbb {R}}}^{n})\times {{\mathbb {R}}}^{n}\rightarrow {{\mathbb {R}}}\) such that
Since \( \frac{\delta F}{\delta \mu } \) is defined up to an additive constant, we adopt the normalization convention
Definition 2.4
Let \( \frac{\delta F}{\delta \mu } \) be \( \mathscr {C}^{1} \) in y. Then the intrinsic derivative \( D_{\mu }F:\mathscr {P}_{c}({{\mathbb {R}}}^{n})\times {{\mathbb {R}}}^{n}\rightarrow {{\mathbb {R}}}^{n} \) is defined by \( D_{\mu }F \doteq D_{y}\frac{\delta F}{\delta \mu } \).
Some important properties of the intrinsic derivative are gathered in the following proposition, which combines the statements of Propositions 2.2\(-\)2.4 from [17].
Proposition 2.1
Let \( F:\mathscr {P}_{c}({{\mathbb {R}}}^{n})\rightarrow {{\mathbb {R}}}\) be \( \mathscr {C}^{1} \), \( \frac{\delta F}{\delta \mu } \) be \( \mathscr {C}^{1} \) in y, and \( D_{\mu }F \) be sequentially continuous and locally bounded. Then, the following holds:
-
1.
For any Borel measurable, locally bounded map \( \varphi :{{\mathbb {R}}}^{n}\rightarrow {{\mathbb {R}}}^{n} \), the function \( s\mapsto F\left( (\textrm{id}+s\varphi )_{\sharp }\mu \right) \) is differentiable at zero, and
$$\begin{aligned} \frac{d}{ds}\Big \vert _{s=0}F\left( (\textrm{id}+s\varphi )_{\sharp }\mu \right) = \int D_{\mu }F(\mu ,y)\cdot \varphi (y)\,d\mu (y). \end{aligned}$$(5) -
2.
Given a compact set \( \Omega \subset {\mathbb {R}}^n \), the restriction of F to \( \mathscr {P}(\Omega ) \) satisfies
$$\begin{aligned}{} & {} \left|F(\mu ') - F(\mu ) - \iint D_{\mu }F(\mu ,y)\cdot (y-x)\,d\Pi (x,y) \right|\\ {}{} & {} \le o \left( \left( \iint \vert x-y\vert ^{2}\,d\Pi (x,y) \right) ^{1/2} \right) , \end{aligned}$$for any \( \mu ,\mu '\in \mathscr {P}(\Omega ) \) and any transport plan \( \Pi \) between \( \mu \) and \( \mu ' \).
-
3.
The quantity \( \frac{\delta F}{\delta \mu } \) can be calculated as follows:
$$\begin{aligned} \frac{\delta F}{\delta \mu }(\mu ,y) = \lim _{h\rightarrow 0+}\frac{1}{h}\left( F\left( (1-h)\mu +h\delta _{y}\right) -F(\mu )\right) . \end{aligned}$$
The first property links the intrinsic derivative with a “directional” derivative, where \( \varphi \) plays the role of direction. The second one relates the notion of intrinsic derivative with the so-called localized Wasserstein derivative [10]:
Definition 2.5
(localized Wasserstein derivative) We say that \( F:\mathscr {P}_{2}({{\mathbb {R}}}^{n})\rightarrow {{\mathbb {R}}}\) is locally differentiable at \( \mu \in \mathscr {P}_{2}({{\mathbb {R}}}^{n}) \) if there exists a tangent vector \( \xi \in \textrm{Tan}_{\mu }{\mathscr {P}}_{2}({{\mathbb {R}}}^{n}) \) such that, for any compact set \( \Omega \supset \textrm{spt}\mu \), the restriction of F to \( \mathscr {P}(\Omega ) \) satisfies
for any \( \mu '\in {\mathscr {P}}(\Omega ) \) and any transport plan \( \Pi \) between \( \mu \) and \( \mu ' \). Such \( \xi \) is uniquely defined and called the localized Wasserstein derivative of F at \( \mu \).
Recall that the tangent space \(\textrm{Tan}_{\mu }{\mathscr {P}}_{2}({{\mathbb {R}}}^{n})\) to \({\mathscr {P}}_{2}({{\mathbb {R}}}^{n})\) at \( \mu \in \mathscr {P}_{2}({{\mathbb {R}}}^{n}) \) is introduced as
Proposition 2.1 says that any \( \mathscr {C}^{1} \) functional on the space of probability measures with sequentially continuous and locally bounded intrinsic derivative \( D_{\mu }F \) is locally differentiable at any \( \mu \in \mathscr {P}_{c}({{\mathbb {R}}}^{n}) \), and the projection of \( D_{\mu }F(\mu ,\cdot ) \) onto \( \textrm{Tan}_{\mu }{\mathscr {P}}_{2}({{\mathbb {R}}}^{n}) \) coincides with the corresponding localized Wasserstein derivative.
The third assertion of Proposition 2.1 offers a convenient tool for practical calculation of the intrinsic derivative. We illustrate this machinery with the use of the following paradigmatic example.
Example 1
Let \( K:{{\mathbb {R}}}^{n}\rightarrow {{\mathbb {R}}}^{n} \) be a \( \mathscr {C}^{1} \) map. Fixed \( x\in {{\mathbb {R}}}^{n} \), let us compute the intrinsic derivative of the functional \(\displaystyle \mu \mapsto F(x,\mu ) \doteq (K*\mu )(x) \doteq \int K(x-y)\,d\mu (y).\) By observing that
the flat derivative is easily found as
which gives: \( D_{\mu }F(x,\mu ,y) = D_{y}\frac{\delta F}{\delta \mu }(x,\mu ,y) = -DK(x-y). \)
Recall another useful fact:
Lemma 2.1
Let F be the same as in Proposition 2.1. Then F is locally Lipschitz.
Proof
Fix a compact set \( \Omega \) and two measures \( \mu ,\mu '\in \mathscr {P}(\Omega ) \). Denote by \( \Pi \) an optimal plan between \( \mu \) and \( \mu ' \) and let \( \mu _{t}=(1-t)\mu +t\mu ' \). Then, we have
The difference in the squared brackets is
Hence the statement follows from the local boundedness of \( D_{\mu }F = D_{y}\frac{\delta F}{\delta \mu } \). \(\square \)
Definition 2.6
We say that \( F:\mathscr {P}_{c}({{\mathbb {R}}}^{n})\rightarrow {{\mathbb {R}}}\) is of class \( \mathscr {C}^{1,1} \) if F is \( \mathscr {C}^{1} \), \( \frac{\delta F}{\delta \mu } \) is \( \mathscr {C}^{1} \) in y, and the intrinsic derivative \( D_{\mu }F \) is locally Lipschitz and locally bounded.
2.4 Nonlocal Vector Fields and Their Flows
A time-dependent nonlocal vector field is a map \(V:I\times {{\mathbb {R}}}^{n}\times \mathscr {P}_{c}({{\mathbb {R}}}^{n})\rightarrow {{\mathbb {R}}}^{n} \). If the dependence on \(\mu \in \mathscr {P}_{c}({{\mathbb {R}}}^{n})\) is fictitious we say that V is a local vector field (or simply “vector field”). The basic regularity of nonlocal vector fields is understood in the sense of Definition 2.2.
It is well-known that the local transport PDEs can be studied using their characteristic flows. Recall the following
Definition 2.7
We say that a vector field \( v:I\times {{\mathbb {R}}}^{n}\rightarrow {{\mathbb {R}}}^{n} \) is of class \( \mathscr {C}^{1,1} \) if
-
1.
v is a locally bounded Carathéodory map;
-
2.
v is \( \mathscr {C}^{1} \) in x for each t;
-
3.
\( D_{x}v:I\times {{\mathbb {R}}}^{n}\rightarrow \mathbb {M}^{n,n} \) is Carathéodory, locally bounded and locally Lipschitz.
Any sublinear \( {\mathscr {C}}^{1,1} \) vector field v generates a unique continuous map \( P:I\times I\times {{\mathbb {R}}}^{n}\rightarrow {{\mathbb {R}}}^{n} \) named the flow of \( v_{t} \); this map is defined such that, for each \( t_{0}\in I \) and \( x\in {{\mathbb {R}}}^{n} \), \(t \mapsto P_{t_{0},t}(x)\) is as a solution of the Cauchy problem
For any \( t_{0},t\in I \) the map \( P_{t_{0},t}:{{\mathbb {R}}}^{n}\rightarrow {{\mathbb {R}}}^{n} \) is a \( \mathscr {C}^{2} \) diffeomorphism. Moreover, it satisfies the semigroup property: \(P_{t_{1},t_{2}}\circ P_{t_{0},t_{1}} = P_{t_{0},t_{2}}\) for all \(t_{0},t_{1},t_{2}\in I.\)
In fact, the concept of flow can be extended to the case of nonlocal vector fields. To this end, we modify Definition 2.7 as follows:
Definition 2.8
We say that a nonlocal vector field \( V:I\times {{\mathbb {R}}}^{n}\times \mathscr {P}_{2}({{\mathbb {R}}}^{n})\rightarrow {{\mathbb {R}}}^{n} \) is of class \( \mathscr {C}^{1,1} \) if
-
1)
V is a locally bounded Carathéodory map;
-
2)
V is \( \mathscr {C}^{1} \) in x for each t and \( \mu \), and \( \mathscr {C}^{1} \) in \( \mu \) for each t and x;
-
3)
both \( D_{x}V:I\times {{\mathbb {R}}}^{n}\times \mathscr {P}_{c}({{\mathbb {R}}}^{n})\rightarrow \mathbb {M}^{n,n} \) and \( D_{\mu }V:I\times {{\mathbb {R}}}^{n}\times \mathscr {P}_{c}({{\mathbb {R}}}^{n})\times {{\mathbb {R}}}^{n}\rightarrow \mathbb {M}^{n,n} \) are Carathéodory, locally bounded and locally Lipschitz.
Now, observe that any sublinear \( \mathscr {C}^{1,1} \) nonlocal vector field V generates a unique sequentially continuous function \( X:I \times I\times {{\mathbb {R}}}^{n}\times \mathscr {P}_{c}({{\mathbb {R}}}^{n})\rightarrow {{\mathbb {R}}}^{n} \) such that, for each \( x\in {{\mathbb {R}}}^{n} \) and \( \vartheta \in \mathscr {P}_{2}({{\mathbb {R}}}^{n}) \), \(t \mapsto X^{\vartheta }_{t_0,t}(x)\) is a solution of the ODE
We abbreviate \(X^\vartheta _t = X^\vartheta _{0,t}\) and stress that \( \mu _{t}=(X^{\vartheta }_{t})_\sharp \vartheta \) is the unique solution of the nonlocal continuity equation
We call the map X the flow of the nonlocal vector field V.
Notice that, for a given \( \vartheta \), we can define \( v_{t}(x)\doteq V_{t}(x,X^{\vartheta }_{t\sharp }\vartheta ) \) and denote by P the flow of v. It is clear that \( X^{\vartheta }_{0,t} = P_{0,t} \). We will use this fact below several times.
The outlined facts (existence of the flow, well-posedness of the nonlocal continuity equation, and the representation formula for its solution) are well-known, refer, e.g., to [11, 29, 41].
2.5 \(\mathscr {O}_\textrm{loc}(\lambda ^{2})\) Families of Vector Fields
In this section, we discuss some differential properties of nonlocal vector fields and their flows.
Definition 2.9
Let \( \Phi ^{\lambda }:\mathscr {X} \mapsto {{\mathbb {R}}}^{m} \), \( \lambda \in [0,1] \), be a family of functions on a topological space \( \mathscr {X} \). We say that \( \Phi ^{\lambda } \) is \( \mathscr {O}_\textrm{loc}(\lambda ^{2}) \) family and write
if for any compact set \(\mathscr {K} \subset \mathscr {X} \) there exists \( C_{\mathscr {K}}>0 \) such that \( \left|\Phi ^{\lambda }(x)\right|\le C_{\mathscr {K}}\lambda ^{2} \) for all \( \lambda \in [0,1] \) and \( x\in \mathscr {K} \).
In particular, a family of nonlocal vector fields \( V^{\lambda }:I\times {{\mathbb {R}}}^{n}\times \mathscr {P}_{c}({{\mathbb {R}}}^{n})\rightarrow {{\mathbb {R}}}^{n}\) is \( \mathscr {O}_\textrm{loc}(\lambda ^{2}) \) if, for any compact \( \Omega \subset {{\mathbb {R}}}^{n} \), there exists \( C_{\Omega }>0 \) such that \( \left| V^{\lambda }_{t}(x,\mu )\right| \le C_{\Omega }\lambda ^{2} \), for all \( \lambda \in [0,1] \), \( t\in I \), \( x\in \Omega \), \( \mu \in \mathscr {P}(\Omega ) \).
Lemma 2.2
Let V be a nonlocal vector field of class \( \mathscr {C}^{1,1} \). Then, for any locally bounded \( \varphi ,\psi :{{\mathbb {R}}}^{n}\rightarrow {{\mathbb {R}}}\), one has
Moreover, the constant \( C_{\Omega } \) that guaranties the estimate
depends only on the data
Proof
We split the proof into several steps.
1. Fix a compact set \( \Omega \) and a triple \( (t,x,\mu )\in I\times \Omega \times \mathscr {P}(\Omega ) \). Consider the identity:
By the mean value theorem, the first difference in the right-hand side takes the form
and the second one yields
where \( \mu _{\lambda ,\tau } = (1-\tau )\mu +\tau (\textrm{id}+\lambda \psi )_{\sharp }\mu \).
2. Let r be as in (6). Then \( x +\lambda \varphi (x) \in {\varvec{B}}_{r} \) and \( (\textrm{id}+\lambda \psi )_{\sharp }\mu \in \mathscr {P}({\varvec{B}}_{r}) \) for all \( \lambda \in [0,1] \), \( x\in \Omega \), \( \mu \in \mathscr {P}(\Omega ) \). Since \( D_{x}V \) and \( D_{\mu }V \) are locally Lipschitz,
3. Let us estimate \(W_{2} \left( \mu ,\mu _{\lambda ,\tau } \right) \). To this end, recall that
for all \( \mu _{0},\mu _{1},\nu \in \mathscr {P}_{2}({{\mathbb {R}}}^{n})\) and all \( \tau \in [0,1] \). This inequality becomes evident if we note that, for any \( \Pi _{0}\in \Gamma _{o}(\mu _{0},\nu ) \) and \( \Pi _{1}\in \Gamma _{o}(\mu _{1},\nu ) \), the convex combination \( (1-\tau )\Pi _{0}+\tau \Pi _{1} \) is a transport plan between \( (1- \tau )\mu _{0}+\tau \mu _{1}\) and \( \nu \). In our case, (9) implies that
The statement now follows from (7), (8) and the inequalities \( \vert \varphi \vert \le r \), \( \vert \psi \vert \le r \) on \( \Omega \). \(\square \)
Arguments, similar to those of the previous proof, lead to the following slight modification of Lemma 2.2.
Lemma 2.3
Let V be a nonlocal vector field of class \( \mathscr {C}^{1,1} \) and \( X:I\times {{\mathbb {R}}}^{n}\times \mathscr {P}_{c}({{\mathbb {R}}}^{n})\rightarrow {{\mathbb {R}}}^{n}\) be a sequentially continuous and locally bounded map such that \( x\mapsto X^{\mu }_{t}(x)\) is bijective for all t and \( \mu \). Then, for any locally bounded Carathéodory maps \( \varphi ,\psi :I\times {{\mathbb {R}}}^{n}\times \mathscr {P}_{c}({{\mathbb {R}}}^{n})\rightarrow {{\mathbb {R}}}^{n} \), we have
Moreover, the constant \( C_{\Omega } \) which guaranties the estimate
depends only on
and \( L_{r} \) which bounds, for all t, the Lipschitz constants of \( D_{x}V_{t} \) and \( D_{\mu }V_{t} \):
The following presents a refined version of the formula (5) for the intrinsic derivative.
Lemma 2.4
Let \( F:\mathscr {P}_{c}({{\mathbb {R}}}^{n})\rightarrow {{\mathbb {R}}}\) be of class \( \mathscr {C}^{1,1} \), and \( \Phi ^{\lambda }:{{\mathbb {R}}}^{n}\rightarrow {{\mathbb {R}}}^{n} \), \( \lambda \in [0,1] \), be a family of Borel maps which can be expanded as follows:
for some \( \varphi :\mathbb {R}^n\rightarrow \mathbb {R}^n \). Then,
Proof
In view of Lemma 2.3, it suffices to show that
According to Lemma 2.1, F is locally Lipschitz. Hence, for any compact \( \Omega \subset {{\mathbb {R}}}^{n} \), there exists \( L_{\Omega }>0 \) such that
for all \( \mu \in \mathscr {P}(\Omega ) \). Now, by using (11), we complete the proof. \(\square \)
2.6 Derivative of the Flow
Recall that, for a fixed initial measure, any sublinear \( \mathscr {C}^{1,1} \) nonlocal vector field (n.v.f.) V generates a map X that can be thought of as its flow. We shall study the flow \( X^{\lambda } \) of the perturbed n.v.f. \( V^{\lambda } = V + \lambda W \), where W is also sublinear \( \mathscr {C}^{1,1} \), and \( \lambda \in [0,1] \).
The results of this section, which provide the linearization of the nonlocal flow, are largely similar to those of [10, sec. 3.2] (both in their statements and their proofs). However, in contrast to [10], we accept here nonlocal perturbations of the vector field. On the other hand, we impose slightly more restrictive assumptions, enabling us to expand the nonlocal flow up to the term of order \(O(\lambda ^2)\) rather than \(o(\lambda )\) as demonstrated in [10]. This fact will play a crucial role in establishing the convergence of our numerical algorithm in Sect. 5.1.
Theorem 2.1
Let V, W be sublinear \( \mathscr {C}^{1,1} \) nonlocal vector fields, X be the flow of V and \( X^{\lambda } \) be the flow of \( V^{\lambda }\doteq V+\lambda W \), where \( \lambda \in [0,1] \). Then
where \( w:I\times \mathbb {R}^n \times {\mathscr {P}}_c(\mathbb {R}^n)\rightarrow \mathbb {R}^n \) satisfies the differential equation
and the initial condition
Moreover, the constant \( C_{\Omega } \) which guaranties the estimate
depends only on the constants \( \rho \), \( C_{\rho } \), r, \( L_{r} \) defined below by (14), (15), (17), (18).
Remark 2.1
First, notice that \( \vartheta \) in (12) can be considered as a parameter. Thus, (12) can be thought of as “linear transport equation with nonlocal source term”. One can easily show (for example, by fixed-point arguments) that (12), (13) has a unique continuous solution w (see also [10, 11], where such solution is constructed explicitly for the case \( W_{t}(x,\mu )\equiv W_{t}(x) \)). Moreover, w is sequentially continuous as a function of t, x, \( \vartheta \).
Before presenting the proof, note that our assumptions on V and W imply that there exists \( C>0 \) such that \( \left| V^{\lambda }_{t}(x,\mu )\right| \le C\left( 1+\vert x\vert \right) \), for all t, x, \( \mu \), \( \lambda \). This means that \( \left| X^{\lambda ,\vartheta }_{t}(x) \right| \le e^{Ct}(Ct+\vert x\vert ) \) for all t, x, \( \vartheta \), \( \lambda \). As a consequence, \( (t,x,\vartheta )\mapsto \left( t,X^{\vartheta ,\lambda }_{t}(x), X^{\vartheta ,\lambda }_{t\sharp }\vartheta \right) \) maps \( I\times \Omega \times \mathscr {P}(\Omega ) \) into \( I\times {\varvec{B}}_{\rho }\times \mathscr {P}({\varvec{B}}_{\rho }) \), where
Using the local boundedness of \( D_{x}V \), \( D_{\mu }V \) and W, we can find \( C_{\rho }>0 \) such that
Now, it follows from (12), (13) that
This implies that \( (t,x,\vartheta )\mapsto \left( t,(X^{\lambda ,\vartheta }_{t}+\lambda w^{\vartheta }_{t})(x), (X^{\lambda ,\vartheta }_{t}+\lambda w^{\vartheta }_{t})_{\sharp }\vartheta \right) \) maps \( I\times \Omega \times \mathscr {P}(\Omega ) \) into \( I\times {\varvec{B}}_{r}\times \mathscr {P}({\varvec{B}}_{r}) \), where
Finally, since \( V^{\lambda } \), \( D_{x}V^{\lambda } \) and \( D_{\mu }V^{\lambda } \) are locally Lipschitz, we choose \( L_{r}>0 \) such that
for all \( t\in I \) and \( \lambda \in [0,1] \).
Fix a compact set \( \Omega \subset {{\mathbb {R}}}^{n} \) and a measure \( \vartheta \in \mathscr {P}(\Omega ) \). From now on, we will omit the index \( \vartheta \) in \( X^{\lambda ,\vartheta }_{t} \) and \( w_{t}^{\vartheta } \). Consider the following set:
and equip it with the norm \( \Vert \varphi \Vert _{\sigma } = \max _{I\times \Omega } e^{-\sigma t}\vert \varphi _{t}(x)\vert \), \(\sigma >0\). Since \( \Vert \cdot \Vert _{\sigma } \) is equivalent to the standard \( \sup \) norm, \( \mathscr {X}(\Omega ) \) becomes a complete metric space.
Finally, for any \( \lambda \in [0,1] \) and \( \varphi \in \mathscr {X}(\Omega ) \), we define
One can easily check that \( \mathscr {F} \) maps \( [0,1]\times \mathscr {X}(\Omega ) \) to \( \mathscr {X}(\Omega ) \).
Lemma 2.5
The map \( \varphi \mapsto \mathscr {F}(\lambda ,\varphi ) \) is contractive in the \( \sigma \)-norm for all sufficiently large \( \sigma \). Moreover, the corresponding Lipschitz constant \( \kappa < 1 \) does not depend on \( \lambda \).
Proof
Let r be defined by (17). Given \( \varphi ,\psi \in \mathscr {X}(\Omega ) \), we have
for any \( t\in I \), \( x\in \Omega \), \( \lambda \in [0,1] \). Since \(\left\| \varphi _{\tau }-\psi _{\tau }\right\| _{L^{2}_{\vartheta }}\le \left\| \varphi _{\tau }-\psi _{\tau }\right\| _{\mathscr {C}^{0}(\Omega ;{{\mathbb {R}}}^{n})},\) we obtain:
Then, for all \( t\in I \),
which means that \( \mathscr {F}(\lambda ,\cdot ) \) is contractive for any \( \sigma >2L_{r} \). \(\square \)
Proof of Theorem 2.1
Let \(\sigma \) be chosen so that \(\sigma >2L_r\) as in the proof of Lemma 2.5. By definition, \( X^{\lambda } \) is a fixed point of \( \mathscr {F}(\lambda ,\cdot ) \) for any \( \lambda \in [0,1] \). Therefore, by Theorem A.2.1 in [15],
where \( \kappa \doteq 2L_{r}/\sigma < 1 \).
It remains to estimate the right-hand side of (19). Since \( X = \mathscr {F}(0,X) \), we obtain
Lemma 2.3 demonstrates that the first integrand is equal to
and the second one can be rewritten as \( \lambda W_{\tau } \left( X_{\tau }(x), X_{\tau \sharp }\vartheta \right) + \mathscr {O}_\textrm{loc}(t,x,\vartheta ;\lambda ^{2})\). Now, the statement follows from (12). The fact that \( C_{\Omega } \) depends only on \( \rho \), \( C_{\rho } \), r, \( L_{r} \) is the consequence of (17), (18) and the second part of Lemma 2.3. \(\square \)
3 Increment Formula
Now, we turn to the analysis of the increment of the cost functional along an adequate class of control variations. The theory of Pontryagin’s maximum principle is commonly built around the class of needle-shaped variation. However, for the specified control-affine case, the latter can be replaced by a simpler class of weak control variations.
3.1 Problem Specification
In this section, in order to simplify the presentation, we assume that the driving vector field V is affine in control variable u, i.e.,
and the running cost is identically zero, i.e., \(L \equiv 0\). Later, in Sect. 6 we will discuss how to deal with the general case. We begin by listing our basic assumptions.
Assumption \(({\varvec{A}}_{1})\):
-
1.
V takes the form (20), where all \( V^{j} \) with \( 0\le j\le m \) are of class \( \mathscr {C}^{1,1} \);
-
2.
\( U\subset {{\mathbb {R}}}^{m} \) is compact and convex;
-
3.
\( \ell :\mathscr {P}_{c}({{\mathbb {R}}}^{n})\rightarrow {{\mathbb {R}}}\) is of class \( \mathscr {C}^{1,1} \).
Assumption \( ({\varvec{A}}_{2}) \): all maps \( D_{x}V^{j} \) with \( 0\le j\le m \) are continuously differentiable in x and their derivatives are locally bounded.
Remark 3.1
We use Assumption \( ({\varvec{A}}_{2}) \) only once: to show that, for any fixed control function \( u\in \mathscr {U} \), the solution w of (12), (13) which corresponds to \( V_{t}(x,\mu )\doteq V_{t}(x,\mu ,u(t)) \) is \( \mathscr {C}^{1} \) in x. Indeed, \( t\mapsto w^{\vartheta }_{t}(x) \) satisfies the ODE: \(\displaystyle \frac{d}{dt}w_{t} = A(t,x)w_{t} + b(t,x),\) where both functions
are continuously differentiable. Hence, w is \( {\mathscr {C}}^1 \) in x, according to the standard ODE theory.
3.2 Increment Formula I
Further in this section, \( \vartheta \) is supposed to be fixed, so we will omit it when writing the arguments X and w.
Let us fix a pair of control functions \( u,{\bar{u}}\in \mathscr {U} \), \(u \ne {\bar{u}}\). We call u a reference control and \( {\bar{u}} \) a target control. A weak variation of u towards \( {\bar{u}} \) is the convex combination
In view of (20), the variation (21) implies the following perturbation of the reference vector field \( V_{t}(x,\mu ) \doteq V_{t}\left( x,\mu ,u(t)\right) \):
Note that, by Assumption \( ({\varvec{A}}_{1}) \), there exists \( C>0 \) such that \( \left| V^{\lambda }_{t}(x,\mu ) \right| \le C(1+\vert x\vert ) \), for all t, x, \( \mu \), \( \lambda \), u, \( {\bar{u}} \). This means that \( \rho \) from (14) can be chosen independently from \( u, {\bar{u}}\in \mathscr {U} \). Again, by Assumption \( ({\varvec{A}}_{1}) \), we can find \( C_{\rho } \) which guarantees, for all \( u,{\bar{u}}\in \mathscr {U} \), the estimate (15), then construct r by (17) and find \( L_{r} \) such that (18) holds for all \( u,{\bar{u}}\in \mathscr {U} \). Now, Theorem 2.1 implies that
where w is a solution of (12), (13). Here, we think of \( \mathscr {U} \) as a compact topological space equipped with the weak-\( * \) topology \( \sigma (L^{\infty },L^{1}) \).
Since \( \mathscr {I}[u] = \ell (X_{T\sharp }\vartheta ) \) and \( \mathscr {I}[u^{\lambda }] = \ell (X^{\lambda }_{T\sharp }\vartheta ) \) and \( \vartheta \) is fixed, we can use Lemma 2.4 to get
Proposition 3.1
Under assumptions \( ({\varvec{A}}_{1}) \), \( ({\varvec{A}}_{2}) \) one has
where w is a solution of the linear problem (12), (13).
Here we write \( \mathscr {O} \) instead of \( \mathscr {O}_\textrm{loc}\) because \( \mathscr {U} \) is already compact.
Our next goal is to rewrite this formula in a “constructive” form, namely, in terms of a Hamiltonian system associated to our optimal control problem.
3.3 Hamiltonian System
The Hamiltonian system associated with Problem (P) (see (1)-(3)) is merely a continuity equation on the cotangent bundle of \( {{\mathbb {R}}}^n \), i.e., on the space \({{\mathbb {R}}}^{n}\times ({{\mathbb {R}}}^n)^* \simeq {{\mathbb {R}}}^{2n}\) comprised by pairs (x, p), where x is the primal and p is the dual state variables. In our case, this equation takes the form
This equation is supplemented with the terminal condition
where \( \mu _{t} \) satisfies (2). The standard well-posedness result for nonlocal continuity equations (see, e.g., [46]) guarantees that (23), (25) has a unique solution \( \gamma _{t} \). Moreover, the projection of \( \gamma _{t} \) onto the x space coincides with \( \mu _{t} \):
3.4 Increment Formula II
Let us go back to (22). First, recalling that \(\mu _T\doteq X_{T\sharp }\vartheta \), we express the integral entering in its right-hand side as follows:
By Lemma 8.1.2 [1], the following version of the classical Newton-Leibniz formula holds for any function \(\psi \in \mathscr {C}^1(I\times {{\mathbb {R}}}^{2n})\):
Remark 3.1 allows us to take \( \psi _t(x,p) \doteq p \cdot w_t\left( X_{t}^{-1}(x)\right) \) in the above expression. Recall that \( X_{t}(x) = P_{0,t}(x) \), where P is the flow of the noauthonomous vector field \( v_{t}(x) = V_{t}\left( x,\mu _{t},u(t)\right) \), in particular, \( X_{t}^{-1} = P_{t,0} \) and we can use the standard rules of flow differentiation (Theorem 2.3.3 [15]) to perform the calculations:
Then,
In view of (12), the right-hand side reduces to
Renaming the variables \((x, p) \leftrightarrow (y, q)\) in the latter term shows that the last two terms cancel out. Hence,
Finally, noticing that \( \psi _0 \doteq p \, w_0 \equiv 0 \) and
then using (27) and the definition of \(W_t\), we have
Proposition 3.2
Under assumptions \( ({\varvec{A}}_{1}) \), \( ({\varvec{A}}_{2}) \) it holds
where
and \(t \mapsto \gamma _t\) is a solution of the Hamiltonian system (23)–(25).
3.5 Pontryagin’s Maximum Principle
A consequence of the increment formula (29) is the following version of Pontryagin’s maximum principle.
Theorem 3.1
(PMP in terms of Hamiltonian system) Assume that \(({\varvec{A}}_{1,2})\) hold, and \( \vartheta \in {\mathscr {P}}_{c}(\mathbb {R}^{n}) \). Let \((\mu ,u)\) be an optimal pair for (P). Then u(t) satisfies, for a.e. \(t\in I\), the maximum condition
where \(\gamma \) is a unique solution of the Hamiltonian system (23)–(25) and \( H_{t} \) is defined by (30).
Proof
Since u is optimal, we have \( \mathscr {I}[u^{\lambda }] - \mathscr {I}[u] \ge 0 \) for any target control \( {\bar{u}} \). Now, the increment formula implies that
On the other hand,
Let \( \psi (t,\upsilon ) \doteq H_{t}(\gamma _{t},\upsilon ) \) and \( \alpha (t) \doteq \max _{\upsilon \in U}\psi (t,\upsilon ) \). It is easy to check that \( \psi \) is a Carathéodory map. Since \( \alpha (t)\in \psi (t,U) \) for a.e. \( t\in I \), we deduce from Filippov’s lemma [4, Theorem 8.2.10] that there exists \( {\tilde{u}}\in \mathscr {U} \) satisfying \( \alpha (t) = \psi (t,{\tilde{u}}(t)) \) for a.e. \( t\in I \). Hence, the inequality in (32) can be replaced by the equality, which completes the proof. \(\square \)
Remark 3.2
Pontryagin’s maximum principle displayed by Theorem 3.1 is essentially the same as in [10, 12]. However, in these papers, the driving vector field has a specific form: It can be represented as the sum of a nonlocal drift term and an external Lipschitz vector field \(u=u(t,x)\) playing the role of control action. In our case, the control is a measurable function of time variable only \(u=u(t)\), which may enter in the non-local term itself, thus enabling us, e.g., to govern convolution kernels as in (4). Finally, note that Theorem 3.1 can be derived from the (most general) version of PMP recently obtained in [7], which relies on the so-called Lagrangian interpretation [24] of the mean-field control problem (P).
Remark 3.3
We conclude this section by stressing two obvious drawbacks of the presented form of the necessary optimality condition, which are critical for its numerical implementation.
-
1.
Equation (23) is defined on the space of dimension 2n, which makes its numerical solution computationally demanding even for \(n=2\).
-
2.
Even if \(\mu \) is absolutely continuous, \(\gamma \) is not. In other words, \(\gamma \) never takes the form \(\rho _t \, {\mathscr {L}}^{2n}\) with a density function \((t,x,p) \mapsto \rho _t(x,p)\). This is due to the fact that \(\gamma _T\) is supported on the graph of the map \(x \mapsto -D_{\mu }\ell (\mu _T)(x)\), which is always \({\mathscr {L}}^{2n}\)-null set. This means that system (23) can not be solved by standard numerical methods for hyperbolic PDEs, which can be used only when densities exist.
These issues motivate the development of a new version of Theorem 3.1, which is obtained by extracting the “adjoint system” from the Hamiltonian PDE (23).
4 Adjoint Equation
It this section, we shall see that the Hamiltonian system (23) can be decoupled into the primal and dual parts just as one is used to experience in the classical optimal control theory. This fact will allow us to rewrite the increment formula and Pontryagin’s maximum principle in an equivalent form, suitable for numerics.
4.1 Derivation
After reflecting upon the formula (29), one comes up with an idea to take, as a matter of adjoint trajectory, the family of signed vector (namely, row vector) measures defined by
where \( \gamma _{t} \) is the solution of (23)–(25). Indeed, return to representation (27), (28) and specify the class of test functions \(\psi \) as follows:
In this case, the left-hand side of (27) vanishes, which implies
where \(\Xi \) is defined in (28). In terms of \(\nu \), the parts of the integral in the left hand side of (34) can be represented as follows:
and, according to (26),
Substituting these expressions into (34), we obtain
The choice \( \varphi (x) = (0,\ldots ,\varphi ^{i}(x),\ldots 0)^{{T}} \), where only i-th component of \( \varphi \) is nonzero, shows that this is merely the weak formulation of the following system of balance laws:
Here, for the sake of readability, we omit the lower index t of \(\nu _t\) and abbreviate
where \( m^{j}_{i} = m^{j}_{i}(t,x,y) \) are elements of the matrix \( D_{\mu }V_{t}\left( y,\mu _{t},u(t),x\right) \).
At the final time instant T, one has
which can be rewritten in terms of the Radon-Nikodym derivative as
Definition 4.1
We call the backward system (36), (37) of nonlocal linear PDEs the adjoint system associated to the optimal control problem (P) .
4.2 Well-Posedness
We observe that there exists a solution of the adjoint system, namely, the one defined by (33). Let us show that this solution is unique. Basically, the adjoint system (36) is a system of linear balance laws with sources of the form
To proceed, recall basic properties [47] of the linear balance law
with a Carathéodory, locally Lipschitz, sublinear vector field \( f_{t} \) and an integrable source \( \varsigma _{t} \).
Definition 4.2
A curve \(\varsigma :I\rightarrow \mathscr {M}({{\mathbb {R}}}^n)\) is called integrable if for any Borel set \( A\subset {{\mathbb {R}}}^{d} \) the map \(t\mapsto \varsigma _{t}(A)\) is measurable and \( \int _{0}^{T}\Vert \varsigma _{t}\Vert _{TV}\,d t < +\infty \), where \( \Vert \cdot \Vert _{TV} \) denotes the total variation norm on \( \mathscr {M}({{\mathbb {R}}}^{n}) \).
For integrable curves we can define a notion of integral in the usual way: \(\displaystyle \left( \int _{0}^{t}\varsigma _{s}\,ds\right) (A) \doteq \int _{0}^{t}\varsigma _{s}(A)\,ds\), for all Borel sets \( A\subset {{\mathbb {R}}}^{n} \).
Definition 4.3
A curve \( \rho \in C\big (I;{\mathscr {M}}({{\mathbb {R}}}^{n})\big ) \) is called a solution of (39) if and only if, for any test function \( \varphi \in C_{c}^{\infty }({{\mathbb {R}}}^{n})\) and a.e. \( t\in I \), one has
Theorem 4.1
Under our assumptions, there exists a unique solution of (39) with the initial condition \(\rho _0=\xi \). Moreover, it can be expressed by
where P is the flow of \( f_{t} \).
The following Lemma collects several well-known properties of the total variation norm (since their proof is quite standard, we drop them for brevity).
Lemma 4.1
Let \( \rho \in \mathscr {M}({{\mathbb {R}}}^{n}) \) and \( \varsigma :I\rightarrow \mathscr {M}({{\mathbb {R}}}^{n}) \) be an integrable curve. Then,
-
1.
for any Borel measurable bijective map \( f:{{\mathbb {R}}}^{n}\rightarrow {{\mathbb {R}}}^{n} \),
$$\begin{aligned} \Vert f_{\sharp }\rho \Vert _{TV} = \Vert \rho \Vert _{TV}; \end{aligned}$$ -
2.
for all \( t\in I \),
$$\begin{aligned} \Big \Vert \int _{0}^{t}\varsigma _{s}\,ds \Big \Vert _{TV}\le \int _{0}^{t}\Vert \varsigma _{s}\Vert _{TV}\,ds; \end{aligned}$$ -
3.
if \( \textrm{spt}\rho \) is contained in a compact set \( \Omega \), then for any \( \varphi \in \mathscr {C}^{0}({{\mathbb {R}}}^{n}) \)
$$\begin{aligned} \Vert \varphi \rho \Vert _{TV} \le \Vert \varphi \Vert _{\mathscr {C}^{0}(\Omega )}\,\Vert \rho \Vert _{TV}; \end{aligned}$$ -
4.
if \( \textrm{spt}\rho \) is contained in a compact set \( \Omega \), then for any \( K \in \mathscr {C}^{0}({{\mathbb {R}}}^{2n}) \)
$$\begin{aligned} \left| \int K(x,y)\,d\rho (y)\right| \le \Vert K\Vert _{{\mathscr {C}^{0}}{(\Omega ^{2})}}\,\Vert \rho \Vert _{TV}\quad \forall x\in \Omega . \end{aligned}$$
The well-posedness of the adjoint system is established by the following result, where \(\mathscr {M}_{c}({{\mathbb {R}}}^{n})\) denotes the subset of \(\mathscr {M}({{\mathbb {R}}}^{n})\) composed of signed measures with compact support.
Proposition 4.1
Under assumptions \(({\varvec{A}}_{1,2})\), the adjoint system (36) with the terminal condition \( \nu _{T} = \xi \), \( \xi \in \left[ \mathscr {M}_{c}({{\mathbb {R}}}^{n})\right] ^n\), has a unique solution.
Proof
Take two terminal measures \( \xi , \xi ' \in \left[ \mathscr {M}_{c}({{\mathbb {R}}}^{n})\right] ^n \) and denote by \( \nu _{t} \) and \( \nu _{t}' \) the corresponding (potentially, non-unique) trajectories of (36). Then, from Theorem 4.1 and Lemma 4.1, it follows that
where \( \varsigma \) and \( \varsigma ' \) are the corresponding sources defined by (38). Since
we obtain, again by Lemma 4.1,
where \( \Omega \) is a compact set containing the supports of the measures \( \nu _{t} \), \( \nu _{t}' \), \( \mu _{t} \), \( t\in I \) (one can show that there is such a set by reasoning as in [47]), \( C^{1}_{\Omega } \) is an upper bound of \( \sum _{i,j}\vert \partial _{x_{i}}v^{j}\vert \) on \( I\times \Omega \) and \( C^{2}_{\Omega } \) is an upper bound of \( \sum _{i,j}\vert m^{j}_{i}\vert \) on \( I\times \Omega \times \Omega \).
By letting \( r(t) = \displaystyle \sum _{i=1}^{n}\Vert ( \nu _{i}-\nu _{i}')_{t}\Vert _{TV} \), we obtain
Now, Grönwall’s lemma gives the uniqueness. \(\square \)
4.3 Increment Formula III
The increment formula (29) and Pontryagin’s maximum principle (Theorem 3.1) are trivially reformulated in terms of a solution to the adjoint system.
Theorem 4.2
(Increment formula) Assume that \(({\varvec{A}}_{1,2})\) hold, and \( \vartheta \in {\mathscr {P}}_{c}(\mathbb {R}^{n}) \). Let \( u,{\bar{u}}\in \mathscr {U} \) and \( u^{\lambda } = u+\lambda ({\bar{u}} - u) \), \( \lambda \in [0,1] \), be the weak variation of u. Then,
where
Theorem 4.3
(PMP in terms of the adjoint system) Assume that \(({\varvec{A}}_{1,2})\) hold, and \( \vartheta \in {\mathscr {P}}_{c}(\mathbb {R}^{n}) \). Let \((\mu ,u)\) be an optimal pair for (P). Then u satisfies, for a.e. \(t\in I\), the maximum condition
where \(\nu \) is a unique solution of the adjoint system (36), (37) and \( {{\textbf {H}}}_{t} \) is defined by (42).
Remark 4.1
Since the adjoint system (36), (37) has a unique solution \( \nu _{t} \), it must coincide with the one given by (33). In particular, \( \nu _{t} \) acts on test functions \( \varphi \in \mathscr {C}^{1}({{\mathbb {R}}}^{n};{{\mathbb {R}}}^{n}) \) by the rule
where \( \gamma ^{x}_{t} \) is the disintegration of \( \gamma _{t} \) with respect to \( \mu _{t} \) (see [1, Theorem 5.3.1]). If the initial measure \( \vartheta \) is absolutely continuous with respect to the Lebesgue measure \( \mathscr {L}^{n} \), then so are all \( \mu _{t} \), \( t\in I \) (thanks to the representation \( \mu _{t} = X_{t\sharp }\vartheta \)). Now, (44) implies that every \( \nu _{t} \), \( t\in I \), must be absolutely continuous as well.
The discussed fact has important consequences, which answer the challenges outlined by Remark 3.3:
-
1.
In contrast to the Hamiltonian continuity equation (23) as a whole, the adjoint system is solvable numerically.
-
2.
While handling the adjoint equation, we deal with a system of \(n+1\) first-order hyperbolic PDEs, each one “living” on \({{\mathbb {R}}}^n\). Solving this system is less computationally expensive than treating a single equation on \({{\mathbb {R}}}^{2n}\).
4.4 Linear Case
Now, we establish a connection between Theorem 4.3 and the well-known version of PMP for \(\mu \)-independent vector fields \(V_{t}(x,\mu ,u)=V_{t}(x,u)\) (see, e.g., [13, 45]). For such fields, the part of adjoint state is played by a solution \((t, x)\mapsto \psi _t(x)\) of a single non-conservative transport equation
It is reasonable to expect that, under sufficient regularity, the adjoint system (36) boils down to (45). This ansatz is confirmed by the following
Proposition 4.2
Assume that \((\mathbf {A_{1,2}})\) hold, \( x\mapsto \frac{\delta \ell }{\delta \mu }(\mu ,x) \) is of class \( \mathscr {C}^{2} \), \( \vartheta \in {\mathscr {P}}_{c}(\mathbb {R}^{n}) \) and \( V_{t}(x,\mu ,u) = V_{t}(x,u) \). Let \(u \in {\mathscr {U}}\), and \( \mu _{t} \) and \( \nu _{t} \) be the corresponding solutions of (2) and (36), (37), respectively. Then, for a.e. \(t\in I\),
where \((t, x) \mapsto \psi _t(x)\) is a solution of the transport equation (45) with the terminal condition
Proof
It is clear that the representation (46), (47) does agree with the terminal condition (37), since
Due to the uniqueness of a solution to (36), we only need to formally check that \(\nu _i \doteq \partial _{x_{i}} \psi \, \mu \), \( 1 \le i \le m \), meets the identity (35) with the vector field \(V_t(x, \mu _{t}, u(t)) = v_t(x)\).
A solution of (45), (47) can be written explicitly as \( \psi _{t}(x) = -\frac{\delta \ell }{\delta \mu }(\mu _{T}, P_{t,T}(x)) \), where P is the flow of v. This formula, together with our assumptions, implies that \( \psi \) admits the partial derivatives \( \partial _{x_i}\psi \), \( \partial _{x_i}\partial _{x_j}\psi \) and \( \partial _{x_i}\partial _{t}\psi \) for all \( 1\le i,j \le m \). These derivatives are at least measurable in t, continuous in x and locally bounded. Take the standard mollification kernel \( \eta _{\varepsilon }:\mathbb {R}\rightarrow \mathbb {R} \) and consider the convolution
It is easy to see that \( \partial _{x_i}\partial _t\psi ^{\varepsilon } = \partial _t\partial _{x_i}\psi ^{\varepsilon } \) and \( \partial _{\alpha }\psi ^{\varepsilon } \rightarrow \partial _{\alpha }\psi \) as \( \varepsilon \rightarrow 0 \) in the sense that
where \( \partial _{\alpha } \) denotes any of the derivatives \( \partial _{x_i} \), \( \partial _{x_i}\partial _{x_j} \), \( \partial _{x_i}\partial _t \). Let \( \nu ^{\varepsilon }_i \doteq \partial _{x_i}\psi ^{\varepsilon }\mu \). Then, we can formally write
More precisely, for any test function \( \varphi \in C^\infty _c(I \times \mathbb {R}^n;\mathbb {R}) \), we have
Therefore,
It remains to use (48) for passing to the limit as \( \varepsilon \rightarrow 0 \). The first term in the right-hand side vanishes thanks to (45), so we get
in the sense of distributions. Since \( D_{\mu }V = 0 \), we conclude that \( \nu \) does satisfy (36). \(\square \)
5 Descent Method
Now, we are able to construct an algorithm for the numerical solution of Problem (P) with vector field as in (20). Note that similar algorithms were earlier proposed for solving classical [3] and stochastic [2] optimal control problems.
5.1 Algorithm
Let u be a reference control, \( \mu _{t} \) and \( \nu _{t} \) be the corresponding trajectory and co-trajectory. Construct the target control as follows:
The increment formula (41) shows that \( {\bar{u}} - u \) is a descent direction. Let us introduce the functional
It is clear that \(\mathscr {E}[u] \ge 0\) and \(\mathscr {E}[u] = 0\) implies that the pair \((\mu [u], u)\) satisfies the PMP. In other words, \( \mathscr {E} \) measures the “non-extremality” of u.
Now, we can use the descent direction \(\bar{u}-u\) for developing the following version of the classical backtracking algorithm.
The convergence analysis of the algorithm is provided by the following theorem.
Theorem 5.1
For any initial control \(u^0\in \mathscr {U}\), the sequence \(\{u^k\}\) generated by the algorithm
-
1.
is monotone in the sense that
$$\begin{aligned} \mathscr {E}[u^k]\ne 0 \ \Rightarrow \ \mathscr {I}[u^{k+1}] < \mathscr {I}[u^{k}], \quad \text{ and } \quad \mathscr {E}[u^k]= 0 \ \Rightarrow \ \mathscr {I}[u^{k+1}] = \mathscr {I}[u^{k}] \end{aligned}$$ -
2.
converges in the sense that \( \mathscr {E}[u^k]\rightarrow 0\) as \(k\rightarrow \infty . \)
Proof
Let \( e^{k}:= \mathscr {E}[u^{k}] \) and assume that \( e^{k}\not \rightarrow 0 \). In this case, there exists \( \varepsilon >0 \) such that \( e^{k}\ge \varepsilon \) for all indices k from some countable set \( K\subset {\mathbb {N}} \). By the choice of \( \lambda ^k \), we obtain, for all \( k\in K \),
This shows that \( (\lambda ^{k})_{k\in K} \rightarrow 0 \), because otherwise \( \mathscr {I}[u^{k}] \rightarrow -\infty \) up to a subsequence. For any large k we have \( \lambda ^{k}\le \theta \). Hence \( \lambda := \lambda ^{k}/\theta \) is an admissible step. On the other hand, by Step 4 of the algorithm for such \( \lambda \) we have: \( \mathscr {I}\left[ u^{k}+\lambda ({\bar{u}}^{k}-u^{k})\right] -\mathscr {I}[u^{k}]\ge c\lambda \langle u^{k}-{\bar{u}}^{k}, d^{k} \rangle _{L^2}, \) that is,
where we use the increment formula (41) to get the last equality. Hence \( \frac{(1-c)\lambda ^{k}}{\theta }e^{k}\le C\left( \frac{\lambda ^{k}}{\theta }\right) ^{2},\) for some \( C>0 \), or equivalently,
Since the right-hand side tends to zero, we come to a contradiction. \(\square \)
5.2 Implementation
In the algorithm described in Sect. 5.1, the primal and adjoint equations are solved numerically. If the original problem is periodic in space, and the driving vector field has a convolutional structure (4), then, for the numeric integration, one gives preference to so-called spectral methods [14].
Assume that the initial measure \( \vartheta \in \mathscr {P}_{c}(\mathbb {R}) \) is absolutely continuous. This implies that the corresponding trajectories \( \mu _t \) and \( \nu _t \) are absolutely continuous as well, and all ingredients of the algorithm can be recast in terms of their densities \(\rho _t\) and \(\zeta _t\), respectively. Moreover, since \( \vartheta \) is compactly supported, there exists a segment [a, b] such that \( \textrm{spt}\mu _t \subset [a,b]\), and \(\textrm{spt}\nu _t\subset [a,b]\) for all \(t\in I. \) This implies that \( \mu _{t} \) and \( \nu _t \) can be considered as measures on the circle \( \mathbb {S}^{1} \) (i.e. the measures can be view as \(2\pi \)-periodic in x).
The primal and the adjoint equations can be written in the form:
\( f,g,h,K,M:I \times \mathbb {S}^{1}\rightarrow \mathbb {R} \) are given functions.
Suppose that all the densities are of the class \( L^{2}(\mathbb {S}^{1}) \). Upon substitution of the truncated Fourier series
in (51), the partial differential equation transforms into the system of ODEs
where the hat over nonlinear terms denotes their Fourier coefficients, and
stands for the Fourier coefficients of \(\rho (x,t)\).
The system (53) can be integrated by any appropriate numerical method (e.g. the Runge–Kutta method). Transformations between the physical and spectral (Fourier) spaces are computed by using the Fast Fourier Transforms (FFT). Multiplications of fields are usually computed in the physical space, derivatives and convolutions are evaluated in the Fourier space.
5.3 Numerical Experiment
As an example, we consider the paradigmatic model of Kuramoto [36], which describes an assembly of pairwise interacting homotypic oscillators. Specifically, we consider an optimization problem in the spirit of [48], in which the goal is to synchronize a continuous oscillatory network by a given time moment T.
The prototypic ODE representing the dynamics of N oscillators takes the form
Here, \(x_i(t) \in {\mathbb {S}}^1\) and \(\omega _i \in \mathbb {R}\) are the phase and natural frequency of the ith oscillator, respectively, \(\alpha \) is the phase shift. Control inputs are \(t \mapsto u(t) \doteq (u_1(t), u_2(t))\), where \(u_1\) affects the angular velocity, and \(u_2\) modulates the connectivity of the network.
As in [48], we assume that all oscillators have a common natural frequency \(\omega \), which, in this case, can be specified as \(\omega =0\). As the number of oscillators \(N \rightarrow \infty \), the limiting mean-field version of (55) is described by the curve \(t \mapsto \mu _t \in \mathscr {P}({\mathbb {S}}^1)\) satisfying the nonlocal continuity equation driven by the vector field
Consider the problem of steering the ensemble to a given phase \(x_{0} + 2\pi n\), \(n \in {\mathbb {Z}}\):
To specify the adjoint equation, we compute (see Example 1 in Sect. 2.3):
and \( D_{\mu }\ell (\mu ;x) = \sin (x-x_0) \). Then, (36) becomes
where \( v_{t}(x) = V_{t}(x,\mu _{t},u(t)) \), \( K_1(x) = \cos (-x+\alpha ) \), \( K_2(x) = \cos (x+\alpha ) \).
Let us associate \( \mu \) and \( \nu \) with their densities represented as in (52) in terms of the Fourier coefficients \( \widehat{a}_n \) and \( \widehat{b}_n \), respectively. To represent the PDE (2) in the Fourier space (i.e. in the form (53)), notice that the only non-vanishing Fourier coefficients of (56) are \(\widehat{V}_0=u_1,\ \widehat{V}_1=i \pi u_2\widehat{\mu }_1\exp (i\alpha )\), and the complex conjugate of the latter one is \(\widehat{V}_{-1}\). This form of \(V(x,\mu ,u)\) enables us to compute the r.h.s. of (53) exclusively in the Fourier space with no recourse to the physical space, in contrast with the case when applying the pseudospectral methods to the system with a generic \(V(x,\mu ,u)\).
In the Fourier space, the nonlocal continuity equation reads
while the adjoint equation (36) and the terminal condition (37) become:
In order to compute the transformation \(\rho (x_j,t)\mapsto \widehat{\rho }_k(t)\) and its inverse, we employ the forward and backward FFTs implemented in the library FFTW [32].
The problem is considered under the control constraint \( u_1^2+u_2^2\le 2 \); for the kth iteration, the corresponding target control \(({\bar{u}}_1^k, {\bar{u}}_2^k)\) provided by (50) takes the form: \(\frac{\sqrt{2}}{\sqrt{(d_1^k)^2 + (d_2^k)^2}}(d_1^k, d_2^k)\), and the control-update rule reads: \(u^{k+1} = u^k+\lambda ^k({\bar{u}}^k - u^k)\).
Some computational results are presented by Fig. 1.
Remark 5.1
Let us stress several differences between the problem that we solve here and the one addressed in [48]. First of all, in [48] the authors consider the so-called mean-field type controls, i.e., they assume that u depends not only on t but also on x. It is clear that this choice greatly improves the controllability of the system. Moreover, the system in [48] is subject to common noise, which also contributes to the controllability. Indeed, let the initial density be given by \( \rho _{0} = 1 + \sin kx \), \( k \ge 2 \). Then, the convolution in (56) vanishes, which means that our control options reduce to shifting the wave \( \rho _0 \) back and forth. On the other hand, under the presence of common noise, the Fourier coefficient corresponding to \( \sin x \) immediately becomes nonzero and, as a result, the system is self-synchronizing for any positive \( u_2 \). A similar effect can be observed if we try to solve (2), (56) with a discretization scheme that involves a numerical diffusion (such as the classical Lax-Friedrichs method).
6 General Case
In this section, we shall discuss a natural extension of the obtained results to the control-nonlinear case and general cost functional (1).
6.1 Nonlinear Dependence on Control
To handle the case of nonlinear dependence \(u \mapsto V_t(x, \mu , u)\), we shall resort to the standard technique based on the extension of the original class \({\mathscr {U}}\) of control signals to a broader space \( \widetilde{{\mathscr {U}}} \doteq \left\{ \eta \in {\mathscr {P}}(I\times U): \, \left[ (t, u) \mapsto t\right] _{\sharp }\eta = \frac{1}{T}{\mathscr {L}}^1\right\} \) of Young measures [22]. It is well-known that such an extension provides the linearization of the vector field w.r.t. the driving signal and, in a certain sense, reduces the general model to the above control-affine case. Recall that i) \({\mathscr {U}}\) is dense in \(\widetilde{{\mathscr {U}}}\) due to the embedding \(u \mapsto \eta \), \(\eta _t=\delta _{u(t)}\), where \(t \mapsto \eta _t \in {\mathscr {P}}(U)\) is the weakly measurable family of probability measures obtained by disintegration of \(\eta \) w.r.t. \(\frac{1}{T}{\mathscr {L}}^1\); and ii) \(\widetilde{{\mathscr {U}}}\) is compact in the topology of weak convergence of probability measures (and therefore, in any metric \(W_p\), \(p\ge 1\)) as soon as U is compact, thanks to the classical Prohorov theorem.
This passage, which is a routine of the mathematical control theory, leads to the following relaxation of the original dynamics (2):
the original cost should be reformulated in the corresponding form: \( \widetilde{{\mathscr {I}}} = \ell ({\tilde{\mu }}_T), \) where \({\tilde{\mu }}_t\) is a solution of (60).
Observing that the dependence \(\omega \mapsto {\widetilde{V}}_{t}\left( x,\mu ,\omega \right) \) is linear, we invite the reader to consider the weak variation \( \eta ^\lambda = \eta + \lambda ({\bar{\eta }} - \eta ) \) and the respective cost increment \(\widetilde{{\mathscr {I}}}[\eta ^\lambda ] - \widetilde{{\mathscr {I}}}[\eta ]\) in place of (21) and (22), and reproduce the arguments of Sect. 3 and 4. By doing this, one ensures that the resulting increment formula and necessary condition for the optimality of a Young measure \(\eta \) keep the form of Theorems 4.2 and 4.3, where V and \({\textbf{H}}_{t}\) are replaced by \({\widetilde{V}}\) and \(\widetilde{{\textbf{H}}}_{t}\), respectively, \( \displaystyle \widetilde{{\textbf{H}}}_{t}(\mu , \nu , \omega ) \doteq \int _{U}{\textbf{H}}_{t}(\mu , \nu , u) \,d\omega (u), \) and the maximum condition (43) becomes
where \({\tilde{\nu }}_t\) is the adjoint backward solution associated to \(\eta \). Now, if the addressed control-nonlinear problem (P) does have a usual minimizer \(u\in \mathscr {U}\), then PMP for u is restored by taking \(\eta \) such that \(\eta _t =\delta _{u(t)}\).
6.2 Running Cost
If the map \(u \mapsto L_t(x, \mu , u)\) is affine, one easily adapts PMP by reformulating the dynamics (24) of the Hamiltonian PDE and the Hamiltonian (42) as
and \(\displaystyle {\textbf{H}}_{t} \doteq \int V_{t}\cdot \,d\nu -L_{t}.\) Further details can be found, e.g., in [10]. The general u-nonlinear case refers to the relaxation technique exhibited in Sect. 6.1.
References
Ambrosio, L., Gigli, N., Savaré, G.: Gradient flows. In: Metric Spaces and in the Space of Probability Measures. Lectures in Mathematics ETH Zürich. Birkhäuser, Boston (2005)
Annunziato, M., Borzì, A.: A Fokker-Planck control framework for multidimensional stochastic processes. J. Comput. Appl. Math. 237(1), 487–507 (2013)
Arguchintsev, A.V., Dykhta, V.A., Srochko, V.A.: Optimal control: nonlocal conditions, computational methods, and the variational principle of maximum. Russ. Math. 53(1), 1–35 (2009)
Aubin, J.P., Frankowska, H.: Set-Valued Analysis. Modern Birkhauser Classics. Birkhäuser, Boston (2009)
Averboukh, Y.: Krasovskii-Subbotin approach to mean field type differential games. Dyn. Games Appl. 9, 573–593 (2018)
Averboukh, Y.: Viability theorem for deterministic mean field type control systems. Set-Valued Var. Anal. 26(4), 993–1008 (2018)
Averboukh, Y., Khlopin, D.: Pontryagin maximum principle for the deterministic mean field type optimal control problem via the Lagrangian approach (2022). arXiv:2207.01892
Bongini, M., Fornasier, M., Rossi, F., Solombrino, F.: Mean-field Pontryagin maximum principle. J. Optim. Theory Appl. 175(1), 1–38 (2017)
Bonnet, B.: A Pontryagin maximum principle in Wasserstein spaces for constrained optimal control problems. ESAIM 25, 52 (2019)
Bonnet, B., Frankowska, H.: Necessary optimality conditions for optimal control problems in Wasserstein spaces. Appl. Math. Optim. 84(S2), 1281–1330 (2021)
Bonnet, B., Rossi, F.: The Pontryagin maximum principle in the Wasserstein space. Calc. Var. Partial. Differ. Equ. 58(1), 11 (2019)
Bonnet, B., Rossi, F.: Intrinsic Lipschitz regularity of mean-field optimal controls. SIAM J. Control. Optim. 59(3), 2011–2046 (2021)
Bonnet, B., Cipriani, C., Fornasier, M., Huang, H.: A measure theoretical approach to the mean-field maximum principle for training neurodes. Nonlinear Anal. 227, 113161 (2023)
Boyd, J.P.: Chebyshev and Fourier Spectral Methods, 2nd edn. Dover Publications Inc., New York (2001)
Bressan, A., Piccoli, B.: Introduction to the Mathematical Theory of Control. AIMS Series on Applied Mathematics, vol. 2. American Institute of Mathematical Sciences, Springfield (2007)
Burger, M., Pinnau, R., Totzeck, C., Tse, O.: Mean-field optimal control and optimality conditions in the space of probability measures. SIAM J. Control. Optim. 59(2), 977–1006 (2021)
Cardaliaguet, P.: Analysis in the space of measures (2019). https://dottorato.math.unipd.it/sites/default/files/Pierre_Cardaliaguet.pdf
Cardaliaguet, P., Delarue, F., Lasry, J.-M., Lions, P.-L.: The Master Equation and the Convergence Problem in Mean Field Games. Ann, vol. 201. Math. Stud. Princeton University Press, Princeton (2019)
Carmona, R., Delarue, F.: Forward-backward stochastic differential equations and controlled McKean-Vlasov dynamics. Ann. Probab. 43(5), 2647–2700 (2015)
Carrillo, J.A., Fornasier, M., Toscani, G., Vecil, F.: Particle, kinetic, and hydrodynamic models of swarming. In: Mathematical Modeling of Collective Behavior in Socio-economic and Life Sciences, pp. 297–336. Birkhäuser, Boston (2010)
Carrillo, J.A., Choi, Y.-P., Hauray, M.: The Derivation of Swarming Models: Mean-Field Limit and Wasserstein Distances, pp. 1–46. Springer, Vienna (2014)
Castaing, C., de Fitte, P., Valadier, M.: Young Measures on Topological Spaces: With Applications in Control Theory and Probability Theory. Mathematics and Its Applications. Springer, Dordrecht (2004)
Cavagnari, G., Marigonda, A., Nguyen, K.T., Priuli, F.S.: Generalized control systems in the space of probability measures. Set-Valued Var. Anal. 26(3), 663–691 (2018)
Cavagnari, G., Lisini, S., Orrieri, C., Savaré, G.: Lagrangian, Eulerian and Kantorovich formulations of multi-agent optimal control problems: Equivalence and gamma-convergence. J. Differ. Equ. 322, 268–364 (2022)
Colombo, R.M., Herty, M., Mercier, M.: Control of the continuity equation with a non local flow. ESAIM 17(2), 353–379 (2011)
Cristiani, E., Frasca, P., Piccoli, B.: Effects of anisotropic interactions on the structure of animal groups. J. Math. Biol. 62, 569–88, 04 (2011)
Cucker, F., Smale, S.: Emergent behavior in flocks. IEEE Trans. Autom. Control 52(5), 852–862 (2007)
Dobrushin, R.L.: Vlasov equations. Funct. Anal. Appl. 13(2), 115–123 (1979)
Fornasier, M., Solombrino, F.: Mean-field optimal control. ESAIM 20(4), 1123–1152 (2014)
Fornasier, M., Piccoli, B., Rossi, F.: Mean-field sparse optimal control. Philos. Trans. R. Soc. A 372, 20130400 (2014)
Fornasier, M., Lisini, S., Orrieri, C., Savaré, G.: Mean-field optimal control as Gamma-limit of finite agent controls. Eur. J. Appl. Math. 30(6), 1153–1186 (2019)
Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proc. IEEE 93(2), 216–231 (2005)
Gigli, N.: On the geometry of the space of probability measures endowed with the quadratic optimal transport distance. Diss. Ph. D. Thesis (2008)
Jabir, J.-F., Siska, D., Szpruch, L.: Mean-field neural odes via relaxed optimal control (2019). arXiv:1912.05475
Ko, D., Zuazua, E.: Model predictive control with random batch methods for a guiding problem. Math. Models Methods Appl. Sci. 31(08), 1569–1592 (2021)
Kuramoto, Y.: Chemical Oscillations, Waves, and Turbulence. Dover Books on Chemistry. Dover Publications Inc., Mineola (2003)
Laing, C.R.: The dynamics of networks of identical theta neurons. J. Math. Neurosci. 8(1), 4 (2018)
Marigonda, A., Quincampoix, M.: Mayer control problem with probabilistic uncertainty on initial positions. J. Differ. Equ. 264(5), 3212–3252 (2018)
Mogilner, A., Edelstein-Keshet, L.: A non-local model for a swarm. J. Math. Biol. 38(6), 534–570 (1999)
Pham, H., Warin, X.: Mean-field neural networks: learning mappings on Wasserstein space (2022). arXiv:2210.15179
Piccoli, B., Rossi, F.: Transport equation with nonlocal velocity in Wasserstein spaces: convergence of numerical schemes. Acta Appl. Math. 124(1), 73–105 (2013)
Piccoli, B., Rossi, F.: Measure-theoretic models for crowd dynamics. In: Modeling and Simulation in Science. Engineering and Technology, pp. 137–165. Springer, Basel (2018)
Pogodaev, N.: Numerical algorithm for optimal control of continuity equations. CEUR Workshop Proc. 467–474, 2017 (1987)
Pogodaev, N.: Optimal control of continuity equations. NoDEA Nonlinear Differ. Equ. Appl. 23(2), 21–24 (2016)
Pogodaev, N.: Program strategies for a dynamic game in the space of measures. Optim. Lett. 13(8), 1913–1925 (2019)
Pogodaev, N., Staritsyn, M.: Impulsive control of nonlocal transport equations. J. Differ. Equ. 269(4), 3585–3623 (2020)
Pogodaev, N.I., Staritsyn, M.V.: Nonlocal balance equations with parameters in the space of signed measures. Sbornik. Math. 213(1), 63–87 (2022)
Sinigaglia, C., Braghin, F., Berman, S.: Optimal control of velocity and nonlocal interactions in the mean-field Kuramoto model. In: 2022 American Control Conference (ACC), pp. 290–295 (2022)
Staritsyn, M., Pogodaev, N., Chertovskih, R., Pereira, F.L.: Feedback maximum principle for ensemble control of local continuity equations: an application to supervised machine learning. IEEE Control Syst. Lett. 6, 1046–1051 (2022)
Weinan, E., Han, J., Li, Q.: A mean-field optimal control formulation of deep learning. Res. Math. Sci. 6(1), 10 (2018)
Zuazua, E.: Averaged control. Automatica 50(12), 3077–3087 (2014)
Acknowledgements
We are grateful to the anonymous referees for their valuable comments enabling us to significantly improve the paper.
Funding
Open access funding provided by FCT|FCCN (b-on). RC and MS acknowledge the financial support of the Foundation for Science and Technology (FCT/MCTES) in the framework of the Associated Laboratory – Advanced Production and Intelligent Systems (AL ARISE, Ref. LA/P/0112/2020), the R &D Unit SYSTEC (Base UIDB/00147/2020 and Programmatic UIDP/00147/2020 funds), and projects RELIABLE – Advances in control design methodologies for safety critical systems applied to robotics (Ref. PTDC/EEI-AUT/3522/2020) and MLDLCOV – Impact of confinement measures related to COVID-19 on mobility, air pollution and macroeconomic indicators in Portugal: an approach in Machine Learning (Ref. DSAIPA/CS/0086/2020), the latter through the program INCO.2030 – National Initiative for Digital Competences e.2030. A part of the simulations was carried out with the OBLIVION Supercomputer (at the High Performance Computing Center, University of Évora) funded by the ENGAGE SKA Research Infrastructure (reference POCI-01-0145-FEDER-022217 - COMPETE 2020 and the FCT, Portugal) in the framework of the FCT calls for computational projects (Refs. 2021.09815.CPCA and 2022.15706.CPCA.A2).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have not disclosed any competing interests.
Additional information
To the blessed memory of Professor Fernando Lobo Pereira.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Chertovskih, R., Pogodaev, N. & Staritsyn, M. Optimal Control of Nonlocal Continuity Equations: Numerical Solution. Appl Math Optim 88, 86 (2023). https://doi.org/10.1007/s00245-023-10062-w
Accepted:
Published:
DOI: https://doi.org/10.1007/s00245-023-10062-w
Keywords
- Optimal control
- Nonlocal continuity equations
- Pontryagin’s maximum principle
- Descent method
- Indirect algorithms for optimal control