The Pontryagin maximum principle for solving Fokker–Planck optimal control problems

The characterization and numerical solution of two non-smooth optimal control problems governed by a Fokker–Planck (FP) equation are investigated in the framework of the Pontryagin maximum principle (PMP). The two FP control problems are related to the problem of determining open- and closed-loop controls for a stochastic process whose probability density function is modelled by the FP equation. In both cases, existence and PMP characterisation of optimal controls are proved, and PMP-based numerical optimization schemes are implemented that solve the PMP optimality conditions to determine the controls sought. Results of experiments are presented that successfully validate the proposed computational framework and allow to compare the two control strategies.


Introduction
In the framework of stochastic optimal control theory [9,23,24], given a stochastic process X(t) subject to a control function u, a control problem is defined by introducing a general objective functional to be minimized that has the following structure where [⋅] represents the expectation with respect to the probability measure induced by the process X(t) (nevertheless, for clarity we explicitly write X in the integral). On the other hand, a fundamental tool for analysing stochastic processes is the fact that the evolution of the probability density function (PDF) associated to X(t) is governed by the so-called Fokker-Planck (FP) equation (or forward Kolmogorov equation), which is a time-dependent partial differential equation (PDE) with an initial PDF configuration; see, e.g., [6] and references therein. Thus, assuming that the stochastic process X(t) owns an absolutely continuous probability measure, one can explicitate the expectation in (1) in terms of the PDF governed by the FP problem. Now, to illustrate these facts and the purpose of this work, and formulate the class of problems that we investigate in this paper, we introduce the following n-dimensional controlled Itō stochastic process where the state variable X(t) ∈ Ω ⊆ ℝ n is subject to deterministic infinitesimal increments driven by the vector valued drift function b, and to random increments proportional to a multi-dimensional Wiener process dW(t) ∈ ℝ m , with stochastically independent components, and is the dispersion matrix coefficient. In this stochastic differential equation (SDE) modelling the stochastic process, we assume that the state configuration of the stochastic process at t 0 is given by X 0 , and we suppose that the control function u ∈ U , where U represents the set of Markovian controls containing all jointly measurable functions u with u(x, t) ∈ K U ⊂ ℝ n , and K U is a compact set, which for simplicity is chosen as a subset of ℝ n .
In application, the model (2) is of central importance in statistical physics, e.g., in the study of Brownian processes, and in the study of biological systems [5]. Recently, it has attracted attention in the framework of modelling pedestrians' motion; see, e.g., [33][34][35] and references therein. Notice that in all these cases, a constant dispersion coefficient is usually considered.
At this point, we remark that there is a difference in our understanding of the control function depending on whether or not its functional dependence on the state of the process X(t) is given a-priori or is sought in order to solve the given control problem. In the former case, we have an open-loop control, and in the latter case we have a closed-loop control function, in the sense that a sudden change of the state of the process X(t) provides instantaneously (feedback) the optimal control for the new state configuration. For a discussion on the significance of these two settings, we refer to [15,16] and the discussion that follows.
Corresponding to (2) and a closed-loop control setting, we consider the following functional (1) J(X, u) = ∫ T t 0

G(X(t), t, u(X(t), t)) dt + F(X(T)) ,
(2) dX(t) = b(X(t), u(X(t), t))dt + (X(t)) dW(t), t ∈ (t 0 , T] X(t 0 ) = X 0 , 1 3 The Pontryagin maximum principle for solving Fokker-Planck… which is a conditional expectation to the process X(t) taking the value x 0 at time t 0 . We refer to the functions G and F as the running cost and the terminal cost functions, respectively; see the above references for more details.
The optimal control ū that minimizes C t 0 ,x 0 (u) for the process (2) is given by Correspondingly, one defines the following value function A fundamental result in stochastic optimal control theory is that the function q (subject to appropriate conditions) is the solution to the so-called Hamilton-Jacobi-Bellman (HJB) equation given by with the HJB Hamiltonian function where a ij represents the ijth element of the matrix a = ⊤ ∕2 . Notice that in our case, since the diffusion coefficient a does not depend on the control, the secondorder differential term can be put outside the parenthesis in (7).
The HJB framework represents the essential tool to compute closed-loop controls. This framework poses the challenging task to analyse existence and uniqueness of solutions to the nonlinear HJB equation; see [20] for a fundamental work in this field. However, this task is facilitated in the case of uniform parabolicity that, in the simplest case, is guaranteed assuming that a is the identity matrix in ℝ n multiplied by a positive number > 0 . This setting is considered later on to simplify our analysis.
We are now ready to introduce the FP equation for (2) and then formulate our optimal control problems while outlining the connection to the HJB framework mentioned above. We have (3) C t 0 ,x 0 (u) = ∫ T t 0 G(X(s), s, u(X(s), s))ds + F(X(T)) | X(t 0 ) = x 0 , (4) u = argmin u∈U C t 0 ,x 0 (u).
where f denotes the PDF of the stochastic process, f 0 represents the initial PDF distribution of the initial state of the process X 0 , and hence we require In the following, both the stochastic process (2) and the FP equation (8) are considered in the time interval [0, T], i.e. t 0 = 0 . We consider the stochastic process in the convex domain Ω ⊂ ℝ n , with measure |Ω| , in the sense that X(t) ∈ Ω and, consequently, Ω is the domain where f is defined. We assume that the boundary Ω is Lipschitz, and denote with Q ∶= Ω × (0, T) the space-time cylinder.
One can see that the coefficients of the FP equation are directly determined by the coefficients of the SDE. Moreover, the choice of the barriers that limit the value of X(t) in Ω translate to boundary conditions for the PDF. Specifically, in the case of absorbing barriers, we have homogeneous Dirichlet boundary conditions for f on Ω , t ∈ [0, T] . On the other hand, reflecting barriers correspond to flux-zero boundary conditions. In fact, notice that (8) can be written in the form ; thus, reflecting barriers require F(f ) ⋅ = 0 , where represents the outward normal to Ω . For reasons that are explained below, later on we focus on Dirichlet boundary conditions, and, in correspondence to this choice, we discuss existence and regularity of solutions to (8)- (9), and the properties of the control-to-state map u ↦ f = f (u).
However, for both choices of boundary conditions, we can make the following discussion that aims at clarifying our framework. Let us assume that f 0 (x) = (x − x 0 ) (the Dirac's delta) at t = t 0 fixed, and notice that the expectation in (3) can be explicitly written in terms of the PDF solving the FP problem with the given initial density distribution. Thus, the functional (3) becomes Therefore the optimization problem (4) can be equivalently stated as a FP optimal control problem where a function u in the space U is sought that minimizes (10). Now, in the Lagrange framework [38] and assuming Fréchet differentiability of all components of the FP optimal control problem, we can derive the following first-order necessary optimality conditions that characterize a solution to the FP optimal control problem. We have

3
The Pontryagin maximum principle for solving Fokker-Planck… and for all v ∈ U . This optimality system is completed with the specification of the boundary conditions. We have homogeneous Dirichlet boundary conditions for p if such are the boundary conditions for f. In the case of zero-flux boundary conditions for f, then variational calculus gives homogeneous Neumann boundary conditions for the adjoint variable. We remark that, if a uniformly parabolicity condition for the FP equation holds, then the solution of the FP problem, with f 0 (x) ≥ 0 (strictly in some open set, and also in the case f 0 (x) = (x − x 0 ) ) and with the given boundary-and initial conditions, remains positive in the sense that f (x, t) > 0 for t > t 0 and almost everywhere in Ω . Based on this fact, we see that (13) represents the first-order optimality condition for the minimization problem for the control function in (7). This remark is the starting point to establish a formal connection between the HJB equation and the adjoint equation (12) at optimality [6,7]. This connection has already been used within a Lagrangian framework to construct closed-loop controls for different application problems [33,37].
However, comparison of (7) with (13), shows that differentiability with respect to u is not required in the HJB problem, and the appropriate theoretical optimization framework to establish the HJB-FP optimal control connection appears to be provided by the Pontryagin's maximum principle (PMP). This also implies that a consistent numerical optimization procedure to solve FP optimal control problems (and thus HJB problems) should be formulated in terms of the PMP framework.
In this paper, we would like to contribute to the investigation of both issues by pursuing a theoretical analysis of two specific FP optimal control problems in the PMP framework, and by developing a numerical PMP-based methodology. For both aims, we rely on our previous work in [13,14] and on the fundamental references [27,30], while our numerical PMP-based approach already proposed in [13,14] represents a further development of previous methods developed in the field of optimal control of ordinary differential equation models [25,36]. Now, to explain the challenge of our work, we anticipate that the necessary optimality conditions provided by the PMP consist of the FP equation (11), the adjoint equation (12), and the condition (12) for almost all (x, t) ∈ Q , where the PMP Hamiltonian function is given by In (14), the pair f ,ū denotes the solution of the FP optimal control problem, and p is the corresponding adjoint variable. We say that (11), (12), and (14) provide the PMP characterization of the solution to our FP optimal control problem. (We formulate (14) in terms of a minimum for convenience; however, an equivalent formulation in terms of maximization of H could be chosen.) Notice that the terminal conditions of our FP adjoint problem and of the HJB problem above are identical, and whenever f (x, t) > 0 the minimizer ū(x, t) of H at (x, t), coincides with that of (7), and correspondingly we have Therefore at optimality, the adjoint equation can be written as t p + H(x, t, Dp, D 2 p) = 0 , which allows to identify p with the value function q. Notice that by the very notion of absorbing boundary for the stochastic process, we have that in this case the value function, which now can be identified with p, must be zero at the boundary of the domain Ω.
However, although the HJB-FP connection is clear at a formal level, we have to guarantee that all components of the PMP optimality system are well defined. Specifically, on the one hand we have to discuss existence and regularity of solutions to (11) and (12); on the other hand, we need to guarantee the well posedness of (14) and to provide a methodology to implement it. These requirements are the main challenges that we face in this work.
We remark that, in order to achieve the above mentioned goals, some L ∞ estimates on the FP solution f and on the adjoint variable p are required. The latter is needed in the proof of the PMP characterization of an optimal control; see also [30], whereas an additional L ∞ estimate for f is required when we analyse wellposedness of our PMP-based optimization method.
Notice that these estimates are usually not considered in existing works focusing on a Lagrange framework for FP optimal control problems; see [6] for a list of these references. Indeed, these estimates are available in the case of PDEs with a linear control mechanism [13,14], but not in the FP case where the control (drift) multiplies f and is subject to differentiation. In fact, proving these estimates is a delicate issue that we address in the following sections, and for this purpose we make some assumptions that allow us to focus on the most relevant problems. In particular, we assume that the diffusion coefficient a = ⊤ ∕2 is the identity in ℝ n multiplied by a scalar > 0 (we use the same symbol), and we choose homogeneous Dirichlet boundary conditions for the PDF. The case of flux-zero boundary conditions requires a different analysis especially concerning the already

3
The Pontryagin maximum principle for solving Fokker-Planck… mentioned L ∞ estimates and, in order to keep this paper at a reasonable size, it is not discussed in this paper.
We focus, on a specific open-loop control structure and, on the other hand, on a closed-loop setting, having in mind that in application an open-loop control is usually much easier to implement than a closed-loop one, while the latter provides, in principle, the optimal control 'per antonomasia'. Our choice of a specific openloop control structure is motivated by the discussion in [15,16] in the framework of ensemble controls, where it is pointed out that a composite linear-bilinear openloop control mechanism in the SDE may provide a reasonable approximation of a closed-loop control; we investigate this fact with numerical experiments. We also refer to [15,16] and the recent work [8] concerning the choice of the functions G and F in the cost functional. These functions will be specified and discussed below.
The other challenge that we face in our work is the numerical solution of our FP optimal control problems within the PMP framework. This is a main focus of our research, and in previous works we have proposed the so-called sequential quadratic Hamiltonian (SQH) scheme to solve nonsmooth elliptic and parabolic optimal control problems with linear and bilinear control mechanisms [13,14]. However, the bilinear control structure of the FP equation poses additional difficulties that we address in this work. Furthermore, the convergence analysis of the SQH scheme in [13,14] needs to be extended to accommodate the FP structure, and this requires additional estimates that we present below.
As already mentioned, our theoretical and numerical investigation focuses on the following two stochastic processes. In the first case, we take it represents our open-loop control function. This is a linear-affine control-drift for the SDE where x appears linearly and is modulated by w. In this case, our controlled stochastic process is modelled as follows where • denotes the Hadamard product.
Our second SDE model is given by where the control function u ∶ Ω × [0, T] → ℝ n is intended to define a closed-loop control mechanism for the stochastic process. (In this case, the dependence of the control function on x has to be determined.) Corresponding to these two cases, we consider FP optimal control problems that require to minimize (10) subject to the differential constraint given by (8)- (9), and t 0 = 0 is fixed.
In the next section, we investigate the FP equation and its adjoint concerning the required L ∞ estimates, considering both control strategies given above. In Sect. 3, we present the PMP characterization of our FP control problems. In Sect. 4, we illustrate the numerical approximation and PMP-based optimization procedure for solving the proposed optimal control problems. For this purpose, we illustrate the numerical approximation of the FP PMP optimality system, using the Chang-Cooper scheme [34] combined with implicit first-and second-order Euler schemes. These schemes provide accurate and positivity preserving approximations that are essential in the FP computation. Further, in this section we present the SQH method and discuss its convergence properties. Also in this section, we discuss a modification of the SQH procedure that implements the PMP optimality in the case of feedback controls that resembles the HJB approach. In Sect. 5, we present results of numerical experiments that demonstrate the ability of the FP framework to determine the two different control strategies for stochastic processes. In particular, we present results of Monte Carlo simulations that show the ability of the control mechanisms to drive the ensemble of stochastic paths along a desired trajectory. A conclusion completes this work.

Analysis of the FP equation and of its adjoint
We start this section, providing the weak formulation of our FP problem (8)-(9) with homogeneous Dirichlet boundary conditions. We have for all ∈ H 1 0 (Ω) and for almost all t ∈ (0, T) where the dot ⋅ denotes the Euclidean scalar product in ℝ n , ∇ denotes the gradient in ℝ n , the divergence of a vector-valued function y = y 1 , … , y n T is given by ∇ ⋅ y , and the partial derivative with respect to t is denoted with f � ∶= t f . We remark that for any function y ∈ (L q (0, T)) n , we have ‖y‖ , and analogously for any function y ∈ (L ∞ (0, T)) n , we have ‖y‖ L ∞ (0,T) ∶= max i=1,…,n ‖y i ‖ L ∞ (0,T) . We assume that Ω ⊂ ℝ n is bounded and convex, and q > n 2 + 1 for n ≥ 2 and q ≥ 2 for n = 1 . Further, (⋅, ⋅) denotes the L 2 (Ω)-scalar product.
We first consider the case of a drift function b(x, t) = (v(t) + w(t)•x) , thereafter we focus on the case b(x, t) = u(x, t) . We anticipate that, in our FP control problems, these functions are sought in the following sets of admissible controls. For the former case, we have where and where i ∈ {1, … , n} , and K i V , K i W are compact subsets of ℝ . Hence, we have that

3
The Pontryagin maximum principle for solving Fokker-Planck… For the latter case, we choose the following admissible set of controls where K U represents a compact subset. However, to ease our discussion and simplify notation, we consider the case K U ∶= u min , u max n , with u min , u max ∈ ℝ , u min < u max .

The 'open-loop' case
We consider the case of the drift b(x, u) = (v + w•x) . Since this drift is differentiable with respect to x, we have The next theorem states a specific boundedness result that is required in our PMP framework. (19), and it satisfies the following

Theorem 1 Consider the following parabolic problem
Proof The existence and uniqueness of the solution y ∈ L 2 0, T;H 1 0 (Ω) ∩ L ∞ 0, T;L 2 (Ω) is proved in [21]. Our concern is to prove (20), for which purpose we use results in [14,Theorem A.1]. Now, we define the bilinear map In order to apply [14, Theorem A.1], we construct an auxiliary problem where the corresponding bilinear map fulfils the coercivity condition. For this purpose, we set ŷ(x, t) ∶= e − t y(x, t) for any ≥ 0 where y solves (19). Then, we multiply both sides of the equation in (19) with e − t and obtain with this result, and by inserting the definition of ŷ , we obtain the following (21)

3
The Pontryagin maximum principle for solving Fokker-Planck… with where we use the Cauchy inequality, see [21, page 622], for > 0.
. From (24), we have that which gives . If ‖b‖ L ∞ (Q) = 0 , then from (24) we obtain that (25) holds for ≥ ‖c‖ L ∞ (Q) . Consequently, we choose In the framework of Theorem 1, and with v ∈ V ad and w ∈ W ad , the FP equation (17) Theorem 5] where the corresponding part of the proof also holds for our case to prove the desired regularity. From these results and Theorem 1, we prove the following theorem; compare with [10].

3
The Pontryagin maximum principle for solving Fokker-Planck… Theorem 3 For the solution to (28), it holds Proof As F ∈ L ∞ (Ω) ∩ H 1 0 (Ω) , and due to the pointwise boundedness of v, w and v i (t) + x i w i (t) ∈ L ∞ (Q) for all i ∈ {1, … , n} , we can apply Theorem 1 to obtain the desired result. ◻ Additionally, we have that p ∈ L q 0, T;W 1,q 0 , see [11] and [32,Proposition 8.35]. Notice that in the adjoint FP problem (28) the solution of the forward FP problem does not appear. This is due to the linearity of our cost functional with respect to the PDF.

The 'closed-loop' case
In the case where u ∈ (L q (Q)) n , the well posedness of the FP problem with homogeneous Dirichlet boundary condition is discussed in [5,22], and also in this case, a L ∞ bound for the PDF, analogous to Theorem 2, can be shown based on [10, Theorem 3.1]. As discussed in detail below, the adjoint FP problem for this case is given by for all ∈ H 1 0 (Ω) . Further, since u ∈ U ad ⊂ L ∞ (Q) , we can apply Theorem 1 to obtain a L ∞ bound for the solution of the adjoint problem that is analogous to that of Theorem 3. For brevity, and to keep this paper at a reasonable size, we avoid to put these statements in the form of theorems. Notice that, also in this case, in the adjoint FP problem the PDF does not appear.

Analysis of FP optimal control problems
In this section, we discuss our optimal control problems governed by the FP model (8)-(9) with homogeneous Dirichlet boundary conditions and with the cost functional defined in (10). Also in this section, we first discuss the open-loop case where b(x, t) = v(t) + x•w(t) and identify u with the pair (v, w). Thereafter, we focus on the closed-loop case.
In the first case, we consider the following optimal control problem Suppose that the purpose of the control is that all realizations of the stochastic process (2)  where A plays the role of an attracting potential (i.e. a well centred at a desired minimum point, such that the negative of the gradient of the potential is directed towards this minimum). For example as in [15,16], one could choose A x, t = x − x d (t) 2 , and similarly F x = x − x T 2 . Moreover, as in [8], we may In both cases, the minimum of the tracking part of the functional corresponds to have the PDF concentrated on the desired path. In general, we assume A ∈ L q 0, T;W 1,q (Ω) ∩ L ∞ (Q) bounded from below. On the other hand, we consider costs of the controls in the form where the g i s are non-negative functions, and in (31) we have s 1 , s 2 ≥ 0 and , ≥ 0 . The lower boundedness of the cost functional is ensured by the boundedness of A, g s 1 , g s 2 and the fact that f ≥ 0 . Clearly, the choice g i s (z) = (z i ) 2 corresponds in the functional to a mean L 2 cost. On the other hand, the choice g i s (z) = |z i | corresponds to a mean L 1 cost. In general, we assume that the g i s are convex and Lipschitz continuous.
With this setting, existence of a solution to (30) can be proved as in [22,5 Existence of optimal controls] with the following additional arguments. The function G is bounded by a constant, and the control-to-state map is sequentially continuous as a map from the admissible set of controls to L 2 (Q) , see [22,Remark 5.1]. This means that any weakly converging sequence results in a strongly converging subsequence of the state, which ensures that we can proceed similarly as in the proof of [22, Corollary 5.1] in our case.
The Pontryagin maximum principle for solving Fokker-Planck… Our purpose is to discuss in detail the characterization of solutions to (30) in the PMP framework. For this reason, we present some preparatory results that are required for proving Theorem 5 below.
Central to the formulation of the PMP is the so-called Hamiltonian function Notice that, when f, v, w, are functions, we shorten notation and write Later the place holder will be filled with the spatial derivative of the solution to the adjoint FP problem.
The next step in order to characterise a solution to (30) in the PMP framework is the following lemma that provides a direct relationship between the values of the cost functional at different triples (f, v, w) and the values of the corresponding Hamiltonian. We have that f 1 , v 1 , w 1 solves (17) for f 1 in place of f, with v 1 in place of v, and with w 1 in place of w. (We also write f ← f 1 , v ← v 1 and w ← w 1 .) We have Lemma 4 Let f 1 , v 1 , w 1 and f 2 , v 2 , w 2 each be solutions to (17). Then, it holds where p 1 is given by (28) for v ← v 1 and w ← w 1 .
Proof In order to save notational effort, we drop the functions' dependency with respect to x, t. We have and H (x, t, f , v, w, ) ∶= G(x, t, v, w) H(x, t, f , v, w, ) ∶= H(x, t, f (x, t), v(x, t), w(x, t), (x, t)).
by partial integration with respect to t [38, Satz 3.11], partial integration with respect to x (second line) and with (17) (fourth line). Combining (32) and (33), we obtain ◻ Next, we recall that the standard step for the PMP characterisation of a solution to the FP optimal control (30) is to introduce the concept of needle variation. For this purpose, we define the needle variation for any ṽ ∈ V ad and w ∈ W ad with t 0 ∈ (0, T) and with S k t 0 , a ball centred in t 0 and for its measure |S k t 0 | it holds that lim k→∞ |S k t 0 | = 0 , as follows where v ∈ K V and w ∈ K W . These variations should be understood componentwise for all components of v and w.

3
The Pontryagin maximum principle for solving Fokker-Planck… Now, we can state the PMP characterisation of an optimal control to (30) as follows.
Theorem 5 Let f ,v,w be a solution to (30). Then it holds that for almost all t ∈ (0, T) where p is the solution to (28) for v ←v and w ←w.

Proof
Since v k ∈ V ad and w k ∈ W ad for all k ∈ ℕ , we have with Lemma 4 that for any k ∈ ℕ the following holds Next, we prove that We subtract (28) for v ← v k and w ← w k from (28) for v ←v and w ←w and obtain where p ∶=p − p k and thus From (36) and Theorem 3, we have that if and For the first term, we have the following for a constant c > 0 due to p ∈ L q 0, T;W 1,q (Ω) , q > n 2 + 1 , see [11]. This means that the function is integrable, see [4, Theorem 6.11, Theorem 6.9] and we can apply the Average Value Theorem [29, Theorem 51] to obtain for almost all t 0 ∈ (0, T) . Further, since

3
The Pontryagin maximum principle for solving Fokker-Planck… for all v ∈ V ad and w ∈ W ad , we analogously have that for almost all t 0 ∈ (0, T). Now, we have that the last line in (35) [29, Theorem 51] the following by taking the limit over k on both sides of the inequality (35) for all v ∈ K V and w ∈ K W and for almost all t ∈ (0, T) , renaming t 0 into t. ◻ We remark that the (unusual) integral form of the PMP given in (34) results from the fact that the controls depend only on time variable, and so the needle variation. This is in contrast to the case where the control depends on both variables (x, t), see, e.g., [13,14,30] and references therein, in which case the needle variation is defined in Q.
We also see that (34) involves the PDF, which is consistent with the fact that we are characterizing an open-loop control for (15). The situation is different for our second FP optimal control problem corresponding to the stochastic process (16). In this case, the drift has the closed-loop structure that leads to important consequences that we illustrate below.
Our second FP optimal control problem is given by Similar to (31), we assume the structure G(x, t, u)(x, t) ∶= A(x, t) + g s (u(x, t)) . The PMP Hamiltonian corresponding to (37) is given by Next, we prove that a solution to (37) has the following PMP characterization.
Theorem 6 Let f ,ū be a solution to (37). Then it holds that for almost all (x, t) ∈ Q , where p is the solution to (29) for u ←ū.
Proof We have that p ∈ L q 0, T;W 1,q 0 due to the regularity of the right hand-side of (29), see [11] and [32,Proposition 8.35]. By [22,Theorem 3.1], we have that f ∈ L 2 0, T;H 1 0 (Ω) . Then the proofs of Lemma 4 and Theorem 5 can be done analogously, where the corresponding control terms are replaced by the control of (37), and the needle variation is now defined in Q. Going step by step through the mentioned proofs, we can apply the same arguments to the control u. ◻ We remark that Theorem 6 is analogous to [14,Theorem 3.3] or [13, Theorem 2] with similar proofs. The main difference in the proof of Theorem 5 with respect to that of Theorem 6 is that in the former the needle variation is performed for functions in (0, T) whereas in the latter variations of functions in Q are considered. Now, we focus on (38) and (39), and notice that our FP equation is uniformly parabolic. Therefore the PDF is almost everywhere non-negative, and we can write the PMP condition (39) in the following form

3
The Pontryagin maximum principle for solving Fokker-Planck… for almost all (x, t) ∈ Q.
This last result and the fact that f does not enter in the formulation of the adjoint FP problem imply that the optimal control u is independent of the PDF and thus independent of the initial condition f 0 , and so on the initial condition of the stochastic process (16). Therefore the optimal control u has the structure and the significance of a closedloop control (feedback law).
Notice that, since the Hamiltonian is continuous on the control argument, the control obtained in (40) as a result of an arg min-function is measurable; see [31, 14.29 Example, 14.37 Theorem].

Numerical approximation and optimization methods
In this section, starting from the PMP characterisation of solutions to our FP control problems (30) and (37), we discuss two numerical solution procedures. In the first case, we implement the SQH method that was proposed in [14]. In the second case, we exploit the special structure of the resulting optimality system and the connection to the HJB problem to formulate a variant of the SQH solution procedure for determining the optimal control. This variant has similarity with a well-known implementation of the HJB equation [39]. However, to avoid confusion we call this procedure the SQH direct Hamiltonian (SQH-DH) method.
For the implementation of both methods, we illustrate the numerical approximation of the FP and adjoint FP problems. For this purpose, we consider the two-dimensional case, n = 2 , and a square domain Ω = (− , ) × (− , ) . We define a uniform grid Ω h given by where N x represents the number of subintervals in each direction, and h = 2 ∕N x . Further, let t = T∕N t be the time step size, and N t denotes the number of time steps. Define On this grid, m i,j represents the value of a grid function in Ω h at (x 1 i , x 2 j ) and time t m . For the space discretization of the FP equation, we consider the Chang-Cooper (CC) scheme that is a second-order accurate spatial discretization scheme which guarantees positivity of the PDF [5,17,28]. For the formulation of the CC scheme one considers the flux form of the FP equation t f = ∇ ⋅ F(f ) , where the ith component of the flux F is given by In the CC method we have the following finite-volume approximation where F m , t m ) . Thus, we have Therefore the numerical fluxes are given by and For the time discretization, we consider a combination of the first-and second-order implicit Euler schemes. Specifically, for the first time step, we implement the following backward Euler scheme as follows where m = 1 . On the other hand, for m = 2, … , N t , we use the following secondorder (BDF2) scheme (45) where m = 2, … , N t . The numerical analysis of these schemes is presented in [28], where it is proved that the resulting numerical solution is O( t + h 2 ) accurate with (45), and O( t 2 + h 2 ) accurate with (46). Therefore, in order to guarantee a global second-order accurate solution, the first scheme is used to solve the FP problem on the interval [0, t] with ̃t = t 2 step sizes. Now, concerning the adjoint FP equation, it has been proved in [5,34] that the transpose of (45) provides an O( t + h 2 ) accurate approximation of the adjoint FP equation. This approximation is as follows where A similar result also holds for the scheme (46); see, e.g., [5,28,34].
Next, we discuss our numerical SQH optimization scheme [13,14]. The main feature of this method is to consider the following augmented Hamiltonian � v i � 2 for any vector v ∈ ℝ n . The quadratic term, which augments the Hamiltonian H, aims at penalising large updates of the control in a sweep that improves the control on all grid points where it is defined. In this way, the current values of the state and adjoint variables are approximately valid during the iteration sweep on all grid points and do not need to be updated during this process, but only after its completion. In this way, the SQH scheme provides an efficient and robust optimization procedure. The SQH method for the open-loop setting is schematically illustrated with the following algorithm. (47) 1. Choose > 0 , ≥ 0 , ̂> 1 , ∈ (0, 1) , ∈ (0, ∞) , v 0 , w 0 , compute f 0 by (17) and p 0 by (28) for for all v ∈ K V and ŵ ∈ K W and for all t ∈ [0, T] 3. Calculate f by (17) with v, w and (28) with v k+1 and w k+1 , set k ← k + 1 5. If 1 + 2 < : STOP and return v k and w k Else go to 2.
The well-posedness of this SQH scheme means that Step 2. to Step 4. are mathematically well defined and can always be performed. Specifically, the minimization in in Step 2. can always be performed, see [14,Lemma 4.1]. Further, if a control is attained that is PMP optimal, then the algorithm will stop, see [14,Lemma 4.3]. On the other hand, in Step 4., if no sufficient decrease of the value of J is attained, a larger value of can always be found in finitely many steps such that we obtain an update of the control that satisfies the decrease condition given with ; see [12, Lemma 50] and [14, (16)].
We remark that the convergence analysis of the SQH method for our FP control problems can be done analogously as in [13,14]. However, for this purpose, we need to prove bounds for ‖f ‖ L ∞ (0,T;H 1 0 (Ω)) and ‖p‖ L ∞ (0,T;H 1 0 (Ω)) , which are not discussed in this paper.
The SQH scheme given above is applied to solve both FP optimal control problem (30).
In the case of the closed-loop control (37), we consider a variant of the SQH scheme that consistently implements the equivalence of the FP control problem with the HJB formulation assuming that the PDF is everywhere positive. This variant is obtained according to (40) by introducing the following augmented Hamiltonian This choice of augmented Hamiltonian results naturally from the positivity of the PDF and by considering the expected value of the quadratic penalization.
Therefore we implement the following iterative scheme

3
The Pontryagin maximum principle for solving Fokker-Planck… Algorithm 2 SQH-DH method 1. Choose > 0 , ≥ 0 , ̂> 1 , ∈ (0, 1) , ∈ (0, ∞) , u 0 , compute f 0 by (17) and p 0 by (28) for u ← u 0 , set k ← 0 2. Find u ∈ K U such that K x, t, u, u k , ∇p k ≤K x, t,û, u k , ∇p k for all û ∈ K U and for all t ∈ [0, T] 3. Calculate f by (17) with u and ∶= ‖u − u k ‖ 2 Notice that in the SQH-DH scheme the calculation of f is only required to evaluate the cost functional. Furthermore, at convergence the solution of the adjoint problem corresponds to solving the HJB equation as discussed in [39]. However, the gradual update of the control thanks to the quadratic penalization makes the SQH-DH approach more robust and does not suffer of instabilities for large timestep sizes.
We see that, in both SQH optimization schemes, Step 2. requires the pointwise evaluation of minimization problems for the augmented Hamiltonian in small dimensional (in our case 4, resp. 2, counting all components) compact sets. These problems can be solved by many methods considering a discretization of these sets. In particular, we refer to derivative-free optimization methods [19,26]. See [13] for an application of the secant method in our context. However, one great advantage of the SQH approach is that a case study of minimization of the augmented Hamiltonian can be performed beforehand by simple analytical tools, which delivers a choice of few possible minimizers depending on the input of the problem. In this case, one is required to evaluate K (or K ) on this points and choose the actual minimizer.
In our experience, this approach has large applicability as demonstrated in [13] where different cost functionals and PDE models are considered. In the following, we illustrate this fact for our specific setting. We focus on the FP optimal control problem (30), and choose where A is a continuous function that we specify in the numerical experiments section. We assume that the following (generalised) L 1 control costs where s 1 = 3 5 , s 2 = 3 10 . The admissible values of the controls are given by the intervals where v min = −2 , v max = 2 , w min = −1 and w max = 1.
We have In the following section, we validate and compare our SQH schemes, where in Step 2. we take advantage of the pre-determined minimizers.

Numerical experiments
In this section, we report results of numerical experiments that validate our FP optimization framework and the ability of the resulting controls to drive the related stochastic processes in order to perform given tasks. Concerning the first goal, we would like to demonstrate that our optimization procedure is able to provide a solution that satisfies the PMP optimality conditions discussed in the previous sections. For this purpose, we define a measure of PMP optimality of the numerical solution of the open-loop setting as follows assuming that f, v, w and p represent the output of Algorithm 1 applied to our FP problem (30). Similarly, we define a measure of PMP optimality for the closed-loop problem (37).
We also report values of the variable N l % , l ∈ ℕ that give the percentage of grid points where 0 ≤ △H(t) ≤ 10 −l is fulfilled.
Our computational setting for both control problems is as follows. We choose Ω = (−2, 2) × (−2, 2) , and consider a uniform space and time discretisation with N x = 40 and N t = 80 and T = 2.
The initial condition for the FP problem is given by the following normalized Gaussian distribution where r = 0.3 and x 0 = (−1, 0) , and | ⋅ | denotes the Euclidean norm in ℝ 2 . In the FP equation, we choose a diffusion coefficient D = 2 2 = 10 −2 . In our cost functional for the open-loop problem, we specify the function .
Further, we have g s given in (48) with the parameters = = 10 −4 , and s 1 = s 2 = 0 . The same function A and the same values = = 10 −4 are chosen for our closedloop problem. In both cases, the terminal function F is taken as follows The choice of A and F is motivated by the concept of ensemble control proposed in [15,16] and analysed in [8]. To illustrate this setting, suppose that the purpose of the control is that the trajectories of our stochastic models are required to track the desired trajectory x d (t) , in the sense that minimizing this term corresponds to having all trajectories of the ensemble being close to x d . For this purpose, the function A(x, t) represents an attracting potential that monotonically increases as a function of the distance |x − x d (t)| . Therefore, by minimization, we have that f results mainly concentrated on the minimum of A corresponding to The main purpose of our experiments is to validate the resulting optimal control problems at the level of the stochastic dynamics. Therefore, once the controls are computed, we use them in Monte Carlo simulation to verify the ability of these controls to perform the given tasks. Moreover, by taking different initial conditions in the simulation of the controlled SDEs, we can test whether the controls have the closed-loop ability to drive the system to perform the given tasks from any initial configuration. In the following, we report results of numerical experiments with the setting above and for both control problems.
In Fig. 1a,   functional along the SQH iterations. Notice that the functional is monotonically decreasing. The numerical PMP test gives N 1 % = 100% , N 2 % = 100% , N 3 % = 100% , and N 4 % = 16% . These results indicate that the solution obtained with the SQH method is PMP optimal in the sense of (34) with a tolerance of 10 −4 .
In order to allow a more direct comparison of the controls obtained with the two settings, in Fig. 3a, we plot again the closed-loop optimal control functions u = (u 1 , u 2 ) in Ω at t = 1 and t = 1.5 , and compare it with Fig. 3b where we depict (v + w•x) in Ω at t = 1 and t = 1.5 . We see that, although some similarities can be recognized, the two controls differ substantially. Notice that with these controls and the setting above, the total probability at final time differs from the initial one by less than 10 −4 .

3
The Pontryagin maximum principle for solving Fokker-Planck… Next, we show that the two controls perform similarly well when the initial conditions for the two stochastic models coincide with x 0 where f 0 is centred. For this purpose, in Fig. 4 we plot the evolution of [X(t)] for the two stochastic processes in [0, T]. However, in the same figure, a more detailed comparison is given by plotting a few (10) stochastic trajectories of the two models. We see that the distribution of these trajectories confirm the plots of the mean [X(t)] . On the other hand, we notice that the closed-loop control is more effective in attaining the tracking objective. Now, we validate the ability of the computed controls to provide a feedback law. In the case of our closed-loop control this feature is expected by construction as discussed in this paper. On the other hand, we would like to investigate the claim in [15,16] that our open-loop control provides a valuable approximation to a feedback control mechanism. In fact, the results in the Fig. 3a, b do in part support this claim. However, we perform a more strict validation by using these controls as drifts of our stochastic models and choosing an initial condition X 0 that is far away from x 0 . The resulting trajectories are plotted in Fig. 5 and compared with the previous ones obtained with X 0 = x 0 . We see that the closed-loop control is able to drive the SDE to follow the desired trajectory and attain the target configuration. On the other hand, the open-loop control mechanism appears not able to perform these tasks properly. Results of additional experiments confirm this conclusion.

Conclusion
A theoretical and computational framework to investigate open-and closed-loop control strategies for stochastic models was presented. This framework is based on the Pontryagin maximum principle (PMP) applied to optimal control problems governed by the Fokker-Planck (FP) equation, which governs the evolution of the probability density function of these models. In this work, existence and PMP characterisation of optimal controls for the FP control problems were discussed. Further, PMP-based numerical optimization schemes were implemented to solve these problems. Results of experiments were presented that successfully validated the effectiveness of the PMP FP optimization framework and the ability of the resulting controls to drive the stochastic models to perform a given task.