PDE-Constrained Optimization Models and Pseudospectral Methods for Multiscale Particle Dynamics

We derive novel algorithms for optimization problems constrained by partial differential equations describing multiscale particle dynamics, including non-local integral terms representing interactions between particles. In particular, we investigate problems where the control acts as an advection 'flow' vector or a source term of the partial differential equation, and the constraint is equipped with boundary conditions of Dirichlet or no-flux type. After deriving continuous first-order optimality conditions for such problems, we solve the resulting systems by developing a link with computational methods for statistical mechanics, deriving pseudospectral methods in both space and time variables, and utilizing variants of existing fixed point methods. Numerical experiments indicate the effectiveness of our approach for a range of problem set-ups, boundary conditions, as well as regularization and model parameters.

1. Introduction. In this work we describe a novel approach for tackling partial differential equation (PDE)-constrained optimization problems for systems in which the underlying dynamics are described by multiscale, interacting particle systems. Our methods are widely applicable to the optimization of many systems described by non-local, non-linear PDEs, some cases of which have recently received significant attention in the literature [1,3,4,6,13,15,26].
In particular, we aim to provide a link between such optimization problems and state-of-the-art methods in statistical mechanics (known as Dynamic Density Functional Theory, or DDFT) [23,30,52,56,67,75,81,82], before devising numerical methods for such problems using a pseudospectral method in space and time, allowing highly efficient and accurate solution of both the forward and optimization problems [14,36,60,76]. Having derived first-order optimality conditions using the formal Lagrange method, we modify existing 'sweeping', or fixed-point, algorithms [4,17] to reliably solve such systems, and apply a recently developed Newton-Krylov method [37,38] to tackle nonlinear optimization problems to higher order. The combination of these approaches has not been applied to particle dynamics problems to our knowledge, and enables the accurate resolution of complex optimization problems. We demonstrate how to efficiently implement such methods for problems with both Dirichlet and Robin (no-flux) boundary conditions, provide a number of exact and validation test cases, and accompany the paper with an open-source software implementation [2], based on 2DChebClass [36,60].
Our use of pseudospectral methods has three main advantages over existing implementations: (i) due to a novel implementation of spatial convolutions, we are not restricted to periodic domains or the use of Fourier grids and can also tackle convolutions with bounded support; (ii) for problems of the types studied here, in particular when the solutions are expected to be smooth and we require accurate solutions, pseudospectral methods provide significant computational gains over finite difference or finite element methods; (iii) using pseudospectral interpolation in time allows us to move beyond fixed timestepping methods, employing more accurate and efficient ordinary differential equation (ODE) and differential-algebraic equation (DAE) solvers, and a spectral-in-time Newton-Krylov method.
This paper is structured as follows. In Section 2 we provide relevant background to multiscale particle dynamics, pseudospectral methods, and PDE-constrained optimization. In Section 3 we give the associated first-order optimality conditions, followed by a description of the numerical methods in Section 4. The results of our numerical experiments are reported in Section 5, followed by some concluding remarks in Section 6.
2. Background. In this section we detail the necessary background required for the development of our algorithms. In Section 2.1 we describe relevant material on multiscale particle dynamics, in Section 2.2 we outline pseudospectral methods, in Section 2.3 we state the PDE-constrained optimization problems of particle dynamics problems that we will consider, and in Section 2.4 we survey concerned with systems in which the particles interact, e.g., through electrostatic forces, volume exclusion, or exchange of information. Typical DDFTs can be thought of as generalized diffusion equations of the form Here F is the Helmholtz free energy of the system. For the non-interacting case, at equilibrium, it is from which it follows that ∇ δF id [ρ] δρ = ∇ρ ρ , resulting in the diffusion equation. For more general systems, the exact free energy is unknown (except in the special case of hard rods in one dimension [74]). As such, much effort has been devoted to determine accurate approximations of the free energy for a wide range of systems, but particular focus is given to hard spheres [68] and particles with soft interactions [39]; these cases may be combined in a perturbative manner [31]. Here we will focus on a relatively simple DDFT, which closes the equation for ρ by considering that the particles are, on average, uncorrelated. For particles which interact through an even pairwise potential V 2 , in an external potential field V 1 , the (approximate) free energy is modelled by This is known as the mean-field approximation, which has been shown to be surprisingly accurate for a range of systems [10], and is known to be exact in the limit of dense systems of particles with soft interactions [58]. We note that this should be considered as the first stepping stone on a path to treating PDE-constrained optimal control systems for general DDFTs. Such systems are highly challenging, not only due to the non-local, non-linear nature of the PDEs, but also due to the complexity of the free energy functionals. For example, Fundamental Measure Theory, which describes the interactions of systems of hard particles, requires the computation of weighted densities through convolution integrals, followed by a further integral of a complicated function of these weighted densities [68]. As such, these challenges are postponed to future work. A final challenge we will address here is the implementation of (spatial) boundary conditions. Most physical systems are constrained in some way, often in a 'box' with impassable walls, such that the number of particles is conserved. For DDFTs, the corresponding boundary condition is j · n = 0 on the boundary, where j is the flux, as in (2.1), and n is the unit normal to the boundary. Whilst this is a standard Neumann boundary condition, we note that the difficulty lies in the form of j; for interacting problems, j is non-local and, as such, so is the corresponding boundary condition. This results in an equation which is challenging to solve numerically; see Section 4.

Pseudospectral methods.
There are a number of standard methods for solving DDFTlike problems. The two most common are the finite element method (FEM) and pseudospectral methods. Here we focus on the latter, but note that the algorithm presented below (see Section 4) is general and may be easily adapted to other numerical methods. The main challenge in using FEM for DDFT problems lies in their non-locality. Heuristically, the principal benefits of FEM are that it (i) produces large, but sparse matrices, leading to systems which may be efficiently solved, for example through the implementation of standard timestepping schemes and carefully-chosen preconditioners (see e.g., [57,61,62,66,71,84] for PDE-constrained optimization problems); and (ii) may be applied to complex domains through standard triangulation/meshing routines. In contrast, for non-local problems such as DDFT the corresponding matrices are not only large, but also dense. This prevents the use of standard numerical schemes and significantly increases the computational cost.
Recently, accurate and efficient pseudospectral methods have been developed to tackle these non-local, non-linear DDFTs [60]. Some details of the implementation will be discussed in Section 4; here we highlight the benefits and challenges. As is widely known [14,76], pseudospectral methods are extremely accurate for problems with smooth solutions on 'nice' domains; here 'nice' roughly corresponds to domains which may be mapped to the unit square in a simple (e.g., conformal) manner. They are more challenging to apply on complex domains (although spectral elements can be seen as a compromise between FEM and pseudospectral methods [14]), and are also of poor accuracy when the solutions are not smooth (the accuracy is order (1/N ) p when the solution has p sufficiently nice derivatives [76], but still at the cost of dense matrices).
Their use to treat DDFT problems stems from three main observations: (i) at least in principle, the diffusion term present in all DDFTs should lead to smoothing of solutions for sufficiently smooth particle interactions; (ii) the pseudospectral matrices are always dense and, as such, treating nonlocal terms does not formally affect the numerical cost; (iii) the implementation of non-local boundary conditions may be treated via standard algebraic-differential equations solvers, thus removing the need for bespoke treatments of different boundary conditions.
2.3. PDE-constrained optimization. In this section we introduce the two main PDE-constrained optimization problem structures that we consider within a multiscale particle dynamics setting. A significant additional complication compared to a standard PDE-constrained optimization problem is the addition of an integral, interaction term. In the following, the terms 'flow control' and 'source control' refer to the application of the control in the PDE constraint either non-linearly, as a vector field within an advection operator, or linearly, as a scalar source term in the PDE.
2.3.1. Flow control problem. We commence with the following problem involving minimizing a cost functional containing a sum of L 2 -norm terms within the entire space-time interval Ω × (0, T ), constrained by a non-linear time-dependent advection-diffusion equation with additional non-local integral term. The control is applied non-linearly in the form of a vector 'flow' term: Here, Ω ⊂ R d , d ∈ {1, 2, 3}, is some given domain with boundary ∂Ω, and T is a prescribed 'final time' up to which the process is modelled. The scalar function ρ and the vector-valued function w are the state and control variables, respectively, β > 0 is a given regularization parameter, and ρ( x, t), V ext ( x, t), f ( x, t), ρ 0 ( x) are prescribed functions corresponding to the desired state, external potential, PDE source term, and initial condition, respectively. We highlight that frequently f ( x, t) = 0, which results in conservation of mass; one reason we allow the case f ( x, t) = 0 is to enable us to more readily construct analytic test problems for (2.2). Additionally, the non-local integral term models interactions between individual particles, where K denotes some vector function. We are particularly interested in the case where K is odd, i.e., K( r, r ) = − K( r , r ); this is the case when K( r, r ) = ∇ r V 2 ( r − r ) with V 2 ( x) = V 2 ( x ) an even potential. However, for now we present the results for a general K. For V 2 ( x ) decreasing as x → ∞, the integral term models repulsive (attractive) interactions when κ is positive (negative). Of course, much more general choices of V 2 are possible. The parameter κ models the particle interaction strength. If κ is set to zero, the model reduces to a standard non-linear advection-diffusion equation control problem. We consider two types of boundary conditions imposed on ρ, specifically the Dirichlet condition: for a given constant c ∈ R, and the 'no-flux type' condition: N (ρ, w) + I(ρ) · n = 0 on ∂Ω × (0, T ). (2.4) Here, with ∂ ∂n denoting the derivative with respect to the normal n. The latter is a no-flux boundary condition in the classical sense if f ( x, t) = 0.

Source control problem.
We also consider the following problem, with an analogous cost functional to the flow control problem, but now with a scalar function for the control variable, which is applied linearly in the form of a PDE source term. This is again minimized subject to a non-linear time-dependent advection-diffusion equation with an additional integral term: This is posed along with the Dirichlet boundary condition (2.3), or the 'no-flux type' condition: We highlight that this paper is focused on fast and effective numerical methods for solving problems of the form (2.2) and (2.5), as opposed to theoretical questions such as existence, uniqueness, and regularity. We refer to [1,4,13,15] for discussion of the first two questions for optimization problems of similar structure, such as those arising from mean-field optimal control. For PDEconstrained optimization problems of the structure examined here, it is typical to demand at least H 1 regularity in space for the state variable, with at least L 2 for the control variable [77]. We note that if 'only' this degree of regularity is to be expected, for example if functions such as ρ, f , V ext , and K themselves have low regularity, the pseudospectral methods examined in this work may not perform substantially better than a finite difference method or a FEM, for instance. However, a key feature of the pseudospectral discretization is that it will exploit whatever regularity does exist within the solution, ensuring far superior convergence compared to alternative methods if the solution has a higher degree of regularity. The particle interaction term within the PDE constraints generally introduces dense matrices under discretization, removing one typical advantage of FEMs or finite difference methods over pseudospectral methods, specifically the presence of sparse matrices for local differential operators. Additionally, due to the potentially poor scaling of optimization methods in the number of grid points, it is highly beneficial to use a method which requires fewer discretized points in space and time, motivating the novel methodology presented in the forthcoming sections. We highlight that the software accompanying this work [2] is designed such that the pseudospectral discretization may readily be replaced with matrices arising from alternative spatial discretizations; we base the work on the pseudospectral method specifically as it is a relatively unexplored class of techniques for PDE-constrained optimization problems, including those arising from particle dynamics, which possesses the significant advantages outlined above.
2.4. Mean-field optimal control. Mean-field games were first introduced by Lasry and Lions [45,46,47,48], and independently by Huang, Caines, and Malhamé [54] under the name Nash certainty equivalence, and have been widely studied since then. The main challenge over typical PDE-constrained optimization problems arises from the additional non-linear, non-local interaction term. Therefore, standard results in optimal control theory cannot readily be applied, and new approaches have to be developed to address theoretical and numerical challenges.
The most commonly studied controls are through the flow, e.g., [4]; interaction term, e.g., [34]; or external agents, e.g., [18]. A common assumption is that the particle distribution has compact support [18,19,33], which eliminates the need for boundary conditions. No-flux boundary conditions, which are a principal focus of our work, have been considered in limited settings [4,21].
The two main avenues of research focus on Vlasov-type PDEs arising from the mean-field limit of Cucker-Smale-like [28,29] models of flocking, and Fokker-Planck equations from the same limit of Langevin dynamics. For the former, Fornasier et al. provided theoretical results on the convergence of the microscopic sparse optimal control problem to a corresponding macroscopic problem, using methods of optimal transport and a Γ-limit argument, proving existence of optimal controls in the mean-field setting, see [33,34,35]. Additional work on sparse control strategies can be found in [63], as well as in the review paper [32]. In [19], convergence results are proved for systems in which the control is applied through interacting, external agents. For the Fokker-Planck case, analytical research has focused on the derivation of first-order optimality conditions [4], existence and regularity of optimal controls [22], and convergence of the microscopic optimal control problem to the mean-field limit [21,64].
In terms of numerical implementations, Strang splitting schemes [24,72] are commonly used, in particular for control strategies which employ external agents [18,20,64], in which the numerical results are used to verify convergence in the mean-field limit. In [7], different selective control strategies were considered, and an iterative numerical method was chosen, where the interaction term is approximated stochastically. Other approaches involve combining a Chang-Cooper scheme for the forward equation, finite differences for the adjoint equation, and Monte-Carlo integration [4] to solve the PDEs. The optimization step was performed with a sweeping algorithm, with updates through the gradient equation, which is similar to the gradient descent method in [17]. Other related numerical work applies to porous media Fokker-Planck equations [21], as well as the determination of steady state solutions [5,8].
As described in Section 4, one of our recommended approaches is an optimization scheme that is inspired by existing sweeping algorithms [4,17], but with a novel coupling to pseudospectral methods used to discretize the space and time domains. This composition of methods offers an efficient and accurate solver for a wide class of problems. To our knowledge, it is the first time that pseudospectral methods have been applied to non-local optimal control problems of this form.
3. First-Order Optimality Conditions for Particle Dynamics Models. In this section we derive the system of PDEs that we need to solve in order to tackle the models (2.2) and (2.5). In order to obtain first-order optimality conditions for (2.2) and (2.5), we apply an optimize-then-discretize method, meaning we derive appropriate conditions on the continuous level and then consider suitable discretization strategies. The alternative to this approach is the discretize-then-optimize method, however we select the former in order to obtain numerical solutions that better reflect the solutions to the continuous first-order optimality conditions. We highlight that an area of active interest in the PDE-constrained optimization community is to construct discretization schemes such that the two approaches coincide (see [25] for a fundamental example of a problem for which different results are obtained using the two methods). Below we briefly describe how the first-order optimality conditions are formed using the formal Lagrange method, for both flow control and source control problems with different boundary conditions, and refer to [77], for instance, for a rigorous justification of how such conditions are formed.
3.1. Flow control with Dirichlet boundary condition. We first consider the advectiondiffusion constrained optimization problem (2.2) with the Dirichlet boundary condition (2.3). This leads to the continuous Lagrangian: where q 1 and q 2 correspond to the portions of the adjoint variable q arising in the interior of the spatial domain Ω and its boundary ∂Ω, respectively. To obtain first-order optimality conditions, we first follow the formal Lagrange method for deriving the adjoint equation for time-dependent PDE-constrained optimization, see [77,Chapter 3] for instance. We obtain that the Fréchet derivative of L in the direction of ρ must satisfy D ρ L(ρ,w, q 1 , q 2 )ρ = 0 for all appropriate functions ρ. Integrating the relevant terms of (3.1) by parts and applying Green's formula, any sufficiently smooth ρ such that ρ( x, 0) = 0 satisfies Noting first that (3.2) must hold for all ρ ∈ C ∞ 0 (Ω × (0, T )) (i.e., where ρ( x, T ), ρ( x, 0) vanish on Ω, and ρ, ∂ρ ∂n vanish on ∂Ω), and observing that C ∞ 0 (Ω × (0, T )) is dense on L 2 (Ω × (0, T )), we obtain the adjoint PDE: Removing the restriction that ρ( x, T ) vanishes on Ω, and arguing similarly, leads to the adjoint boundary condition q 1 ( x, T ) = 0. From here, we may similarly remove the condition that ∂ρ ∂n vanishes on ∂Ω to conclude that q 1 = 0 on ∂Ω × (0, T ). Setting the final integral term in (3.2) to zero then gives the relation between q 1 and q 2 . Putting all the pieces together, and relabelling q 1 as q, we obtain the complete adjoint problem: Searching for the stationary point upon differentiation with respect to each component of w, using similar working as above, gives: whereupon considering the derivatives with respect to the all entries of w, and applying Green's formula, leads to the gradient equation: To summarize, the complete first-order optimality system for the problem (2.2) with the Dirichlet boundary condition ρ = c includes the PDE constraint itself (often referred to as the state equation), the adjoint problem (3.3), and the gradient equation (3.4).
Note that the adjoint terms arising from the particle interactions agrees with the representation of the interaction term in [4], where κ K( r, r ) = P ( r, r )( r − r). For the special case when K( r, r ) = ∇ r V 2 ( r − r ), we have that K is an odd function in the sense that K( r, r ) = − K( r , r) and 3.2. Flow control with no-flux type boundary condition. To provide an illustration of how the same working may be applied to problem (2.2) with the no-flux boundary condition (2.4), we briefly consider the Lagrangian given by: Solving D ρ L(ρ,w, q 1 , q 2 )ρ = 0 for all ρ such that ρ( x, 0) = 0 gives that: Applying the same reasoning as above then leads to the adjoint problem: along with the state equation as in (2.2), and the gradient equation (3.4).
3.3. Source control with Dirichlet boundary condition. We next consider the problem (2.5) with the Dirichlet boundary condition (2.3). This leads to the continuous Lagrangian: Solving D ρ L(ρ,w, q 1 , q 2 )ρ = 0 for all ρ such that ρ( x, 0) = 0 gives that: with further boundary terms which are eliminated through relating q 1 and q 2 . This then leads to the adjoint problem: Searching for the stationary point upon differentiation with respect to w, using similar working as above, gives: leading to the gradient equation: To summarize, the complete first-order optimality system for the problem (2.5), with the Dirichlet boundary condition ρ = c, includes the PDE constraint itself, the adjoint problem (3.5), and the gradient equation (3.6).
3.4. Source control with no-flux type boundary condition. Applying the same working to problem (2.5) with no-flux boundary condition (2.6), the Lagrangian is given by: Applying the same reasoning as above then leads to the adjoint problem: along with the state equation as in (2.5), and the gradient equation (3.6).
4. Numerical Method for the Optimization Model. In this section we describe the structure of our algorithm for the PDE-constrained optimization models under consideration. After describing a pseudospectral method for the PDE constraints (the forward problem), and the adjoint equations, we outline the optimization solvers to be applied numerically, and detail the measures of accuracy that we will employ in our numerical tests. We emphasize that the structure of our algorithm is independent of the choice of solvers in each step, for example, the pseudospectral method in space may be replaced by finite differences or finite elements for problems with non-smooth solutions. To highlight this we will describe two different choices of solver for the optimization stage. Additionally, through the combination of 2DChebClass [36] and a fixed-point or (spectral-in-time) Newton-Krylov solver, one may enforce essentially arbitrary boundary conditions, such as non-local Robin type, with no additional cost to the user. This contrasts with traditional 'boundary bordering' approaches [14], for which significant analytical work is often required to derive the correct matrices to impose the boundary conditions (see below). These properties make the approach highly versatile.

4.1.
Pseudospectral method for the forward problem. As described in Section 2.2, we solve the forward problem using Chebyshev pseudospectral methods, in particular implemented in matlab using 2DChebClass [36,60]. The principal novelties of the method concern the computation of convolution integrals and the implementation of spatial boundary conditions; the boundary conditions in time will be discussed in the following section. This makes the method particularly well-suited to problems on finite, non-periodic domains in which the interaction term involves a convolution on a region with finite support. Such applications arise in diverse fields such as hard-sphere DDFT using Fundamental Measure Theory [60,68,75], and opinion dynamics [51].
As described in [60], the convolution integrals are computed in real space, in contrast to many implementations in which they are computed via Fourier transforms. The principal advantage of Fourier methods is that they are computationally cheap, requiring only fast Fourier transforms and multiplication of functions. The main disadvantage is that for finite, non-periodic domains, one needs to pad the domain, which both increases computational cost for no accuracy gain and introduces difficulties when applying boundary conditions. Convolution integrals, including those with bounded support, can be implemented by a single matrix-vector multiplication in the spatial method, with the matrix precomputed for all time steps. Use of the physical domain allows efficient implementation of the boundary conditions.
As is standard, after discretization, in this case through the use of (mapped) Chebyshev pseudospectral points, the forward PDE(s) are converted into a system of ODEs. For example, the diffusion equation becomes where ρ is a vector of values of the solution at each of the Chebyshev points, and D 2 is the Chebyshev second-order differentiation matrix. In the interior of the domain, this can be solved using standard time-stepping solvers for ODEs. The challenge lies in imposing the correct spatial boundary conditions. One standard approach is to modify the matrix on the right hand side of (4.1) so that the boundary conditions are automatically satisfied. This is known as 'boundary-bordering' [14]. For simple boundary conditions, such as homogenenous Dirichlet or (local) Neumann, such an approach is relatively straightforward. For example, for homogeneous Dirichlet conditions, assuming that the initial conditions satisfy the boundary conditions, it is sufficient to set the rows and columns of D 2 that correspond to points on the boundary of the domain to zero. For homogeneous Neumann, there is a similar approach (see [76]), which becomes more involved with more complex right-hand sides of the PDE. Another approach is to restrict the computation to interpolants (solutions) which satisfy the boundary conditions; we do not discuss this here as it is highly non-trivial for the non-linear, non-local problems that we are interested in.
Here we take a more general approach. The imposition of spatial boundary conditions can be seen as extending the discretized system of ODEs to a system of differential-algebraic equations, where the discretized PDE is solved on the interior of the domain, and the boundary conditions correspond to algebraic equations. There are various numerical methods for solving such differentialalgebraic equations, see e.g., [70] for a Runge-Kutta scheme with algebraic constraints, or [38] for a Newton-Krylov scheme which allows the inclusion of algebraic constraints alongside the PDE. The main advantage here is that the numerical method does not have to be explicitly adapted when one changes the boundary conditions; one simply has to specify different algebraic constraints that correspond to the boundary conditions. In fact, the 2DChebClass code automatically identifies the boundary of various geometries, allowing a simple implementation of this approach. One possible approach is to apply a backward Euler method for the time derivative in the state equation, with the adjoint operator applied to the adjoint equation, whereupon a huge-scale coupled system of equations is obtained from matrices arising at each time-step. These may be tackled using a preconditioned iterative method, following methodology in e.g., [61,62,71], but note that the systems considered were sparse whereas our systems are dense. As above, as well as boundary conditions in time, there are also boundary conditions in space. In contrast, in order to utilize our efficient and accurate forward solver, for our fixed-point approach we reverse time in the adjoint problem, resulting in a set of well-posed equations with initial conditions. For this approach, the forward and adjoint equations are coupled non-locally in time; the adjoint equation requires the value of the state variable at later times, so the two equations cannot be solved simultaneously. By contrast, the Newton-Krylov approach allows us to tackle state and adjoint equations simultaneously.

4.3.
Optimization solver. Now that we have presented an accurate and efficient numerical scheme for the solution of a set of PDEs which includes the forward and adjoint equations, the remaining challenges are to: (i) determine a suitable time discretization for the optimality system; (ii) choose a suitable optimization scheme. For (i), we again choose a Chebyshev pseudospectral scheme (1D in time), which, assuming that the solutions are smooth in time, leads to exponentially accurate interpolation; it is also the foundation of the spectral-in-time Newton-Krylov scheme presented in [38]. For (ii), we note that the choice of optimization solver depends strongly on the nature of the solution, and the amount of information available. We consider: (a) a general fixed-point or sweeping method [4,17], with an adaptive line search framework to determine a mixing rate [55], to solve the system of equations iteratively, which does not require the analytic computation of the Jacobian, and is also applicable to problems with box constraints as well as other systems for which the regularity of the solution is not sufficient to be exploited by the spectral-in-time nature of the Newton-Krylov approach; (b) a higher-order, more efficient Newton-Krylov scheme, which does require the computation of the Jacobian, and could potentially be more challenging to adapt to more general problems. For the fixed-point method, after applying the pseudospectral discretization, we require the solution of a system of algebraic-differential equations. As in Section 4.1, these can be solved using a standard DAE solver. In this paper, the matlab inbuilt ODE solver ode15s is used. However, our approach is highly modular and it is straightforward to replace our chosen solvers with any other optimization routine,including space or time discretization of the Newton-Krylov approach.
In the following, we denote the discretized versions of the variables ρ, q, and w by P , Q, and W , respectively. Each of these matrices is of the form A = [a 0 , a 1 , ..., a n ], where the vectors a k represent the solutions at the discretized times k ∈ {0, 1, ..., n}, where n is the number of time steps. In particular, the first column of P , denoted by ρ 0 , corresponds to the initial condition ρ( x, 0). If the spatial domain is one-dimensional, P , Q, and W are of size N × (n + 1), where N is the number of spatial points. In the two-dimensional case, P and Q are of size (N 1 N 2 ) × (n + 1), where N j is the number of spatial points in the direction of x j . The discretized control W for linear (source) control problems is also (N 1 N 2 ) × (n + 1) dimensional, while it is (2N 1 N 2 ) × (n + 1) dimensional for non-linear (flow) control problems.
4.4. Fixed-point, sweeping method. We first present a first-order fixed-point method, based on [4,17], modified to include a mixing rate which is standard in density functional theory problems of the type considered here [68]. We emphasize that this method is included for its simplicity and generality; we have also implemented a higher order Newton-Krylov method -see Section 4.5. The optimization algorithm is initialized with a guess for the control, W (0) . Then, in each iteration, denoted by i, the following steps are computed: 1. Starting with a guess for the control W (i) as input variable, the corresponding state P (i) is found by solving the state equation. g ; see Section 4.6. If E is smaller than a set tolerance, the algorithm terminates, otherwise we proceed to Step 5. 5. We update W (i+1) as a linear combination of the current guess W (i) , and the value obtained in step 3, W (i) g , employing a mixing rate λ ∈ [0, 1]: Typical values of λ, which provide stable convergence in the cases we study here, lie between 0.001 and 0.1. Note that, while the solutions P (i) and Q (i) change in each iteration, the initial condition ρ 0 and final time condition q n remain unchanged throughout the process; the updates are induced by changing W (i) . It is also feasible to vary the mixing rate λ, based on viewing W (i) as approximate solutions of a fixed-point problem: an unchanged numerical solution for the control variable w at successive iterates indicates that a solution of the PDE-constrained optimization problem has been found. Work in [55] proposes an adaptive line search framework that can determine λ which satisfies an Armijo-type condition and hence converge faster to a fixed-point of a system compared to using a constant mixing rate. Based on this, [41] uses a potential function E (i) defined as follows based on an iterative scheme: For the fixed-point scheme (4.2), we have that d (i) := W (i) g −W (i) . In this notation, W (i) (λ) coincides with that of the subsequent fixed-point iterate, W (i+1) . As in [41], we consider an adaptive mixing rate which at each iteration satisfies the minimization problem of E (i) over [0, 1]: Since the solution of (4.3) cannot be found easily in general, we may seek an approximate minimum which satisfies the Wolfe-type (or Armijo-Wolfe-type) conditions based on those of [79,80]. Specifically, for some δ, σ ∈ (0, 1) with δ < σ, we wish that It is possible that with only the Armijo-type condition (4.4) satisfied, the fixed-point algorithm would not achieve reliable convergence. Hence the condition (4.5), based on the curvature condition discussed in [59, Section 3.1], is used additionally to ensure that λ (i) is not too small and hence unacceptably short steps are ruled out. Note that for the iterative scheme being applied: Based on discussion in [50], we select δ = 0.3 and σ = 0.5 for our tests. The selection of mixing rate λ at each fixed-point iteration is then based on Algorithm 1, whilst also ensuring that λ ≥ 0.01. We set λ 0 = 0.2. We note that it would also be possible to test the classical (Armijo-Wolfe) conditions using the value of J (ρ, w) (see [79,80] and [59,Chapter 3]), but the fixed-point approach described here is computationally cheaper and is found to be effective for our problems.

Newton-Krylov method.
In addition to the first-order fixed-point method, we also wish to consider a higher-order, Newton-type method, with the aim of achieving satisfactory convergence in many fewer iterations than the fixed-point method. The usual disadvantage of such a method is that one typically needs to solve a number of very large linear systems of equations, unless we design a highly efficient discretization procedure. Further, the linear systems are certainly dense for the particle dynamics problems considered, due to the integral particle interaction terms in the problem.
To circumvent this key difficulty, and exploit the faster convergence achieved by higher-order optimization methods, we employ a recently devised Newton-Krylov method for PDE-constrained optimization problems [38] (see also [42,43] for more general descriptions of such methods), and tailor this to the problem at hand by efficiently describing the PDEs and the associated Jacobian on the discrete level, as well as solving the Newton system efficiently. We highlight that such a method has not previously been applied to PDE-constrained optimization problems which involve integral terms, or problems in which the control variable is applied non-linearly. We now briefly describe how the Newton-Krylov method may be applied to both flow control and source control problems. For both problems the state and adjoint equations may be described in the following general form (see [38]), by separating the spatial and temporal derivatives in each case: with the vector-valued functions u, v : [0, T ] → R N denoting the state and adjoint variables ρ and q evaluated at each Chebyshev point in the time variable, and u 0 corresponding to the initial condition ρ 0 ( x). The vector functions F and G arise from a method of lines discretization of the state and adjoint PDEs at each time-step, and correspond to the following spatial (derivative and linear) terms: F(t, u, v) ← ∇ 2 ρ + 1 β ∇ · (ρ 2 ∇q) + ∇ · (ρ∇V ext ) + ∇ r · I(ρ) + f for flow control, ∇ 2 ρ + ∇ · (ρ∇V ext ) + ∇ r · I(ρ) − 1 β q + f for source control, G(t, u, v) ← −∇ 2 q + 1 β ρ |∇q| 2 + ∇V ext · ∇q + I * (ρ, q) − ρ + ρ for flow control, −∇ 2 q + ∇V ext · ∇q + I * (ρ, q) − ρ + ρ for source control.
Note that the gradient equation (3.4) (for the flow control problem) or (3.6) (for the source control problem) has been substituted into the state and adjoint equations where applicable. Following the working in [38], we may then consider approximations u k , v k to u, v at the kth timestep t k , k ∈ {0, 1, ..., n}, and define Chebyshev interpolants u(t), v(t) based on these approximations. The residual functions: can then be approximated at each time-step, along with the exact imposition of initial/final-time conditions, to obtain the expressions: Here, r u,k and r v,k approximate r u (t k ) and r v (t k ), F k and G k denote the functions F and G evaluated at time t k , and Q = [q i,j ] i,j=1,...,n+1 is a (n + 1) × (n + 1) collocation matrix arising from cumulative integration. Based on this, we then wish to (approximately) solve R = 0, where the global residual function R : Applying Newton iteration for this problem leads to an iterative procedure of the form x (k+1) = x (k) − [J(x (k) )] −1 R(x (k) ), with J denoting the Jacobian matrix of the residual function R. Although Jacobian-free Newton-Krylov methods have been studied [43], we elect to form the blocks of the Jacobian matrix explicitly due to the availability of this information for the PDE systems under consideration, in order to achieve rapid convergence of the Newton scheme. This requires us to accurately form the functions F and G, as well as the derivatives of these functions in the directions u and v. In the code below, for the two-dimensional flow control problem with Dirichlet boundary conditions, these quantities are denoted JFu, JFv, JGu, JGv, and our software [2] allows us to compute these quantities to spectral accuracy: 1 % Definition of state and adjoint PDE operators 2 K1 = @(t,u,v) L + 2/bet * scalarOperator(dotVectors(grad * u,grad * v)) ...              In more detail, Dx1 and Dx2 are matrices applying spatial derivatives in each direction, with grad corresponding to the gradient function, and L the spectral discretization of the Laplacian operator. The function Conv applies a convolution integral with the function K, gradVextDotGrad applies an operator of the form ∇V ext · ∇, with LapVext evaluating ∇ 2 V ext to spectral accuracy. The function scalarOperator forms a scalar function, with dotVectors taking an inner product of two vectors, and dotVectorOperator similarly taking an inner product of the first argument with the second argument applied to a subsequent term. Finally, f and g describe the source term of the state equation f and the desired state ρ within the PDE operators.
For no-flux type boundary conditions the interior and boundary nodes need to be separated within the code, with the Jacobians defined separately for the boundary conditions. We refer to the open-source software [2] (which also makes use of [37]) for the full implementation with different boundary conditions, as well as for source control problems. By devising routines to compute all derivatives and integration terms to spectral accuracy for particle dynamics systems, we are able to achieve rapid Newton convergence for a range of problems. Having formed the appropriate terms of the Newton system at each iteration, these are solved inexactly using an inner Krylov method, specifically the Generalized Minimal Residual (GMRES) algorithm [69]. Column operations may be applied to J so that the leading 2N × 2N block of the Jacobian matrix is invertible, at which point the Kronecker-product based preconditioner described in [38, Section 2.3] may be applied.
We believe there are advantages to both the fixed-point and Newton-Krylov methods we have described above. For a range of problems the higher-order Newton-Krylov method is expected to yield more rapid convergence, due to the inclusion of Jacobian information, and the spectral-intime representation of the residual along with a pseudospectral discretization in space leads to an efficient solver. By contrast, the fixed-point method does not require the Jacobian matrix (or an approximation to it), and is likely to be applicable to more general problems such as those with additional algebraic constraints, which may have lower regularity and therefore may not be amenable to the spectral-in-time approximation. In Section 5 we carry out a number of experiments using both fixed-point and Newton-Krylov methods, to demonstrate and compare their effectiveness. 4.6. Measures of accuracy. All errors in Section 5 are calculated as a measure of the difference between a variable of interest, y, and a reference value y R , e.g., a previous value of W (i) , or an analytic solution to a test problem. The error measure E is composed of an L 2 error in space and an L ∞ error in time. We define absolute and relative L 2 spatial errors where the small additional term on the denominator prevents division by zero, which are used in the full error measure: The minimum between absolute and relative spatial error is taken to avoid choosing an erroneously large relative error, caused by division of one numerically very small term by another.  Table 5.1 Flow Control, No-Flux: Cost when w = 0 (Juc) and optimal control cost (Jc) for a range of κ and β.
We have benchmarked the fixed-point scheme against matlab's inbuilt fsolve function. The latter uses the trust-region-dogleg algorithm, see [65], to solve the optimality system of interest. While it is very robust, it is also much slower than the fixed-point method, which works reliably for the types of problems considered in this paper.

Numerical Experiments.
The optimal control problems (2.2) and (2.5) require inputs in terms of the desired state ρ, the PDE source term f , and the external potential V ext , alongside initial and final time conditions for ρ and q, respectively. Additionally, an initial guess for the control w is needed when using the fixed-point method. These are given for each of the examples below. We also require an interaction kernel, which here we fix as We note that quality of our results is robust with respect to the precise choice of the interaction kernel; the example here is chosen for illustration. Interest lies in how the solution to the optimization problems changes upon varying the interaction strength, κ. Here we consider three representative values: κ = 0 (no interaction), κ = −1 (attraction), and κ = 1 (repulsion). As a baseline for the cost, we solve the forward PDE using w = 0. We evaluate the associated cost functional J , the value of which is denoted by J uc . We then expect that applying the optimization method lowers the value of the cost functional, which we then aim to minimize by optimizing w, resulting in a cost J c . This cost depends on the value of the regularization parameter β and it is expected that the norm of the optimal control applied will increase with decreasing β. When an initial guess for the control is required, i.e., in the fixed-point method, we take w = 0, corresponding to the reference system. The Newton-Krylov solver requires an initial guess for the state and adjoint variables at all times. In the examples below, it suffices to choose the initial and final time conditions for the state and adjoint, respectively, as an initial guess at all time points.
In the following examples, the domain considered is Ω × (0, T ) = (−1, 1) d × (0, 1); our results are robust to changes in the domain. The numbers of spatial (Chebyshev) points are N 1 = N 2 = 20 for the two-dimensional examples, and N 1 = N 2 = N 3 = 20 for the three dimensional example. The number of time points is n = 11, unless stated otherwise. The absolute and relative tolerances of the ODE solver (matlab's ode15s [70]) for the forward problems are set to 10 −9 . The tolerance for the Newton-Krylov and fixed-point solvers are 10 −16 and 10 −4 , respectively. The mixing parameter λ for each iteration of the fixed-point method is determined using Algorithm 1, with δ = 0.3 and σ = 0.5.

Two-Dimensional Examples.
We now present four examples, applying no-flux and Dirichlet boundary conditions to both flow and source control problems. The precise initial condition, external potential, and target chosen in each case are given below. We recall that the two-body interaction is given by (5.1). For our first example (flow control with no-flux boundary conditions, Section 5.1.1), we show results using both Newton-Krylov and fixed-point solvers. However, since they produce very similar results, as further validated in Appendix A, for the remaining examples we restrict our results to the more efficient Newton-Krylov scheme.   where Z ≈ 1.3791 is a normalization constant.
In Table 5.1, the value of the cost functional for the initial configuration (J uc ), where w = 0, is compared with the optimized case (J c ) for different values of β and for each of the interaction strengths. As expected, in all cases J c ≤ J uc and the lowest values of J c occur for the smallest β values. For large values of β, applying control is heavily penalized and the optimal control approaches zero, which coincides with the uncontrolled case. The numerical solution takes between 200 and 500 seconds for the Newton-Krylov solver, and between 500 and 11500 seconds for the fixed-point solver.
The results (from the Newton-Krylov scheme) for β = 10 −3 and various interaction strengths, κ, are shown in Figures 5.1 and 5.2, which display the optimal states and controls, respectively. At earlier times, the density accumulates in regions with potential wells and the areas where the potential is large are avoided. It is clear that the control acts to drive the particle distribution towards the desired state. However, it does not act uniformly around the peak of the desired state, but rather acts strongly in the area between the location of the desired peak and the point (−1, 1). This is due to the external potential being large in this area, which requires more control to overcome. It is also evident that more control has to be applied to the repulsive particles, since the desired state requires the particles to accumulate in one part of the domain. In the attractive configuration, the effect of the attraction supports the control action, so less control is needed to reach the desired state.    Table 5.2 shows the difference E as defined in Section 4.6 between the solutions of the Newton-Krylov and fixed-point methods. Here we use N 1 = N 2 = 30 and n = 21 to ensure accurate solutions with the fixed-point method. As expected, the resulting plots of the density and control for the fixed-point method are very similar to those for the Newton-Krylov solver, and hence we do not show them here. Note that, when the non-local interaction term is turned off (κ = 0), the error is of order 10 −3 or better, which is reflective of the tolerance chosen for the fixed-point solver. Turning on the interaction term clearly introduces more complexity, and the two solvers disagree slightly more in terms of the optimized state and control.
The resulting costs are in Table 5.3. The solution of each example takes between 100 and 200 seconds. The desired state prescribes the density to move from one uniform bump in the middle of the domain, to accumulate in a steeper, elongated shape across the x 1 -axis. In this example, the effect  Table 5.3 Flow Control, Dirichlet: Cost when w = 0 (Juc) and optimal control cost (Jc) for a range of κ and β. For β = 10, the cost functionals differ by 10 −4 for κ = 0 and κ = 1, and by 10 −2 for κ = −1. For β = 10 3 , the cost functionals differ by 10 −7 for κ = 0, by 10 −6 for κ = 1, and by 10 −4 for κ = −1.
of the different interaction strengths and the external potential on the control is clearly visible; see Figure 5.5. The external potential is large on the left side of the domain, which naturally drives the density away from this region, so that most effort of the control variable is concentrated on the right side of the domain. However, for attractive particles, additional control must be applied in the right side of the domain, since the attractive particles oppose the density spreading out along the x 1 -axis. For these attractive particles, the initially clumped density is a more 'natural' state; see Figure 5.4. In contrast, for repulsive particles, most of the work of the control is done to push the particles together. These controls can be seen in Figure 5.5. 5.1.3. Linear (source) control problem with no-flux boundary conditions. We consider problem (2.5) with no-flux boundary conditions (2.6). The chosen inputs for this example are  Table 5.4 Source Control, No-Flux: Cost with w = 0 (Juc) and with the optimal control (Jc) for a range of κ and β. Note that for β = 10, the cost functionals differ by 10 −4 , while for β = 10 3 they differ by 10 −7 .  Table 5.5 Source Control, Dirichlet: Cost with w = 0 (Juc) and with the optimal control (Jc) for a range of κ and β. Note that for β = 10 the cost functionals differ by 10 −5 , while for β = 10 3 they differ by 10 −7 for κ = 0 and κ = 1, and by 10 −8 for κ = −1.
The resulting costs for different β and κ are in Table 5.4. We note that the solution of each example takes between 100 and 200 seconds, apart from the β = 10 −5 case which takes around 25 seconds.
In Figure 5.6 we show (for β = 10 −3 ) the optimal states for different interaction strengths. Since β is small, the optimal state is very close to the desired state ρ (not shown). We can observe clear effects on the optimal state and the control from the external potential V ext . Since V ext is large around x 2 = 1, more control has to be applied in this area to force the density towards ρ. It can also be seen that the state is slightly asymmetric because of this effect, despite ρ being symmetric.
The effect of the different interaction strengths on the state can be observed in Figure 5.6, by inspecting the shape of the particle distribution. The desired state ρ prescribes higher density near the two corners (−1, 1) and (1, 1). Without control or an external potential, repulsive particles accumulate on the boundary of the domain, whilst attractive particles favour the centre of the domain. Hence in this example, where the target density is higher near the boundary, less control needs to be applied for repulsive particles. For attractive particles, the accumulated particles are arranged in a rounder shape, while the repulsive particles are more spread out, as would be expected from their interactions.   Note, in particular, that the external potential is time dependent. Since it decays over time, this results in the strongest effect of V ext being visible at earlier times. The resulting costs for different β and κ can be seen in Table 5.5. The solution of each example takes between 70 and 100 seconds, apart from the β = 10 −5 case which takes around 25 seconds.
We show the optimal state for β = 10 −3 and varying κ in Figure 5.7, which is once again close to the target state, ρ, irrespective of the interaction strength. The corresponding optimal controls are shown in Figure 5.8. Since the external potential is large around the bottom half of the domain, the density is not centred in the middle of the domain, but shifted slightly upwards. At the same time it can be observed that at t = 0.1, the control is applied where the external potential is steep. At later times, the control is mostly applied where the density is prescribed to accumulate approximately in the form of the desired state ρ, which is at the left half of the domain. While the qualitative behaviour of the control is similar in each case, it can be seen that less control has to be applied for attractive particles compared to repulsive ones, since the attraction causes the particles to clump together, which supports the shape of the desired state ρ.

Three-Dimensional Example.
Our final example is three-dimensional, featuring the non-linear flow control problem (2.2) with no-flux boundary conditions (2.4). This has been chosen as an illustrative example in three dimensions since it is both the most challenging combination of control type and boundary conditions and also the most physically relevant for our applications. The chosen inputs are This example is only run for β = 10 −3 , due to a running time of approximately 35 hours per problem. This is a simple consequence of the 'curse of dimensionality'. The effect of the different interaction strengths is clearly displayed in Figure 5.9 and is particularly obvious in earlier times of the particle evolution. It is evident that attractive particles enhance the control in pushing the density into a cluster in the middle of the domain, as prescribed by the desired state ρ, while more control is needed for a similar effect in the repulsive setup. We get, for κ = 0, J c = 0.0078. This can be compared to J uc = 0.0195 from the computed forward problem with w = 0. For κ = 1 we obtain J c = 0.0102, compared to J uc = 0.0232 in the uncontrolled case, and for κ = −1 we have J c = 0.0059, with J uc = 0.0477. As expected, the optimal control leads to a cost which is significantly lower than in the uncontrolled case.
6. Concluding Remarks. We have derived an accurate and efficient algorithmic strategy for solving the first-order optimality conditions arising from PDE-constrained optimization problems, along with additional integral terms, describing multiscale particle dynamics problems. Our approach, linked to the DDFT approach applied to (non-optimized) systems in statistical mechanics, applies a pseudospectral method in space and time, and utilizes fixed-point and Newton-Krylov schemes within the optimization solver. This novel methodology is more general in scope than existing numerical implementations for similar problems, and exhibits the substantial computational benefits of applying such methods for non-local, non-linear systems of PDEs. Numerical tests indicate the potency of our approach for a range of examples, boundary conditions, and problem parameters. An open-source software implementation of our methodology is available at [2]. There are many possible extensions to our approach: for instance, one may apply our methodology to problems where the misfit between state and desired state is measured at some final time only, models with different cost functionals, and boundary control problems. Furthermore, methods of this type may be tailored to specific particle dynamics applications, in fields such as opinion dynamics, flocking, swarming, and optimal control problems in robotics, and such applications will be tackled in future work.

Solver
Error β = 10 −5 β = 10 −3 β = 10 −1 β = 10 1 β = 10 3  solve a problem with a known exact solution. The methods are also compared for the flow control problem with no-flux boundary conditions in Section 5.1.1; see Table 5.2 in particular. The example here is also of this form; the differences here are that there are no interactions, and we include an additional source term such that the problem has an analytic solution.
The results for this example can be seen in Table A.1, which displays the error of the computed state and adjoint variables using the Newton-Krylov solver in reference to the exact solution after 10 iterations, with n = 10 and N 1 = N 2 = 20 points, and the fixed-point solver after 1 iteration having been supplied with the exact solution for w as an initial guess, when using n = 21 and N 1 = N 2 = 30 points. Each example takes between 38 and 88 seconds to run for the fixed-point solver, while it takes 80 seconds to solve the exact problem with the Newton-Krylov algorithm. It should be noted however that the Newton-Krylov solver requires an initial guess for ρ and q at all time points, while the fixed-point solver requires an initial guess for w at all times. The Newton-Krylov initial guess at all time points consists of the initial and final time conditions ρ 0 and q T , while the initial guess for the fixed-point algorithm is w ex . The errors E (as defined in Section 4.6) displayed in Table A.1 are small for both methods. However, the errors made by the Newton-Krylov solver are several orders smaller than those of the fixed-point algorithm, demonstrating the superiority of the higher-order method over the first-order method. It is therefore the natural choice for solving the optimization problems considered in this paper, although the accuracy achieved by the fixed-point scheme is sufficient to retrieve good results in cases where the Newton-Krylov scheme cannot easily be applied, such as in optimal control problems with box constraints.