Dynamic Programming for Finite Ensembles of Nanomagnetic Particles

We use optimal control via a distributed exterior field to steer the dynamics of an ensemble of N interacting ferromagnetic particles which are immersed into a heat bath by minimizing a quadratic functional. By using dynamic programing principle, we show the existence of a unique strong solution of the optimal control problem. By the Hopf-Cole transformation, the related Hamilton-Jacobi-Bellman equation from dynamic programming principle may be re-cast into a linear PDE on the manifold M = (S^2)^N, whose classical solution may be represented via Feynman-Kac formula. We use this probabilistic representation for Monte-Carlo simulations to illustrate optimal switching dynamics.


Introduction
We control a ferromagnetic N -spin system which is exposed to thermal fluctuations via an exterior forcing u := (u 1 , . . . , u N ) : [0, T ] × Ω → (R 3 ) N . A relevant application includes data storage devices, for which it is crucial to control the dynamics of the magnetization m := (m 1 , . . . , m N ) : [0, T ] × Ω → (S 2 ) N in order to ensure a reliable transport of data which are represented by those magnetic structures. Besides the control H ext,i (u i ) = C ext u i , the i-th spin of the ensemble with magnetization m i is exposed to different forces H ani,i (m i ), H d,i (m i ), and H exch,i (m): • the anisotropic force H ani,i (m i ) = Am i for some A ∈ R 3×3 diag , which favors alignment of m i with the given material dependent easy axis e ∈ R 3 , as the effective field. The dynamics of m for times [0, T ] is then governed by the following SDE system (1 ≤ i ≤ N ): (1.2) Here, W = (W 1 , . . . , W N ) is a (R 3 ) N -valued Wiener process on a filtered probability space (Ω, F, {F t } t , P) satisfying the usual conditions to represent thermal fluctuations from the surrounding heat bath. The leading term in the drift part in (1.2) causes a precessional motion of m i around H eff,i (m, u), while the dissipative second term scaled by α > 0 favors a timeasymptotic alignment of m i with H eff,i (m, u). The Stratonovich type of the stochastic integral of the diffusion term ensures that each state process m i takes values in S 2 ; see e.g. [1] for further details.
Our aim is to find a control process u * such that the solution process m * from (1.2) approximates a given deterministic profile m ≡ ( m 1 , . . . , m N ) ∈ L 2 0, T ; (S 2 ) N . More precisely, we aim to solve the following problem. Problem 1.1. Let the parameters δ, ν, α ≥ 0, and T, λ > 0, N ∈ N as well as h ∈ C 2 (S 2 ) N ; R be given. Let Ω, P, F, {F t } 0≤t≤T be a given stochastic basis with the usual conditions, and W be a {F t } 0≤t≤T -adapted (R 3 ) N -valued Wiener process on it. Find a pair 1 (m * , u * ) ∈ L 2 {Ft} Ω; C [0, T ]; (S 2 ) N × L 2 (0, T ; (R 3 ) N ) which minimizes the cost functional subject to (1.2). We call such a minimiser a strong solution of the optimally controlled Landau-Lifschitz-Gilbert equation.
In [5], some of the present authors have studied Problem 1.1 in its weak form and constructed a weak optimal solution Π * := Ω * , P * , F * , {F * t } 0≤t≤T , m * , u * , W * of the underlying problem via Young measure (relaxed control) approach for a compact set U ⊂ (R 3 ) N , a control space, such that 0 ∈ U, which may be generalized to the case U = (R 3 ) N thanks to the coercivity of the cost functional with respect to u; see also [4] for an extension to infinite spin ensembles. To approximate it numerically, implementable strategies may be developed that rest on Pontryagin's maximum principle which characterizes minimizers. In [5], a stochastic gradient method is proposed to generate a sequence of functional-decreasing approximate feedback controls, where the update requires to solve a coupled forward-backward SDE system. A relevant part here is to simulate a (time-discrete) backward SDE via the least-squares Monte-Carlo method, which requires significant data storage resources [4,5], and thus limits the complexity of practically approachable Problems 1.1.
In this work, we use an alternative strategy which rests on the dynamic programming principle. This allows us to prove the existence of a unique strong solution of Problem 1.1, which sharpens results of [5]. Since the solution of the underlying SDE lies on M = (S 2 ) N , the Hamilton-Jacobi-Bellman equation is defined on the manifold [0, T ] × M. In Section 3 we verify that the value function is the unique solution of that Bellman equation and that it belongs to C 1,2 [0, T ] × M . To solve this semi-linear PDE by deterministic numerical strategies seems non-accessible due to the high dimension of the underlying manifold M; we also want to avoid a direct probabilistic representation of its solution which would involve a backward SDE. Indeed, we demonstrate how the nonlinear HJB equation may be replaced with a linear parabolic PDE (3.9) by applying the Hopf-Cole transformation. The quadratic form resp. linearity of the control in the cost functional resp. in the equation (1.2) together with the geometric character of the problem then lead to an isotropic quadratic term in the HJB equation (3.8), which is crucial for this transformation. The regularity of the value function and the optimal policy mapping is explicitly expressed through the regularity of the terminal condition h. Furthermore, the solution w of the linear parabolic PDE can now be represented via a Feynman-Kac formula. This is the starting point for the numerical scheme proposed in Section 5. To approximate the optimal pair (m * , u * ) numerically, a Monte-Carlo method for the solution w of the linear equation (3.8) and its tangential gradient ∇ M w is proposed, from which the optimal feedback functionū can be obtained directly via (3.12). To approximate ∇ M w through a difference quotient with needed accuracy, we choose a stencil diameterh = O 1/ √ M for a sufficiently large number of Monte-Carlo realizations M ; see Remark 6.1. Importantly, this approach does not require larger data storage resources as [4,5] does, but an ample calculation of iterates from related SDEs. Computational studies for the switching dynamics of single and multiple ferromagnetic particles are reported in Section 6.

The stochastic Landau-Lifschitz-Gilbert equation
The solution process m of (1.2) attains values in M = (S 2 ) N . For any For any m m m = (m 1 , . . . , Then, using σ σ σ also here, and combining H ani, For m i ∈ S 2 , one has where P(m i ) is the orthogonal projection onto the tangent plane of S 2 at m i . Note that the diffusion term ν m i (s) × (•dW i (s)) in (1.2) can be re-written as ν σ(m i (s)) • dW i (s). To state the dynamic programming equation, we introduce a family of stochastic Landau-Lifschitz-Gilbert equations with different initial times t ∈ [0, T ] and states m m m ∈ M: where f is defined in (2.1). The solutions m = m t,m m m of (2.3) thus depend on t and m m m; however, we shall drop the superscript of m t,m m m in the subsequent text for the ease of notation. For every 0 ≤ t ≤ T , m(t) = m m m, and u ∈ L 2 {Fs} Ω; L 2 (t, T ; (R 3 ) N ) , there exists a unique strong solution m = (m 1 , . . . , m N ) ∈ L 2 {Fs} Ω; C(t, T ; (R 3 ) N ) of (2.3). Indeed by considering the truncation of the control and then using the stochastic version of the Arzela-Ascoli theorem, Prohorov's lemma and Jakubowski-Skorokhod representation theorem (cf. [13]), we have existence of a weak solution of (2.3). Moreover, an application of Gyöngy-Krylov's characterization of convergence in probability introduced in [11] along with pathwise uniqueness of weak martingale solutions gives existence of a unique strong solution, see [4,Appendix]. Furthermore, by applying Itô's formula to the functional x → x 2 R 3 for m i and using the vector identity a, a × b = 0 for any a, b ∈ R 3 , we have P-a.s., Since m i (t) ∈ S 2 , we see that P-a.s., each m i is S 2 -valued, and thus m ∈ L 2 {Fs} Ω; C(t, T ; (S 2 ) N ) . recalling that the Itô integral is a martingale.

Dynamic programming and HJB equation
For any (t, m m m) ∈ [0, T ] × M, we consider problem (2.3) to now construct the associated Hamilton-Jacobi-Bellman equation, following the formal rules of dynamic programming. We then use the Hopf-Cole transformation to replace the nonlinear HJB equation by a linear PDE and show the existence of a unique classical solution, which then implies existence of a unique classical solution of the original nonlinear HJB equation. Next, we present a verification theorem which shows that the value function is indeed equal to the solution of the HJB equation. We describe the optimal control through an optimal feedback function which is written explicitly in terms of the value function.

Let us define the Lagrangian
where the parameters δ, λ are given in Problem 1.1.
Let Ω, P, F, {F s } t≤s≤T be a given filtered probability space satisfying the usual hypotheses, and W is a {F s } t≤s≤T -adapted (R 3 ) N -valued Wiener process on it. We denote by U s s s m m m [t, T ] the set of all admissible pairs (m, u) such that u ∈ L 2 {Fs} Ω; L 2 (t, T ; (R 3 ) N ) , and m(·) is the unique {F s } t≤s≤T -adapted M-valued strong solution of (2.3). In fact, the superscript s s s refers to the fact that we search an optimal admissible pair on the given filtered probability space. It follows for admissible (m, u) that L m(·), u(·) ∈ L 1 {Fs} Ω; L 1 (t, T ; R) and h(m(T )) ∈ L 1 F T (Ω; R), recalling that h is a continuous function on a compact manifold. As a further consequence, Dynkin's formula (2.6) then holds for all ψ ∈ Note that in the definition of H, the letter m m m stands for the identity map m m m → m m m. Using (2.1), we evaluate the supremum analytically. We have for the tangential gradient ∇ M ψ where * denotes the convex conjugate function, and Q is the 3N × 3N matrix given by Then, in summary, we have We point out that (3.6) ensures that the quadratic term in the Hamiltonian is isotropic, which is crucial for the Hopf-Cole transformation below. We now state the Hamilton-Jacobi-Bellman equation, whose solution we denote by W : The Hopf-Cole transformation: The HJB equation (3.8) is a second-order nonlinear PDE on a high-dimensional domain and therefore without further understanding of its structure computationally expensive to solve; the study of its wellposedness as well as the regularity of its solution is non-trivial. We use the Hopf-Cole transformation w = exp −β W , β ∈ R given below, to substitute (3.8) by the linear PDE (3.9).
We span the tangent space of M at any point m m m by the orthonormal tangent vectors It is convenient to conceptually let ∂ i,1 , ∂ i,2 span the local coordinates associated to the ith sphere. Then we have the following relations: for j ∈ {1, 2} and i ∈ {1, 2, · · · , N }, We see that Therefore, we have the following correspondences: to obtain a cancellation of the quadratic nonlinearity via (b). Substituting the respective terms in (3.8) and multiplying by −β w = 0, we arrive at the following second-order linear equation The following theorem shows that weak solutions w may be examined in the Sobolev-Bochner space is the unique classical solution of the Bellman equation (3.8).
Proof. The existence of a unique solution w of (3.10) in W 1,2,2 [0, T ]; H 1 (M), H −1 (M) is for instance given in [19, p. 224]. Using charts of M and a partition of unity to represent (3.10) locally on the flat space, and an application of parabolic regularity theory ensures that w belongs to C 1,2 [0, T ] × M . This implies that the nonlinear HJB equation (3.8) has a classical solution W . Moreover, the above construction of the Hopf-Cole transformation directly ensures that a function w ∈ C 1,2 [0, T ]×M is a classical solution of (3.9) if and only if W of (3.11) solves (3.8) classically, which then implies the existence of a unique classical solution W , given by (3.11), of the HJB equation (3.8).
It is easy to see that additional smoothness of the terminal condition h directly translates into additional regularity of w and W .
3.2. The verification theorem. In this subsection, we show that V = W , i.e., the value function of the optimal control problem is equal to the solution of the above HJB equation (3.8). The following verification theorem also provides an explicit formula for the optimal control, which inherits the smoothness of ∇ M W .
Proof. The proof is divided into two steps.
Step 1: and therefore one has Because W is a smooth classical solution, we may substitute ψ by W , in which case the left-hand side of (3.14) vanishes. In other words, Indeed, the existence of a classical solution W avoids a more complicated construction to arrive at a bound like (3.15), which would be necessary in a setting with viscosity solutions. Applying now Dynkin's formula (2.6) with W in place of ψ and recalling the final time conditions, we conclude from (3.15), Since W is a C 1,2 -solution of (3.8), we see thatū is continuously differentiable and bounded on Since W ∈ C 1,2 [0, T ] × M , by the mean-value theorem, either applied in combination with Whitney's extension theorem [6, Section 6.5] in the ambient space (R 3 ) N or directly to the geodesics of M, we have Thus, Now, on the given stochastic basis (Ω, P, F, {F s } t≤s≤T ) and for the {F s } t≤s≤T -adapted Brownian motion W(s), the stochastic differential equation Recalling that by Theorem 3.1 the HJB equation (3.8) has a unique solution W , we conclude from the above that V = W . Now we show the uniqueness of the optimal admissible pair (m * , u * ), and thus in particular of the strong solution of Problem 1.1. We remark that the uniqueness of the optimal admissible pair is not automatically provided from the uniqueness of solutions of the HJB equation, which instead was used to characterize the value function through the differential equation (2.3). T ] be an optimal pair. Similar to [20, Chapter 5, Theorem 5.1], we note that then (3.16) holds as equality with (m * , u * ) in place of (m, u). This implies that also (3.13) holds with equality for a.e. s ∈ [t, T ], P-a.s. Hence (m * , u * ) and thus every optimal pair satisfies for a.e. s ∈ [t, T ], P-a.s.
Remark 3.1. In [5], P-a.s. the orthogonality of an optimal state and control was shown both theoretically and numerically. In our approach, we also have P-a.s. the orthogonality of m * and u * . Indeed, by using the vector identity a, a × b = 0 for any a, b ∈ (R 3 ) N , and the fact that For its computational evidence, see Figure 3 for an ensemble of N = 3 particles.

Probabilistic representation of the value function
To solve the linear PDE (3.9) numerically by deterministic methods is still demanding since it is posed on M ⊂ (R 3 ) N . Therefore, we choose a probabilistic representation of the solution of (3.9) which requires to solve the following forward stochastic differential equation, defined on a given stochastic basis (Ω, P, F, {F s } t≤s≤T ) with 3N -dimensional Brownian motion W, We note that because of the linearity of (3.9) the Feynman-Kac representation can be used in place of a backward SDE.

4.1.
A numerical scheme for (4.1). To approximate the solution m m m of (4.1), we use the semi-implicit method proposed in [16]. Note that (4.3a)-(4.3b) is a system of linear equations, leading to short simulation times. Furthermore, the numerical schemes ensures m m m j takes values on M. This is exploited when applying arguments from [1] in this finite ensemble setting to conclude convergence of iterates {m m m j } J j= +1 towards a weak solution of (4.1) for τ → 0.

HJB solution.
The classical solution w of (3.9) is given by (4.2). In order to approximate it, and hence the classical solution W of the nonlinear HJB equation  Thus, we simulate the transformation functionū(t , m m m ) from (3.12) by using one of the above methods, where the sequence (t , m m m ) is described in Subsection 4.1.

Optimal state.
We use again the semi-implicit method proposed in [16] to approximate the solution in ( in (4.3a) resp. (4.3b) is replaced by a i m j ,ū(t j , m j ) resp. a i m j + e e e j 2 ,ū t j , g j such that the iterates {m j } J j= converge towards a weak solution of (2.3) for τ → 0. Moreover, the iterates {u j =ū(t j , m j )} J j= defines the discrete optimal control along I J . In summary, we have the following algorithm to compute the optimal solution and control along with Method B. (V) Define g := (VI) For i = 1, . . . , N do: For l = 1, 2, 3 do: (4) Based on g := g g g, compute g g g + h,i,l and g g g − h,i,l as in (5.2). using {+ξ ξ ξ j (ω k )} J j= resp. {−ξ ξ ξ j (ω k )} J j= with initial condition m m m = g g g.
(d) Compute H(t , g g g + h,i,l , ω k ) resp. H(t , g g g − h,i,l , ω k ) in ( to determine d l,i H(t , g g g, ω k ) in (5.5) and store it. (6) Approximate E[d l,i H(t , g g g)] in (5.5) via Monte-Carlo estimation along with the variance reduction method of antithetic variates. (VII) Set ∇ M w(t , g g g) ≈ d 1 H(t , g g g), . . . , d N H(t , g g g) with d i H(t , g g g) as in (5.5), and computē u(t , g ) as in (3.12), and m +1 via scheme (4.3b) with a i g ,ū(t , g ) .

Computational Experiments
We computationally study the behavior of the optimal state and control for the switching dynamics of an ensemble of N particles by using the algorithm from Section 5. For this purpose, we employ discretely distributed random numbers from the GNU Scientific Library [9]. All computations are performed on an Intel Core i5-4670 3.40GHz processor with 16GB RAM in double precision arithmetic. The arising linear algebraic systems are solved by the Gaussian elimination method [9]. 6.1. Test studies. We start with some test problems to compare the two methods from Subsection 5.2. For this purpose, we omit certain energy contributions in (1.1), and allow only one or two spins such that an exact solution of (3.9) becomes available. Moreover, we have the explicit formula for ∇ S 2 w(t, m m m), and hence for ∇ S 2 W (t, m m m): Letū exct (t, m m m) resp.ū app (t, m m m) be the function defined in (3.12) associated to (6.2) resp. (4.2), and m * exct (t) resp. m * app (t) be the solution of (2.3) withū exct t, m * exct (t) resp.ū app t, m * app (t) . By denoting the error Test problem 2: We study the interaction of two isotropic (D = 0) spins for α = 0 = δ, and all other parameters are equal to 1. Let us first recall how the spherical harmonics on a single sphere S 2 generalizes to the manifold M = (S 2 ) N = (S 2 ) 2 most naturally. Indeed, because M is a tensor product of spheres, and the spherical harmonics form an orthogonal basis on the single sphere, the tensor products of spherical harmonics form an orthogonal basis on M. It is therefore reasonable to expect that the simplest meaningful test problems on (S 2 ) 2 can be constructed with terminal time conditions which are products of low-order spherical harmonics and which are eigen-functions of the transformed Bellman equation. Because of the spin interaction, the first order coefficient b in (3.9) does not vanish any more. Therefore, we combine the functions with the positive semi-definite matrix J: for any µ > 0, We compute the tangential gradient of the functions w 00,00 , w 01,00 and w 00,01 : We observe that the error err(t) for the both test problems 1 and 2 is of the same magnitude as the error made in the approximation of ∇ M w (hence ∇ M W ).
Optimal control of two interacting isotropic spins. In the setting of test problem 2 we next study the time evolution of a single trajectory of the optimal state, as well as the magnitude and direction of the optimal control. In this case, the trajectory of the optimal control lies in x 1 -x 2 plane to balance the random influences; see Figure 2.
Remark 6.2. Computational studies for both test problems suggest stability of the scheme (4.3a)-(4.3b). However, convergence resp. termination of the scheme depends crucially on the given parameters in Problem 1.1. For choices λν 2 min{δ, 1}C 2 ext (1 + α 2 ), an exponential overflow occurs during truncation in simulations, and therefore the computed value of w t, m m m in (4.2) is set to zero then. Hence, in this case, log w(t, m m m) is not defined, and thus the approximation procedure to approximate ∇ M W (t, m m m) terminates. This is the reason that Examples 5.1 and 5.2 from [5] may not directly be simulated here. Notice that no exponential overflow occurs for both test problems above, since δ = 0. 6.2. Optimal control of three interacting spins. We now study an ensemble of N = 3 particles, which additionally are subjected to exchange forces. We are mainly interested in the switching control for one (i = 2) of these particles fromm 2 (at initial time) to −m 2 at given  Figure 2. Test problem 2: time evolution of a single trajectory of the optimal state t → m * i (t) (red), the direction of the optimal control t → u * i (t) u * i (t) −1 R 3 (blue), and the magnitude of the optimal control t → u * i (t) 2 where the deterministic target profile m : [0, T ] → (S 2 ) 3 is given by We use again Method B to approximate ∇ M W . To simulate the optimal pair of the underlying problem, we have used the methodology described in Subsections 4.1 and 5.1-5.3, along with the following set-up of parameters: In this case, the minimum value of the cost functional is J * sto ≡ 0.9078. Though the first and third spins start already at the desired state, it is due to the noise, and the exchange forces in particular, that the optimal control is acting on the whole time interval and on all spins. For the second spin, we observe that at the beginning and end, less control is needed opposed to the applied control at the mean time; see Figure 3. The orthogonality of the optimal pair (m * , u * ) (e.g. Remark 3.1) is shown in Figure 3, (G)-(I) by displaying the temporal evolution 6.3. Optimal control of four interacting spins. We consider here the switching control for an ensemble of N = 4 particles.
Set-up 1: We use the parameters as in Subsection 6.2 withm = e 1 , −e 1 , e 1 , −e 1 , and m(t) = e 1 , m 2 (t), e 1 , m 2 (t) . In this case, the first and third spins start already at the desired state; the associated optimal controls are acting on the whole time interval. Moreover, for the second and fourth spins, significant controls are required to approximately meet the terminal state profile. The time evolution of t → u * 2 (t) 2 R 3 is similar to the results for N = 3 spin constellations (see Figure 3, (E)), while u * 4 (t) 2 R 3 is delayed in time for the fourth spin. We (i) t → u * 3 (t), m * 3 (t) Figure 3. Time evolution of a single trajectory of the optimal state t → m * i (t) (red), the direction of the optimal control t → u * i (t) u * i (t) −1 R 3 (blue), the magnitude of the optimal control t → u * i (t) 2 R 3 , and the angle between optimal pair t → u * i (t), m * i (t) for i = 1, 2, 3.
observe a loop of the orientation of u * i (t) u * i (t) −1 R 3 (i = 2, 4) close to the terminal time; see Figure 4. Set-up 2: We use same parameters as in set-up 1 withm = e 1 , −e 1 , −e 1 , e 1 and m(t) = e 1 , m 2 (t), m 2 (t), e 1 . For the second and third spins, significantly synchronous controls at mean times are required to meet approximately the desired target profile. Like in Set-up 1, we also observe the formation of loops of the orientation of u * i (t) u * i (t) −1 R 3 (i = 2, 3) close to the terminal time; see Figure 5. (h) t → u * 4 (t) 2 R 3 Figure 4. Time evolution of a single trajectory of the optimal state t → m * i (t) (red), the direction of the optimal control t → u * i (t) u * i (t) −1 R 3 (blue), and the magnitude of the optimal control t → u * i (t) 2 R 3 with set-up 1 for i = 1, 2, 3, 4. (h) t → u * 4 (t) 2 R 3 Figure 5. Time evolution of a single trajectory of the optimal state t → m * i (t) (red), the direction of the optimal control t → u * i (t) u * i (t) −1 R 3 (blue), and the magnitude of the optimal control t → u * i (t) 2 R 3 with set-up 2 for i = 1, 2, 3, 4.  (e) t → u * 7 (t) 2 R 3 Figure 6. A single realization of the optimal state m * (red) at final time T , the time evolution of the optimal state t → m * i (t) (red), the direction of the optimal control t → u * i (t) u * i (t) −1 R 3 (blue), and the magnitude of the optimal control t → u * i (t) 2 R 3 for i = 3, 7.
To simulate the optimal pair of the underlying problem, we take again D = 0, J as in (6. In Figure 6, (A), we visualize the behavior of the optimal state m * . Due to the large damping coefficient α = 1.0, we observe fast switching dynamics of the optimal state. With the choice λ = 1 the control is penalised more strongly than in the previous experiments, which has a noticeable effect on the magnitude of u * , compare Figures 5 (E) -(H) and 6 (C) & (E). At the beginning a stronger control is applied to move towards the desired target profile. Because of the large noise intensity ν, and the less control, some particles of this ensemble do not reach the target profile appropriately. For illustration, we plotted the behavior of the optimal state t → m * i (t) (red), the direction of the optimal control t → u * i (t) u * i (t) −1 R 3 (blue), and the magnitude of the optimal control t → u * i (t) 2 R 3 for i = 3, 7; see Figure 6.