Analysis and Optimal Velocity Control of a Stochastic Convective Cahn–Hilliard Equation

A Cahn–Hilliard equation with stochastic multiplicative noise and a random convection term is considered. The model describes isothermal phase-separation occurring in a moving fluid, and accounts for the randomness appearing at the microscopic level both in the phase-separation itself and in the flow-inducing process. The call for a random component in the convection term stems naturally from applications, as the fluid’s stirring procedure is usually caused by mechanical or magnetic devices. Well-posedness of the state system is addressed, and optimisation of a standard tracking type cost with respect to the velocity control is then studied. Existence of optimal controls is proved, and the Gâteaux–Fréchet differentiability of the control-to-state map is shown. Lastly, the corresponding adjoint backward problem is analysed, and the first-order necessary conditions for optimality are derived in terms of a variational inequality involving the intrinsic adjoint variables.

(1.2) n · ∇ϕ = n · ∇μ = 0 in(0, T ) × ∂O , where O is a smooth bounded domain in R d , d = 2, 3, T > 0 is a fixed final time, and n denotes the normal outward unit vector on ∂O. The system (1.1)-(1.4) models isothermal phase-separation occurring in a moving fluid occupying the space region O during the time interval [0, T ]. The order parameter, or phase-variable, ϕ represents the relative concentration between the pure phases, the variable μ represents the chemical potential of the system, and the nonlinearity : R → R is a doublewell potential with two global minima. The term u is an external random velocity field acting on the system, modelling possible stirring and mixing processes of the fluid which may affect phase-separation itself. The stochastic forcing describing the thermal fluctuations affecting phase-separation is modelled by means of a cylindrical Wiener process W on a given probability space and a W -integrable coefficient B, possibly depending on the phase variable itself, which calibrates the intensity of the noise.
The Cahn-Hilliard equation is a classical model employed in phase-separation, and has nowadays numerous applications to physics, biology, and engineering. Its introduction dates back to the pioneering work by Cahn and Hilliard (1958), where it was proposed, in the deterministic version, to adequately describe spinodal decomposition in binary metallic alloys. In the last decades, the model has been extensively refined in several directions. For example, the description of possible viscous behaviours has been originally presented in Elliott and Stuart (1996), Elliott and Songmu (1986), Novick-Cohen (1988), and then generalised in Gurtin (1996). The presence of a further evolution close to boundary due to the interaction with the hard walls has been accounted for by proposing several choices of dynamic boundary conditions, for which we refer to (Fischer et al. 1997;Kenzler et al. 2001;Gal 2012). The deterministic Cahn-Hilliard equation has been proven to be extremely effective in describing phase-separation phenomena. Nevertheless, it presents some drawbacks. Indeed, the phase-separation process inevitably presents some disruptions, acting at a microscopic level. These are due to unpredictable movements at the atomistic level, which may be caused, for example, by temperature oscillations, magnetic effects, or configurational interactions. As such, the classical Cahn-Hilliard system is unable to capture the erratic nature of the separation process. The most natural way to overcome this problem is to switch to a random setting instead, by introducing a suitable noise term in the equation that could effectively describe the unpredictability of the phenomenon at a small scale. This was proposed by Cook (1970) for Wiener-type noises and gave rise to the well-known Cahn-Hilliard-Cook stochastic model for phase-separation. The stochastic version of the model was then confirmed multiple times (Binder 1981;Pego 1989) to be the only one that can genuinely describe phase-separation in alloys. Since then, the random version of the equation has been increasingly studied, both in the physics literature Elder et al. 1988;Grant et al. 1985;Langer et al. 1975;Milchev et al. 1988) and in the direction of model validation and numerical simulations (Blömker et al. 2001(Blömker et al. , 2008(Blömker et al. , 2016Hawick 2010;Hawick and Playne 2010;Hawick 2008;Lee et al. 2014).
The classical Cahn-Hilliard equation is the gradient flow associated with the free energy functional with respect to the metric of H 1 (O) * . The gradient term penalises the oscillation of the order parameter, while the double-well potential models the tendency of each phase to concentrate. The form of the chemical potential in (1.2) appears then naturally from the differentiation of the free energy. Typical examples of are given by and pol (r ) := 1 4 (r 2 − 1) 2 , r ∈ R. (1.6) Although (1.5) is the most relevant choice in terms of thermodynamical consistency, its singular behaviour in ±1 could be hard to tackle from the mathematical viewpoint, and in several models the polynomial approximation (1.6) is often employed. The velocity field u models the transport effects due to convection terms acting on the system. In our analysis, this will be a prescribed external forcing field which will play the role of velocity control in a typical optimisation problem. Optimisation involving phase-separating fluids where the velocity is the control arises naturally in applications. For example, this is the case of block solidification of silicon crystals in photovoltaic applications. Here, the flow of the fluid acts as a control to optimise the distribution of certain impurities, at the atomistic level, in a process of solidification of silicon melt. For more details about the applications of optimal velocity control problem in phase-separating fluids, we refer to (Kudla et al. 2013;Rocca and Sprekels 2015). In practice, the motion of the fluid can be achieved in several ways: as pointed out in Colli et al. (2018a), Rocca and Sprekels (2015), the most common choices consist in employing either mechanical stirring devices or ultrasound emitters directly into the container. Another possibility is to prescribe a velocity on the fluid by means of magnetic fields: this is widely employed, for example, in the case of molten metals (Kudla et al. 2013) or bulk semiconductor crystals. Nevertheless, it is worthwhile noting that in all these scenarios, the velocity field is usually obtained in an indirect way, meaning that the motion of the fluid is achieved only as a consequence of more direct controls, such as mechanical devices or magnetic effects. This being noticed, it is clear then that the external prescription of a given velocity is strongly affected by microscopic noises, which may be caused, depending on the type of motion-inducing devices, by configurational or electromagnetic disturbances occurring in the flow-creating process. Also, the effective induction of the flow is strongly affected by the imprecision of the above-mentioned devices.
From the modelling point of view, this strongly calls for the introduction of a further source of randomness in the velocity field u and for abandoning the classical deterministic setting of the problem. Let us stress that the random component of the velocity field prescinds from the stochastic nature of the noise in equation (1.1): while the Wiener process W models microscopic turbulences occurring in phase-separation, the random nature of u takes into account the imprecision of the flow-inducing mechanisms. For example, in typical situations u would satisfy a further stochastic equation involving a further Wiener process, independent of W . Clearly, this extra equation would specifically depend on the model in consideration: here, in order to make the treatment as general and light as possible, we only require u to be a stochastic process. Let us point out that this choice implies that the microscopic fluctuations in u coming from a possible further noise are not taken into account explicitly here. Indeed, the box constraint for the controls (see Sect. 2 below) only requires some general measurability and integrability conditions on u, and does not prescribe any specific requirement on the microscopic fluctuations of u. To fix the ideas, the reader can naturally think about focusing only on macroscopic controls, e.g. controls which are C 1 in time and W 1, p in space, and neglecting thus the microscopic turbulence in u. Here, since the methodology can be directly adapted to more general controls, we preferred to consider a broader class of admissible controls, for sake of mathematical generality.
The importance of allowing the control variable to be random is crucial when dealing with a controlled stochastic equation (see, for example, Yong and Zhou 1999). Indeed, bearing in mind the typical perspective of Monte Carlo simulations, restricting to deterministic controls would mean to choose a priori a control which is independent of the possible outcomes of the evolution according to the prescribed underlying probability space. By contrast, stochastic controls ensure more freedom from the point of view of the controller, as they allow to adapt the control to the random outcomes of the phenomenon itself. With this in mind, in our analysis u will be a prescribed stochastic process satisfying some natural box-constraints, possibly taking into account the random imprecision of the velocity-inducing devices. The model that we study presents then two main sources of randomness: the first one is given by the Wiener noise in equation (1.1), taking into account the microscopic turbulence affecting phase-separation, and the second one is the stochastic component of the convection term, modelling the imprecision of the stirring procedure. Hence, one can think the two random forcings as acting on two separate levels: a microscopic scale described by W , and a different uncorrelated scale rendered by u.
The mathematical literature dealing with the Cahn-Hilliard equation is extremely developed. In the deterministic case, attention has been widely devoted to the study of well-posedness, regularity, long-time behaviour of solutions, and asymptotics. Due to the considerable size of the literature, we prefer to quote the detailed overview by Miranville (2019) and the references therein for completeness. Let us only point out the contributions (Colli et al. 2014;Cherfils et al. 2011;Gilardi et al. 2009) dealing with well-posedness and (Colli et al. 2015a(Colli et al. , b, 2016Hintermüller and Wegner 2012) in the direction of distributed and boundary control problems. Possible relaxations and asymptotics of the Cahn-Hilliard equation have been recently studied in Bonetti et al. (2017Bonetti et al. ( , 2018Bonetti et al. ( , 2020, Colli and Scarpa (2016), Scarpa (2019a) also with nonlinear viscosity terms.
In the stochastic case, the original contribution dealing with Cahn-Hilliard equation is (Da Prato and Debussche 1996), on the existence of mild solutions in the case of polynomial potentials. Further studies have been then carried out in the works (Cornalba 2016;Elezović and Mikelić 1991) again in the polynomial setting, and in Scarpa (2018Scarpa ( , 2020 in the case of more general potentials in variational framework. The stochastic Cahn-Hilliard equation with logarithmic potential has been studied in Debussche and Zambotti (2007), Debussche and Goudenège (2011);Goudenège (2009) in relation to reflection measures, and in  in the case of degenerate mobility. In the context of phase-field modelling with stochastic forcing, it is worthwhile mentioning the contributions (Antonopoulou et al. 2016;Feireisl and Petcu 2019a, b), as well as (Bauzet et al. 2017;Bertacco 2020;Orrieri and Scarpa 2019) on the stochastic Allen-Cahn equation. In the direction of optimal control, we point out (Scarpa 2019b) dealing with a distributed optimal control problem of the stochastic Cahn-Hilliard equation, and the recent work (Orrieri et al. 2020) on a stochastic phase-field model for tumour growth.
Concerning specifically the Cahn-Hilliard equation with convection, in the deterministic case well-posedness has been studied in Colli et al. (2018a) under general choices of dynamic boundary conditions, in Porta and Grasselli (2015) in a local version with reaction terms, while some related optimal velocity control problems have been analysed in Colli et al. (2018bColli et al. ( , 2019, Rocca and Sprekels (2015), Zhao and Liu (2013), and Zhao and Liu (2014). Also, the relationship between the behaviour of the convection term and phase-separation has been analysed in the recent work (Feng et al. 2020): here, the authors show that if the velocity field is sufficiently mixing, then no phase-separation occurs, and the solutions of the respective advective Cahn-Hilliard equation converge exponentially to a homogenous mixed state instead. This may have important connections to related optimal control problems with a target distribution at a final time: in particular, the above-mentioned result makes the optimisation problem meaningful also when the final target state is not necessarily separated, but is a homogenous mixed state. Also, it points out how powerful the action of the convection term is on the phase-separation, and motivates the study of phase-optimisation problems where the control is the velocity itself. The convective Cahn-Hilliard equation has also been considered in coupled systems, with a further equation equation for the velocity field: it is the case, for example, of Cahn-Hilliard-Navier-Stokes systems, studied in Abels (2009), and Frigeri et al. (2019, 2020, 2016. By contrast, despite its strong relevance in application to stochastic optimal velocity control, the convective Cahn-Hilliard has not been analysed yet. The only results available in the stochastic setting deal with coupled systems, for example in the context of stochastic Cahn-Hilliard-Navier-Stokes models (Deugoué and Medjo 2018b, a;Medjo 2017). This paper constitutes a first contribution to optimal velocity control for the stochastic convective Cahn-Hilliard equation.
The literature on stochastic optimal control is also quite extensive: for a general overview we refer to the monograph (Yong and Zhou 1999). Stochastic optimal control is also studied in Fuhrman et al. (2012Fuhrman et al. ( , 2013Fuhrman et al. ( , 2018, Fuhrman and Orrieri (2016), Guatteri et al. (2017) in the context of the heat equation and reaction-diffusion systems.
For completeness, we refer also to the works (Du and Meng 2013;Lü and Zhang 2014) concerning the stochastic maximal principle. Relaxation of the optimality conditions has been addressed in Brzeźniak and Serrano (2013) and Barbu et al. (2018) for dissipative SDPEs and the Schrödinger equation, respectively. Deterministic optimal control problems of stochastic reaction-diffusion equations have been analysed in Stannat and Wessels (2019).
Let us describe now the main points that will be addressed in this work. First of all, we concentrate on the well-posedness of the state-system (1.1)-(1.4), where the control u is arbitrary but fixed. Using a Yosida approximation on the nonlinearity and a time-regularisation on the velocity field, we show existence-uniqueness of solutions by means of variational techniques and stochastic compactness arguments. Thanks to monotone analysis tools, we are able to cover very general potentials, not necessarily of polynomial growth. Also, we prove continuous dependence of the variables with respect to the control, and this allows to define a suitable control-to-state map S : u → (ϕ, μ). Secondly, we focus on the optimisation problem, which consists in minimising a tracking-type cost functional in the form: subject to the state-system (1.1)-(1.4) and the constraint that u is an admissible control, meaning that u ∈ U ad with U ad being a suitable bounded, closed subset of the space p-integrable progressively measurable process with values in L 3 (O) d . Here, ϕ Q and ϕ T represent some running and final targets, while α 1 , α 2 , α 3 are nonnegative weights. Cost functionals in this form arise very naturally from applications. Roughly speaking, the optimisation problem amounts to identify the optimal way of stirring and mixing the fluid in such a way that the state variable ϕ is as close as possible to the running target ϕ Q during the evolution and to the final target ϕ T at the end of the evolution, without wasting too much energy in inducing the flow u. As we have anticipated above, a typical example that we have in mind appears in the solidification process of silicon crystals in the context of industrial photovoltaic applications (Kudla et al. 2013;Rocca and Sprekels 2015). Here, a certain mixture of impurities needs to be moved by convection from within the silicon melt to its boundary, in order to refine the quality of the final silicon block. The flow u of the fluid behaves then as a control on the silicon melt in order to make the relative distribution of impurities ϕ be close enough to some prescribed targets. In particular, the final target distribution ϕ T of impurities can be seen here as concentrated on the boundary and diluted in the interior. Analogous applications arise more generally in optimal distribution problems of melting materials: the local distribution of some substance contained in the separating fluid is optimised close to some desired targets by inducing a flow in the material itself.
The starting point in the analysis consists in addressing existence of optimal controls. This is one of the main differences with respect to the deterministic optimal control problem. Indeed, in the deterministic setting existence of optimal controls follows with no particular effort from the direct method of calculus of variations, since one is able to obtain enough compactness from the well-posedness of the state system and the boundedness of the set of admissible controls. By contrast, in the stochastic case these uniform estimates on the minimising sequence of controls do not ensure enough compactness in probability, due to the stochastic nature of the problem itself. Also, classical stochastic tools that are usually employed to bypass this problem, such as the well-known criterion à la Gyöngy-Krylov, do not work here: this is due to the non-uniqueness of optimal controls, which is caused by the highly nonlinear nature of the minimisation problem. To overcome this issue, we propose instead a relaxed notion of optimality, which may be considered as optimality in law, i.e. requiring that the stochastic basis and the Wiener process are part of the definition of optimal control themselves. This technique mimics the definition of probabilistically weak solution for stochastic evolution equations, and has been employed in other settings such as (Barbu et al. 2018;Orrieri et al. 2020). In this framework, we prove existence of relaxed optimal controls, and we show that when one restricts the attention only to deterministic controls, then it is possible to get existence in the classical (probabilistically strong) sense.
We move then to the study of the differentiability properties of the control-tostate map S. More specifically, we prove that S is Gâteaux and Fréchet differentiable between suitable Banach spaces. This is done by showing well-posedness of the so-called linearised system, obtained from (1.1)-(1.4) formally differentiating with respect to u, and by carefully proving that the unique linearised solution actually coincides with the derivative of S. This will allow to explicitly characterise, thanks to the chain rule in Banach spaces, the derivative of the reduced cost functional J • S, so that the optimisation problem could be seen only in terms of the control u. Consequently, it is possible to obtain a first rudimental version of necessary conditions for optimality, by imposing the classical first-order variational inequality D(J • S)(u) ≥ 0 on a given optimal control.
The last part of the paper aims at refining the first version of necessary conditions, by removing any explicit dependence on the linearised variables. This is done by introducing and studying a suitable adjoint problem, which is formally related to the dual problem of the linearised system. The adjoint problem consists of a backward-in-time stochastic partial differential equation, and its analysis is the most challenging point of the work. The first main difficulty is indeed the backward nature of the equation: although this is not a great limitation in deterministic problems, in the stochastic case it calls for the introduction of an extra variable, in order to preserve adaptability of the processes in play, and requires different analytical techniques such as martingale representation theorems. The second and most crucial difficulty depends instead on the nonlinear nature of the system. Indeed, the presence of the nonlinear term (ϕ) and the dual structure of the equation prevent from obtaining uniform estimates directly on the adjoint system. Consequently, well-posedness cannot be obtained classically by tackling the adjoint problem straightaway, and a different idea is needed. In this regard, we use a duality method. We consider a more general version of the linearised system, where an arbitrary forcing term is added, and we show that this is well posed and the solutions depend continuously on the forcing term. Then, we prove that such system is in duality with the adjoint problem that we want to study, and this allows to recover by comparison some first uniform estimates on the adjoint variables. This tool is extremely powerful, as it allows to bound the adjoint variables without even working on the adjoint system itself: the main intuition behind this is that the linearised system is usually much simpler to study, and the duality between linearised-adjoint systems allows to "transfer" uniform bounds on the solutions from one problem to the other. Once these first crucial estimates are obtained, using classical techniques we are then able to prove well-posedness of the adjoint problem. Lastly, the duality relation is employed to refine the first-order conditions for optimality and to write them as a variational inequality only depending on the intrinsic adjoint variables.
The main novelty of the work is the presence of two sources of randomness in equation (1.1), accounting for noises both in the phase-separation process and in the flow-inducing procedure. As interesting as it may be from the applied point of view, certainly this novel framework does not come without effort on the mathematical side. Indeed, let us stress that the fact that u is assumed to be a stochastic process, and not a deterministic function, causes several non-trivial issues in estimating the solutions: this is due to a lack of satisfactory computational tools of Gronwall type in the genuinely pure stochastic case. Such difficulties are evident especially in the study of the forward problems, i.e. in the state system (1.1)-(1.4) and in the corresponding linearised system. Here, the idea is to argue instead combining carefully the Hölder inequality and several iterative patching arguments, in order to avoid applying the Gronwall lemma, which does not work. In the adjoint problem, the situation is slightly better: we will show that the backward nature of the equation allows indeed to use a very general and recent backward-in-time version of the stochastic Gronwall lemma (see Lemma 6.1).
We conclude by summarising here the structure of the paper. Section 2 contains the description of the setting of the work, the precise assumptions, and the main results that we prove. In Sect. 3, we prove well-posedness of the state-system, while Sect. 4 focuses on the existence of optimal controls. Then, in Sects. 5 and 6, we study the linearised system and the adjoint system, respectively. Finally, in Sect. 7, we prove the two versions of the first-order conditions for optimality.

Setting and Assumptions
In this section, we specify the general setting, notation, and assumptions of the work. We then present the main results of the paper.
Let ( , F , (F t ) t∈[0,T ] , P) be a filtered probability space satisfying the usual conditions, where T > 0 is a fixed final time and W is a cylindrical Wiener process on a separable Hilbert space K . For convenience, let us fix now once and for all a complete orthonormal system (e j ) j of K . The progressive σ -algebra on × [0, T ] is denoted by P.
As far as notation is concerned, the dual of a given real Banach space E is denoted by E * , and the duality pairing between E * and E is denoted by ·, · E * ,E . Weak convergence in E and weak * convergence in E * will be denoted by the respective symbols and functions from [0, T ] to E, respectively. For spaces of stochastic processes, we use the notation L q 1 P ( ; L q 2 (0, T ; E)) to further specify that measurability is also intended with respect to the progressive σ -algebra P. In the case that q > 1 and E is separable, we explicitly set L q w ( ; L ∞ (0, T ; E * )) as the dual space of L q q−1 ( ; L 1 (0, T ; E)), which we recall can be characterised (Edwards 1965, Thm. 8.20.3) as the space of weak*-measurable random variables y : → L ∞ (0, T ; E * ) with finite q-moment in . Finally, if E 1 and E 2 are separable Hilbert spaces, we use the notation L 2 (E 1 , E 2 ) for the space of Hilbert-Schmidt operators from E 1 to E 2 .
In the proofs, the symbol c is reserved to denote any generic positive constant, whose value depends on the structure of the problem and may be updated from line to line in the proofs.
Let O ⊂ R d (d ≥ 2) be a smooth bounded domain. We use the classical notation The outward normal unit vector on the boundary ∂O is denoted by n. We introduce the functional spaces endowed with their natural norms · H , · V 1 , · V 2 , and · V 3 , respectively. We identify H to its dual, so that we have the continuous and dense inclusions For all y ∈ V * 1 , we use the notation y O := 1 |O| y, 1 for the spatial mean of y, and define the subspaces of zero-mean elements as Let us recall that the variational formulation of the Laplace operator with Neumann conditions is a well-defined linear operator, and its restriction to V 1,0 is an isomorphism onto the space V * 1,0 . Its inverse N : V * 1,0 → V 1,0 is the resolvent operator associated with the abstract elliptic problem on O with homogenous Neumann conditions, meaning that for all y ∈ V * 1,0 the element z := N y ∈ V 1,0 is the unique solution with null mean to As a consequence of the Poincaré-Wirtinger inequality, it is immediate to check that yields an equivalent norm on V * 1 . In particular, it follows the compactness inequality We introduce the space where the divergence is intended in the sense of distributions on O. The space of velocity controls u that we focus on will be Let us note that this includes as a special case the choice of deterministic controls, which has also received a strong mathematical interest on its own: see, for instance, Stannat and Wessels (2019). Indeed, we can set The following assumptions on the problem will be in force throughout the paper.

A1:
: Let us point out that the classical polynomial double-well potential pol satisfies these assumptions with γ = 2. Nonetheless, by allowing also the smaller values γ ∈ [1, 2] we are able to include possibly more singular potential, such as the first-order exponentials. We set β : r → (r ) + C r , r ∈ R: then β : R → R is a C 2 nondecreasing function; hence, it can be identified with a maximal monotone (single-valued) graph in R × R. Let us also denote by β : R → [0, +∞) the convex lower semicontinuous function with Moreover, we prescribe that in case of multiplicative noise.
Let us note that in case of additive noise B ∈ L 2 (K , V 1 ), these conditions are trivially satisfied for all γ ∈ (1, 2] if d = 2 and for all γ ∈ [3/2, 2] if d = 3: in particular, the classical polynomial case in dimension two and three is always covered. In the genuine multiplicative noise case, i.e. when B is not constant in V 1 , we also suppose that B is L 2 (K , V 1,0 )-valued: this amounts to requiring that the noise is conservative, in the sense that it preserves the mean ϕ O of the phase-variable. A direct consequence is the conservation of mass, which is a fundamental feature of Cahn-Hilliard-type evolutions. This hypothesis on the noise is very classical and natural in literature: for example, let us stress that a relevant multiplicative choice of B can be given as: It is not difficult to show that this example allows for all values of γ ∈ [1, 2] in every space-dimension d = 2, 3.
In the context of the optimal velocity control, it will be useful to introduce a polynomial-growth assumption on . This will be necessary only in the study of the optimisation problem, but is not needed for the well-posedness of the state system.

C1
: it holds that γ = 2 in A1 and Such requirement is very natural in the Cahn-Hilliard context, since it is satisfied by the classical choice of the polynomial double-well potential pol of degree 4.
The first main result of the paper states existence and uniqueness of strong solutions, and their continuous dependence with respect to the velocity field.
for all s ∈ (0, 1/2), and such that for every t ∈ [0, T ], P-almost surely. Furthermore, there exists a constant K > 0, only depending on the structure of the problem, such that for all u ∈ U, the respective solution (ϕ, μ) satisfies Lastly, if also C1 holds, then (2.4) Once the analysis of well-posedness of the state system has been addressed, we can turn our attention to the optimal velocity control problem. As far as the controls are concerned, we consider classical box-constraints on the velocity controls, by defining the set of admissible controls as: where L > 0 is a prescribed constant. The prescription of a box-constraint on the admissible controls is classical on the mathematical side. In applications, the constant L is typically related to the maximum capacity of the flow-inducing devices that convey the velocity field. It will be useful to introduce an enlarged bounded open set U ad in U containing U ad , as Analogously, we introduce the corresponding spaces of admissible deterministic controls as: The cost functional that we study is of quadratic tracking-type and reads where α 1 , α 2 , α 3 are non-negative constants with α 1 + α 2 + α 3 > 0 and the targets are fixed with The optimal velocity control consists in the following: (CP) minimise the cost functional J with the constraints that u belongs to U ad and ϕ is the unique corresponding solution component to the state system (1.1)-(1.4).
By virtue of the well-posedness Theorem 2.1, it is well defined the control-to-state map With this notations, we can state the exact definition of optimal control as follows. As anticipated, we also give some relaxed notions of optimality, one based on the concept of optimality-in-law and the other obtained minimising only on the deterministic controls.
Definition 2.3 An optimal control for (CP) is an element u ∈ U ad such that is a filtration satisfying the usual conditions, W is a K -cylindrical Wiener process on it, α 1 ϕ Q ∈ L 2 P ( ; L 2 (0, T ; H )) and α 2 ϕ T ∈ L 2 ( , F T ; H ) have the same laws of α 1 ϕ Q and α 2 ϕ T , respectively, and u ∈ U ad satisfies Our first result in the analysis of the optimisation problem (CP) concerns existence optimal controls. It is worthwhile noting that due to the non-uniqueness of optimal controls, in the genuinely stochastic case one can only show existence of relaxed optimal controls: this is typical in highly nonlinear stochastic optimal control problems, see, for example, (Barbu et al. 2018;Scarpa 2019b). By contrast, we show that deterministic optimal controls always exist.
Theorem 2.4 Assume A1-A3. Then, there exist a relaxed optimal control u and a deterministic optimal control u det for problem (CP).
Once existence of minimisers for (CP) is proved, we can now turn to the main focus of the work, i.e. the investigation of necessary conditions for optimality. The first main step in this direction is the study of the differentiability of the control-to-state map S, along with the characterisation of its derivative through the analysis of the linearised state system. This will allow to obtain a first version of the first-order conditions for optimality by means of a suitable variational inequality involving the derivative of the reduced cost functional. In this direction, we introduce the assumptions C2: the map B : V 1 → L 2 (K , H ) is of class C 1 . Let us point out that this implies together with A3 that D B(y)ζ L 2 (K ,H ) ≤ C B ζ H for all y, ζ ∈ V 1 . Moreover, let us stress this requirement is very natural, and it is satisfied, for instance, in the relevant example described in A3, provided to replace H ))), and it holds that This is a refinement of assumptions C1-C2 and ensures, as we will see, better differentiability properties for S. Still, C3 is satisfied by the polynomial potential pol and the relevant noise coefficient described in A3, provided to replace The linearised system can be formally obtained by differentiating the state system (1.1)-(1.4) with respect to the control u in a given direction h ∈ U, and reads The next result ensures exactly that the linearised system (2.6)-(2.9) is well posed in a suitable variational sense, and that the unique solution to (2.6)-(2.9) coincides with the derivative of the control-to-state map S in the point u along the direction h.
Theorem 2.5 Assume A1-A3, C1-C2, and p > 3. Then, for all u ∈ U ad and h ∈ U, such that, for every t ∈ [0, T ], P-almost surely, Furthermore, the control-to-state map S 1 is Gâteaux-differentiable in the following sense: for all u ∈ U ad and h ∈ U, as δ 0, it holds that Moreover, if p ≥ 7 and C3 holds, then S 1 is also Fréchet-differentiable as a map The second step in the analysis of necessary conditions for optimality consists in studying the so-called adjoint system and by proving a suitable duality relation with respect to the linearised system. The adjoint system can be formally obtained as the dual system of (2.6)-(2.9), and reads Let us point out that the adjoint system is backward in time: due to the stochastic framework of the problem, this necessarily requires the introduction of the additional variable Z in view of the classical martingale representation theorems. The situation here is then much more complex than the deterministic one: the variable of the adjoint system is indeed the couple (P, Z ), withP being an auxiliary variable. Due to the difficulty of analysis of the adjoint system, we will need to require more regularity on the targets, namely C4 p ≥ 6 and it holds that The next result ensures that the adjoint system (2.10)-(2.13) is well posed in a suitable variational sense, and state a duality relation between (2.6)-(2.9) and (2.10)-(2.13).
Theorem 2.6 Assume A1-A3, C1-C2, and C4. Then, for all u ∈ U ad , setting ϕ := S 1 (u), there exists a triplet (P,P, Z ), with such that, for every t ∈ [0, T ], P-almost surely, Furthermore, the solution components ∇ P,P, and ∇ Z are unique in the spaces At this point, we are finally ready to state the necessary conditions for optimality: more specifically, we present here two different versions. The first one is deduced directly by the characterisation of the derivative of S 1 in Theorem 2.5, and consists of a variational inequality depending also on the linearised variables. The second one is a refinement of this, as it employs the adjoint problem and only depends on the intrinsic adjoint variables (P,P, Z ), not on the linearised ones.
Theorem 2.7 Assume A1-A3, C1-C2, and p ≥ 6. If u ∈ U ad is an optimal control for (CP) and ϕ := S 1 (u) is its respective optimal state, then (2.14) where θ v−u is the unique first solution component of the linearised system (2.6)-(2.9) with the choice h := v − u, in the sense of Theorem 2.5.
Theorem 2.8 Assume A1-A3, C1-C2, and C4. If u ∈ U ad is an optimal control for (CP) and ϕ := S 1 (u) is its respective optimal state, then where ∇ P is the uniquely determined solution component of the adjoint system (2.10)-(2.13) in the sense of Theorem 2.6. In particular, if α 3 > 0, then u is the orthogonal projection of − 1 α 3 ϕ∇ P on the closed convex set U ad in the Hilbert space L 2 P ( ; L 2 (0, T ; H d )).

Remark 2.9
Let us comment on the necessary condition for optimality. When handling the optimisation problem in practice, the main role of condition (2.15) is to restrict the class of possible candidates to be optimal controls. Roughly speaking, the optimisation analysis begins with the identification of some natural candidates u to the role of optimal controls. Secondly, for such controls u the forward and the backward systems are solved, so that the respective variables ϕ = ϕ(u) and ∇ P = ∇ P(u) are identified. Finally, if condition (2.15) is not met, then the candidate u is cut off from the analysis, otherwise it is confirmed. Nonetheless, let stress again that condition (2.15) is only a necessary requirement, and can only help to restrict the class of potential optimal controls. In order to further refine the analysis, sufficient conditions for optimality should be investigated. The mathematical idea behind this is very natural: if the reduced cost functional J can be shown to be twice (Fréchet or Gâteaux) differentiable, then any control u satisfying the first-order stationary condition (2.15) and the positive definiteness condition D 2 J (u) > 0 is an optimal control. Such second-order analysis is extremely challenging, and to the best of the author's knowledge, it has been performed so far only in relation to some selected optimal control problems in the deterministic setting (Colli et al. 2015b;. In the stochastic case, the secondorder analysis is open and is currently being investigated in a work in preparation.

Well-posedness of the State System
This section is devoted to the proof of Theorem 2.1 about well-posedness of the state system.

Uniqueness
Let {u i } i=1,2 ⊂ U and let us denote by {(ϕ i , μ i )} i=1,2 any respective solutions to (1.1)-(1.4) in the sense of Theorem 2.1. Let us set for brevity of notation ϕ := ϕ 1 − ϕ 2 , μ := μ 1 − μ 2 , u := u 1 − u 2 : then we have where the equality is intended in the usual variational sense of Theorem 2.1. Taking 1 |O| ∈ V 1 as test function yields directly by assumption A3 that Now, the mean value theorem and assumption A1 give while the inclusion V 1 → L 6 (O), the Hölder and the Poincaré-Wirtinger inequalities yield Using the compactness inequality (2.1) and rearranging the terms, we are left with On the right-hand side, we have, by the Hölder inequality in time, and, thanks to the Burkholder-Davis-Gundy and the Young inequalities, assumption A3, and again the compactness inequality (2.1), Consequently, taking power p/2 at both sides of (3.1) and rearranging the terms yield Hence, setting Since T 0 is independent of the initial time, we can iterate the procedure and close the estimate on each subinterval [kT 0 , (k + 1)T 0 ] for all k ∈ N until (k + 1)T 0 > T : summing up, noting that the number of such subintervals is less than T T 0 + 1, and renominating c independently of u 2 , we get then from which uniqueness of solutions follows.

Approximation
We turn now to existence of solutions. First of all, for every λ let β λ : R → R be the Yosida approximation of β and β λ : R → [0, +∞) be the Moreau-Yosida regularisation of β, which are defined, respectively, as: Let us recall that β λ is 1 λ -Lipschitz continuous, β λ is convex and quadratic at ∞, and as λ 0 it holds that β λ (r ) → β(r ) and β λ (r ) β(r ) for all r ∈ R. For further details about the properties of β λ and β λ , we refer to the monograph (Barbu 2010, Ch. 2). We define the approximated double-well potential as: so that in particular we have λ (r ) = β λ (r ) − C r for r ∈ R. Secondly, we define where (ρ λ ) λ ⊂ C ∞ c (R) is a classical non-anticipative sequence of mollifiers in time. In particular, let us point out that it holds The approximated system is obtained by replacing with λ and u with u λ in (1.1)-(1.4): We formulate (3.2)-(3.5) in an abstract way as where the variational operators are defined as: Since λ is Lipschitz-continuous, it is not difficult to show (see, for example, Scarpa 2018, Lem. 3.1) that A λ is weakly monotone, weakly coercive, and linearly bounded, in the sense that there are two constants c λ , c λ > 0 such that As far as the convection operator C λ is concerned, since div u λ = 0, thanks to the divergence theorem we have and, thanks to the Hölder inequality and the inclusion V 1 → L 6 (O), Hence, the operator A λ + C λ : × [0, T ] × V 2 → V * 2 is weakly monotone, weakly coercive, and linearly bounded. Besides, due to the Lipschitz-continuity of λ and the regularity of u λ , it is immediate to check that it is also hemicontinuous. Moreover, assumption A3 ensures that B : H → L 2 (K , H ) is Lipschitz-continuous. It follows then by the classical variational approach to SPDEs by Pardoux (1975) and Krylov and Rozovskiȋ (1979) that the evolution equation (3.6) admits a unique variational solution Let us set μ λ := − ϕ λ + λ (ϕ λ ) as the approximated chemical potential.

Uniform Estimates
Itô's formula for the square of the H -norm yields Now, on the left-hand side, we have, thanks to the monotonicity of β λ , Also, by the Hölder inequality and the inclusion V 1 → L 6 (O), it holds Thanks to the elliptic regularity theory for the Neumann problem (see, for example, Brezis 2011, §9.6) there is c > 0 independent of λ such that ζ V 2 ≤ c( ζ H + ζ H ) for every ζ ∈ V 2 : consequently, renominating c and using the Young inequality we get Furthermore, noting that 2γ γ −1 ≥ 4 since γ ∈ [1, 2], assumption A3 yields Putting this information together and using assumption on the right-hand side we get, possibly updating the value of c, Taking now power p/2 at both sides, the stochastic integral on the right-hand side can be treated again thanks to A3, using classical computations based on the Burkholder-Davis-Gundy inequality (see, for example, Marinelli and Scarpa 2018, Lem. 4.3). Consequently, the same iterative argument used in Sect. 3.1 ensures that (3.7) In order to deduce further estimates on ϕ λ and μ λ , we rely on the free-energy estimate. Namely, we consider the approximated energy Clearly, E λ is well defined and of class C 1 in V 1 , with derivative so that in particular we have DE λ (ϕ λ ) = μ λ . Moreover, the Lipschitz-continuity of Now, we would like to write Itô's formula for E λ (ϕ λ ): in order to do this, we need to show first that ϕ λ and μ λ enjoy more regularity. This can be shown by performing a further approximation on the problem (for example, the classical Faedo-Galerkin approximation of the abstract evolution equation (3.6)). Indeed, by the classical variational theory on stochastic evolution equations (Liu and Röckner 2015), there is a sequence (H n ) n of finite-dimensional subspaces of H , included in V 2 and with ∪ n H n dense in H , such that, setting P n : V * 2 → H n as the orthogonal projection onto H n , the unique solution (ϕ n λ , μ n λ ) of the finite-dimensional system dϕ n λ − μ n λ dt + P n (u λ · ∇ϕ n λ ) dt = P n B(ϕ n λ ) dW At this point, the finite-dimensional Itô formula for E λ|H n yields for every t ∈ [0, T ], P-almost surely. We show now uniform estimates on the terms on the right-hand side, independent of both λ and n. These will show a posteriori that ϕ λ and μ λ are actually more regular. For this reason and for brevity of notation, we omit from now on the dependence on n and refer to (Scarpa 2018(Scarpa , 2020 for more detail.
To this end, noting that the definition of μ λ and assumption A1 imply On the right-hand side, thanks to the Hölder and Young inequalities, the inclusion V 1 → L 6 (O), and the estimate (3.7), proceeding as in Sect. 3.1, we have Moreover, assumptions A3 and A1 yield, together with the Hölder inequality and (3.7), Finally, the Burkholder-Davis-Gundy and the Poincaré-Wirtinger inequalities give, together with assumption A3, for every δ > 0, where we have updated the value of c and c δ step-by-step, independently of λ. Putting all this information together, choosing δ sufficiently small, rearranging the terms, and updating again the value of c, we infer that Consequently, we can close the estimate on a certain subinterval [0, T 0 ], where T 0 is chosen sufficiently small in order to incorporate the terms on the right-hand side into the corresponding ones on the left. Also, a patching argument as in Sect. 3.1 allows then to extend the estimate to the whole interval [0, T ], and we obtain ϕ λ L p ( ;L ∞ (0,T ;V 1 )) + μ λ L p/2 P ( ;L 2 (0,T ;V 1 )) + ∇μ λ L p P ( ;L 2 (0,T ;H )) which by comparison in μ λ = − ϕ λ + λ (ϕ λ ) and estimate (3.7) gives also λ (ϕ λ ) L p/2 P ( ;L 2 (0,T ;H )) ≤ c 1 + u 2 p p−2 U .
Let us show now that, possibly on a further subsequence, we have also the strong convergence To this end, we use the following lemma due to Gyöngy and Krylov (1996, Lem. 1.1), which characterises the convergence in probability in a Polish space.

Lemma 3.1 Let X be a Polish space and (Z n ) n be a sequence of X -valued random variables. Then, (Z n ) n converges in probability if and only if for any pair of subsequences
(Z n k ) k and (Z n j ) j , there exists a joint sub-subsequence (Z n k , Z n j ) converging in law to a probability measure ν on X × X such that ν({(z 1 , z 2 ) ∈ X × X : z 1 = z 2 }) = 1.
We apply this lemma to X = C 0 ([0, T ]; H ) ∩ L 2 (0, T ; V 1 ) and (ϕ λ ) λ . Given two arbitrary subsequences (ϕ λ k ) k and (ϕ λ j ) j , since the laws of the pairs (ϕ λ k , ϕ λ j ) k, j are tight on for some measurable random variables Similarly, we have for some measurable random variables and from which u 1 = u 2 P -almost surely due to the arbitrariness of f . Let us set then u := u 1 = u 2 and (μ λ k i , μ λ j i ) := (μ λ k i , μ λ j i ) • φ i : since the maps φ i preserve the laws, from the uniform estimates (3.7)-(3.9) we deduce also that for some measurable random variables Now, if we introduce the filtration (F i,t ) t∈[0,T ] as: using classical representation theorems for martingales (see Flandoli andGatarek 1995 andZabczyk 2014, § 8.4) we have that W i is a cylindrical Wiener process on ( , F , (F t ) t∈ [0,T ] , P ) and so that on the new probability space ( , F , P ) we have where the equations are intended in the usual variational sense (3.6). Now, the strong convergences of (ϕ λ k i , ϕ λ j i ) i imply, together with the Lipschitz-continuity of B, that

Introducing then the limiting filtration (F t ) t∈[0,T ] as
a classical argument based again on the martingale representation theorem (see Flandoli andGatarek 1995 andZabczyk 2014, § 8.4) yields the identification Moreover, the strong convergences of (ϕ λ k i , ϕ λ j i ) i together with the uniform estimate (3.9) on the nonlinearities also give Putting all this information together, we deduce that (ϕ 1 , ϕ 2 ) solves the limit problem (1.1)-(1.4) in the sense of Theorem 2.1 on the new probability space ( , F , P ), namely Since we have already proved uniqueness of solutions in Sect. 3.1, we deduce that so that Lemma 3.1 ensures the strong convergence (3.12) also on the original probability space ( , F , P). Proceeding now in exactly the same way on ( , F , P) instead, it is a standard matter to show that (ϕ, μ) is the unique solution to the state system (1.1)-(1.4). Clearly, the global estimate (2.2) follows directly by the computations in Sect. 3.3 and assumption A3,

Continuous Dependence
Here we conclude the proof of  4). To this end, we use the same notation of Sect. 3.1 and use Itô's formula for the square of the H -norm instead, getting The third term on the left-hand side can be handled thanks to assumption A1, the Hölder and Young inequalities, and the embedding V 1 → L 6 (O), as The convection terms on the right-hand side can be treated similarly using the divergence theorem, the Hölder and Young inequalities, and the inclusion L 6 ( ) → V 1 as Hence, we rearrange the terms and take power p/6 at both sides, obtaining, thanks to the Hölder and Young inequalities, where the Burkholder-Davis-Gundy inequality and the Lipschitz-continuity of B yield for all σ > 0. Hence, choosing σ sufficiently small and rearranging the terms, the continuous dependence (2.4) follows from the already proved estimates (2.2)-(2.3). This concludes the proof of Theorem 2.1.

Existence of Optimal Controls
In this section, we prove Theorem 2.4 showing that the optimisation problem (CP) always admits a relaxed optimal control u ∈ U ad and a deterministic optimal control u det ∈ U det ad . The main idea is to use the direct method from calculus of variations, combined with a stochastic compactness argument.
Let (u n ) n ⊂ U ad be a minimising sequence for the functional J , in the sense that and define (ϕ n , μ n ) n as the unique respective solutions to the state system (1.1)-(1.4), in the sense of Theorem 2.1. Thanks to the definition of U ad and the estimate (2.2), we deduce that there exist u ∈ U ad and a triplet (ϕ, μ, ξ) with such that, as n → ∞, possibly on a subsequence, Assumption A3 and the uniform estimates on (ϕ n ) n ensure also that so that in particular By comparison in the equation (1.1), we infer then which ensures that the laws of (ϕ n ) n are tight on the space C 0 ([0, T ]; H ) ∩ L 2 (0, T ; V 1 ). We argue now on the same line of Sect. 3.4. As a consequence of the Skorokhod theorem, there is a probability space ( , F , P ) and measurable maps Furthermore, on the new probability space we have where the stochastic integral is intended with respect to a suitably defined filtration (F i,t ) t∈ [0,T ] . Proceeding as in Sect. 3.4, we infer that so that by assumption A3 and the martingale representation theorem we can pass to the limit as i → ∞ on the new probability space and get This shows that u ∈ U ad and that (ϕ , μ ) = S (u ). To conclude that u is a relaxed optimal control for the optimisation problem (CP), we note that by the weak lower semicontinuity of the cost functional J we have so that u ∈ U ad is a relaxed optimal control in the sense of Definition 2.3. In order to show existence of a deterministic optimal control, the argument is similar. We start taking a minimising sequence (u n ) n ⊂ U det ad such that Arguing exactly as above, thanks to the fact that (u n ) n are deterministic, in this case we have that u n i = u n i for every i ∈ N. Consequently, in this case we can (ϕ n ) n inherits some strong compactness properties on the original probability space, using a similar argument to the one of Sect. 3.4, by employing Lemma 3.12. Namely, we infer the strong convergence on the original probability space ( , F , P). It follows then that ξ = (ϕ) almost everywhere, and letting n → ∞ yields so that (ϕ, μ) = S(u). At this point, the conclusion follows as above by lower semicontinuity of the cost functional.

Linearised System and Differentiability of the Control-to-State Map
The aim of this section is to prove that the linearised state system (2.6)-(2.7) is well posed and to characterise its solution as the derivative on the control-to-state map. Namely, we prove here Theorem 2.5.

Existence
Let u ∈ U ad and h ∈ U be arbitrary and fixed. Using the notation of Sect. 3.2, we consider the approximated linearised problem Noting that λ (ϕ) ∈ L ∞ ( × Q), the classical variational approach ensures existence and uniqueness of the approximated solution in the sense that, for every ζ ∈ V 2 , for every t ∈ [0, T ], P-almost surely, Noting that (θ h,λ ) O = 0, we can write Itô's formula for 1 2 ∇N θ h,λ 2 H , getting Now, assumption A1, the Hölder-Young inequalities and the compactness inequality (2.1), and the embedding V 1 → L 6 (O) give, for all ε > 0, Similarly, by C2 and again the compactness inequality (2.1), we have As for the stochastic integral, the Burkholder-Davis-Gundy and Young inequalities give (see, for example, Marinelli and Scarpa 2020, Lem. 4.1), together with (2.1) and C2 E sup Consequently, using the same iterative-patching argument of Sect. 3.1, raising to power p/2, taking supremum in time and expectations, we infer that (5.6) Now, Itô's formula for 1 2 θ h,λ 2 H yields where by the divergence theorem we have Hence, it is not difficult to see that, using again the Hölder, Young and Burkholder-Davis-Gundy inequalities, assumption C2, and the estimate (5.6), all the terms on the right-hand side can be handled, except the one containing . For this one, we proceed using C1, the embedding V 1 → L 6 (O), as where, thanks to (5.6) and the Hölder inequality, Consequently, we deduce that from which, by comparison in (5.2), We infer the existence of (θ h , ν h ) with such that, as λ 0 (possibly on a subsequence), Since the systems (5.1)-(5.4) and (2.6)-(2.9) are linear, the passage to the limit is straightforward. Indeed, by assumption C2 and the dominated convergence theorem, it follows that ; L 2 (0, T ; L 2 (K , H )) .
Moreover, thanks to C1 and the regularity of ϕ, we have (ϕ) ∈ L 3 ( ; L ∞ (0, T ; L 3 (O))), so in particular and also, thanks to (5.10), We deduce that letting λ 0 in (5.5) we get that (θ h , ν h ) is a solution to (2.6)-(2.9) in the sense of Theorem 2.5. The strong continuity in H of θ h follows a posteriori with a classical method by Itô's formula on the limit equation (2.6).

Uniqueness
We show here that the linearised system (2.6)-(2.9) admits at most one solution. By linearity, it enough to check that if (θ, ν) is a solution to (2.6)-(2.9) in the sense of Theorem 2.5 with h = 0, then θ = ν = 0. To this end, we note that (2.6) yields θ O = 0, so that Itô's formula gives Now, we can argue on the same line of Sect. 5.1 by using assumption A1 on , C2 on D B, together with Burkholder-Davis-Gundy and Young inequalities to get from which θ = 0, and also ν = 0 by comparison in (2.7). This show that the linearised system (2.6)-(2.9) admits at most one solution.

Gâteaux-Differentiability
We prove here that S 1 is Gâteaux-differentiable. Let u ∈ U ad and h ∈ U be arbitrary and fixed: since U ad is open in U, there exists δ 0 > 0 such that u + δh ∈ U ad for all δ ∈ [−δ 0 , δ 0 ]. For every such δ, setting (ϕ δ , μ δ ) := S(u + δh) and (ϕ, μ) := S(u), the difference of the respective equations (for δ = 0) gives whose natural variational formulation reads -a.s. (5.12) Now, by the continuous dependence estimate (2.4), we deduce that there exists a constant c > 0 independent of δ such that such that, as δ → 0 possibly on a subsequence, It follows in particular that (5.16) Furthermore, since u ∈ U, by the inclusion V 1 → L 6 (O), the Hölder inequality, and the convergence (5.14), it holds that As far as the nonlinear term is concerned, thanks to the mean-value theorem we have Now, by the strong convergence (5.16) and the continuity of , we have where, recalling that by C1 has quadratic growth, thanks to the embedding V 1 → L 6 (O) the left-hand side is uniformly bounded in the space for every ∈ [1, p/2) and ∈ [1, +∞). Taking (5.14) into account, we infer in particular that for every ∈ [1, p/3) and ∈ [1, 2). Similarly, thanks to C1 and the regularity of ϕ, we have (ϕ) ∈ L p/2 ( ; L ∞ (0, T ; L 3 (O))), and the same argument as above yields for every ∈ [1, p/3) and ∈ [1, 2). It follows that (5.18) Lastly, let us handle the stochastic integral. By the Lipschitz-continuity of B in A3, we have Now, the strong convergence (5.16), the continuity and boundedness of D B in C2 imply together with the dominated convergence theorem that for every ∈ [1, +∞). Since ϕ δ −ϕ δ is bounded in L p/3 ( ; L 4 (0, T ; V 1 )) by interpolation of (5.13)-(5.14), it follows that for every ∈ [1, p/3). Similarly, by the boundedness of D B in C2 and the convergence (5.14), we have also Hence, we obtain that Finally, letting δ → 0 in (5.12) using convergences (5.13)-(5.19), we deduce that actually (θ h , ν h ) is the unique solution of the linearised system (2.6)-(2.9) in the sense of Theorem 2.5. It remains to show now the strong convergence of ϕ δ −ϕ δ . To this end, note that by the Lipschitz-continuity of B in A3 and (5.14), we have from which, thanks to the classical result (Flandoli and Gatarek 1995, Lem. 2 ≤ c r ∀ r ∈ (0, 1/2).
By comparison in the equation (5.12) and the estimates proved above, we infer then that ≤ c r ∀ r ∈ (0, 1/2). Now, recalling that by (Simon 1987, Cor. 5), we have so that the laws of ( ϕ δ −ϕ δ ) δ are tight on L 2 (0, T ; V 1 ). By using again Lemma 3.12 together with the uniqueness of the limit problem at δ = 0, proceeding as in Sect. 3.4, we also get the strong convergence which in turn yields, together with (5.14), the strong convergence of Theorem 2.5. This proves that S 1 is Gâteaux-differentiable, and its derivative is a solution to the linearised system, in the sense of Theorem 2.5.

Fréchet-Differentiability
We are only left to show the Fréchet-differentiability of S 1 . To this end, since U ad is open in U, there is a U-ball B U r (u) of radius r = r u > 0 centred at u such that Noting that (y h ) O = 0, Itô's formula yields , P-a.s. Now, the Young and Hölder inequalities give, together with the embedding V 1 → L 6 (O), Moreover, note that by the mean value theorem and assumption A1 we have where, by the Hölder inequality, the compactness inequality (2.1), the embedding V 1 → L 6 (O), and assumption C1, Lastly, we have so that by A3, C2-C3, and the compactness inequality (2.1), Consequently, taking all this information into account, we can choose ε small enough and rearrange the terms to get Thanks to the embedding L ∞ (0, T ; H ) ∩ L 2 (0, T ; V 2 ) → L 4 (0, T ; V 1 ), by (2.4) and (5.13)-(5.14), we have where the constant c is independent of h. Taking power p 14 at both sides, supremum in time and expectations, on the right-hand side we use the Hölder inequality with exponents 1 7 + 3 7 + 3 7 = 1 to get and similarly Consequently, arguing again as in Sect. 3.1, using an iterative argument and the Burkholder-Davis-Gundy and Young inequalities (see also Marinelli and Scarpa 2020, Lem. 4.1) gives then This proves the Fréchet-differentiability of S 1 and concludes the proof of Theorem 2.5.

Adjoint System
In this section, we study the adjoint problem (2.10)-(2.13), proving that it is well posed in the sense of Theorem 2.6. As we have anticipated in Introduction, the presence of the extra-random component in the convection term calls for non-trivial mathematical tools when deriving estimates on the solutions. Let us recall here a general backward version of the stochastic Gronwall lemma that will be used in this section: for details we refer to (Hun et al. 2020, Thm. 1) and (Wang and Fan 2018).
) be a nonnegative process such that Then, for every t ∈ [0, T ] it holds that

Approximation
For every λ > 0, using the approximations on and u as in Sect. 3.2, we consider the approximated problem This can be written in abstract form as: By construction it holds that λ (ϕ) ∈ L ∞ ( × Q) and u λ ∈ L ∞ P ( × (0, T ); U ), so that using similar arguments to the ones in Sect. 3.2, we have that the operator F λ is progressively measurable, hemicontinuous, weakly monotone, weakly coercive, and linearly bounded. Moreover, the Lipschitz-continuity of B in A3 implies that D B(ϕ) * is uniformly bounded as well. The classical variational theory for backward SPDEs (Du and Meng 2010, Sec. 3) ensures then that such approximated problem admits a unique variational solution (P λ , Q λ ), with Actually, let us note that thanks to the assumption on the target ϕ T and the regularity of ϕ, the final value satisfies α 2 (ϕ(T ) − ϕ T ) ∈ L 2 ( , F T ; V 1 ). Consequently, by a standard finite dimensional approximation of the approximated problem with λ > 0 fixed, it follows that the approximated solution actually inherits more regularity, namely We can then set so that (P λ ,P λ , Z λ ) satisfy, for every t ∈ [0, T ], P-almost surely, for every ζ ∈ V 1 ,

An Estimate by Duality Method
The first estimate that we prove is based on a duality method between the approximated adjoint system (6.1)-(6.4) and a suitably introduced approximated linearised system. This step is fundamental as it allows to obtain some preliminary estimates on the adjoint variables without working explicitly on the adjoint system, which may be not trivial. Such duality method is extremely powerful, and it will be crucial in showing well-posedness of the adjoint system. The idea is the following: we consider the λ-approximated version of the linearised system (2.6)-(2.9), in a more general version where the forcing term is given by an arbitrary term Since λ (ϕ) ∈ L ∞ ( × Q), the classical variational approach (see again Sects. 3.2 and 5.1) ensures that the system (6.5)-(6.8) admits a unique solution Moreover, we can show that the system (6.5)-(6.8) is in duality with the approximated adjoint system (6.1)-(6.4). To this end, by Itô's formula we have that which readily implies by comparison in the two systems that (6.9) Let us set now for brevity of notation θ Using the fact that λ ≥ −C and the boundedness of D B(ϕ) in L (V 1 , L 2 (K , H )), thanks to the Hölder-Young inequalities and the compactness inequality (2.1) we get, for all ε > 0, We take now power p p+4 at both sides, supremum in time, and expectations. Thanks to the Burkholder-Davis-Gundy inequality (see Marinelli and Scarpa 2020, Lem. 4.1), assumption C2, and (2.1), we get Moreover, since u ∈ U ad , by the Hölder inequality we have Since p p+4 > 0 and p−2 p+4 > 0, we can close the estimate rearranging all the terms on [0, T 0 ] for T 0 sufficiently small (independent of both λ and g). Using once more a classical iterative procedure on every subinterval until T , we infer that there exists a constant c > 0, independent of both λ and g, such that . (6.10) Now, by assumption C4 and the regularity of ϕ (since 2 p p−4 ≤ p for p ≥ 6), it holds so that the duality relation (6.9) (with h = 0) and the estimate (6.10) yield .
By the arbitrariness of g we obtain ≤ c. (6.11)
Lastly, convergence (6.17) readily implies that while by the linearity and continuity of the stochastic integral we have Consequently, we can let λ 0 in the variational formulation of the approximated system (6.1)-(6.4) and deduce that (P,P, Z ) solve the limit adjoint problem (2.10)-(2.13). The pathwise continuity of P, hence by comparison also ofP, follows by classical methods using Itô's formula on the limit equation.

Uniqueness
and similarly, since D B(ϕ)P is L 2 (K , H 0 )-valued by A3, by the Poincaré-Wirtinger inequality and C2 we have Q T t (D B(ϕ) * Z )P = t 0 (Z (s), D B(ϕ(s))P(s)) L 2 (K ,H ) ds Rearranging the terms and taking conditional expectations with respect to F t , we get that so that applying the backward stochastic Gronwall Lemma 6.1 and then taking expectations yield ∇P = 0 almost everywhere in × Q, hence alsoP = 0 almost everywhere in × Q sinceP O = 0. Consequently, the stochastic integral appearing in the estimate above vanishes, and we deduce also ∇ P = 0 in L 2 P ( ; C 0 ([0, T ]; H d )), from which P = 0 in L 2 P ( ; C 0 ([0, T ]; V * 1 )). Also, ∇ Z = 0 in L 2 P ( ; L 2 (0, T ; L 2 (K , H d ))). This concludes the proof of Theorem 2.6.
Lastly, we note that (2.15) follows directly from (2.14) provided to show the duality relation In order to prove this, we can take g = 0 and h = v − u in the duality relation (6.9), and then let λ 0 thanks to the convergences (5.9)-(5.10). This concludes the proof of Theorem 2.8.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.