Strong Solvability of a Variational Data Assimilation Problem for the Primitive Equations of Large-Scale Atmosphere and Ocean Dynamics

For the primitive equations of large-scale atmosphere and ocean dynamics, we study the problem of determining by means of a variational data assimilation algorithm initial conditions that generate strong solutions which minimize the distance to a given set of time-distributed observations. We suggest a modification of the adjoint algorithm whose novel elements is to use norms in the variational cost functional that reflects the H1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H^1$$\end{document}-regularity of strong solutions of the primitive equations. For such a cost functional, we prove the existence of minima and a first-order adjoint condition for strong solutions that provides the basis for computing these minima. We prove the local convergence of a gradient-based descent algorithm to optimal initial conditions using the second-order adjoint primitive equations. The algorithmic modifications due to the H1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H^1$$\end{document}-norms are straightforwardly to implement into a variational algorithm that employs the standard L2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L^2$$\end{document}-metrics.

tional information. It constitutes a fundamental technique for modeling real-world phenomena. Numerical weather prediction and ocean state estimation are examples of scientific disciplines that rely on data assimilation (Kalnay 2003;Wunsch 1997;Forget et al. 2015). Data assimilation determines a best estimate of the unknown current state of the atmosphere or the ocean by steering the model trajectory toward time distributed observations of the past. The estimate of the current state is used as initial condition for a prediction. The rationale behind is, the better the estimate of the present state, the better the prediction of the future evolution.
The data assimilation algorithm that we consider here is known as "4D-var" or "adjoint method" of optimal control and belongs to the class of variational algorithms. Variational data assimilation algorithms strive for minimizing the distance between observations and model solution by solving an optimization problem for an appropriately chosen cost functional. The initial condition acts as a control variable of the optimization problem that fits the model trajectory over a finite time horizon to the observations. In contrast, feedback control algorithms such as "continuous data assimilation " or "nudging" drive the model over an infinite time horizon toward observations by means of a feedback term (Kalnay 2003). Recently, these algorithms have been improved and rigorous results have been proven for a range of fluid dynamical equations, including the Navier-Stokes or the planetary geostrophic equations (see, e.g., Azouani and Titi 2014;Foias et al. 2016;Desamsetti et al. 2019;Farhat et al. 2016;Korn 2009) and the primitive equation (Pei 2019). A complementary class of assimilation algorithms is given by stochastic data assimilation such as the Kalman filter and its variants (Evensen 2003;Reich and Cotter 2015;Houtekamer and Zhang 2016), for mathematical aspects we refer to Stuart (2017, 2018). Variational as well as stochastic algorithms are of great relevance in atmosphere-ocean science. Research in this area focuses on improvement and extension of computational algorithms in order to improve numerical weather predictions or ocean state estimation. The complexity of data assimilation algorithms and their applications have reached an impressive level (see, e.g., Bonavita et al. 2018;Isaksen et al. 2010;Gauthier et al. 2007). It is therefore important to understand fundamental properties of these algorithms. Results that are similar in spirit to our work can be found in Agoshkov and Ipatova (2007), where a variational algorithm was analyzed that uses observations of sea surface temperature and of sea surface elevation to determining the control variables of heat flux and water flux that occur in the surface boundary conditions of the oceanic primitive equations. The authors used a cost functional based on the L 2norm and showed the existence of minimizers. In Shutyaev (1997), the more general setting of a data assimilation problem for quasi-linear evolution equations in Hilbert spaces was studied. To illustrate the general theory the existence of minimizers and the convergence of an iterative adjoint algorithm was established for a two-dimensional barotropic fluid. For fluid dynamical equations such as the 2D Navier-Stokes or the 3D Navier-Stokes-α equations similar results on optimal control using initial conditions, boundary conditions or external forcing as control variable can be found in Fursikov (1999), Abergel and Temam (1990) and Korn (2009). In Korn (2019), an idealized coupled atmosphere-ocean model was introduced and based on the regularity of the coupled equations the existence of optimal initial conditions and the convergence of a steepest descent method was proven.
In the 4D-var algorithm, the crucial modeling decision is how the distance to the observational data and to the background state is measured, i.e., how the cost functional is specified. The classical choice is the L 2 -norm. In this paper, we suggest to use for the primitive equations the H 1 -norm. This norm in the cost functional reflects the H 1regularity of strong solutions of the primitive equations. The optimization determines initial conditions for strong solutions of the primitive equations. With such a choice, we take advantage of a priori knowledge about the dynamics of the primitive equations.
The use of alternative metrics beyond L 2 ([0, T ], L 2 ( )) is not new in optimal control of fluids [for a review see Medjo et al. (2008)]. In Bewley et al. (2001), for example, enstrophy-based metrics and Sobolev-type cost functionals were discussed in the context of turbulence control via boundary forcing for a channel flow. The purpose here is to obtain additional regularizing effect due to derivatives in the cost functional. In contrast, in our work here, the goal is not to introduce a regularization of the data assimilation problem but to demonstrate that optimal initial conditions for strong solutions of the primitive equations assimilation problem necessitates the use of H 1norms. We refer to this property as strong solvability of the data assimilation problem. The new and fundamental but decisive element in the suggested formulation of the data assimilation algorithm is to use in the cost functional metrics that are tailored to the regularity of the primitive equations. This paper describes for the primitive equations the consequences of such a choice.

The Primitive Equations
The primitive equations are a central set of model equations for atmosphere and ocean dynamics. These equations approximate the large-scale dynamics of ocean and atmosphere and describe ocean or atmosphere as thin layer of a rotating, incompressible fluid, together with the Boussinesq approximation and the hydrostatic approximation. The primitive equations are derived as an approximation of the rotating Navier-Stokes equations Li and Titi (1992). The state vector of the primitive equations consists of a horizontal velocity field v = (u, v) and a tracer such as temperature θ . More specifically, denote by := M × (−h, 0) a cylindrical domain, where h > 0 denotes the depth of the ocean and M ⊆ R 2 is bounded, with a smooth boundary ∂ M. The primitive equations are given by where w denotes the vertical velocity, f the Coriolis parameter, ρ denotes an equation of state with a mean density ρ 0 > 0, a mean temperature θ 0 > 0 and a coefficient α > 0 of thermal expansion, F v , F θ are forcing terms. The numbers Re 1 , Re 2 > 0 denote the horizontal and vertical Reynolds number that represent the viscosity coefficients. The numbers Rt 1 , Rt 2 > 0 are the horizontal and vertical mixing coefficients for temperature. The operators ∇, ∇· and denote the horizontal gradient, divergence and Laplacian, respectively. The partial derivative with respect to the vertical direction is denoted by ∂ z . The boundary ∂ consists of a lateral boundary s , a bottom boundary b and a surface boundary u , i.e., ∂ = s ∪ b ∪ u . The corresponding boundary conditions are where τ is a given 2D wind stress field, n the normal vector at s , n 3 the vertical Cartesian unit vector, and θ * is a given temperature field at the sea surface. The velocity v and the potential temperature θ can be modified at s by adding a τand θ * -dependent contribution such that we can assume without loss of generality homogeneous boundary conditions at the ocean surface (see Lions et al. 1992, Section 2.4., Titi 2007 p. 248, Cao andTiti 2003, p. 202) The mathematical analysis of primitive equations dates back to Lions et al. (1992), where the existence of global in time weak solutions was proven for square integrable initial conditions. Weak solutions satisfy the regularity properties (v, θ) ∈ L ∞ ([0, T ], L 2 ( )) ∩ L 2 ([0, T ], H 1 ( )) 1 . The uniqueness of weak solutions remains still an open question (for weak solutions see also remark 1 in Section 2). The local existence of strong solutions was shown by Guillén-González et al. (2001). In a seminal paper, Cao and Titi (2007) have proven the global existence and the uniqueness of strong solutions for arbitrary initial data in the Sobolev space H 1 . Strong solutions possess more regularity than weak solutions, namely . The notion of "strong solution" is introduced in below Definition 2.1. The well-posedness result Cao and Titi (2007) uses Neumann boundary conditions for velocity on top and bottom of the domain, and a slip boundary condition at lateral sides. A different proof was given by Kobelkov (2007). This breakthrough has been followed by well-posedness results in H 1 for Neumann boundary conditions at the surface and Dirichlet boundary conditions at the bottom and at the lateral boundary by Kukavica and Ziane (2007). The existence of an attractor of the primitive equations was shown in Chueshov (2014) and Ju (2007). The well-posedness of the primitive equations for L p -spaces is proven in Hieber and Kashiwabara (2016). In this work, we use Dirichlet boundary conditions at lateral sides and bottom of the domain and Neumann boundary conditions at the surface, because these are the most common boundary conditions in ocean general circulation models but our results translate to the other boundary conditions mentioned above.
The H 1 -regularity forms the mathematical basis for the specification of the data assimilation problem. We state now the central theorem for later reference. The space V occurring in the Theorem consists of pairs (v, θ) of velocity and temperature in H 1 ( ) that satisfy the boundary conditions (2)-(7) and is defined in Sect. 2 [see (16)].
Theorem 1.1 (Global Well-Posedness (Cao and Titi 2007;Kukavica and Ziane 2007)) Let a time interval [0, T ], with T > 0, be given. Let the forcing satisfy F v , F θ ∈ L 2 ([0, T ], L 2 ( )). If the initial conditions for velocity and temperature satisfy The regularity of solutions of the primitive equations, described in Theorem 1.1, still leaves room for a broad spectrum of spatiotemporal scales that comprises local and sudden events with large gradients such as fronts. For an analysis of the admissible scales in terms of Nusselt, Rayleigh and Reynolds numbers, we refer to Holm (2011, 2013).

The Data Assimilation Problem
We now proceed by describing the data assimilation problem for the primitive equations. Let X = L ∞ ([0, T ], H 1 ( )) be the "state space" of strong solutions of the primitive equations with initial conditions X 0 = H 1 ( ). Let time distributed observations ψ obs are given. In general, the observations reside in an "observational space" Y and an observation operator H maps ψ obs ∈ Y in the model space X . In this work, we assume that the observations are already mapped to the model space, i.e., ψ obs ∈ X .
The data assimilation problem consists in determining, for given observations ψ obs ∈ X an initial conditionψ 0 ∈ X 0 such that and the trajectory ψ(ψ 0 ) ∈ X satisfies the primitive equations (1) with initial conditionψ 0 .
The cost functional J in (9) is defined as a sum of a background and an observational term The observational part J obs of the cost functional measure the distance between the model state and the observations and is defined by Here, M denotes the model operator that advances an initial condition ψ 0 to the model state ψ(t) at time t, i.e., ψ(t) = M[ψ 0 ](t). The observational error covariance operator R provides a statistical weighting of the model-observation misfit, according to the quality of the observations. The scalar product in (11) acts on the spatial variable. The background term J b of the cost functional in (10) is defined by where ψ back ∈ X 0 is a given background state such as the outcome of a previous forecast and B denotes the model error covariance operator. The background term J b incorporates prior information about the system but it can also be interpreted as a Tikhonov regularization of J . The model error covariance operator B provides a weighting according to the estimated model error.
As a consequence of the nonlinearity of the primitive equations, encoded in the model operator M in (11), we can for the data assimilation problem (9) not expect a global minimum but at most multitude of local minima. In principle, the uniqueness of the optimization can be enforced by weighting the background and observational parts of the cost functional appropriately but this may come at the expense of discarding the observational information [see Theorem 7 in Korn (2009) for a result in this respect].

Overview of Results
The relevance of the primitive equations for atmosphere/ocean science and climate research together with the fact that these equations constitute one of the rare examples of a 3D fluid dynamical equation with a global well-posedness result poses the question how this knowledge can be incorporated into a 4D-var data assimilation algorithm. We advocate a formulation of the variational cost functional (10) that reflects the regularity of the underlying dynamical equations. This philosophy has already been demonstrated in the context of an idealized coupled atmosphere-ocean model (Korn 2019).
The classical formulation of 4D-Var based on the L 2 -norm in space and with initial conditions in L 2 ( ) is associated with the notion of weak solutions. These solutions describe the dynamics in the L 2 -space. For weak solutions the uniqueness as well as the continuous dependency on the initial conditions is unknown but for the adjoint approach of variational data assimilation not only continuity but differentiability with respect to the initial condition is required. The situation changes completely if we employ the notion of strong solutions of the primitive equations, for which uniqueness and continuous dependency of the solution on the initial condition are established.
In view of the data assimilation problem strong solutions require optimal initial conditions with respect to the observations in the space H 1 ( ). These initial conditions generates dynamics in L ∞ ([0, T ], H 1 ) ∩ L 2 ([0, T ], H 2 ( )) and consequently we measure the distance to observations in this space, i.e., we use in the background term J b of the cost functional (10) the H 1 -norm and in the observational term J obs the L 2 ([0, T ], H 1 ( ))-norm. This imposes stronger constraints through the higher derivatives than the L 2 -norm. The L 2 ([0, T ], H 2 ( ))-norm can not be used in the observational term of the cost functional, because the degree of the spatial derivative in this term is linked via the Gateaux-derivative of the cost functional to a differential operator that is part of the forcing of the adjoint equations. The second-order derivative in the spatial H 2 -norm would result in an adjoint forcing that is no longer squareintegrable. For details, we refer to Theorem 4.4 and remark 2 in Sect. 4.2.
In Lemma 3.2, we prove that strong solutions are differentiable with respect to the initial conditions. For strong solutions of the primitive equations, we prove in Theorem 4.1 that local minimizers of the cost functional exists. Theorem 4.4 gives a first-order necessary adjoint condition for these minimizers that provides the basis for their computation by means of an adjoint based optimizations. For strong solutions, we prove the convergence of a steepest-descent algorithm in Theorem 5.3, provided one uses a suitable starting point for the iterative process. Pure Newton methods that allow global convergence have never been used for realistic applications in numerical weather prediction or ocean state estimation due to their prohibitive computational costs. Variants such as quasi-Newton methods are not considered in this paper.
Our formulation of the variational data assimilation for the primitive equations does only modify the optimization framework. Affected are the forcing of the adjoint equation and the gradient that results from an integration of the adjoint equation, while the dynamics of the primitive equations remain untouched. The overall algorithmic structure of an existing 4D-Var code is also left intact. The choice of the H 1 -norm in the background term results for a computational gradient-based algorithm in a filter of the gradient through an inverse Helmholz operator. The L 2 ([0, T ], H 1 ( ))-norm in the observational term implies a modification of the forcing of the adjoint equation with a Helmholtz operator applied to the model-data-misfit. Both of these changes are easy to integrate into an existing 4D-Var algorithm (see Fig.1).
The impact of regularity-tailored cost functionals on the quality as well as on the performance of the 4D-Var algorithm has to be assessed in numerical simulations. In particular the potential of the method for controlling turbulent flows has to be investigated. This comprises the comparison of the quality of the H 1 -with the L 2 -4D-Var data assimilation and the potential to resolve issues of (L 2 -) 4D-Var such as strong adjoint sensitivities or large number of local minima (see, e.g., Hoteit et al. 2005, (Köhl andWillebrand 2002;Lorenc 2003). The experimental investigation of these topics is beyond the scope of this paper.

Functional Analytic Setup
Let us recall the following basic notation • v := (v, w) velocity field with horizontal velocity v := (u, v) and vertical velocity w We define the following fundamental function spaces for velocity and temperaturẽ Denote by V v , V θ the closure ofṼ v in the Sobolev space H 1 ( ) and ofṼ θ in H 1 ( ). The norm of V v and V θ is and ||θ || 2 H 1 := |θ(x, y, z)| 2 dxdydz + |∇ 3 θ(x, y, z)| 2 dxdydz.
The product space by is equipped with the sum of the norms || · || H 1 := || · || H 1 + || · || H 1 . We denote the L 2 -norm in the velocity and temperature space by || · || L 2 and || · || L 2 , and in the velocity-temperature product space by || · || L 2 := || · || L 2 + || · || L 2 . The H 2 -norm in the velocity-temperature product space is denoted by || · || H 2 := || · || H 2 + || · || H 2 . In (1), the vertical velocity w is determined by the constraint (1c) and can by virtue of the boundary conditions be expressed as The pressure term can with the hydrostatic balance be reformulated as where p s is the surface pressure and has to be determined.
and if it satisfies in the sense of distributions for all ∈ (C ∞ (¯ )) 2 and φ ∈ C ∞ (¯ ) where w denotes the vertical velocity, f the Coriolis parameter and ρ 0 , θ 0 > 0 are a mean density and a mean temperature, respectively, and α > 0 the thermal expansion coefficient, F v , F θ denote the external forcing. The numbers Re 1 , Re 2 > 0 denote the horizontal and vertical Reynolds number that represent the viscosity coefficients. The numbers Rt 1 , Rt 2 > 0 are the horizontal and vertical mixing coefficients for temperature.
(ii) We define the state vector ψ of the primitive equation as pair of velocity and temperature variables where (v, θ) denote a strong solution of the primitive equation. Furthermore, we define the nonlinear terms as follows: The dissipation operators are denoted by with The linear terms are given by With this notation, we write the primitive equations (1) in the form and interpret this equation in the distributional sense.
Remark 1 (Weak Solutions) Weak solutions of the primitive equations are defined with less regularity requirements. They satisfy (19) in the sense of distributions but with (v, θ) ∈ L ∞ ([0, T ], L 2 ( )) ∩ L 2 ([0, T ], H 1 ( )). The global existence of weak solutions with initial data in (v 0 , 0 ) ∈ L 2 ( ) has been shown in Lions et al. (1992). The uniqueness is still open. Uniqueness of weak solutions with initial conditions in the space of continuous functions was proven in Kukavica et al. (2014).
We recall a set of inequalities that are used in the sequel. The Ladyshenzkaya inequalities in R 2 are valid for φ ∈ H 1 (M) and for a non-dimensional constant C 0 In R 3 , it holds for u ∈ H 1 ( ) Lemma 2.2 (Cao and Titi 2003)

Lemma 2.3 (Gronwall Inequality) Let f be an absolutely continuous function that satisfies If d f
dt ≤ g f + h for some real integrable functions g(t) and h(t). Then,

Linearized and Adjoint Primitive Equations, and Differentiability
In this section, we prove regularity results for the linearized and adjoint primitive equations and establish the differentiability of the model solution with respect to the initial condition.

Linearized Equations and Differentiability
We linearize the primitive model equations around a strong solution ψ = (v, θ) of (1). The resulting equations are also referred to as "tangent linear model." The linearized equations in terms of velocity and temperature variables (V, ) are given by with surface pressure P S and with the equation of state given by The boundary conditions for (32) are analogous to the boundary condition (2)-(8) for the primitive equations (1) and read as follows: We write the linearized primitive equations for = (V , ) in the following form where D and L are defined in (24), (27), and where The solution satisfies for t ∈ [0, T ] where Proof The proof is given in Appendix 1.
The regularity result of the linearized primitive equations in Theorem 3.1 is instrumental for the differentiability result of the next Lemma.
Proof Let a := (a v , a θ ) ∈ V. Denote by ψ 0 , ψ 0 + τ a ∈ H 1 ( ), τ > 0, two initial conditions and by ψ and ψ τ a the corresponding strong solutions of the primitive equations (1). They satisfy the equations Let be the solution of the linearized equations (32), which are linearized around ψ, with zero forcing, initial condition (t 0 ) = a and boundary conditions (33)-(38). In order to prove the lemma, we have show that From the definition of y follows that it solves the equation with initial condition y 0 = 0. We introduce k = (k u , k θ ) as follows: The components of k = (k v , k θ ) read as follows: Then, Eq. (46) for y becomes Note that (47) is a linearized primitive equation with initial condition y 0 = 0 and an external forcing k that depends on the model solution itself. We prove now the following three inequalities: there exist constants K , C > 0 such that For (i) follows from (175) in the proof of Theorem 3.1 (see Appendix 1) after integration with respect to time over a time interval where K (t) is a bounded function on [0, T ]. This proves assertion (i). For assertion (ii), we derive with the Agmon inequality Analogously, we obtain This proves (ii) and (iii). By combining (i)-(iii), we conclude that Denote the difference between the two strong solutions byv This equation has a similar structure as the linearized equation (32) with a zero forcing term and homogeneous boundary conditions. Using the regularity properties of the two strong solutions (v τ a v , θ τ a θ ), (v, θ), we can repeat all steps that lead to (175) (cf. Appendix 1), and this inequality implies with This proves (45) With the definition of the Gateaux derivative it follows that the solution w of the linearized equations satisfies w(a) = (D /Dψ 0 )a.

Adjoint Primitive Equations and Duality Relation
In this section, we study properties of the adjoint primitive equations. Adjoint equations allow to formulate a first-order necessary conditions for minima of the data assimilation cost functional and this condition provides the basis for the actual computation of minima within an optimization algorithm. The adjoint equations are derived by choosing taking the L 2 ([0, T ], L 2 ( ))-scalar product of the linearized primitive equations with a so-called "adjoint state variable" . Then, all differential operators are moved from the linear variables to the adjoint variables by means of integration-by-parts. More details are given in Appendix 1 for more details. For an extensive treatment of adjoint equations, we refer to Marchuk et al. (1996).
The resulting adjoint primitive equations are given by where The operator L : The initial condition 2 of the adjoint equation is specified at t = T and given by ( , the boundary conditions for (54) coincide with the corresponding boundary conditions (33)-(38) of the linearized equations and read as follows: The following result about the adjoint primitive equations follows immediately from the previous result on the linearized equations and the definition of the adjoint equations.
The solution satisfies for t ∈ [0, T ] where Proof The proof follows essentially from the corresponding result on the linearized equations, i.e., Theorem 3.1, and the duality relation between linearized and adjoint equations.
The next lemma follows from the definition of linearized and adjoint equations and integration by parts.

Existence of Local Minima of the Data Assimilation Problem
Based on the regularity results of the previous section, we formulate now the specific data assimilation cost functional for the primitive equations. The model dynamics that we are considering consist of trajectories in C([0, T ], H 1 ( )) and are emerging from initial conditions in H 1 ( ). Let observations ψ obs ∈ C([0, T ], H 1 ( )) be given that reside in the same space as the dynamics of the ocean primitive equations. Suppose furthermore that a background guess ψ back ∈ H 1 ( ) at initial time is given. We define the cost functional by The background term is given by where D α denotes a derivative with multi-index α. The observational term is defined as The model error covariance operator B provides a weighting according to the estimated background error and is assumed to be linear, bounded and positive definite. We do not address here the important problem of modeling error statistics but assume these error covariance operators as given. Furthermore, we assume that the error covariance operators preserve the functional space of model solutions and observations, i.e., we make the General Assumption on B, R : F ∈ X implies BF ∈ X and RF ∈ X . (71) We note that the notion of strong solution, given in Definition 2.1, contains the L 2 ([0, T ], H 2 ( )-norm for the regularity of solutions to the primitive equations, while the observational part J obs of our cost functional in (70) imposes only the weaker L 2 ([0, T ], H 1 ( )-norm. The reason for this is the adjoint characterization of optimal initial conditions in Theorem 4.4. Details are explained in Remark 2 after Theorem 4.4 is stated. The proof of Theorem 4.1 does also work if the stronger L 2 ([0, T ], H 2 ( )norm are used in J obs . In the proof, we use the regularity that is given by the wellposedness Theorem 1.1 and not the regularity that is implied by the boundedness of the observational part of the cost functional. We state this as a separate results in Corollary 4.2.
Proof Let (ψ 0,n ) n = (v 0,n , θ 0,n ) n ⊆ V ⊆ H 1 ( ) be a minimizing sequence of initial conditions for the data assimilation problem. We denote by  (70), we infer that the cost functional is well-defined. (68) follows that the sequence of initial conditions (ψ 0,n ) n is bounded in H 1 ( ), i.e., there exists a c > 0 such that uniformly for all n ∈ N ||ψ 0,n || H 1 ≤ c.
We show now that the limitψ(t) is a strong solution of the primitive equations, i.e.,ψ = M[ψ 0 ]. Consider the equation (19) in Definition 2.1 of strong solutions of (1) for the elements of the sequence ψ n = (v n , θ n ) n with initial conditions (ψ 0,n ) n = (v 0,n , θ 0,n ) n . We integrate over the time interval [0, T ], use that where ρ(θ n ) = ρ 0 − α(θ n − θ 0 ) (cf. (1e)). Since (ψ n ) n converges strongly in With the convergence of (ψ n ) n toψ in L 2 ([0, T ], H 1 ( )) the following convergence properties follows For the nonlinear term in the velocity equation (74a), we obtain after integration by parts and with the inequalities of Cauchy-Schwartz, Hölder and Ladyshenzkaya (31), The first three terms on the right-hand side of (76) vanish due to the convergence of (v n ) n tov in L 2 ([0, T ], H 1 ( )) and the boundedness of the remaining terms in the respective integral. For the last term in (76), we note that due to (73) , H 1 ( ) shows that the last term converges to zero. The convergence of the nonlinear terms in the temperature equations follows analogously. The pointwise convergence of (v n ) n tov in H 1 for almost every t ∈ [0, T ] implies thatv satisfies the divergence-free constraint for almost every t ∈ [0, T ]. This proves that the limitψ is a solution of the primitive equations. It remains to show that thatψ ∈ C([0, T ], V). We show first that ∂ tψ ∈ L 2 ([0, T ], L 2 ( ). In order to establish this assertion, we take the L 2 -scalar product of the primitive equations (1a)-(1e) for ψ =ψ with ∂ tψ = ∂ t (v,θ). For the velocity equation, we obtain whereρ is the density computed fromθ and where we omitted the external forcing for simplicity. We consider the nonlinear terms on the right-hand side. With the inequalities of Hölder, Ladyzhenskaya and Young follows where 1 > 0 is arbitrary. For the vertical advection follows with Lemma 2.2 and the Young inequality with 2 > 0 . The remaining terms in (77) can be bounded with the inequality of Cauchy-Schwarz and Young. In each of these upper bounds, a term occurs that involves the L 2 -norm of the time derivative ∂ tv multiplied by an i as in (78) and (79). If we choose the i appropriately, we can compensate the time derivative on the right-hand side and the time derivative on the left-hand side. In summary, we obtain the inequality Using the regularityψ = (v,θ) ∈ L 2 ([0, T ], H 2 ( )), we observe that the righthand side of (80) is finite and this shows that ∂ tv ∈ L 2 ([0, T ], L 2 ( )). From the same arguments follows for the temperature equation that ∂ tθ ∈ L 2 ([0, T ], L 2 ( )). We show now that the initial conditionψ 0 minimizes the cost functional J in (68). Lower semi-continuity of the scalar product implies for the limitψ 0 and for the associated sequence of model solutions As a consequence of (81) and (82), we have The initial conditionψ 0 ofψ is a minimizer of J , defined in (68).
From the proof of Theorem 4.1 follows immediately the next corollary that replaces the H 1 -scalar product in the observational part of the cost functional by the H 2 -scalar product. This choice reflects the regularity of the solutions of the primitive equations. With regard to this aspect, see also Remark 2 below.

Adjoint Characterization of Local Minima
The existence of local minima of the data assimilation problem is guaranteed by Theorem 4.1. We address now the problem of how these optimal initial conditions can be computed. For this purpose, we prove an necessary condition in terms of the adjoint equation. As preparation we need the following Lemma on the regularity of Helmholtz equations for temperature and velocity.

Lemma 4.3 (i)
Let f ∈ L 2 ( ) be given. The equation with homogeneous Neumann boundary condition ∇ 3 θ · n = 0 at ∂ has a unique solution θ ∈ H 2 ( ) that satisfies (ii) Let F ∈ L 2 ( ) 2 be given. The equation with mixed homogeneous boundary conditions has a unique solution v ∈ H 2 that satisfies Proof For equation (85) with homogeneous Neumann boundary conditions we refer to Theorem 9.26 in Brezis (2010) from which the assertion follows. For equation (87) and boundary condition (88), we note that the associated bilinearform The assertion follows from a classical result on elliptic equations (see Necas 2012, Theorem 3.1. on p. 30).

Theorem 4.4 (Adjoint Characterization of Optimal Initial Conditions) Let observations
, H 2 ( )) and a background state ψ back ∈ H 1 ( ) be given. Denote byψ 0 = (v 0 ,¯ 0 ) ∈ H 1 ( ) an optimal initial condition of the data assimilation problem (9). Thenψ 0 satisfies where˜ is the solution of the adjoint equation (54)  The degree of derivatives that appear in S are linked to the degree of derivatives in the observational part of the cost functional. This becomes evident when the Gateauxderivative of the cost functional is calculated (see the proof of Theorem 4.4, in particular the integration-by-parts in the second equation of (92)). As this calculation shows, raising the spatial regularity in J obs from H 1 to H 2 implies that in the operator S the derivatives are raised from two to four, such that the highest-order term is then given by the biharmonic operator 2 . For such a fourth-order operator, we can no longer guarantee that the adjoint forcing in (91) satisfiesF ∈ L 2 ([0, T ], L 2 ( )). The final calculation of the optimal initial stateψ 0 in (90) restores regularity through the smoothing effect of the inverse S −1 but this requires a square integrable forcing in the adjoint equation.
Proof For a minimizerψ 0 := (v 0 ,θ 0 ) ∈ V, which exists according to Theorem 4.1, the Gateaux derivative vanishes such that J (ψ 0 ; a) = 0 for all perturbations a = (a v , a θ ) ∈ V. We calculate the Gateaux derivative of J at an arbitrary state ψ in direction a and with integration by parts follows where we have used that := DM[ψ 0 ] Dψ 0 a, according to Lemma 3.2, satisfies the linearized equation with forcing F = 0 and that the boundary integral vanishes as a consequence of the boundary conditions. Now, we integrate an adjoint equation whose forcing is given by the modelobservation difference in the second term on the right-hand side of (92), The initial condition of the adjoint equation is˜ (T ) = 0. From (92) follows with the adjoint relation in Lemma 3.4 that For a minimum we have J (ψ 0 ; a) = 0 for all a, and with (94) we obtain According to Lemma 4.3 this equations can be solved forψ 0 = (v 0 ,θ 0 ) by inverting the operator S = (S v , S θ ). Since˜ 0 ∈ L 2 ( ), we obtain B −1 S −1˜ 0 ∈ H 2 ( ). This implies with ψ back ∈ H 1 ( ) thatψ 0 ∈ H 1 ( ) is given bȳ We consider now the case of observations that are only square integrable. Consequentially, the H 1 -norm in the observational part of the cost functional (70) has to be replaced by the L 2 -norm. This results in a different adjoint forcing term (see also remark 2 in the next section). The background term J b in (69) remains unchanged, because the H 1 -norm is indispensable in obtaining strong solvability of the data assimilation problem.

Convergence of Gradient-Based Descent Algorithm
The goal of this section is to prove the local convergence of an iterative gradientbased method for determining the optimal initial condition of the data assimilation problem, i.e., the descent method converged provided one uses a starting point that is sufficiently close to the optimal initial condition. Globally convergent algorithms such as the Newton method come with a prohibitively high computational costs for the large-scale problems of numerical weather prediction and ocean state estimation and are not considered here. To prove convergence, we use a general condition that is valid in Hilbert spaces and that relies on the Hessian of the cost functional. The Hessian is calculated through the relation between the Hessian and second-order adjoint equations. This lemma provides the basis of our convergence result.
Lemma 5.1 (Abergel and Temam 1990) Let J be a real-valued function on a Hilbert space X with norm | · |. We make the following assumptions:  Figure 1 where the cost functional J corresponds to J in Fig. 1, the local minimumx toψ and the initial value x 0 to ψ 0 0 .
The adjoint gradient-based algorithm for which we want to establish convergence is illustrated in Fig. 1.
The second derivative of the cost functional J is related to the Hessian

Fig. 2 Algorithm for calculation Hesse matrix via second-order adjoint
The calculation of the Hessian H J of the cost functional uses the second-order adjoint equations. The second-order adjoint equations are given by (for details see Appendix 1) where Rt 2 ∂ 2 zz θ and with forcingF := (FÛ ,FV ,Fˆ ) and with G := (GÛ , GV , Gˆ ) given by The following Theorem provides information about the regularity of the secondorder adjoint equations that we need to prove our convergence result.

Proof (Sketch of proof)
The second-order adjoint equations (101) resemble formally the first-order adjoint equations (54). If one identifies the first-order adjoint variablẽ with the second-order adjoint variable¯ , then the difference between the two equations are the additional G-terms defined in (102). These terms consist of products of linear variable := (V, ) and derivatives of the second order adjoint¯ . Using the regularity of linear equations in Theorem 3.1, we can estimate GÛ , GV , Gˆ in the same manner as the products of the state vector ψ and derivatives of the second-order adjoint¯ on the left-hand side of (101). With the Agmon inequality, we obtain the following estimate This shows that (GÛ , GV , Gˆ ) ∈ L 2 ([0, T ], L 2 ). If we now defineF := G +F, we can cast the second-order adjoint equations in the form of the first-order adjoint equations and apply the estimates in the proof of Theorem 3.3 to prove that a unique solution ∈ C([0, T ], H 1 ( )) ∩ L 2 ([0, T ], H 2 ( )) to (101) exists.
The equations (100) and (103) are used to verify the boundedness of the second derivative of the cost functional. From Theorem 5.2, we infer that the right-hand side of (103) is well-defined in H 1 ( ), i.e., for W ∈ H 1 ( ), we have We show now that the algorithm described in Figure 1 converges to an optimal initial condition of the data assimilation problem.

Theorem 5.3 (Convergence)
Suppose that the assumptions of Theorem 4.1 are satisfied. Letψ 0 ∈ V be an optimal initial condition for the data assimilation problem (9) with cost functional specified by (68). Let ψ 0 0 ∈ V be an initial value for the descent algorithm 5 that lies within a ball B(ψ 0 ) ⊆ H 1 ( ) aroundψ 0 . Define the sequence (ψ n 0 ) n by (98). Then, (ψ n 0 ) n converges toψ 0 in H 1 ( ). Proof The proof of the Theorem is based on Lemma 5.1. We have to establish a lower and upper bound on the Hessian of the cost functional. This is accomplished via bounds on the second-order state variable. In order to ease notation we suppress the index "n" of the sequence, the state variable ψ in this proof corresponds to ψ n . The occurring equations, their initial conditions and forcing terms are summarized in Fig. 2.
We infer from Theorem 5.2 withˆ 0 = 0 with K 1 , K 2 defined in (44) and with forcing given byF := |α|≤1 α R . We estimate now the right-hand side of (108) and begin with the forcing. With assumption (71) on the covariance operators, we obtain the estimate where ||R|| denotes the operator norm of the linear operator R and ∈ L ∞ ([0, T ], H 1 ( )) ∩ L 2 ([0, T ], H 2 ( )) is the solution of the linear primitive equations with initial condition 0 = W and vanishing forcing. According to (42), this solution satisfies for t ∈ [0, T ] the inequalities The estimates (110) and (111) imply for (109) We continue now with the estimation of the G-terms in (108). It follows analogous to (106) The adjoint state in (113) which result from integration of the adjoint equations with zero terminal condition and forcingF := |α|≤1 α R(ψ − ψ obs ) satisfies according to Theorem 3.3 for t ∈ [0, T ] the H 1 -estimate For the H 2 -bound on the adjoint state in (113) yields Theorem 3.3, combined with (114) For the adjoint forcing in (114) and (115) follows with the triangle inequality where L 4 (t) > 0 is bounded on T , since ψ * , ψ n , ψ obs ∈ L 2 ([0, T ], H 2 ( )). The function L 4 is bounded uniformly in n, because (ψ n ) n ⊆ B(ψ * 0 ). From (114)-(116) follows for (113) With (109) and (117), we derive for the upper bound on the second-order state in (108) for t ∈ [0, T ] We are now in the position to derive upper and lower bounds on the Hessian. From (118) follows Equations (119) and (100) establish the bounds on the second derivative of the cost functional and the application of Lemma 5.1 proves the assertion of the Theorem.
Funding Open Access funding enabled and organized by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. where

Consider now a linearized perturbation
:= (U , V , ) of the state vector ψ = (u, v, θ) that results from a perturbation of the initial condition ψ 0 := (u 0 , v 0 , θ 0 ) of the state vector. Note that we only perturb variables that require an initial condition, i.e., for the primitive equations and in contrast to the Navier-Stokes equations we do not perturb the vertical velocity. The perturbation of u for example is defined as Gateaux derivative of u at u 0 in direction h and analogously for v and θ . The equation governing the dynamics of the linearized perturbation can be found by taking the Gateaux derivative of the primitive equations (1). The product of the Jacobi-matrix F , consisting of the partial Gateaux derivatives of F with respect to the components (u, v, θ) of the state vector ψ, applied to the linearized state vector = (U , V , ) is given by The linearized primitive equations can be written in the form subject to the divergence-free constraint ∇ 3 · V = 0. The adjoint primitive equation with respect to the L 2 -scalar product reads as follows The expression of the adjoint operator F * [ψ] is derived by taking L 2 -scalar product of the linear operator F [ψ] with the adjoint state vector˜ = (Ṽ, ),Ṽ := ( U , V ) performing integration-by-parts and rearranging the resulting terms such that the linear state vector takes the role of a test function and the other component of the scalar product consists of adjoint variables˜ and state variable ψ. We illustrate this with the pressure gradient in the velocity equation. Integrating by parts, using the homogeneous Dirichlet boundary conditions and the incompressibility of the adjoint velocity field yields where n H denotes the horizontal outer normal vector. The multiplication by the linear temperature variable as a test function implies that the vertical integral over the horizontal divergence is an element of the weak formulation of the adjoint temperature equation. This results in the following expression for F * [ψ] subject to the constraint ∂ x U + ∂ y V + ∂ z W = 0. In order to derive the second-order adjoint primitive equations we linearize the combined set of primitive and adjoint equations. The resulting second-order adjoint primitive equations are given by

Appendix B: Proof of Theorem 3.1
Proof The following proof is formal, it can be easily made rigorous by using a Galerkin expansion of the linearized variables (V, ). The a priori estimates of the following proof, applied to this Galerkin expansion, allows then to pass to the limit with the Lions-Aubin compactness lemma. Techniques that were originally developed in Cao and Titi (2007) are instrumental for the proof.
Taking the L 2 -inner product of the linearized velocity equation (32a) with the velocity V yields The term with the Coriolis force vanishes. For the surface pressure we obtain after integration by parts For the linearized temperature equation we obtain after taking the L 2 -inner product of (32c) with the temperature Using that the following advective terms vanish due to the incompressibility (1c) of the velocity v yields for the velocity equation For the temperature equation, we obtain We estimate now the terms on the right-hand side of (131) and (132). The first terms on the right-hand side of (131) can be estimated with the inequalities of Hölder, Ladyshenzkaya and Young and for the first term on the right-hand side of (132) we have where the i > 0 are arbitrary and will be fixed later. For the second term at the right-hand side of (131), we use the fact that the domain is cylindrical, = M × {−h, 0}, and apply the triangle inequality and the inequalities of Cauchy-Schwarz and Hölder (with norms L 2 , L 4 , L 4 ) to obtain 3 For the first term on the right-hand side of (135), we obtain with Cauchy-Schwarz For the second term, we obtain with the Minkowski inequality and the Ladyshenzkaya inequality in 2D For the last term on the right-hand side of (135), it holds that From (136)- (138), we obtain for (135) The same argument yields the following estimate for the second term on the right-hand side of (132) For the pressure gradient term in the velocity equation (129), the inequality of Cauchy-Schwarz and the Young inequality yield where we have used the equation of state (32d) in the last step. The forcing terms on the right-hand side of (129) and (130) are estimated as Collecting the estimates (129)-(141) results in the following bounds for the linearized velocity equation and for the temperature equation We combine now the two estimates (143) and (144). We define 1 Re := min{ 1 Re 1 , 1 Re 2 } and 1 Rt := min{ 1 Rt 1 , 1 Rt 2 } and choose the i appropriately. This yields The Gronwall inequality implies Since (v, θ) is a strong solution this shows that V and are bounded in L ∞ ([0, T ], L 2 ( )).

Step 2a: L ∞ ([0, T], L 2 (Ä))-bound on ∇V
Taking the L 2 -inner product of the linearized velocity equation (32a) with AV yields 1 2 where we have used that the Stokes operator annihilates gradients and that the Coriolis term also vanishes. The forcing term is estimated by the inequalities of Cauchy-Schwarz and Young For the first and second term on the right-hand side of (147) follows with the inequalities of Hölder, Ladyshenzkaya and Young and For the third term on the right-hand side of (147) we obtain with Lemma 2.2 ( f = V, g = ∂ z V) and the Young inequality Applying Lemma 2.2 to the fourth term on the right-hand side of (147) Summarizing our estimates yields We need a bound on ||∂ z V|| 2 L 2 in (153). This is done in the next step before, we complete the L ∞ ([0, T ], H 1 ( )) bound on the velocity.
The vertical derivative U := ∂ z V of the linearized velocity satisfies the following equation Taking the L 2 -scalar product of (154) with U yields For the first term on the right-hand side of (155) we obtain with the inequalities of Hölder, Ladyshenzkaya and Young For the third term on the right-hand side of (155) follows the Hölder and Young inequalities Analogously follows for the fifth term We observe that and on the right-hand side of (155) the second and seventh as well as the fourth and sixth term cancel. The pressure gradient (the eight term in (155)) is bounded as follows where we have used (32d). For the forcing term, we obtain In summary, we obtain from (155) Step 2c: Completing the L ∞ ([0, T], H 1 (Ä))-bound on velocity.