A Regularity-Aware Algorithm for Variational Data Assimilation of an Idealized Coupled Atmosphere–Ocean Model

We study the problem of determining through a variational data assimilation approach the initial condition for a coupled set of nonlinear partial differential equations from which a model trajectory emerges in agreement with a given set of time-distributed observations. The partial differential equations describe an idealized coupled atmospheric–ocean model on a rotating torus. The model consist of the viscous shallow-water equations in geophysical scaling that represents the large-scale atmospheric dynamics coupled via a simplified but physically plausible coupling to a model that represents the large-scale ocean dynamics and consists of the incompressible two-dimensional Navier–Stokes equations and an advection–diffusion equation. We propose a variational algorithm (4D-Var) of the coupled data assimilation problem that is solvable and computable. This algorithm relies on the use of a variational cost functional that is tailored to the regularity of the coupled model as well as to the regularity of the observations by means of derivative-based norms. We support this proposal by developing regularity results for an idealized coupled atmospheric–ocean model using the concept of classical solutions. Based on these results we formulate a suitable cost functional. For this cost functional we prove the existence of optimal initial conditions in the sense of minimizers of the cost functional and characterize the minimizers by a first-order necessary condition involving the coupled adjoint equations. We prove local convergence of a gradient-based descent algorithm to optimal initial conditions using second-order adjoint information. Instrumental for our results is the use of suitable Sobolev norms instead of the standard Lebesgue norms in the cost functional. The index of the actual Sobolev space provides an additional scale selective mechanisms in the variational algorithm.


Introduction
Data assimilation aims at blending observational data with a dynamical model of a physical process. It constitutes a fundamental technique for modelling real-world phenomena. Numerical weather prediction and ocean state estimation are examples of scientific disciplines that rely on data assimilation. Data assimilation for coupled atmosphere-ocean models, also known as coupled data assimilation, has recently gained attention (see the report of the World Meteorogical Organisation WMO [29] and references therein). The main goal of coupled data assimilation is to extend the predictability horizon of weather and climate forecasts. An important example of such a coupled assimilation problem is the initialization of a coupled atmosphere-ocean model for the purpose of climate prediction on decadal time scales. The challenges in coupled data assimilation are manifold and comprise for example the formulation of error covariance matrices, or the treatment of vastly different spatio-temporal scales of the respective model components. The current state of knowledge is summarized consisely in [29]. The majority of actual approaches applies a strategy based on numerical experiments using a fully discrete configuration (see e.g. [15,16,30]). Our approach is complementary to this and focuses on the Partial Differential Equations of an idealized coupled atmosphere-ocean model.
The purpose of this paper is to describe fundamental mathematical properties that need to be fulfilled by a variational data assimilation algorithm for a coupled atmosphere-ocean system such that the corresponding minimization problem is solvable and computable. We aim to formulate a sound mathematical basis that guarantees a controlled behaviour of a coupled variational data assimilation algorithm. For this purpose we focus on the formulation of the 4D-Var cost functional. This cost functional is minimized and its critical points determine the optimal initial conditions of variational data assimilation. The key idea for implementing our goal is to connect the norms in the cost functional to the regularity theory for the coupled equations and to the regularity of the observations. Not respecting this connection may lead to an unpredictable behaviour of the algorithm. If for example the function spaces used in the background term of the cost functional does not match the function space for which the coupled equations are well-posed then one searches in the variational optimization for initial conditions that potentially create an ill-defined model trajectory. Similar reasoning applies to the observational term of the cost functional. If the gradient of the cost functional, obtained from a reverse in time integration of the adjoint equations forced by the model-observation difference, does not reside in the same space as the initial condition this creates ill-defined gradients that may results in an erratic behaviour in the gradient-based optimization process. An ill-posed behaviour of the optimization procedure can in principle be compensated by regularization techniques. Therefore it is important to derive criteria for well-posed behaviour and to understand the possible sources of ill-posed optimization in order to design minimally invasive regularization techniques. To substantiate our arguments we investigate a set of coupled nonlinear PDE's and carry out a mathematical analysis of an idealized atmosphereocean model that consists of an atmospheric component, described by viscous shall-water equations in a geophysical scaling, following [25], and of an ocean component that is given by the incompressible 2D-Navier-Stokes equations, supplemented by an advection diffusion equation. The coupling between the atmosphere and the ocean is simplified and done via a forcing term and not via boundary conditions as it is done in coupled (three-dimensional) general circulation models. For the idealized coupled atmosphere-ocean PDE considered here we develop a regularity theory (see Theorem 1). Based on this result we formulate a variational cost functional that is tailored to the regularity of the underlying equations. For this cost functional we prove the existence of minimizers, i.e. the existence of optimal initial conditions (Theorem 4). We characterize the optimal initial conditions by a necessary adjoint criterion (Theorem 5) and prove convergence (Theorem 7) of a gradient-based algorithm via an estimate of the Hessian of the cost functional utilizing second-order adjoint equations of the coupled model.
For the equations of the idealized atmosphere-ocean model we carry out an elementary mathematical analysis consisting of a well-posedness and regularity result in the Sobolev space H s with s := (s i ) 4 i=1 ∈ Z 4 , s i ≥ 3 (see Theorem 1). With this analysis we stay within the framework of classical solutions and do not use the concept of weak and strong solutions. The use of classical solutions facilitates the analysis, but excludes relevant phenomena such as shocks and turbulence, including them requires more sophisticated solution concepts (e.g [5,35]). Since the focus of this work is not well-posedness per se or optimal regularity but the relation between regularity/well-posedness of coupled equations to solvability/computability of the variational data assimilation problem we prefer to stay in the regularity regime of classical solutions before we relax this condition in later work. We study the regularity of the associated linearized and adjoint equations and show that the solution of the coupled equations is differentiable with respect to the initial condition. In accordance with the aforementioned regularity results we formulate a cost functional which uses Sobolev spaces L 2 (T , H s ) 1 with positive as well as negative Sobolev space indices s i ∈ Z, (i = 1 . . . 4). The variational cost functional consists of a observational term and a background term. In the language of Inverse Problems the background term provides a Tikhonov regularization. For this term we use a Sobolev space H s with s i ≥ 3, in consistency with the regularity of the underlying equations. For the observational term the range of-positive or negative-Sobolev space order is determined by the regularity of model solution and of observations. The physical effect of including positive Sobolev space indices in the observational term of the cost functional is to fit the model to the observations on small scales, while negative indices exclude the small scales from the fitting process by means of a filtering process. This implements a regularity-aware filter capability within the data assimilation algorithm that regulates which spatial scales are seen by the data assimilation algorithm, while leaving the model dynamics unaffected. The filter can be adjusted individually for different components of the coupled state vector via the degree s i of the respective Sobolev space. The proposed formulation of the cost functional is the key to prove existence of minimizers for the variational data assimilation problem of the coupled equations (Theorem 4) and to show in Theorem 5 that critical points of the cost functional satisfy an adjoint equation with an observation-based forcing term that depends also on the specific Sobolev space used. It allows also to prove convergence of a steepest descent algorithm to an optimal initial condition, provided the initial guess of the iterative algorithm is good enough (Theorem 7). This result uses the regularity of the second-order adjoint equations. This numerical minimization algorithm shows also that our Sobolev-metric based approach can easily be incorporated into the standard L 2 -based variational data assimilation algorithms.
Derivative-based metrics provide an additional spatial filter mechanism in the data assimilation algorithm with the objective of emphasizing/de-emphasizing specific scales of individual components of a multi-component state vector within the minimization procedure. Which choice of Sobolev space indices for background and observational term of the cost functional may improve the data assimilation results has to be determined by numerical experiments. For ocean data assimilation based on the Primitive Equations the results about the global well-posedness of the Primitive Equations with initial conditions in H 1 (see [9]) suggest to apply at least the H 1 -norm in the background term rather than the usually chosen L 2 -norm (cf. [31]). The choice of the appropriate Sobolev norms for the observational data should depend on the regularity of the observations. We furthermore note that the Sobolev embedding theorem implies that for sufficiently high-order, in our case of two spatial dimensions, s ≥ 2, one controls additionally also the L 2 (T , L ∞ )-norm while at the same time a Hilbert-space structure is retained that allows to stay in the framework of a least-square problem.
The approach described here has to be supplemented by numerical experiments and our results provide the basis for such experiments. The purpose of such experiments is to illustrate first, what can go wrong if the connection between regularity of model equations and observations and the norms in the cost functional is lost and what goes right if this connection is respected. Second, one has to explore the potential of the filter capacity that is provided by the derivative-based metrics. These experiments are beyond the scope of this paper.
The use of alternative metrics beyond L 2 (T , L 2 ) is not new in the literature. In [6] enstrophy-based metrics and Sobolev-type cost functionals were discussed in the context of turbulence control via boundary forcing for a channel flow with a focus on numerical experiments (see also [33]). We are not aware of a similar study for atmospheric or oceanic data assimilation. The data assimilation community in Atmosphere and Ocean sciences focuses on another important element of the cost functional, namely the modelling of the observational and model error covariance matrix (see e.g. [39]) that are often modelled with the intention of implementing a kind of filtering or scale selection.
Structure of the paper In Sect. 2 we introduce the coupled model and generic data assimilation cost functionals. Section 3 describes the functional setting. We proceed in Sect. 4 with the analysis of the model equations, its linearization and the adjoint equations. Based on this analysis we propose in Sect. 5 a specific formulation of the cost functional. We show that the use of higher-order cost functionals lead to solvable data assimilation problems (Theorem 4) and demonstrate how the modification can be incorporated into a iterative gradient based optimization algorithms (Theorem 5) and prove in Sect. 6.2 a convergence result for a descent algorithm (Theorem 7) by second-order adjoint methods. The paper ends with a conclusion in Sect. 7.

The Coupled Model and the Associated Data Assimilation Problem
We are studying an idealized coupled atmosphere-ocean model that is described by the following equations Atmosphere: Ocean: with initial conditions The state of the system is described by a state vector ψ := (ψ a , ψ o ) with atmospheric component ψ a := (u a , θ a ) and oceanic component ψ o := (u o , θ o ), where each component consists of a (vectorial) velocity field u a , u o and a scalar variable θ a , θ o , respectively. All functions depend on a two-dimensional space variable and time. For the Coriolis term we use the notation u a⊥ := (u a 1 , u a 2 ) ⊥ = (−u a 2 , u a 1 ). The coupling functions γ, σ regulate the strength of the interaction between the two components. Both coupling functions depend on space and time variables, and vary smoothly in space and time. We write the coupled model Eqs. (1)-(5) also in the following notation where N contains the advective terms of the equations, the operator L represents the linear terms, except the Laplace operator which is described by D. The coupling is described by the coupling operator C. The physical picture of the atmosphere-ocean coupling is as follows. The ocean model component consists of a twodimensional velocity field and an equation that describes the sea-surface temperature of the ocean, while the atmospheric component contains a two-dimensional velocity field and an equation describing the heat content. Both systems are linked together via a simple coupling: changes in the oceanic sea surface temperature heat or cool the atmosphere, this changes the atmospheric circulation and consequently the atmospheric winds that drive the ocean. The atmosphere is affected by the ocean through a thermodynamic effect, while the ocean is influenced by the atmosphere through a dynamic effect. The strength of this coupling is regulated by the space-and time-dependent coupling functions γ, σ . This simple coupling procedure is motivated by practice in large-scale climate modelling (see e.g. [14]). This type of coupling has for example been used to study the El-Nino-Southern Oscillation phenomenon, a climate mechanism in the tropical Pacific that crucially depends on the atmosphere-ocean interaction (see e.g. chpt. 7 in [11], or [28]).
The numbersRe, Pe with superindex for the atmospheric and oceanic component denote the Reynolds and Péclet numbers. The numbers Ro andFr in the atmospheric component denote the Rossby and the Froude number. The Froude number measuring the degree of compressibility occurs in the atmospheric component only. The atmospheric Eqs. (1)-(5) arise from the (2D-) shallow water equations after non-dimensionalization as in chpt. 4 of [25]. If U and L denote a typical velocity and length scale, respectively and H 0 represents the mean height of the fluid and N 0 the typical size of the height perturbation, then one can introduce the following non-dimensional variables to the atmospheric equations x := x L , t := t L U , u := u U , θ := θ N 0 . We introduce the non-dimensional parameters where ν u , ν θ denote viscosity and diffusivity parameters, f is the rotation frequency, g the gravity constant. The Eqs. (1)-(5) are derived after applying the scaling above to the shallow water equations and using the assumption Ro, Fr << 1, with the notationFr := √ Fr Ro. More details, in particular the convergence of the uncoupled shallow water equations to the quasi-geostrophic equations in the vanishing Rossby number limit can be found in [7,25]. A review of the mathematical theory of shallow-waters equations can be found in [8], for application in the context Geophysical Fluid Dynamics we refer to [36]. Through an appropriate choice of the non-dimensional parameters a coupled system (1)-(5) can be created that consists of a fast and a slow component.
The model Eqs. (1)-(5) do not describe general circulation models of atmosphere and ocean but comprise several idealizations and simplifications. First, the fact that the model is purely horizontal and the vertical axis is absent, excludes vertical dynamics of the atmospheric boundary layer and the oceanic mixed layer that are important for atmosphere-ocean interaction. Representing the rich dynamics of the two fluids by a state vector that consists in each case of a velocity field and a tracer variable represents another severe simplification. Furthermore we do not consider subgrid scale closures and physical parametrizations, and work with constant coefficients. As a consequence of our model configuration we work with an idealized coupling that focuses on first-order feedbacks between the two components, namely the thermal effect of the ocean on the atmosphere and the mechanical energy input of the atmosphere to the ocean. The heat transfer from the atmosphere to the ocean is on large scales negligible due to the larger heat capacity of the ocean. The momentum transfer from the ocean to the atmosphere can also be neglected on the scales under consideration here (see Remark 1). The coupling takes place via forcing terms and not, as in general circulation models, via boundary conditions. The complex and highly nonlinear physical behaviour of the atmosphere-ocean interface is poorly understood (see e.g. [32]). The coupling used here is clearly not realistic for describing the atmosphere-ocean interface on small scales but it has proven to be a useful approximation for large scales dynamics. It is physically plausible within the range of validity of our idealized atmosphere-ocean model, which has approximation quality for large-scale dynamics but not for small scales. For a description of the basic physical processes of atmosphere-ocean interaction we refer to [38]. A concise mathematical description of coupled general circulation models can be found in [21][22][23].
We now proceed by describing the data assimilation problem for the system (2) in an abstract manner. Let X be a generic "state space" of solutions of the coupled equations with a set of initial conditions X 0 . Let time distributed observations ψ obs ∈ Y of the atmosphere and the ocean be given. These formal definitions will be specified in Sect. 5. Our aim is to fit a trajectory ψ ∈ X of the atmosphere-ocean model to the observations ψ obs ∈ Y by minimizing the distance between coupled model trajectory and observations using the initial conditions ψ 0 = (u a 0 , θ a 0 , u o 0 , θ o 0 ) ∈ X 0 as a control variable. The coupled data assimilation problem consists in determining, for given observations ψ obs ∈ Y, an initial conditionψ 0 ∈ X 0 such that and ψ(ψ 0 ) ∈ X satisfies coupled model equations (1)- (5) with initial conditionψ 0 .
The cost functional J in (8) is defined as a sum of a background and an observational term The observational part J obs of the cost functional measures the distance between the coupled model state and the observations and is defined by Here M denotes the model operator that advances an initial condition ψ 0 to the model state ψ(t) at time t. The observational error covariance operator R provides a statistical weighting of the model-observation misfit, according to the quality of the observations. We subsume this weighting into the Lebesgue measure and denote the resulting measure by dμ R and the model space supplemented with this measure by X (dμ R ). The norm in (10) acts on the spatial variable. The background term J b of the cost functional in (9) is defined by where ψ back is a given background state. The background state incorporates prior information about the system and can for example be provided by a previous forecast. The model error covariance operator B provides a weighting according to the estimated background error and is assumed to be linear, bounded and positive definite. We do not address the important problem of modelling error statistics but assume these error covariance operators as given and that the error covariance operators preserve the functional space of model solutions and observations [see 26]. The data assimilation problem (8) and the cost functional (9)- (11) has been stated in a formal manner. In order to arrive at a sensible definition that provides the basis of a computational algorithm we have to give the abstract notion of "state space" X , "space of initial conditions" X 0 and "observation space" Y a precise meaning. This will be done in Sect. 5, relying on the mathematical analysis of the coupled model in Sect. 4, for which the next section provides that mathematical framework.

Functional Setting
Domain and Boundary Conditions The spatial domain is a two dimensional square Ω := (0, L) × (0, L) with L ∈ R + . We assume periodic boundary conditions. By W s (Ω) we denote the L 2 -Sobolev space of order s ∈ Z + ∪ {0} that is defined as the set of functions f ∈ L 2 (Ω) such that its derivatives in the distributional sense D α f (x, y) = ∂ α 1 x ∂ α 2 y f (x, y) are in L 2 (Ω) for all |α| ≤ s, with multi-index α = (α 1 , α 2 ) ∈ Z 2 + , and degree |α| := α 1 + α 2 . The scalar product in W s (Ω) is defined by The vectorial counterpart of the Sobolev space W s (Ω) is denoted by W s (Ω). More information about Sobolev spaces can be found for example in [2,12,27]. We define and its vector-valued equivalent V := {u : R 2 → R 2 :u is a vector-valued trigonometric polynomial with period L, and We define now the following function spaces and the dual pairing between f ∈ H −s (Ω), h ∈ H s (Ω) can be expressed by Functions θ ∈ H s (Ω) and u ∈ H s (Ω) or u ∈ H s div (Ω) can be decomposed into an orthonormal basis of eigenfunctions w n (x) := e 2πin·x L of the Laplace operator such that (see e.g. [10]) An equivalent characterization for θ ∈ H s (Ω) in terms of Fourier coefficients that is valid for integer exponents s ∈ Z, i.e. including the dual space of H s (Ω), reads as follows A scalar product in H s (Ω) in terms of the Laplacian that is equivalent to (12), is given by and for the dual pairing between With this preparations we define now for with the norm of ψ = (u a , θ a , u o , θ o ) ∈ H s given by The notation s ≤ t for two multi-indices is to be understood as relation between the components, i.e. s i ≤ t i for all i. Similarly are expressions such as s + 1 are defined as s i + 1 for all components i. We use an analogous notation for the Lebesgue spaces and denote by L 2 , L 2 , L 2 the sets of square-integrable scalar functions, vector fields and state vectors, respectively.

Assume that f is absolutely continuous on T and that for almost every t
Then f ∈ L ∞ loc (T , R) and Young inequality: ab ≤ 1 2 a 2 + 2 b 2 for a, b ∈ R and > 0. General Assumption on Covariance Operators: The error covariance operators B, R preserve the space of their respective arguments, i.e.
if ξ ∈ H s then Bξ ∈ H s , and Rξ ∈ H s .

Mathematical Analysis of the Coupled Model
In this section we provide the regularity results of the coupled model Eqs.
(1)-(5) and of the associated linearized and adjoint equations. This enables us to prove the differentiability of the model solution with respect to the initial conditions (Lemma 7). These results are instrumental for the formulation of the data assimilation cost functional in Sect. 5.

Well-Posedness of the Coupled Equations
is said to be a regular solution of(1)-(5) (or equivalently of (6)) on the time interval T : The following theorem is the main result of this section.
. Then there exists on the time interval T a regular solution ψ of (1)-(5) in the sense of Definition 1. The solution is unique and depends continuously on the initial condition.

Proof Step 1: Existence and Uniqueness of Galerkin Approximation.
The Galerkin approximation to the coupled Eqs. (1)-(5) reads as follows here the Galerkin projections u a m , θ a m for the atmospheric component are given as truncation of the expansions (18) with w n given by (4). The Galerkin system (28) has periodic boundary conditions and its initial condition is given by For the oceanic component an analogous expansion holds. The corresponding operator equation for the state vector The system (28) is a ordinary differential equation system with constant coefficients and a quadratic nonlinearity. By standard arguments one can show that this system is a Lipschitz continuous mapping from H s m into itself and then it follows from the Picard Theorem that a unique solution Step 2: H s -Estimate Applying the derivative operator D α to the Galerkin system (28) yields for the atmospheric component where the nonlinear terms on the right-hand side are defined by Taking the L 2 -inner product of (31) with (D α u a m , D α θ a m ), integrating by parts and adding the two equations yields where the Coriolis term vanishes due to D α u a⊥ m D α u a m L 2 = 0. For the coupling term we derive with the inequalities of Cauchy-Schwarz, Poincaré and with Lemma 2 After integrating the second term on the right hand side of (32) by parts and invoking the periodic boundary condition, this term cancels with the third term on the right hand side and we obtain with (33) and the Young inequality the following estimate and From (35), (36) we obtain for (34) We sum over all derivatives D α such that the degree |α| of any derivative is less or equal to the degree of the corresponding s-component. This implies with the Young inequality the following upper bound for the atmospheric state For the oceanic state we proceed analogously to the atmosphere. After applying D α to (28) , integrate by parts and add the two equations. This yields where the pressure and the two gradient terms in the second line vanish after integration by part due to incompressibility and the periodic boundary conditions. The coupling term in the velocity equations has been treated analogously to (33) with the temperature variable θ a m replaced by the velocity variable u o With the estimates (35) and (36) for G o u (α) and G o θ (α) respectively we arrive analogously to the atmospheric estimate at the following inequality Adding (38) and (41) results in The inequality is still true if we neglect the positive gradient-terms on the left hand side (43) We now make use of our assumption that all components of the state vector belong to a Sobolev space H s and that all component of s are greater or equal to 3. This allows to apply Lemma 2 (with s = 2, k = 1) to the divergence term in (42) and it follows where the constant C s from Lemma 2 does depend on s but not on m. Hence We chose now t 1 > t 0 such that the following condition is satisfied A t 1 with this property exist, because the right-hand-side of (47) is positive. From (46) follows With (30) follows Since the upper bound is independent on m, we have thus proven that the sequence where the endpoint t 1 satisfies (47). It follows that for each m the solutions ψ m of the Galerkin system do have a joint interval of existence [t 0 , t 1 ]. The boundedness of (ψ m ) m establishes the existence of a subsequence (ψ k ) k that converges weakly in L 2 (T ; H s ) to a ψ ∈ L 2 (T ; H s ). According to (49) it holds that ψ ∈ L ∞ (T ; H s ).
Step 3: Estimate on the time derivative.
The uniform boundedness of (ψ m ) m in L 2 (T ; H s ) implies together with (45) that ( dψ m dt ) m is uniformly bounded in L 2 (T ; H s ). The sequence ( dψ m dt ) m is in particular uniformly bounded in L 2 (T , L 2 ) and in L 2 (T ; H −s ). The last fact follows from the continuous injection H s ⊆ L 2 ⊂ H −s .
The boundedness of (ψ m ) m in L 2 (T ; H s ) and of ( dψ m dt ) m in L 2 (T , L 2 ) implies with the Aubin compactness theorem (cf. [10], Lemma 8.2) that a subsequence (ψ k ) k of (ψ m ) m exists that converges strongly in L 2 (T , L 2 ) and weakly in L 2 (T ; H s ) to the limit ψ ∈ L 2 (T , H s ). We consider the (ψ k − ψ), and denote an arbitrary component of this difference by ( f From (50) we derive with the convergence of (ψ k ) k in L 2 (T , L 2 ) and with the boundedness This proves continuity in the weak sense, i.e. ψ ∈ C w (T ; H s ). The weak continuity implies for τ ∈ [t 0 , t 1 ] lim From (49) follows lim From (52) and (53) we obtain continuity of the H s -norm of the solution at initial time From (42) we get after integration over T = [t 0 , t 1 ] From (49) follows that the right hand side of (55) is bounded independent from m. This implies that ψ ∈ L 2 (T , H s+1 ). Consequently there exists a set E ⊆ T of Lebesgue-measure zero such that for all τ ∈ T \E it holds that ψ(·, τ ) ∈ H s+1 . This implies that for all δ > 0 there exists a t * 0 < δ such that ψ(·, t * 0 ) ∈ H s+1 . If we use ψ t * 0 := ψ(·, t * 0 ) as initial condition we can repeat all the arguments of our proof to establish the existence of a solutioñ We obviously have for the two endpoints t 1 ≤ t * 1 and hence ψ,ψ coincide on [t * 0 , t 1 ]. Since δ > 0 was arbitrary we have ψ ∈ C((t 0 , t 1 ], H s ) and combined with the continuity at t 0 (see (54)) it follows that ψ ∈ C([t 0 , t 1 ], H s ).
Step 5: Existence Global in Time Let [t 0 , t 1 ] be the maximal interval of existence of the solution ψ. If we assume that t 1 < ∞ then this implies that This contradicts ψ ∈ C(T , H s ). Therefore t 1 = ∞ and the solution exists globally in time.
Remark 1 (Coupling Variants) Some variants of coupling our atmosphere and ocean equations can be included in our analysis. The atmospheric wind forcing term σ u a in the oceanic velocity equation in (1)-(5) can be modified into σ (u a − u o ) without significantly changing the proof of Theorem 1. This can immediately be seen from (40).
A second variant that can easily be incorporated is to include heat transfer from atmosphere to ocean and momentum transfer from ocean to atmosphere. The additional term κθ a for heat transfer can be added to the right-hand side of the ocean heat equation in (1)-(5) and the term λu o for the momentum transfer to the right-hand side of the atmospheric velocity equation, where κ, λ are sufficiently smooth coupling functions. These changes can be included without altering essentially the proof of Theorem 1.

Linearized, Adjoint Coupled Equations and Differentiability
We linearize the coupled model equations around a solution ψ = (ψ a , ψ o ) of (1)-(5). The resulting model equations (also referred to as "tangent linear model") are given by Atmosphere: Ocean: denotes the forcing of the linearized equations. In analogy to (6) we write the linearized equations of the (linear) state vector Ψ := (U a , Θ a , U o , Θ o ) in the following form where the linear operators L includes pressure gradient and Coriolis force, D the dissipation and N the linearization of the advective operator. The next theorem establishes a regularity result about the linearized equations. The proof is based on the energy method and classical inequalities.

Then (56) has a unique solution on the time interval T with the properties
The state vector Ψ of (56) satisfies For the H s -estimate we apply D α to (59), multiply by D α Ψ m and integrate over the spatial domain. If we now add velocity and scalar equations and integrate by parts, the gradient term of the velocity equation cancels with the divergence term in the θ -equation This implies that it is sufficient to consider the following inequality We proceed by estimating all terms on the right hand side of (60) in terms of the corresponding Sobolev norm. The first term on the right-hand side is estimated with the inequality of Cauchy-Schwarz, Lemmas 1, 2 and Theorem 1 where N s := C s K s ||u a m || H s a u and u a m ∈ L ∞ (T ; H s a u ) according to Theorem 1. For the second term on the right-hand-side of (60) we use integration by parts and the periodic boundary conditions, and we find with the Cauchy-Schwarz inequality, Lemmas 1 and 2 For the forcing terms in (60) we integrate by parts 3 and use the inequality of Cauchy-Schwarz where we have used that the boundary terms vanish due to the periodic boundary conditions. For the coupling term we obtain analogously to (33) with the inequalities of Cauchy-Schwarz, Poincaré and with Lemma 2 We collect (61)-(64) and sum up over all derivatives D α such that their degree is smaller or equal than the corresponding component of s. This yields for (60) the following estimate After applying Young's inequality (see Sect. 3) with = 1 Re a and = 1 Pe a to the two terms on the right-hand side, the resulting quadratic terms of the form (a + b) 2 are estimated by 2(a 2 + b 2 ) and finally we obtain where M a s := M s (C s , K s , Re a , Pe a , ||u a (t)|| H s a u ) is bounded on T. For the ocean component in (56) we proceed similarly and apply D α to the Galerkin system and take the L 2 -inner product with D α U o m , D α Θ o m and arrive at For the coupling term in the velocity equation we obtain analogously to (40) with the inequalities of Cauchy-Schwarz, Poincaré and with Lemma 2 The other terms in the velocity equation can be estimated analogously to the atmospheric case (see (61), (62)) We have to estimate the additional terms (U o m · ∇)θ o and Θ o m ∇u o in the third line of (68). For the first term we find after integration by parts with the inequality of Cauchy-Schwarz (71) For the second term in the third line of (68) we have similarly with integration by parts, and the inequality of Cauchy-Schwarz The estimates (71), (72) imply with Lemma 15 where ν * := min{ 1 R , 1 P }. Using Gronwall's inequality it follows

Proof of continuous dependency on the initial conditions and uniqueness of the coupled nonlinear Eqs. (1)-(5)
Let Then the difference for the atmospheric component satisfies the equations and analogously for the ocean. A comparison with the linearized equations (56) shows that the system above has an identical structure. Analogously to the linear equations (cf. (77) with vanishing forcing) one can derive the inequality The Gronwall lemma implies The above inequality proves the continuous dependency on the initial conditions. In particular, if the two solutions ψ 1 , ψ 2 have the same initial conditions, i.e. δψ(t 0 ) = 0, uniqueness follows. The 4D-Var algorithm applies a gradient-based minimization such as steepest descent, which relies on the derivative of the model state with respect to the initial conditions. For this purpose we have to assure that the mapping of the initial state to the model state at a certain time instant is differentiable. This is the content of the following Lemma. Proof Let h ∈ H s . Denote by ψ 0 , ψ 0 + τ h ∈ H s two initial conditions and by ψ and ψ τ h the corresponding solutions of the coupled model equations and Let Ψ be the solution of the linearized equations, which is linearized around ψ. Then Ψ satisfies ∂Ψ ∂t with zero forcing and initial condition Ψ (t 0 ) = h. The assertion of the lemma is proven if we have shown that y(τ ) The function y = (y a u , y a h , y o u , y o h ) solves the equation dy dt with initial condition y 0 = 0. If we introduce The expression for k o u is analogous to k a u but with u a replaced by u o . The forcing k o θ is given by We prove now the following two inequalities: where K is a exponentially growing but bounded function on T . This proves assertion i). For assertion ii) it follows from Lemma 1 that Analogous estimates apply to k o u , k o θ and this proves ii). Combining i) and ii) we conclude that We show now an upper bound on the right hand side of (82) with an analogous estimate for the ocean term in (82). Defineψ := ψ τ h − ψ, i.e. for the atmospheric component we haveû a := u a τ h − u a andθ a := θ a τ h − θ a . According to Theorem with C(t) bounded on T . With the definition ofψ this implies The adjoint model of the coupled system (6) is defined as adjoint of the equations that are linearized around a model trajectory (56). The adjoint equations are explicitly given by the following equations (see e.g. [20]) Pe a Θ a +F a Θ .
and with forcing termsF . Observe the "inverse coupling" in the adjoint equations. While in the coupled Eqs. (1)-(5) and in the linearized equations (56) the ocean is influenced through the atmospheric component via the velocity equation and the atmosphere is coupled to the ocean via the advection-diffusion equation, these roles are reversed in the adjoint equations above. In analogy to (6) we write the linearized equations in the following form the cost functional (9)-(11) will no filled with a precise mathematical definition that is consistent with the analysis of Sect. 4. The model dynamics that we are considering consist of trajectories in C(T , H s ) and are controlled by initial conditions in H s . Let observations ψ obs and a background guess ψ back be given. We define the cost functional by The background term is given by is a non-negative index set for the order of the Sobolev spaces in the background term. We introduce the following notation for index sets of the background term The definition of the observational term unfolds to (cf. (20, (21)) with the index set I t obs s obs given as follows. Let s obs := (s obs u a , s obs θ a , s obs u o , s obs θ o ) ∈ Z 4 and t obs := (t obs u a , t obs θ a , t obs u o , t obs θ o ) ∈ Z 4 be given such that t obs ≤ s obs . Define now Note that the indices in the observational term are allowed to be negative, while the indices for the background term are non-negative.
The following theorem establishes the existence of stationary points of the cost functional. We refer to these points as "optimal initial conditions".

an index set for the order of the Sobolev spaces in the background component of the cost functional that is chosen such that
Let s obs := (s obs u a , s obs θ a , s obs u o , s obs θ o ) ∈ Z 4 be an index set for the order of the Sobolev spaces in observational component of the cost functional that is chosen such that Then there exist optimal initial conditionsψ 0 ∈ H s b for the coupled data assimilation problem (8) using the cost functional (88).
Remark 4 (Cost functional for smooth and non-smooth observations) Let observations be given that are more regular than the model dynamics, such that s * ≥ s b . Following (94) in Theorem 4 we choose the observational norm s obs = s b . Then the observational term of the cost functional will involve derivatives up to degree s b . This will not use the full regularity of the observations in case that s * > s b .The lower bound of the degree of derivatives that enter the cost functional is given by the lower index of the observational index set t obs ∈ Z 4 and can be chosen independently.
For less smooth observations that are for example square integrable only, we have s * = 0. Consequently the observational norm is to be chosen as s obs = s * . Now only the L 2 -norm is used in the observational term of the cost functional. The lower bound t obs ∈ Z 4 can be chosen such that the desired scale selectivity is implemented.
These examples illustrate that the highest degree of the derivatives that appear in the cost functional J obs , measured by the upper index s obs in I[t obs , s obs ] (see (92)), is determined by the regularity of the model and by the regularity of the observations. The lowest degree of derivatives that enter J obs is given by the second multi-index t obs . This index can be chosen freely and it can even by negative. With negative indices one filters out small scale features (see Remark 5 below).
where the components referring to velocities are vector Laplacians, while the remaining two are scalar Laplacians. For negative Sobolev space indices the small scales are filtered out via the inverse Laplacian and are excluded from the minimization of the cost functional. The decision whether positive or negative indices are appropriate is a design decision of the data assimilation process and depends on the problem under investigation.

Proof of Theorem 4
Let (ψ 0,n ) n ⊆ H s b (Ω) be a minimizing sequence of initial conditions for the data assimilation problem. We denote by ψ n := M[ψ 0,n ] ∈ C(T , H s b (Ω)) ∩ L 2 (T , H s b +1 (Ω)) the corresponding solutions of equations (1)- (5). The model-observation difference satisfies (M[ψ 0 ] − ψ obs ) ∈ L 2 (T , H s obs (Ω)), with s obs satisfying (92). Since the model error covariance operator R preserves the space (cf. (26)) it follows that R(M[ψ 0 ] − ψ obs ) ∈ L 2 (T , H s obs (Ω)) and from (91) we infer that the cost functional is well-defined. From (88) follows that the sequence of initial conditions (ψ 0,n ) n is bounded in H s b , i.e, there exists a c > 0 such that uniformly for all n ∈ N From Theorem 1 we see that the sequence of associated solutions (ψ n ) n is bounded in Since H s b +1 is compactly embedded in H s b we conclude that a subsequence, still denoted (ψ n ) n , exists and a limitψ, such that (ψ n ) n converges strongly in For the limitψ lower semi-continuity implies for dt. Consequently We show now that the limitψ is a regular solution of (1)-(5) in the sense of Definition 1. We can adopt the arguments from the proof of Theorem 1 (cf. Step 4) to show thatψ ∈ C(T , H s b ).
We consider the components of (ψ n −ψ), and denote them by ( f This proves the convergence of (ψ n ) n in C(T ; H s ) toψ for s < s b , because (ψ n ) n converges in L 2 (T , L 2 ) and is bounded in L 2 (T , H s b ). From the strong convergence in C(T ; H s ) for s < s b and the density of The weak continuity implies for τ ∈ [t 0 , t 1 ] From (49) follows lim This proves the continuity of the H s b -norm of the solution at initial time From (96) and the weak convergence of (ψ n ) n toψ in This implies thatψ ∈ L 2 (T , H s b +1 ). Consequently there exists a set E ⊆ T of Lebesguemeasure zero such that for all τ ∈ T \E it holds thatψ(·, τ ) ∈ H s b +1 . This implies that for all δ > 0 there exists a t * 0 < δ such thatψ(·, t * 0 ) ∈ H s b +1 . If we useψ t * 0 :=ψ(·, t * 0 ) as initial condition we can repeat all the arguments of our proof to establish the existence of a solutionψ ∈ C([t * 0 , t * 1 ], H s * ), with s * < s b + 1. The two solutionsψ,ψ coincide on their joint interval of existence [t 0 , t 1 ] ∩ [t * 0 , t * 1 ]. We obviously have for the two endpoints t 1 ≤ t * 1 and henceψ,ψ coincide on [t * 0 , t 1 ]. Since δ > 0 was arbitrary we haveψ ∈ C((t 0 , t 1 ], H s b ) and combined with the continuity at t 0 (see (54)

Remark 6
The geostrophic balance u ⊥ ∼ ∇θ constitutes an important constraint on large scale Atmosphere-and Ocean dynamics. The H s -approach drives the gradient ∇θ towards ∇θ obs and u ⊥ towards u ⊥,obs and thereby automatically accounts for geostrophic balance, provided the observational data are in approximate geostrophic balance. Therefore an addition of a penalty term to the cost functional to prevent the deviation from geostrophic balance is not needed.
where M s , ν * is defined in Theorem 2.
Proof Subtracting the two model solutions ψ and φ leads to a difference equation that resembles the linear equation (56) with linear variables U a = u a 2 − u a 1 , Θ a := θ a 2 − θ a 1 and The assertion follows now from Theorem 2.

Calculation of Minimizers and Convergence of Gradient Algorithm
In this section we study the convergence of a gradient based algorithm to calculate the optimal initial conditions for the data assimilation problem (8) with cost functional specified by (88). We demonstrate convergence by invoking a classical result about convergence of gradient algorithms (see below Lemma 9). This Lemma involves the second derivative of the cost functional. We calculate this derivative with the help of the second-order adjoint equations of the model.

Characterization of Local Minima
The existence of a minimizer allows us to investigate the problem of establishing an algorithm for the calculation of this minimum.
Proof For a minimizerψ 0 , which exists according to Theorem 4, the Gateaux derivative vanishes such that J (ψ 0 ; h) = 0 for all perturbations h. We calculate the Gateaux derivative of J at an arbitrary state ψ in direction h, by using (19), as follows where we have applied the chain rule and the fact that Ψ := DM[ψ 0 ] Dψ 0 h, according to Lemma 7, satisfies the linearized equation. Now we define the forcing of the adjoint equation bỹ F := α∈I[t obs ,s obs ] α R M[ψ 0 ] − ψ obs and its initial condition byψ(t 1 ) = 0. Then Lemmas 7 and 8 imply that For a minimum we have J (ψ 0 ; h) = 0 for all h, which implies with (107) and we finally deriveψ

Remark 7
On the left-hand-side of (106) we find the linear operator L ψΨ := ∂Ψ ∂t + N * [ψ](Ψ )+LΨ −C(Ψ a ,Ψ o ). In order to solve the equation L ψΨ =F, the right-hand-side with the observational information has to be in the range of L ψ .

Remark 8
The appearance of the Sobolev-norm H s in the background term is equivalent to a smoothing of the adjoint field at time t = t 0 via the smoothing operator S −1 s b . Since the adjoint field at time t = t 0 is identified with the gradient of the cost functional with respect to the initial conditions (cf. Theorem 5) this implies a smoothing of the gradient. The Sobolevnorm with negative index H −s in the observational term leads to a smoothing of the adjoint forcing, which potentially may results in a more regular adjoint field at time t = t 0 .
Remark 9 (The case of two uncoupled models) The results on data assimilation in sections 5 and (6) apply also to the two uncoupled models that is constructed if once replaces the coupling term by an appropriate external forcing (cf. Remark 3). Then our proofs imply the existence of optimal conditions (Theorem 4) and the characterization of the optimal initial condition by an adjoint condition (Theorem 5).

Convergence of Gradient-Based Descent Algorithm
In this section we always assume that the assumptions of Theorem 4 are satisfied. The goal of this section is to prove the convergence of an iterative gradient based method for determining the optimal initial condition. In order to prove convergence we investigate the Hessian of the cost functional and its computation via the second-order adjoint equations. This iterative gradient algorithm reads as follows: be a start value for the initial condition that lies within a ball around the optimal initial condition. Choose λ 0 > 0 and iterate for n = 1 . . . as follows: 1. Integrate the coupled equations (1)-(5) with initial condition ψ n 0 to obtain ψ n (t  5. Increase n, update the stepsize λ n such that and go back to (1) until a stopping criterion is satisfied.

Remark 10
The difference between the iterative algorithm above and classical data assimilation algorithms are the presence of the Laplace operator in the adjoint forcing in step 2, and the occurrence of the smoothing operator S −1 s b in steps 3 and 4. The smoothing operator takes care that the adjoint stateψ n resides in the same space as the initial state ψ n+1 0 . Both modifications are a consequence of the Sobolev-norms in the background and observational term of the cost functional.
The algorithm above shows also that the required modifications can without fundamental difficulties be integrated into an existing data assimilation framework.
The next lemma gives a conditions on the convergence of gradient algorithms in a Hilbert space in terms of the second derivative of the cost functional.

Second-Order Adjoint Equations
The second-order adjoint equations are derived by linearizing the system of model and adjoint equations (1)- (5) and (85). For more information we refer to [20,37]. The evolution of second-order adjoint variablesΨ : where the additional terms on the right-hand side are defined by The oceanic terms G oŪ , G oV , G oV are defined analogously. The forcing is denoted bȳ For the data assimilation problem under investigations the forcing is given in terms of the linearized statē From the previous theorem we infer that the right-hand side of (111) is well-defined in H s , i.e. for W ∈ H s (Ω) we have The convergence of the descent algorithm is the content of the following Theorem.
Proof To ease notation we denote the Sobolev index of the background term by s := s b . We establish the convergence of the descent algorithm in Sect. 6.2 by invoking Lemma 9. The necessary bounds on the cost functionals derivative are obtained from the regularity of the second-order adjoint by means of equations (110) For an idealized coupled atmosphere-ocean model we could show that that this formulation leads to a solvable optimization problem for which we can prove the existence of optimal initial conditions. The coupled optimization problem is also computable in the sense of local convergence of a gradient-based descent algorithm to optimal initial conditions. This work can be extended and continued in several respects. Most importantly, the impact of the Sobolev-type of cost functional has to be investigated in numerical experiments. These experiments need to study the difference between a welldefined (Sobolev-type) cost functional suggested here and a standard cost functional with square-integrable norms only. The scale-selective filtering property of the Sobolev norms has to be investigated experimentally with respect to its physical impacts on the optimal initial condition that is determined by coupled data assimilation. The filtering has also to be compared against regularization approaches such as using increased dissipation in the adjoint equations as compared to the nonlinear equations to regularize the computation of the gradient of the cost functional (see e.g. [17]). Most notably it has to study if our scale selective filtering is beneficial in avoiding local minima during the cost functional minimization.
For both coupled and uncoupled data assimilation algorithm the modelling of multivariate covariance matrices for model error and observational error is of paramount importance. Modelling the error covariance between different components of the state vector of the coupled system poses a fundamental challenge. In our approach derivatives are additionally taken of the model-background difference, weighted by the model error covariance matrix B. An analogous procedure is applied to the model-data difference (see (89) and (91)). The interaction between the derivatives, created by the Sobolev norm, and the two error covariance matrices constitutes a new constituent of the data assimilation algorithm that was not addressed in this paper and remains to be understood. Since we have not focused on model error covariance modelling we have used for simplicity a single model error covariance operator that is applied to all derivatives of the model-background difference. This does not necessarily have to be the optimal choice and one can imagine to use different error covariance operators for different derivative terms. The standard cost functional with the L 2 -norm has an interpretation as statistical least-square estimation technique (see e.g. [18], [40]). If there is a corresponding interpretation of the Sobolev-norm cost functional as a Bayesian estimator is an open question.
We mention also a complementary research path towards understanding and practical implementation of coupled data assimilation. Here one makes a compromise on the complexity of the assimilation algorithm but not on the coupled model, i.e. to apply less sophisticated data assimilation algorithms to complex three-dimensional coupled atmosphere ocean equations. An example of such an assimilation technique are downscaling and continuous data assimilation algorithms that have been used in a series of papers (see e.g. [4,13] and references therein) to study fundamental aspects of data assimilation for a variety of (uncoupled) models.
The final frontier of coupled data assimilation is of course the data assimilation for coupled models consisting of three-dimensional general circulation models of atmosphere and ocean. For such models well-posedness theorems are not available and the knowledge about the regularity of the solution is limited. The deepest results is the series of papers by Lions-Temam-Wang [21][22][23], who established for a coupled atmosphere-ocean model based on the (hydrostatic) primitive equations the existence of weak solutions, with uniqueness remaining an open question and only weakly continuous dependency on initial conditions. For uncoupled primitive equations the breakthrough result of [9] establishes well-posedness for initial data in H 1 . For these equations the solvability of the variational data assimilation problem using weak solutions has been shown in [3]. The extension to strong solutions remains open. For coupled general circulation model we are not aware of any solvability and computability result of the data assimilation problem. In order to make progress towards general circulation models we have to abandon our notion of classical solutions of the atmosphere-ocean equations and extend our approach to more general notions such as weak and strong solutions.