Stochastic model reduction: convergence and applications to climate equations

We study stochastic model reduction for evolution equations in infinite dimensional Hilbert spaces, and show the convergence to the reduced equations via abstract results of Wong-Zakai type for stochastic equations driven by a scaled Ornstein-Uhlenbeck process. Both weak and strong convergence are investigated, depending on the presence of quadratic interactions between reduced variables and driving noise. Finally, we are able to apply our results to a class of equations used in climate modeling.


Introduction
In this paper we study stochastic model reduction for a system of nonlinear evolution equations in infinite dimensional Hilbert spaces which is general enough to cover well-established systems of equations used in climate modeling. The big advantage of such a procedure is the lower complexity of the reduced equations, since complexity is still one of the major issues when predicting the evolution of systems over time spans which are typical for climate rather than meteorology.
Following [14], we assume that the climate variables of the system, i.e. those more relevant to climate prediction, evolve on longer times scales than the unresolved variables, which can be modelled stochastically and have a typical time scale much shorter than the climate variables. To be able to close the equation for the climate variables, the task is to understand the effects of unresolved variables when stretching time to climate-time. In what follows, we also refer to climate variables as resolved variables.
Climate modeling typically starts with equations containing quadratic nonlinearities which can describe many features of oceanic and atmospheric dynamics at meteorological time-see [15,20]. In abstract mathematical terms, such equations would look like (1) dZ t dt = f t + AZ t + B(Z t , Z t ), where A : H → H is a linear operator, B : H × H → H is a bilinear operator, and f is an external forcing term.
Here, the variable Z taking values in H is supposed to be a complex mix of climate and unresolved variables, and hence the space H has to be 'big enough' to 'host' variables of that type. We therefore choose H to be a separable infinite-dimensional Hilbert space. Now, there is a variety of procedures to identify climate variables in practice which we will not discuss in this paper. We rather assume that climate variables have been identified spanning a Hilbert-subspace H d ⊂ H, and we further assume that the orthogonal complement H ∞ , H = H d ⊕ H ∞ , gives the space of unresolved variables. When projecting Z onto H d , H ∞ via the projection maps π d , π ∞ , equation (1) gives raise to two equations (2) dX for the collection of climate variables X = π d (Z) and unresolved variables Y = π ∞ (Z), respectively. The next step, called stochastic climate modeling, consists in replacing the complicated nonlinear selfinteraction term in (3) by a linear random term. Such a replacement could be justified by the assumption that quickly varying fluctuations of small scale unresolved variables are more or less indistinguishable from the combined effect of a large number of weakly coupled factors, usually leading to Gaussian driving forces via Central Limit Theorem. But such effects would only become visible at climate time and not at meteorological time used in (2) & (3), so that we are looking to replace B 2 22 (Y ε −1 t , Y ε −1 t ) by a linear random term, stretching meteorological time to ε −1 t, using a small parameter ε ≪ 1.
In this work, following [14,17], we suppose that where µ, σ are positive constants, andẆ is Gaussian noise, white in time, and coloured in space. This way, the parameter ε is used to scale time, but also to adjust for the size of the involved variables when scaling time.
All in all, when introducing the notation X ε t = X ε −1 t for climate variables at climate time, and Y ε t = ε −1 Y ε −1 t for the effect of unresolved variables at climate time, equations (2) The hope is now that, when ε tends to zero, climate variables at climate time can be approximated by a random variableX which solves a closed stochastic equation with new coefficients not depending on unresolved variables any more. Of course, these new coefficients will be functions of the coefficients of equations (4) & (5), and the process of finding these new coefficients is called stochastic model reduction.
Stochastic model reduction of finite-dimensional systems similar to (4), (5) were extensively discussed in [14]. However, one of the key steps, i.e. proving the convergence X ε →X, ε ↓ 0, was kept rather short. Indeed, the authors first sketch a perturbation method based on a theorem by T.G. Kurtz, [13], which is their general method, and they then briefly describe a so-called direct averaging method for special cases based on limits of solutions to stochastic differential equations. In particular the latter method lacks a certain amount of rigour because the convergence of the involved stochastic processes is not shown, and this gap has not been closed in follow-up papers-see [6,5,10] for example.
In this paper we are not only closing this gap, but also develop a new method of proof. We at first identifyX, and then study in very detail the convergence X ε →X, ε ↓ 0, when X ε solves an evolution equation of type where Y ε is a decoupled infinite-dimensional Ornstein-Uhlenbeck process satisfying Since equation (6) is more general than (4), once stochastic model reduction is established for the system (6),(7) with decoupled unresolved variables, it also follows for an interesting subclass of systems of type (4), (5) with coupled unresolved variables-see Theorem 5.3. Part (ii) of this theorem deals with the case of linear scattering, that is B 1 22 = 0, and in this case we achieve showing 'strong' convergence in probability: on a given climate time interval [0, T ]. When the quadratic interaction term B 1 22 is non-trivial, we can only show convergence in law, as stated in Theorem 5.3(i). We refer to Remark 4.2(ii) for an argument which suggests that one cannot expect much more than a weak-type convergence in the general case. This insight of course sheds new light on the results given in [14] and follow-up papers.
At this point it should be mentioned that thoughout this paper we assume that H d is finite-dimensional which seems to be a natural choice when it comes to climate modeling. However, our arguments are general and can be adapted to infinite dimensional subspaces, see [4].
In the case of the more abstract system (6),(7), the process Y ε will eventually behave like white noise, as ε ↓ 0. This limiting behaviour is fundamental for finding the limit of equation (6) because it opens the door for using arguments similar to those of Wong & Zakai in [21]. Of course, Wong & Zakai formulated their results in a finite-dimensional setting. There have been earlier attempts of proving similar results in infinite dimensions, we refer to [1,19,18], for example. However, we would like to emphasise that these earlier attempts dealt with piecewise linear approximations of noise rather than an infinite dimensional Ornstein-Uhlenbeck process. Note that it is typical for Wong-Zakai results that stochastic integral terms of limiting equations are interpreted in the sense of Stratonovich.
The paper is structured as follows.
In section 2, we formulate our main results on the convergence of solutions to (6), (7). First, the limiting equation forX is identified, and then conditions for weak convergence X ε →X are stated in Theorem 2.2(i). However, when (6) is a simpler equation, i.e. β = 0, even the stronger convergence (8) can be shown under the same conditions-see Theorem 2.2(ii).
In section 3, we give the proof of Theorem 2.2(ii). The proof relies on preliminary localization and discretization arguments which allow to consider, instead of (8), its discrete version for only finitely many t k ∈ [0, T ].
In section 4, we give the proof of Theorem 2.2(i) which, at the beginning, requires a careful analysis of the quadratic term β(Y ε t , Y ε t ), but otherwise is an adaptation of the proof given in the previous section. In section 5, we eventually use the results of section 2 to prove Theorem 5.3 under quite natural conditions, thus making the connection to our main applications in climate modeling.

Notation and main Result
Let H d , H ∞ be real separable Hilbert spaces. Assume that H d is finite-dimensional, dim H d = d, with given orthonormal basis e 1 , . . . , e d , and that H ∞ is infinite-dimensional with given orthonormal basis f 1 , f 2 , . . .
Given two Banach spaces U, V , let L(U, V ) denote the Banach space of continuous linear operators mapping U to V , endowed with the operator norm.
For each ε > 0, consider the pair of stochastic processes (X ε , Y ε ), taking values in H d × H ∞ , where X ε satisfies (6) over a fixed finite time interval [0, T ], and Y ε is given by where W is a Wiener process in H ∞ , with real-valued time parameter and self-adjoint trace class covariance operator Q ∈ L(H ∞ , H ∞ ).
(iii) Since Q is trace class, both W and Y ε take values in H ∞ . Without loss of generality, we can assume that Q is diagonal with respect to the chosen basis {f m } m∈N of H ∞ , that the eigenvalues of Q form a sequence {q m } m∈N satisfying m q m < ∞, and that E W t , f m 2 H∞ = |t|q m , for all m. Adopting the useful notation W ε t = t 0 Y ε s ds, we can write (6) in integral form as We make the following assumptions on these coefficients: Of course, by standard theory (see [2] for example), equation (9) admits a unique local strong solution, for each ε > 0.
Next, we introduce the limiting equation for the wanted limitX of the processes X ε , when ε ↓ 0. First, define the so-called Stratonovich correction term C : is matrix notation for the linear map σ(s, x) ∈ L(H ∞ , H d ) with respect to our chosen basis vectors; second, let Then, our limiting equation would read where W is the same Wiener process used to define Y ε in Remark 2.1, while {W ℓ,m } ℓ,m∈N is a family of independent one-dimensional standard Wiener processes, which are also independent of W . Again by standard theory, this equation admits a unique local strong solution, too. However, in view of the interpretation of our results with respect to climate modeling, it is natural to further assume that (A4) both equations (9) and (12)  Another assumption specific to climate modeling, which has been advocated in [14], for example, would be that the mean of β(Y ε s , Y ε s ) is zero, for any s, with respect to the invariant measure of the corresponding Ornstein-Uhlenbeck process. Since all Y ε are stationary under P, see Remark 2.1(ii), this assumption would translate into where Y ε,ℓ s is short notation for the coordinates Y ε s , f ℓ H∞ , ℓ = 1, 2, . . . , s ∈ [0, T ]. As a consequence, we also impose the zero-mean condition . . , d, which is usually true for equations from fluid-dynamics and can in general be understood as a renormalization procedure for the quadratic term.
The following theorem is the main result of this paper. In what follows, to keep notation light in proofs, when no confusion may occur, the norms in both spaces H d and H ∞ will be denoted by | · |, and their scalar products by ·, · . The symbol means inequality up to a multiplicative constant, possibly depending on the parameters of our equations, but not on ε.

Strong convergence
In this section we give the proof of Theorem 2.2(ii), which is divided into several steps. First, by localization, we argue that we can restrict ourselves to |X ε t |, |X t | ≤ R, for some large R, which is effectively leading to Lipschitz continuity of the coefficients of (9).
Second, we discretize the problem, which allows us to reduce the proof of Theorem 2.2(ii) to its discrete version: for only finitely many t k ∈ [0, T ]. Here, we choose t k = k∆, where ∆ = ∆ ε is a positive parameter whose ε-dependence has to be carefully chosen in the proof-see Remark 3.8. Third, we prove the above discretized version.

3.2.
Discretization. Fix ε > 0. We show that the expectation on the right-hand side of (14) can be replaced by an expectation of the same quantity, but with the supremum taken over a finite number (diverging to ∞, as ε ↓ 0) of times t k , see Corollary 3.6 below.
To start with, we have the following useful a priori estimate.
Proof. First, the result is true in one dimension-see [12,Theorem 2.2].
In the infinite dimensional case, by Hölder's inequality, we can suppose p > 2. Therefore, since Q is trace class with eigenvalues satisfying m∈N q m < ∞, when α = (p − 2)/p, we obtain that Now, we introduce the discretization of the time interval [0, T ]. Let ∆ > 0, and let [T /∆] be the largest integer less or equal than T /∆. In what follows, ∆ will also depend on ε, in a way to be determined later. Also, to make it easier to bound terms by powers of ε or ∆, without loss of generality, we will always assume that both ε, ∆ are less than one.
The next two lemmas control the excursion of X ε between adjacent nodes in terms of the ratio ∆/ε.
Proof. The claim easily follows from Lemma 3.4 with τ = ∆, and the inequality Proof. First, by Hölder's inequality with q > 1 and Corollary 3.5, we have that Thus, the proof can easily be completed by combining the above and Lemma 3.2, while taking into account where [t/∆] is again our notation for the floor of t/∆.

3.3.
Proof of the discretized version. By (14) and Corollary 3.6, it suffices to prove To start with, by (9) without β-term, (A2), and (15), we have that Similarly, using (12) instead of (9), the processX satisfies Having in mind to apply Gronwall's lemma, it turns out to be useful to summarise the contributions of the right-hand sides of (17), (18) as follows: for any h = 1, . . . , [T /∆], which splits the difference X ε h∆ −X h∆ into 5 sums.
We at first prove that the 2nd and the 5th sum can be neglected when proving (16). The summands of the 5th sum are discussed in Lemma 3.7 below. The contribution of the 2nd sum though is more delicate and requires a martingale argument similar to that of [8, Theorem VI.7.1].
The remaining sums will be controlled in terms of the difference X ε −X itself, which allows them to be estimated via Gronwall's lemma.
Of course, under assumption (A1), the function F is uniformly continuous when restricted to In what follows, we will denote by ω F : [0, T ] → [0, ∞) the (local) modulus of continuity of F (·, x): for every t, s ∈ [0, T ], and x ∈ B R (0).
Obviously, the function ω F vanishes at zero, and without loss of generality, it can be chosen to be both non-decreasing and continuous.
Lemma 3.7. For any p > 1: Proof. Throughout this proof, we will frequently make use of (A1),(A2) without explicit mentioning. For I k 1 , by Hölder's inequality and Lemma 3.2, We now consider I k 8 . Here, the idea is to convert Y ε -increments into X ε -increments via integration by parts since X ε -increments are easier to control. This way, applying Lemma 3.1 and Lemma 3.3, In a similar way, for J k 1 and J k 3 , now applying Lemma 3.4, For the last sum J k 5 , by Burkholder-Davis-Gundy's inequality and Lemma 3.4, Remark 3.8. The estimates given in Lemma 3.7 motivate the following choice of how ∆ = ∆ ε should behave when ε goes to zero: . Such a choice is always possible. Indeed, without loss of generality, we can suppose ω σ (t) > t 1/2 , for every t ∈ [0, T ], and then define ∆ = ∆ ε via ∆ ω σ (∆) = ε 2 .
We now discuss the 2nd sum on the right-hand side of (19), that is Taking the conditional expectation of c k ℓ,m (∆, ε) with respect to F k∆ yields where the following representation of Y ε , has been used, and this conditional expectation can easily be calculated as Now, since j=1,...,d D j σ i,m (k∆,X τ ε ∧(k∆) )σ j,ℓ (k∆,X τ ε ∧(k∆) ) is F k∆ measurable, for every ℓ, m ∈ N, i = 1, . . . , d, each process M i h , h = 1, . . . , [T /∆], given by is a discrete martingale with respect to the filtration (F h∆ ) Proof. Combining Doob's maximal inequality and martingale property gives To eventually cover the remainder of the 2nd sum on the right-hand side of (19), after subtracting the martingale term M h , we introduce Proof. The proof is an easy consequence of (20 All in all, Lemma 3.9 and Lemma 3.10 together imply showing that the 2nd sum on the right-hand side of (19) can be neglected, like the 5th one, when ε ↓ 0, and ∆ = ∆ ε behaves as described in Remark 3.8.
Recall that we wanted to control the remaining sums in terms of the difference X ε −X itself, which is obvious for the first and third sum on the right-hand side of (19). However, in case of the fourth sum, applying almost the same martingale argument used in case of the 2nd sum, each term I k 5 can be formally replaced by (k+1)∆ k∆ C(k∆, X ε k∆ ) − C(k∆,X k∆ ) ds, subject to a sufficiently small ε-correction, eventually leading to the wanted contraction argument in this case, too.

Weak convergence
In this section we prove part (i) of Theorem 2.2. The idea of proof is similar to the one of part (ii), except that now β = 0 is possible. It is the existence of this bilinear term which prevents us from proving convergence in probability-we only succeed in showing convergence in law (see Remark 4.2(ii)).
First, we prove weak convergence of the bilinear term. Second, we prove convergence in law of X ε , ε ↓ 0, using bounds similar to those obtained in section 3.

4.1.
Weak convergence of the bilinear term. For any ε > 0, define the process U ε by where Y ε is the stationary Ornstein-Uhlenbeck process introduced in Remark 2.1. By (A5), the process U ε has zero-mean, and, using (A3), its second moments, can be calculated to be for i, j = 1, . . . , d, and ℓ, m ∈ N. Recalling (11), using the above short notation, we also have that for any ℓ, m ∈ N, and hence where M ε is a d-dimensional continuous local martingale, while the process V ε satisfies by combining (A3) and Lemma 3.1.
The above representation of U ε , though very simple, has been used in a variety of cases in a fruitful way, see for instance [16] or [7]. Observe that, by (A5), the Itô-correction actually cancels out, being otherwise a contribution of order ε −1 . The process U ε , nevertheless, has got an interesting limit in law: , and ω is a Q-Wiener process, like W . Furthermore, η and ω are independent.
Proof. First, by (22), it is sufficient to prove the proposition for (M ε , W ) instead of (U ε , W ).
Next, taking into account of Lemma 3.4 & Corollary 3.5, respectively, would hold true when replacingX byX ε , on the other. Therefore, when expanding X ε andX ε as in (17) & (18), but including the β-term, and then arguing as in the proof of Theorem 2.2(ii) in section 3, it would immediately follow that X ε ·∧τ ε R −X ε ·∧τ ε R → 0, in probability, ε ↓ 0, for any R > 0, once the following lemma is also available. Lemma 4.3. Assume that ∆ = ∆ ε behaves as described in Remark 3.8. Then, Proof. To start with, write which creates two summands, for any fixed 0 We estimate the impact of each summand separately. First, using |Dσ(r, Second, we approach following the method used when discussing the 2nd sum on the right-hand side of (19) in the proof of Theorem 2.2(ii), but now for triple moments of Y ε . Indeed, define and take the conditional expectation with respect to F k∆ , that is + Y ε,ℓ k∆ δ m,n q n + Y ε,m k∆ δ ℓ,n q n + Y ε,n k∆ δ ℓ,m q ℓ Next, for each i = 1, . . . , d, the process M i h , h = 1, . . . , [T /∆], given by is a martingale with respect to the filtration (F h∆ ) , and arguing as in the proof of Lemma 3.9 yields So, it remains to prove that the remainder, after subtracting the martingale term M h from (25), also vanishes, when ε ↓ 0. For i = 1, . . . , d, the ith coordinate of this remainder reads and we can easily calculate the below bound, finishing the proof of the lemma.
Corollary 4.4. For any R > 0, if ∆ = ∆ ε behaves as described in Remark 3.8, and hence X ε ·∧τ ε R −X ε ·∧τ ε R → 0, in probability, ε ↓ 0, in particular. The above corollary suggests that it would be sufficient to show thatX ε ·∧τ ε R →X ·∧τ ε R , in law, when ε ↓ 0, subject to some procedure allowing to let R go to infinity, afterwards. So, we at first prove the weak convergence for fixed R, and then discuss the limit-procedure for R → ∞.
Modify the coefficients F, σ outside the set {(t, x) : |x| < R} in such a way that the new coefficients F R , σ R , but also Dσ R , are globally bounded, and that both functions F R (t, ·) and Dσ R (t, ·) are globally Lipschitz, uniformly in t ∈ [0, T ].
Of course,X ε ·∧τ ε R coincides withX ε,R ·∧τ ε R , whereX ε,R denotes the solution to the equation obtained when replacing the coefficients of (24) by F R , σ R , and the Stratonovich correction C R associated with σ R . Also, let X R denote the solution to the equation obtained when replacing the coefficients of (12) by F R , σ R , C R . Proposition 4.5. Fix R > 0. Then,X ε,R converges toX R , in law, when ε ↓ 0.
by boundedness of the coefficients on the above right-hand side, we obtain that where Burkholder-Davis-Gundy's inequality gives E sup t≤T | t 0 σ R (s,X ε,R s )dW s | T 1/2 .
LetF be the P R -completion of B, and let (F t ) t∈[0,T ] be the smallest filtration the process (ξ, η, ω) is adapted to, on the one hand, and which satisfies the usual conditions with respect to P R , on the other. Also, introducẽ F n , (F n t ) t∈[0,T ] in a similar way with respect to P R,εn , n ∈ N. Now, it easily follows from Proposition 4.1 that, on (Ω,F, P R ), the following distributional properties must hold for the pair of processes (η, ω): η is a d-dimensional Wiener process with covariance ( ℓ, , ω is a Q-Wiener process, η and ω are independent. Introduce Therefore, applying [2, Theorem 8.2] to the pair of process (M R , ω) yields on (Ω,F , P R ), or an enlargement of this space we still denote by (Ω,F, P R ), where W R is another Q-Wiener process, which, by the above representation, even P R -almost surely coincides with ω, so that Thus, equation (26) can be written as where ω is a Q-Wiener process, while η is a d-dimensional Wiener process, independent of ω, and with covariance ( ℓ,m∈N b i ℓ,m b j ℓ,m ) d i,j=1 . Observe that the processX R satisfies the same type of equation, as ℓ,m∈N b ℓ,mW ℓ,m from (12) is a d-dimensional Wiener process with covariance ( ℓ,m∈N b i ℓ,m b j ℓ,m ) d i,j=1 , too. But, since this type of equation admits a unique strong solution, the laws of ξ andX R must be the same, provingX εn,R →X R , in law, when n ↑ ∞. However, the same argument applies to any converging subsequence, and the limit will always be the same, finally provingX ε,R →X R , in law, when ε ↓ 0.
It remains to discuss how R can be taken to infinity. Recall thatX is the solution of (12), and it is not difficult to see thatX R converges toX, in law, as R → ∞. Now take a function Then, and becauseX ε,R →X R , in law, when ε ↓ 0, we deduce that where the last probability converges to zero, when R → ∞, becauseX is a global solution.
As a consequence, for any In what follows, the above equations will always have initial conditions (x 0 , y 0 ), where x 0 ∈ H d can be chosen arbitrarily, while y 0 = 0 −∞ ε −2 e ε −2 s dW s will be fixed to ensure pseudo stationarity of the scaled unresolved variables. Note that fixing y 0 ∈ H ∞ this way would not restrict the initial data of the reduced equations.
In fluid dynamics settings like (1), it is customary to assume that A is self-adjoint, and that the full nonlinearity is skew-symmetric: B(z ′ , z), z H = 0, z, z ′ ∈ H, see [15]. We therefore make the following assumptions on the projected coefficients: and finally we will need the analogue of (A5), that is (C4) ℓ∈N B 1 22 (f ℓ , f ℓ ), e i H d q ℓ = 0, for all i = 1, . . . , d. Note that the latter condition is indeed satisfied for many fluid-dynamics models-it usually holds independently of the structure of the noise because B 1 22 (f ℓ , f m ), e i H d would be zero on the diagonal, when ℓ = m, for all i. Next, we bring equations (27),(28) into a form which makes them comparable to (6), (7). Using the definition of y 0 , we have the following mild formulation of (28), is a stationary Ornstein-Uhlenbeck process. Plugging (29) into (27), X ε alternatively satisfies when using the abbreviation Since Z ε s is close to A 2 1 X ε s + B 2 11 (X ε s , X ε s ), for small ε, and since both terms B 1 22 (Ỹ ε s , Z ε s ), B 1 22 (Z ε s , Z ε s ) will be shown to vanish with ε, too, the process X ε should be close toX ε satisfying    (i) If (C4), then X ε converges in law, ε ↓ 0, to the unique processX satisfying (32).
All in all, Theorem 2.2 implies that both parts (i) & (ii) of Theorem 5.3 hold true when replacing X ε byX ε . Thus, it is sufficient to prove convergence in probability of X ε −X ε to zero, ε ↓ 0, uniformly on compact subsets of a localising stochastic interval, which can easily be shown following the lines of proof of Theorem 2.2.