A pathwise parameterisation for stochastic transport

In this work we set the stage for a new probabilistic pathwise approach to effectively calibrate a general class of stochastic nonlinear fluid dynamics models. We focus on a 2D Euler SALT equation, showing that the driving stochastic parameter can be calibrated in an optimal way to match a set of given data. Moreover, we show that this model is robust with respect to the stochastic parameters.


Introduction
A fundamental challenge in observational sciences, such as weather forecasting and climate change predictions, is the modelling of uncertainty due, for example, to unknown or neglected physical effects, and incomplete information in both the data and the formulation of the theoretical models for prediction. Various dynamical parameterisation approaches have been proposed to tackle this challenge, see e.g. [7], [4], [12], [5], [1]. Of particular interest are the recently developed Data Driven models, that accommodate uncertainty by predicting both the expected future measurement values and their uncertainties, based on input from measurements and statistical analysis of the initial data. To effectively incorporate uncertainty in the data driven approach, such predictions are made in a probabilistic sense. Additionally, a data assimilation procedure is used to take into account the time integrated information obtained from the data being observed along the solution path during the forecast interval as "in flight corrections".
In the geoscience community, data assimilation (DA) refers to a set of methodologies designed to efficiently combine past knowledge of a geophysical system (in the form of a numerical model ) with new information about that system (in the form of observations). DA is a central component of Numerical Weather Prediction where it is used to improve forecasting by adjusting the model parameters and reducing the uncertainties. To achieve this, a stochastic feedback loop between the model and the observation may be introduced: the assimilation of more data during the prediction interval will then decrease the uncertainty of the forecasts based on the initial data, by selecting the more likely paths as more observational data is collected. This is the basis of the so-called ensemble data assimilation which uses a set of model trajectories that are intermittently updated according to data.
A key step for ensuring the successful application of the combined stochastic parameterisation and data assimilation procedure, is the "correct" calibration of stochastic model parameters. For Stochastic Advection by Lie Transport (SALT) and Location Uncertainty (LU) models, current numerical methods for calibration, see [4], [1], [5], have largely been inspired by the physical interpretation of the models derivations. More specifically on the assumption that the flow map is decoupled into a slow scale mean part and a fast scale fluctuating part. In the references mentioned before, it was shown that these methods are effective and led to successful combination of data driven models and state of the art data assimilation techniques.
In this work, we wish to investigate the feasibility and viability of probabilistic pathwise approach for calibration. Our general aim is to explore such ideas for a wide class of nonlinear stochastic transport models. This will be very useful in data assimilation problems, as in real world applications the signal is usually observed through discrete observations, but no results of this type for SALT or LU models have been obtained before. Currently, Lagrangian particle trajectories are simulated starting from each point on both the physical grid and its refined version, then the differences between the particle positions are used to calibrate the noise. This is computationally expensive and not fully justified from a theoretical perspective. In the same spirit as [3] but with a more complicated noise term and without any smoothing effects of a Laplacian, we propose an approach which uses high-frequency in time and low-frequency in space observations of a single path of the solution, to rigorously infer properties of the stochastic parameters. The knowledge of the noise is crucial for determining the behaviour of the solution and for assessing to what degree the solution of the coarse resolution SPDE deviates from the solution of the fine resolution PDE in the model reduction procedure, so an optimal calibration of the noise parameters is relevant from both a theoretical and an applied perspective.
In this work we look at stochastic calibration for the two-dimensional incompressible Euler equation in vorticity form. This stochastic equation models the local rotation of a fluid flow in the presence of spatial uncertainties and it has been derived from fundamental principles in [7]. This equation is a key ingredient in modelling phenomena in oceanography and in order to ensure that it efficiently encodes the small-scale variablity in the upper part of the ocean, one needs to specify the stochastic parameters based on real observations. One of main issues in parameter estimation using real data is the fact that the model parameters do not map to observations in a unique way (model identifiability problem, see e.g. [2]). For this reason, we believe that a probabilistic approach is much more suitable.
The 2D Euler equation in the form derived in [7] and studied in [4], [5] and [9] reads: and (W i ) i∈N is a sequence of independent Brownian motions. Global well-posedness for equation (1) has been studied in [9] and the numerical and data assimilation perspective has been studied in [4] and [5]. In [9] the authors have proven that equation (1) admits a uniques pathwise solution which lives in the Sobolev space W k,2 (T 2 ) (k ≥ 2) when ω 0 ∈ W k,2 (T 2 ) and can be extended to L ∞ (T 2 ) when ω 0 ∈ L ∞ (T 2 ).
In this paper we consider the following SPDE on the two-dimensional torus T 2 = R 2 /Z 2 , driven by a 1-dimensional Brownian motion W : where u and ω are as above and • denotes Stratonovich integration. We impose the following condition on the stochastic parameter ξ, in the same spirit as (2): with k ≥ 2. This condition ensures that for any f ∈ W 2,2 ( Remark 1. We can view the stochastic part as a space-time noise (ξ, W ) where the spatial component is given by ξ and the time component is a standard Brownian motion. This perspective is many times useful in numerical applications where (ξ • dW t ) · ∇ is implemented as a random operator applied to the solution ω.
The problem of parameter estimation, known also as statistical inference, is technically challenging for such (infinite-dimensional) SPDEs driven by transport noise, as most methods used in the literature benefit from a diagonalizable structure of the underlying space-covariance matrices. This structure is specific for additive noise and therefore it does not apply in our case. Also, most results are obtained for stochastic variations of the heat equation, which contain a smoothing Laplace operator (see for instance [3]). Our model does not contain a Laplacian a priori, and therefore we cannot exploit the properties of a heat kernel. These makes the analysis much harder.

Contributions of the paper
In this work, we focus on equation (3) from two perspectives: • First, we show that the driving stochastic parameter ξ can be calibrated in an optimal way to match a set of high-frequency in time given data. This is done using a forced and damped version of the equation and a parametric form of the stream function and the corresponding stochastic parameter which is implemented using an orthonormal basis. Our technique can be explicitly applied to calibrate the 2D Euler model using real oceanic data and we intend to do this in coming work.
• Second, we show that the original 2D Euler model is robust with respect to the stochastic parameters ξ in the sense that if we consider two couples (ω 1 , ξ 1 ) and (ω 2 , ξ 2 ) which solve equation (3), then the L 2 distance between ω 1 and ω 2 can be controlled using the initial conditions and the difference between ξ 1 and ξ 2 only (see Section 4). This is important in applications as it shows that if we consider approximate values for ξ, the corresponding model solution remains close to the true solution.

Structure of the paper
In Section 2 below we present the problem formulation. In Section 3 we introduce the methodology. In Section 4 we prove the robustness of the original model and in Section 5 we present the numerical results.

Problem formulation
Let (Ω, F, (F t ) t≥0 , P) be a filtered probability space and W a one-dimensional Brownian motion adapted to the complete and right-continuous filtration (F t ) t≥0 . We assume we are given a finite sequence of high frequency in time vorticity fields, that are denoted by ω * ti (x), i = 1, . . . , N , and are adapted to (F t ) t≥0 . We take the view that the ω * ti 's are the given observation data. We further assume that ω * ti ∈ W k,2 (T 2 ). Writing ω ξ to denote solutions to the model (3) for a given vector field ξ, the generic problem we are interested in is to find a ξ so that solutions to (3) matches the data as best as possible, i.e. arg min for some suitable norm. The dimension of the observations currently coincides with the number of sources of noise, that is we have a determined system. However, in practice this is not always a realistic assumption and in future work we will look at underdetermined or overcomplete systems i.e. when the number of noise sources is larger than the dimension of the observation operator.
In general, the infinite dimensional optimisation problem (7) may be too hard to solve in practice. We thus make concrete the form of ξ. Let (e j ) j∈N be an orthonormal basis in L 2 (T 2 ). We assume the following parametric form for the stream function of ξ, which is henceforth denoted by ζ, where α j are reals. Then and the optimisation problem (7) then reduces to finding the coefficients α j .

Methodology
We will first introduce a couple of known results for vorticity equations, for further discussions on the topic see [10] or [11]. The link between the vorticity and the velocity vector field in equation (3) is uniquely established using the Biot-Savart operator K: with where G is the Green function of the operator −∆ on T 2 and k = (k 1 , k 2 ), k ⊥ = (k 2 , −k 1 ). It is known that, for any k ≥ 0, there exists a constant C k,2 , independent of u such that u k+1,2 ≤ C k,2 ω k,2 .
If ψ : The reconstruction of u from ω is ensured by the incompressibility condition ∇ · u = 0 and a periodic, distributional solution of ∆ψ = −ω is given by From (3) we have in which for simplicity of notation, we defined B s (x; ω) := u s (x) · ∇ω s (x). Combining (12) with the Biot-Savart law (10) we obtain Now consider the "kinetic energy" which is not conserved in the SALT case. Using Itô's lemma, we obtain where ., . is the standard L 2 (T 2 ) pairing. For a stochastic process X t defined on a filtered probability space, its quadratic variation is defined by where t 0 = 0 < t 1 < · · · < t n = t is a partition of the interval [0, t], ∆t i := |t i − t i−1 |, and the convergence holds in probability (see e.g. [8]). In our case, for the semimartingale e t , we have Substituting in the parametric form for ξ, we obtain the following quadratic form Due to global existence and uniqueness of solutions to (3), [e] t exists globally P-almost surely. Thus the right hand side of (18) can be arbitrarily well approximated by its truncation for all t i.e. for a given > 0, there exists M such that Additionally, from the computational perspective, for any fixed M , the linear map that defines the truncated quadratic form is symmetric and positive definite 1 , and thus can be diagonalised by a unitary linear map. Doing so, we obtain the following linear problem where denotes the truncation error of (18), λ j are the eigenvalues of the associated linear map, andα j 's are the original α values which get rescaled by the unitary matrix from the diagonalisation. We can estimate [e] t using the high frequency in time data ω * and (16), assuming the discrete sum converges fast enough, The estimate[ e] t,N could then be used in (21) to get an estimate for theα. One could then recover the original α's by applying the unitary linear map that's associated with the diagonalisation of A ij .

Remark 2.
The calculations shown in this section could also be directly applied to the vorticity equation (12) with little modification. In the numerics section of this work, this is what we did. The linear system for estimation, however, would would also depend on the space variable. Recall that the system (3) conserves spatial integrals of ω t (it is a Casimir of the system, see [4]), the quadratic variation of the spatially integrated ω would be trivial.

Robustness
Theorem 3. Let ω 1 , ω 2 be two solutions of the 2D Euler equation (3) and ξ 1 , ξ 2 the corresponding stochastic parameters for each of these two solutions. More precisely, (ω , ξ ) for = 1, 2 solves Then for any p ≥ 2 there exist non-negative constants C = C(p, T ), C 1,p , C 2,p , such that Proof of Theorem 3. Letω := ω 1 − ω 2 ,ū = u 1 − u 2 ,ξ = ξ 1 − ξ 2 . Thenω satisfies By the Itô formula: The difference of the nonlinear terms is analysed explicitly in [9] pp. 9: We used here that ∇ω 1 t 4 ≤ C ω 1 t k,2 and ū t 4 ≤ C ū t 1,2 ≤ C ω t 2 . Also, since u 2 is divergence-free, We estimate the difference terms which include ξ 1 and ξ 2 in Lemma 5 below. Note here that the term ω t , ξ 2 · ∇ ξ 2 · ∇ω t is negative. To estimate the stochastic term let Z := ξ 2 2 . By the Burkholder-Davis-Gundy inequality, for arbitrary p ≥ 2 there exists a constant C p such that 2 p/2 t where [D] t is the quadratic variation of the martingale D t . Note that [D] t can be controlled using the control for Q in Lemma 5 so we have k,2 Z with k ≥ 3. We then have, for t ∈ [0, T ]: since D t is a martingale so its expectation is zero. Therefore for any p ≥ 2, with γ as above. We used here the fact that the 2D Euler equation (3) has a unique global solution in W k,2 (T 2 ) for k ≥ 2, as proven in [9]. Moreoverξ is deterministic and time-independent so Proof. For the difference terms which include ξ 1 and ξ 2 we use that with k ≥ 3.

Numerical results
We implemented the main equation (3) with added forcing and damping, on a unit square domain with doubly periodic boundary conditions, where we chose r = 0.001 and Q(x) = 0.01(cos(8πy) + sin(8πx)). We considered a ξ whose parametric form with respect to the Fourier basis consists of only one α. The stream function of our chosen ξ is given by ζ(x, y) = α (cos(k 1 2πx) cos(k 2 2πy) − sin(k 1 2πx) sin(k 2 2πy)) .
Note that and To discretise (27), we followed the methods documented in [4] -a mixed Finite Element method was used for the spatial derivatives, and an explicit strong stability preserving Runge-Kutta scheme of order 3 was used for the time derivative. We added the forcing and damping terms to balance out the energy dissipation caused by discretisation. This helped with maintaining the statistical homogeneity of the numerical solution, once it has reached a spun-up state from some set initial state. Our choice for the set initial state was ω(0, x, y) = sin(8πx) sin(8πy) + 0.4 cos(6πx) cos(6πy) + 0.3 cos(10πx) cos(4πy) + 0.02 sin(2πy) + 0.02 sin(2πx).
Spatially, we chose a grid of size 64×64 cells. We first spun-up the system until it reached a statistical equilibrium state. This statistical equilibrium state was then set as the initial condition for our experiment. Figure 1 shows a snapshot of the obtained initial condition. Over the spin-up phase, we used a smaller α = 0.000001 value and k = (2, 4). The time horizon for the experiment data was chosen to be the unit interval, i.e. we generated data ω * (t i , x) for 0 = t 0 < t 1 < · · · < t N = 1. See Figure 1 for snapshots of ω * (0, x) and ω * (1, x). When generating the data, we used the larger value of α = 0.001. This was so to avoid any possible numerical issues 4 when we attempted to recover α from data.
For these concrete experiment parameter choices, we chose to follow Remark 2, and directly worked with the quadratic variation of ω.
Thus our estimate for α is given byα Remark 6. In (35), we applied spatial averaging before dividing to avoid possible division by zero issues, and to help with stablising estimation errors.  Shown on the right is a snapshot of the basis element B t (k, x) (cos(k 1 2πx) sin(k 2 2πy) + sin(k 1 2πx) cos(k 2 2πy)) 2 , which was approximated using the same N number of data samples.

Remark 7.
The assumption that we know k in advance is of course too strong from the applications viewpoint. The aim of this experiment is to test the strength of the pathwise approach under the assumption of "perfect knowledge". If we cannot accurately recover α in this case, then getting a good estimate for α using the pathwise approach may be too difficult or impractical in more realistic scenarios. Figure 2 shows snapshots of[ ω] t,N (x) and B(t, k, x)e k (x). We applied (35) for different values of N . In each case, the time integral that constitutes B(t, k, x) was approximated using a simple trapezoidal rule, for which the same N number of data snapshots were used. Figure 3 shows the results for the relative error for the different values of N . The results show that, in the worst case of N = 2500, the relative error was no greater than 0.89. This translates to an absolute error of range of 0.001 ± 0.00089. The best case was when all 200000 data samples were used to estimate α, the relative error in that case was 0.00135. This suggests convergence and stabilisation of the sum for[ ω] t . For future work, we aim to test the pathwise approach for cases in which we do not know the exact selection of basis elements for ξ. Further, we wish to extend and test these ideas on coarse grained PDE data and compare with the results that were obtained in [4] using previously developed calibration methods.