Time-Limited Balanced Truncation for Data Assimilation Problems

Balanced truncation is a well-established model order reduction method which has been applied to a variety of problems. Recently, a connection between linear Gaussian Bayesian inference problems and the system-theoretic concept of balanced truncation has been drawn (Qian et al in Sci Comput 91:29, 2022). Although this connection is new, the application of balanced truncation to data assimilation is not a novel idea: it has already been used in four-dimensional variational data assimilation (4D-Var). This paper discusses the application of balanced truncation to linear Gaussian Bayesian inference, and, in particular, the 4D-Var method, thereby strengthening the link between systems theory and data assimilation further. Similarities between both types of data assimilation problems enable a generalisation of the state-of-the-art approach to the use of arbitrary prior covariances as reachability Gramians. Furthermore, we propose an enhanced approach using time-limited balanced truncation that allows to balance Bayesian inference for unstable systems and in addition improves the numerical results for short observation periods.


Introduction
One of the greatest challenges of 21st century mathematics is integrating large amounts of data into computational models to obtain novel insights into the state and dynamics of a system.This is the typical framework for data assimilation.Data assimilation combines available, mostly high-dimensional, models with observations at certain time steps, often through time-dependent PDEs.These models occur in various domains of application, such as meteorology [18,25,26], geosciences [15,21,43], and medicine [20,30].They are often unstable.
The typical approach to data assimilation problems is Bayesian inference; see, for instance, [17,35,53].We consider the parameters x x x ∈ R d to be the realisations of a random variable.The distribution of said random variable is what we want to infer.Some prior information is given.The time-dependent observations m m m are related to the parameters via a given forward model operator G G G with additive noise ǫ ǫ ǫ: m m m = G G G(x x x) + ǫ ǫ ǫ. ( From ( 1), the likelihood m m m|x x x is derived.Combining this with the measurements, the prior is updated to the posterior distribution of interest x x x|m m m.Usually, the initial state of a dynamical system, or rather its distribution, is to be inferred.Often, the state is high-dimensional and therefore forward model simulations are expensive.This problem is well known from systems and control theory, where reduced-order modelling has been established; see, e.g., [1,6,7].Both non-intrusive data-based methods [5,44] and intrusive model-based methods [2,28] have been in the spotlight of past and current research.Model-based approaches are promising in the data assimilation context.One such method is balanced truncation (BT) reducing a stable, linear time-invariant system by projecting it onto a subspace of simultaneously easily reachable and observable states.Variants for time-varying systems [34,50], inhomogeneous systems [3,29], bilinear systems [4,19] and even nonlinear systems [31,32] have been developed.The application of BT to data assimilation has appeared first in the context of 4D-Var [11,27,37,38].
Recently, Qian et al. [46] have reinvigorated interest in the connection between system-theoretic model order reduction and Bayesian inference.Balanced truncation for linear Gaussian Bayesian inference has been considered.In this paper, we relate the results of Qian et al. to the application of BT to 4D-Var.We generalise the existing results on BT for linear Gaussian Bayesian inference to arbitrary Gaussian priors and unstable systems by proposing time-limited balanced truncation (TLBT) for Bayesian inference.The proposed method improves on the state-of-the-art approach, especially for small observation intervals appearing in many data assimilation applications.
The remainder of the paper is structured as follows: At the beginning, the frameworks for two data assimilation problems and the basics of BT are presented; Sect. 2 Sect. 3 introduces how time-limited balanced truncation is applied in these situations.Sect. 4 presents numerical results before the paper is concluded in Sect. 5 with the discussion of directions for future work.

Setting and Background
We consider the linear dynamical system where x x x(t) ∈ R d is the state of the system at time t ∈ R, and A A A ∈ R d×d is a linear model operator describing the state evolution.The initial state x x x(0) = x x x 0 is unknown.Noise-polluted outputs m m m k ∈ R dout of the system are only available at discrete time points 0 < t 1 < . . .< t n via the measurement model with C C C ∈ R dout×d the state-to-output operator, typically with d out ≪ d.The noise ǫ ǫ ǫ k ∼ N (0 0 0, Γ Γ Γ ǫ ) is independently and identically distributed additive Gaussian with known positive definite covariance Γ Γ Γ ǫ ∈ R dout×dout .The observation interval of the system is [0, t n ].We always start at time 0, with the unknown initial condition, and set the end time to t e = t n .
The data assimilation task is to infer the unknown initial state from the noisy measurements.To this end, the Bayesian statistical approach is applied and the initial condition is assumed to be a-priori Gaussian distributed: with Γ Γ Γ pr ∈ R d×d a given positive definite prior covariance matrix.Substituting the Gaussian prior (2c) in the linear Gaussian measurement model (2b) yields a Gaussian likelihood for the measurements conditioned on the initial condition and the measurement times where m m m ∈ R ndout , G G G ∈ R ndout×d , and Γ Γ Γ obs ∈ R ndout×ndout as follows: For a Gaussian prior and a Gaussian likelihood, the posterior is again Gaussian [14]: The matrix H H H ∈ R d×d is called the Fisher information of m m m [39] and defined as The advantage of this data assimilation setting is that the posterior statistics (3) can be obtained in a closed form.The challenge is the computation of the posterior mean and covariance for a high-dimensional state space, that is for d large.This implies several multiplications with the high-dimensional forward map G G G and its transpose, which might only be available through costly evolution of the underlying dynamical system.Benner, Qiu, and Stoll have proposed a low-rank compression of the posterior covariance [8].Other than this, the usually small dimension d out of the observations suggests that the dynamics of interest may be modelled accurately when restricted to a low-dimensional subspace.Model order reduction approaches, such as those presented in this paper, aim to reduce the dimensionality of the problem to make computations more efficient while maintaining sufficient accuracy.

Incremental Four Dimensional Variational Data Assimilation (4D-Var)
4D-Var is a variational data assimilation technique originating in weather forecasting [51].In 4D-Var, instead of a continuous model evolution, a discrete time-varying system for k = 0, 1, . . ., n is considered: where x x x k ∈ R d is the state of the system at time t k ∈ {0, . . ., t n }, and A k : R d → R d is the nonlinear model operator that describes the evolution of the state from time t k to t k+1 .The measurement model suffers from centered Gaussian observational noise The goal is again to infer the initial state x x x 0 from the noisy measurements {m m m k } k=0,...,n .We assume that an a-priori estimate x x x b 0 of the initial state is given-the so-called background state-which is error-prone: Linear Gaussian Bayesian inference and 4D-Var are closely related: discretising (2) with equispaced points in time t k = k • h, k = 0, . . ., n and t n = t e , we obtain a system of the more general type (4) with A k = A A A disc = e A A Ah and C k = C C C for all k.A closed form solution to the posterior statistics for the initial state can only be computed for systems of type (4) if A k is linear and time-invariant.In 4D-Var, for arbitrary A k , the maximum a posteriori estimate for x x x 0 , i.e., x x x * 0 = arg min x x x0 J(x x x 0 ), is obtained by minimising the cost functional derived from the posterior distribution; see, for instance, [23]: subject to the forward model dynamics where x x x k is the state at time step Evaluating the functional J(x x x 0 ) is very costly because of the nonlinearity of the operators A k and C k .Incremental 4D-Var overcomes this by using consecutive linearised versions of the operators and minimising a quadratic functional [16].This approach to minimising ( 5) is an inexact Gauss-Newton method [36].Linearisations of A k and C k about the model state x x x k lead to Jacobians A A A k and C C C k in the so-called tangent linear model: where ) for all k and δx x x k denotes a state perturbation.A discussion of how the observation and prior uncertainties influence (6) follows in Sect.3.1 The corresponding quadratic cost functional that has to be minimised for δx x x 0 is given by subject to the forward model dynamics inding the minimum δx x x * 0 of ( 7) is called inner loop.The result of the inner loop is the next increment δx x x n ] and expected observations are obtained by evaluating the nonlinear model based on the current step's initial condition x x x (i) 0 .Despite the promising idea of obtaining x x x * 0 incrementally, the computation of ( 7) is still very costly in high-dimensional settings, even in the linear case.For this reason, the system (6) should be reduced to obtain lower-dimensional matrices replacing A A A k , C C C k and Γ Γ Γ pr in (7).Details on how balanced truncation is used to make the minimisation of this cost functional tractable is discussed in Sect.3.1

Model Reduction by Balanced Truncation
Projection-based methods are popular in model order reduction for dynamical systems [6,7].The system-theoretic concept of balanced truncation [41,42] is one of them.We provide some background from systems theory in Sect.2.2.1 before explaining the proposed method in Sect.2.2.2The theoretical explanations in these sections are based on a book by Antoulas [1].The time-limited approach to balanced truncation is described in detail in Sect.2.2.3

System-Theoretic Basics
Balanced truncation (BT) is usually considered for a linear time invariant (LTI) system with continuous (t ∈ R + ) or discrete (t ∈ Z + ) state equation as follows: x x x(0) = 0 0 0.
The output equation and the initial condition are unified for both cases.Here, x x x, A A A, and C C C are as in (2).The input of the system u u u(t) ∈ R din is through the port B B B ∈ R d×din .The observer output y y y(t) ∈ R dout is without any noise.We, exceptionally, use the notation x x x(t + 1) instead of x x x k+1 for discrete time in this subsection to unify the concept of BT for discrete and continuous time.It is important to notice, however, that the physical meaning of the system matrices A A A and B B B differs from continuous time to discrete time: when we discretise the continuous-time system dx x x(t) The idea of BT is to truncate the system by removing the states that are difficult to reach and difficult to observe.The concepts of reachability and observability rely on the so-called reachability and observability Gramians, spanning the (sub)spaces of reachable and observable states.The Gramians are functions of time and the timelimited reachability Gramian P P P (t) ∈ R d×d at time 0 < t < ∞ is defined by (9a) The notion of observability of states at a fixed time t is dual to reachability.The time-limited observability Gramian The Gramians are associated with the reachability energy and the observability energy of a state, respectively: where x x x 2 [P P P (t)] −1 is the minimal energy necessary to reach the state x x x at time t and x x x is the maximal energy produced by the output of an observable state x x x at time t.The time-limited Gramians are, thus, functions of time informing us how much energy is necessary to reach or observe a state at a fixed, finite time t.
By monitoring the system from initial time to infinity, it is possible to determine how much energy is required to reach or observe a state at any point in time.The limits of the finite Gramians ( 9) and (10) exist only under certain restrictive conditions: the system is required to be stable, i.e., the eigenvalues of A A A lie in the open left half plane for the continuous-time setting or lie inside the unit circle for the discrete-time setting.Then, e A A Aτ is bounded for τ → ∞ and, hence, P P P (t) and Q Q Q(t) are bounded for t → ∞ and their limits P P P ∞ = lim t→∞ P P P (t) and Q Q Q ∞ = lim t→∞ Q Q Q(t) exist; similarly for the discrete-time case.The stable LTI system (8) has infinite reachability Gramian P P P ∞ ∈ R d×d defined by (12a) The advantage of infinite Gramians is that they can be computed efficiently as the solutions to the continuous-time Lyapunov equations A A AP P P ∞ + P P P The drawback is that their use limits the application of BT to stable systems.This is our motivation to discuss an approach to BT for unstable systems, which are common in data assimilation applications.

Balanced Truncation
Projection-based model reduction methods aim to find a suitable low-dimensional subspace of the state space containing the important features.A projection operator projects the state variables and matrices of the system (8) onto this low-dimensional subspace.In our case, the low-dimensional subspace consists of the states that are both easily reachable and easily observable, quantified by the energies (11).The reachability energy is low and the observability energy is high.The directions obtained by BT, therefore, maximise the generalised Rayleigh quotient The dominant eigendirections of the matrix pencil After having determined the dominant directions v v v k , the goal is to build a balancing transformation T T T such that T T T −1 P P PT T T −T = P are expressed in a common basis and can be considered simultaneously.The balancing transformation is used as the projector to reduce the system dimensions.Balanced truncation for stable LTI systems is based on the singular value decomposition and given by Algorithm 1.
are given by P

Time-limited Balanced Truncation
Time-and frequency-limited balanced truncation has been developed by Gawronski and Juan [24].This section is based on their work.The idea of time-limited balanced truncation is to use the time-limited Gramians ( 9) and ( 10) for a time interval of interest instead of the infinite Gramians ( 12) and ( 13).In the data assimilation problems, we consider the interval T = [0, . . ., t e = t n ] and use the notation P P P T := P P P (t e ) and The time-limited Gramians are the solutions of (modified) Lyapunov equations: Theorem 2.1 The time-limited Gramians P P P T and Q Q Q T for a continuous LTI system (8) with stable A A A are the unique, positive-semidefinite solutions to the modified continuous-time Lyapunov equations

B, and
The time-limited Gramians P P P T and Q Q Q T for a discrete system (8) with stable A A A are the unique, positive-semidefinite solutions to the modified discrete-time Lyapunov equations The balanced truncation Algorithm 1 is applied to the system (8) using the Gramians P P P = P P P T and Their matrix square roots may be obtained as the solutions of modified Lyapunov equations from Theorem 2.1.We call this procedure time-limited balanced truncation (TLBT).Using TLBT is arguably the way to go in the setting of data assimilation, since there are only observations in some fixed time-interval T = [0, t e ].The end time t e is often comparatively small and we are not interested in the behaviour at an infinite time horizon.
The assumption that the system (8) and hence A A A is stable is only necessary for the existence of the infinite Gramians P P P ∞ and Q Q Q ∞ .The time-limited Gramians P P P (t) and Q Q Q(t) are defined anyways, see ( 9) and (10).Hence, we drop the stability assumption for TLBT.Redmann and Kürschner [49] have shown that for continuous systems with unstable A A A and Λ(A A A) ∩ Λ(−A A A) = ∅ the Lyapunov equations from Theorem 2.1 can still be used for the computation of the time-limited Gramians, which is beneficial from a numerical perspective.A drawback is that, even for stable A A A, stability is not preserved for systems (14), which were reduced by TLBT, but additional assumptions or modifications are necessary [24].
In this section, two data assimilation approaches and the basics of balanced truncation and its time-limited version have been explained.In the next Sect.3, the details of (TL)BT in these two methods are discussed and compared.

Time-limited Balanced Truncation for Data Assimilation
The idea of balanced truncation (BT) from Sect.2.2 has been applied to 4D-Var for discrete time [11,27,38] and to linear Gaussian (LG) Bayesian inference for continuous time [46].An adaptation of (time-limited) BT to data assimilation is only possible if the underlying dynamical system can be considered as an LTI system for which Gramians are defined.In the following, we describe how this adaptation is done for the inner loop of 4D-Var (Sect.3.1) and LG Bayesian inference (Sect.3.2) and introduce time-limited BT for both cases.

TLBT within the Inner Loop of Incremental 4D-Var
BT for the 4D-Var method was first proposed by Lawless et al. [37] and then applied to settings with [27] or without [11,38] model error in the state dynamics (4a).In Sect.2.1 we have explained why a reduced order model is needed within the inner loop of incremental 4D-Var.
The system to be truncated is the tangent linear model (6).Let us assume that This is an important assumption and restriction, because BT as given in Sect.2.2 is only defined for LTI systems.In (6), the noise has been ignored to give the basic idea of incremental 4D-Var.With noise, the tangent model in the inner loop of 4D-Var taking into account the prior is: making sure that the covariance of the increment δx x x 0 corresponds to the prior covariance [27].Again, we assume the observation error to be time-invariant, i.e., R R R k = Γ Γ Γ ǫ for all k.A reduced version of ( 15) is: where the increment δx x x ∈ R d is projected by T T T −1 δx x x =: δx x x ∈ R r , with r ≪ d.System (15) is almost of type ( 8), but the input comes with a covariance and thus a different input port at time k = 0. Green [27] proposes to begin the summation for the time-limited Gramians with zero input port at k = 1.For time step k = 0 the summand A A A 0 Γ Γ Γ pr (A A A T ) 0 = Γ Γ Γ pr is added.With these adjustments, the time-limited Gramians (9b) and (10b) of ( 15) are: The relevant literature is inconsistent regarding the inclusion of the observation error from (4b) in the observability Gramian for the tangent linear model (15).From comparison to (4b), it holds that In the theses of Boess and Green [11,27] the observability Gramian for (15) was given by , which we do not consider correct.Instead, we prefer a time-limited formulation of Lawless et al. [38], which corresponds exactly to (17b).Bernstein et al. [10] state that a model reduction by BT with observability Gramian (17b) minimises the distance error between the ouput d d d k of the full tangent linear model (15) and the output d d d k of the reduced model (16), weighted by the ob- Weighting by the inverse of the covariance matrix means "penalizing errors in the approximation [...] more strongly in directions of lower [...] variance."[52, p. 11] This also makes sense here: In directions of low observational covariance, we are fairly confident that our observation is correct.With this explanation, we will from now on use (17b) as observability Gramian for TLBT in incremental 4D-Var.The time-limited observability Gramian for 4D-Var satisfies a modified discrete Lyapunov equation, The proof of this equation follows along the same lines as for the deterministic case; see, e.g., [1].Time-limited balanced truncation is applied to system (15) for the inner loop of incremental 4D-Var with Gramians (17).The balancing transformation T T T and its inverse T T T −1 from Algorithm 1 are the projection matrices to obtain the reduced system (16).If the linear tangent model (15) does not change during the outer loop iterations, the reduced system ( 16) may be computed once with TLBT for the entire incremental 4D-Var algorithm, making this technique particularly effective.
Standard BT in data assimilation is limited to stable systems.We overcome this by considering time-limited BT.Another approach for unstable discrete systems is αbounded balanced truncation, which has been developed for 4D-Var by Boess [12].The idea of α-bounded BT is to shift the system matrices by a parameter α so that the eigenvalues of A A A lie inside a disc of radius α around the origin.α-bounded BT requires experimentation on the choice of the parameter α [27] and is only applicable to discrete systems.We have, therefore, focused on a more classical system-theoretic concept and have introduced time-limited BT.In Sect. 4 we demonstrate that this approach additionally improves the accuracy for low end times.

Balancing Linear Gaussian Bayesian Inference
BT has been extended to continuous linear Gaussian (LG) Bayesian inference by Qian et al. [46].We generalise the concept to arbitrary prior covariances and unstable system matrices A A A. There are two main differences between the LTI system (8) and the LG Bayesian inference setting (2) that need to be resolved before applying BT: 1.In the LG Bayesian inference setting, there is no input function u u u(t).2. Unlike for the LTI system, the observations m m m k in Bayesian inference are noisy.

From Compatible to Arbitrary Prior Covariances as Reachability Gramians
Qian et al. [46] have solved the problem of the missing input in Bayesian inference by introducing the notion of prior-compatibility.It is not necessary to know the input port B B B explicitly but only to ensure that there is a B B B ∈ R d×din with A prior covariance matrix Γ Γ Γ pr fulfilling this is called prior-compatible.The prior-compatible Γ Γ Γ pr is used as infinite reachability Gramian for the LTI system with given A A A and possibly unknown B B B. It is also the time-limited reachability Gramian: P P P LG T = P P P LG ∞ = Γ Γ Γ pr .Prior-compatibility is not ensured in general.The spectral decomposition of A A AΓ Γ Γ pr + Γ Γ Γ pr A A A T may be used to manipulate the prior covariance and make it priorcompatible [46,Sect. 4.1.2].In practice, however, the prior distribution is usually not chosen arbitrarily, but based on expert knowledge or accumulated information.Changing the prior to make it compatible and artificially introducing an input port B B B may involve a loss of information.We want to justify that any Gaussian prior covariance is a time-limited reachability Gramian.
For the tangent linear model in 4D-Var the time-limited reachability Gramian is given by (17a).By comparison of the discrete linear tangent model (15) and the continuous LG Bayesian inference setting, we can use the continuous version of (17a) to compute the time-limited reachability Gramian for (2), which is The above comparison with established results from 4D-Var (see Sect. 3.1) overcomes the need for prior-compatibility for Γ Γ Γ pr and makes the application of BT to LG Bayesian inference more general.A numerical example to emphasise that the original prior should be used instead of a modified compatible one whenever possible is given in Sect.4.1 This approach is also well-founded in literature: time-limited reachability Gramians are the covariance matrices of the LTI system state at a fixed time [48] and the infinite Gramian is, thus, the total state covariance.The state equation (2a) evolves linearly without model error.Hence, the state evolution is deterministic and the full state covariance is introduced by the prior covariance as reachability Gramian.The POD-Gramian of an input-free system is constructed similarly [1].
Remark 3.1 The unknown initial condition to be inferred is not zero.It has a centered Gaussian prior and a Gaussian posterior distribution.BT is defined for a zero initial condition.It may therefore be questioned whether this method is well suited for application to data assimilation problems.Other systems theory techniques, however, require an input matrix; see, e.g., [1,6,7].This input would have to be artificially constructed by modifying prior knowledge, which is not necessary for BT.Therefore, we consider balanced truncation to be the best method so far.

A Noisy Time-Limited Observability Gramian
Regarding the pollution of the measurement m m m k by noise ǫ ǫ ǫ k ∼ N (0 0 0, Γ Γ Γ ǫ ) in the LG Bayesian inference setting (2) compared to the LTI system (8), the following is proposed [46, p. 28] defines the maximal energy produced by an observable state x x x.It is the squared distance of the output signal y y y to 0 0 0 in the L 2 (R)-norm.For noisy observations, instead of • • • 2 L 2 (R) , the Mahalanobis distance [40] from the equilibrium state 0 0 0 to the conditional distribution of a single measurement m m m We use the same reasoning for the definition of the noisy time-limited observability Gramian Q Q Q LG T ∈ R d×d for linear Gaussian Bayesian inference as: It is the unique, positive-definite solution to the dual modified Lyapunov equation similar to the discrete Lyapunov equation ( 18) for 4D-Var.Remark that T is also a noisy (discrete) time-limited observability Gramian.The Fisher matrix is a discrete summation approach to this same Gramian and could be used as such [46].This resembles an optimal low-rank method for LG Bayesian inference [52] not based on systems theory.
For (TL)BT, Algorithm 1 is applied to the LG Bayesian inference problem by using the Gramians (19) in Step 1.This leads to the reduced system with reduced forward map G G G (TL)BT and approximate Fisher information H H H (TL)BT : and gives the posterior mean and covariance approximations: The notation (µ µ µ pos,TLBT , Γ Γ Γ pos,TLBT ) denotes a reduced posterior obtained by TLBT and (µ µ µ pos,BT , Γ Γ Γ pos,BT ) a reduced posterior obtained by BT with infinite Gramians.
Remark 3.2 For initial state x x x(0) = 0 0 0 there exists an error bound for the output of the BT-reduced system (21) compared to the full system [1].For TLBT, error bounds have been derived for continuous time [47,49] and discrete time [45].These results are hard to adapt to Bayesian inference, since we infer an unknown non-zero initial state x x x(0) = x x x 0 .

Remark 3.3 BT for inhomogeneous, i.e., non-zero, initial conditions has been investigated by Heinkenschloss et al. proposing to augment the input matrix of the LTI system (8) by the initial condition before BT is applied [29]. This technique seems to be less useful for model reduction with changing or unknown initial conditions. Beattie et al. use splitting: the output is approximated by the simultaneous reduction
of two systems, one depending on the initial state x x x 0 = 0 0 0 and one with homogeneous initial condition [3].Regarding the application to Bayesian inference, we have a different objective: not the output approximation, but the reduced posterior statistics.

Computational Aspects of TLBT in Data Assimilation
The use of the time-limited noisy Gramians ( 17) and ( 19) in Algorithm 1 is straightforward if in Step 1 there are (low-rank) factors for P P P T = RR RR RR T and Q Q Q T = LL LL LL T .In LG Bayesian inference and 4D-Var, the time-limited reachability Gramian is equal to the prior covariance.Therefore, no Lyapunov equation needs to be solved.The time-limited noisy observability Gramian computation requires the solution of the modified Lyapunov equation (18) or (20).The main computational obstacle in solving the discrete or continuous modified Lyapunov equations is the computation of the matrix power A A A te or the matrix exponential e A A Ate for the right-hand sides.There is no direct square root approach.But direct approaches quickly reach their limit for high-dimensional problems.Slower iterative algorithms, like (block-)rational Krylov algorithms, may still compute a result.Shift selection is a typical bottleneck in these algorithms and they take longer than solving the Lyapunov equations for the infinite Gramians in low-dimensional settings; see, e.g., [33,45].Time-limited Gramians, however, allow the application of the BT Algorithm 1 to unstable systems, where Lyapunov equation solvers cannot be used.

System TL reachability Gramian TL observability Gramian
Standard TLBT ( 8) LG Infer.( 2) 4D-Var ( 15) In this section, time-limited balanced truncation has been introduced for the linear tangent model of 4D-Var and for LG Bayesian inference.Table 1 summarises the resulting Gramians for TLBT in data assimilation.By comparing the two systems in question, we have generalised the application of BT in linear Gaussian Bayesian inference to arbitrary prior covariances.We have provided a unifying definition for the observability Gramian in incremental 4D-Var.Our main contribution is that our approach can handle unstable system matrices A A A, which are common in data assimilation problems.In Sect. 4 we will see that TLBT further outperforms the standard BT approach for short end-times.

Numerical Experiments for TLBT in Data Assimilation
This section is dedicated to demonstrating the performance of TLBT in linear Gaussian Bayesian inference with numerical experiments.The MATLAB code reproducing our results is available at https://github.com/joskoUP/TLBTforDA.

Comparison of Compatible and Non-Compatible Prior Covariance
In the previous section we have argued that it is reasonable to set P P P LG T = Γ Γ Γ pr regardless of the prior-compatibility.The behaviour of non-compatible versus compatible priors in BT for LG Bayesian inference will be demonstrated by a numerical experiment.The reduced posterior statistics are given by (22).We include the optimal low-rank approach (OLR) to the posterior statistics (3) by Spantini et al. [52]: Consider the linear Gaussian Bayesian inference problem m m m = G G Gx x x + ǫ ǫ ǫ with forward matrix G G G ∈ R ndout×d , Fischer matrix H H H ∈ R d×d and a vector of measurements m m m = [m m m T 1 , . . ., m m m T n ] T .Prior and observation errors are assumed to be independent, centered Gaussians of the right dimensions with covariance matrices Γ Γ Γ pr and Γ Γ Γ ǫ , respectively.The posterior statistics of N (µ µ µ pos , Γ Γ Γ pos ) are approximated by μ µ µ pos and Γ Γ Γ pos as follows: Let (τ 2 i , v v v i ) d i=1 be the d generalised eigenvalue-eigenvector pairs of the matrix pencil (H H H, Γ Γ Γ −1 pr ) and let (w w w i ) d i=1 be the d generalised eigenvectors of the matrix pencil (G G GΓ Γ Γ pr G G G T , Γ Γ Γ obs ).Both sequences of generalised eigenvectors are assumed to be associated with a non-increasing sequence of eigenvalues and the w w w i are normalised w.r.t.Γ Γ Γ obs .The optimal posterior covariance approximation Γ Γ Γ pos of Γ Γ Γ pos in the Förstner-metric (out of class The optimal mean approximation μ µ µ pos of µ µ µ pos in the See the paper by Spantini et al. [52] for more details and explanations of the measures.
The expectation is taken over the joint distribution of x x x and m m m [39].
For the numerical experiments in this and the next subsection we have reproduced the benchmark examples by Qian et al. [46, cf. p. 22]. 1 In the first presented experiment, we have modified the code for the additional computations with the noncompatible prior.We always consider LTI systems starting at t = 0 and running up to different end times t e .Measurements are taken at discrete equidistant times t i = ih, (i = 1, . . ., n), t n = t e .The goal is to infer the initial condition x x x 0 by using reduced models of different ranks r.For our experiments, a true initial condition is drawn from N (0 0 0, Γ Γ Γ pr ) and measurements are generated from evolving the underlying dynamical system exactly and adding N (0 0 0, Γ Γ Γ ǫ )-Gaussian measurement noise.We consider a given prior non-compatible with the system matrix A A A. Using the code from [46, Appendix D], we create a modified compatible prior.
The reduced posterior statistics (22) for this problem are computed using BT with infinite Gramians for both the given non-compatible (NC) and the modified compatible (C) prior.We compare them with the full posterior statistics (3) using the Förstner metric (for the covariance) and the • Γ Γ Γ −1 pos -norm Bayes risk (for the mean) derived from the true non-compatible prior (-NC) and derived with the modified compatible prior (-C).OLR for both priors and both measures is also considered.For the Bayes risk computation, we run 100 independent experiments to obtain 100 different posterior mean estimates and compute the empirical Bayes risk among them.
We study the heat equation 2 [46] and the optimal low-rank approach (OLR) [52] of LG Bayesian inference for the heat equation.The first letters denote the prior used for the approx.posterior computation and the second the one used for the reference.Measurements are spaced h = 0.005 apart inside T = [0, 10] The results of the experiments are depicted in Fig. 1 Similarly to the results of Qian et al. [46], OLR (in blue/cyan) is superior to BT (pink/orange) for the same prior.BT with the true NC prior (orange circles) performs almost as good as OLR, but the modified compatible prior gives bad approximations in the original (NC) measures (pink circles).Considering these approximations in the measures with regard to the modified compatible prior (C), they are almost as good as the original ones (pink and cyan crosses).It is thus not useful to enforce prior-compatibility for noncompatible priors containing valuable knowledge as the modification changes the results significantly.Compatible priors, however, perform well in the measures derived from them.Building compatible priors may therefore be a way of constructing a prior that fits the system dynamics when good prior knowledge is not available otherwise.
Non-compatible priors are particularly important when dealing with unstable system matrices A A A. Time-limited BT is especially suitable for such systems in data assimilation, as will be demonstrated in the next experiments.

Comparison of TLBT and Standard BT with Infinite Gramians
We apply TLBT to benchmark examples and compare the approximate posterior statistics with full balancing (BT) and with the optimal low-rank approach (OLR).For the experiments in this section we consider the same setup as before with added code for the time-limited Gramian computation and with the following change: Γ Γ Γ pr is obtained as the solution of A A AΓ Γ Γ pr + Γ Γ Γ pr A A A T = −B B BB B B T for B B B according to the experiment; see [46].The TLBT-reduced posterior statistics (22) are computed as described in Sect.3.2 The true posterior statistics (3) are compared in the • Γ Γ Γ −1 pos -norm Bayes risk and the Förstner-metric with (µ µ µ pos,TLBT , Γ Γ Γ pos,TLBT ), (µ µ µ pos,BT , Γ Γ Γ pos,BT ), and (μ µ µ pos , Γ Γ Γ pos ) from OLR.The time-limited Gramians for TLBT are computed by the rational Krylov method by Kürschner [33, p. 1830]. 3We have to choose parameters: the approximation tolerance (tol) for the low-rank solution of f (A A A)B B B and the maximum number of rational Krylov iterations (maxit).The Lyapunov equation for the recent low-rank approximation is solved at every s-th step.We studied both the heat equation and the ISS1R module. 4All parameters are given in Table 2 The results depicted in Fig. 2 for the heat equation and in Fig. 3 for the ISS model are as expected: for low end times, especially t e = 1, TLBT performs significantly better than BT with infinite Gramians.This is caused by the faster decay of the Hankel singular values [33].With increasing end time t e the TLBT approximation results approach the BT approximation results.For the heat equation, OLR outperforms TLBT, whereas for the ISS model, optimality can be attained by TLBT, especially for lower reduction ranks r.For low end times, TLBT is able to reach the optimal OLR approximation even when standard BT cannot.This is a huge advantage of using time-limited Gramians.Especially in the ISS1R model for t e = 1 this behaviour can be observed.
OLR is superior to TLBT regarding approximation quality, but not regarding speed.The main advantage of (TL)BT over OLR is that G G G and H H H are computed in reduced versions of lower dimensionality.They are obtained by forwarding a reduced  [46] and the optimal low-rank approach (OLR) [52] of LG Bayesian inference for the heat equation model.
In the left panel, the normalised square roots δ i and δ T L i of the generalised eigenvalues of the matrix pencils ) are plotted, corresponding to the Hankel singular values rather than a full dynamical system (2).Computations based on a reduced forward operator G G G run faster.This is particulary relevant during the online computation of multiple posterior means for a series of measurements.OLR is optimal for linear Gaussian Bayesian inference.For BT, there are generalisations to time-varying [50] and nonlinear [4,31,32] systems that should be further explored in the context of data assimilation.

TLBT for Unstable Systems
Time-limited BT for unstable systems is applied to advection and diffusion for an instantaneous point release.For simplicity, we assume the system is one-dimensional and consider isotropic and homogeneous diffusion.For K(z, t), the location-and time-dependent concentration, the process is described by The diffusion coefficient is D = 0.02 and the mean flow is u = 0.01.The system is discretised using finite differences for state dimensions d = 200 and d = 1200, respectively.The system matrix A A A has both positive and negative eigenvalues and is unstable.The observation operator C C C with y y y = Cx Cx Cx = 1 d . . .σ obs = 0.008.The prior covariance is chosen to be Γ Γ Γ pr = I I I d and is used as timelimited reachability Gramian, P P P LG T = Γ Γ Γ pr .This Gramian is not prior-compatible, i.e., A A AΓ Γ Γ pr +Γ Γ Γ pr A A A T = A A A+A A A T is not negative-semidefinite, but as discussed in Sect.3.2.1 it is not necessary to assume prior-compatibility. 1), the algorithm is the same as for stable systems.To account for the instability of the system, which makes it unsuitable for long-term predictions, we use smaller end times than in the stable examples.The BT approach with infinite Gramians is not applicable to unstable systems.We, therefore, use a different approach by Qian et al. [46] for comparison: the Fisher matrix H H H is set as the observability Gramian.No Lyapunov equation solution is required, but the computation of the full Fisher matrix is expensive.We refer to this approach as BT-H.Fig. 4 illustrates that in our case with measurements spread quite far apart (h = 0.001), TLBT surpasses BT-H.With higher measurement frequency, the Fisher matrix approaches the time-limited observability Gramian and the results become similar.OLR ignores system stability and can be applied regardless of the system matrix A A A. As demonstrated in Fig. 4, TLBT recovers the optimal posterior mean Bayesian risk well, but OLR outperforms TLBT for the posterior covariance approximation in the Förstner metric.The error rates of TLBT (better than 10 −12 ) are still impressive and sufficient for applications.

After computing the observability Gramian
For Fig. 5 we increase the system dimension to d = 1200.This introduces positive eigenvalues λ + into the unstable system matrix A A A, which are about an order of magnitude higher than before.The increasing instability affects the numerical procedure: all posterior predictions become less reliable, especially for the largest end time i te = 1 δ T L i τ i Fig. 4 Comparison of time-limited balanced truncation (TLBT) and balanced truncation with Fisher matrix H H H as observability Gramian (BT-H) [46] and the optimal low-rank approach (OLR) [52] of LG Bayesian inference for the unstable discretised advection-diffusion equation; d = 200.Measurements are spaced h = 0.001 apart inside T = [0, te] for three different end times te = 0.1, 0.5, 1 t e = 1.This is caused by the blow-up of the values of e λ+τ in e A A Aτ for Q Q Q LG T , already for moderate end times t e .In the mean prediction, TLBT outperforms the state-ofthe-art BT-H, which is not robust to instabilities since it uses exponentials of A A A. OLR mean prediction also fails for unstable systems and only achieves the same level of error as TLBT.These results emphasise the importance of using a time-limited approach and short observation intervals for unstable systems.
The generalisation of BT to TLBT works well, but has a main disadvantage: the numerical algorithms require the solution of a Lyapunov equation.This is only numerically feasible for (anti)stable A A A. In TLBT for unstable systems, the integral for Q Q Q LG T has to be approximated or calculated according to its definition [33].To make TLBT faster and more generally applicable to large problems, further work in this direction is needed.
In this section we have demonstrated how TLBT extends the idea of Gramianbased model reduction for Bayesian inference to unstable systems.For short observation intervals TLBT significantly outperforms BT with infinite Gramians, despite the increased numerical cost of approximating the posterior statistics.

Conclusion
Model order reduction is central to data assimilation because models and data are typically high-dimensional and expensive to process.To this end, the adaptation of model reduction techniques from systems theory to data assimilation problems is essential.This work has further strengthened the link between the two research areas.
Time-limited balanced truncation (TLBT) enabled us to significantly generalise the theory of balancing Bayesian inference to systems with an unstable system matrix A A A and non-compatible priors.TLBT is, thus, particularly suited to data assimilation applications such as numerical weather prediction, where short-term forecasts are required for unstable systems [12,13].One of the popular techniques is incremental 4D-Var and TLBT is perfectly tailored to model reduction of unstable linear tangent models in this context.The numerical efficiency of TLBT needs further investigation, e.g., by using more advanced rational Krylov methods.
Interesting research questions in data assimilation include inhomogeneous initial conditions, errors in the dynamical forward model or time-varying systems.Similarities with systems theory may be instructive.Literature on such problems in the deterministic setting is available, but not extensive; see, e.g., [3,29,34,50].
Although our approach of using TLBT in data assimilation is more general than the existing ones, it is so far limited to linear Gaussian Bayesian inference.The next step in the exploration of model order reduction for Bayesian inference is the generalisation to nonlinear settings.The 4D-Var method could be a step in the right direction, as it already deals with nonlinear operators.Concepts from systems theory for the reduction of nonlinear dynamical systems may also prove useful here.This opens up a broad field of investigation at the interface between the two research communities.
The model and observation operators are reduced by Â A A := T T T −1 A A AT T T and Ĉ C C := C C CT T T .The projection also affects the state variables, hence x x x := T T T −1 x x x.This justifies the projected version of the prior covariance matrix Γ Γ Γ pr := T T T −1 Γ Γ Γ pr T T T −T ; the observations m m m k and the observation error covariance Γ Γ Γ ǫ remain unchanged.We propose to compute the projection operator T T T via time-limited balanced truncation.
The observation errors d d d k of the full 4D-Var system (4) are low and so should their approximations d d d k .If |d d d k − d d d k | are large, our approximation is very inaccurate and has to be penalised.In contrast, if we are uncertain about the observations, the errors d d d k have larger variance.If the approximations d d d k are relatively inaccurate but within the standard deviation, the reduced system (16) approximates (15) reasonably well.

Fig. 1
Fig.1Comparison of a given non-compatible prior (NC) and a modified compatible prior (C) in balanced truncation (BT)[46] and the optimal low-rank approach (OLR)[52] of LG Bayesian inference for the heat equation.The first letters denote the prior used for the approx.posterior computation and the second the one used for the reference.Measurements are spaced h = 0.005 apart inside T = [0,10]

Fig. 2
Fig.2Comparison of time-limited balanced truncation (TLBT) with standard balanced truncation (BT)[46] and the optimal low-rank approach (OLR)[52] of LG Bayesian inference for the heat equation model.Measurements are spaced h = 0.005 apart inside T = [0, te] for three different end times te = 1, 3, 10.In the left panel, the normalised square roots δ i and δ T L

1 d x x x gives the mean concentration and d out = 1 .Fig. 3
Fig.3Quantities plotted as in Figure2; here for the ISS1R model.Measurements are equispaced with h = 0.1 inside T = [0, te] for three different end times te = 1, 3, 10

Table 2
. Parameters for the LTI examples