Observability Gramian and Its Role in the Placement of Observations in Dynamic Data Assimilation

Lakshmivarahan, S.; Lewis, John M.; Reddy Maryada, Sai Kiran

doi:10.1007/978-3-030-77722-7_9

S. Lakshmivarahan³,
John M. Lewis⁴ &
Sai Kiran Reddy Maryada³

891 Accesses
2 Citations

Abstract

A method of data assimilation that is complementary to traditional 4D-Var (4D-Var) has been developed. 4D-Var has appealed to scientists because of the efficiency with which it determines the cost function gradient with respect to control and available observations. Then through use of any of the gradient-based optimization algorithms, the minimum is iteratively found. The alternate methodology does not depend on available observations; rather, the methodology determines placement of observations that avoid flatness of the cost functional about the operating point in control space. Avoidance of flat patches by bounding the norm of the gradient away from zero fundamentally depends on the dynamics of forecast sensitivities to control that are found through differentiation of the governing constraint equations and coupled solution to these equations and the basic constraint equations. These sensitivities are used to define a linear transformation which turns out to be the observability Gramian (symmetric positive semi-definite matrix) G that maps control error (initially unknown) to the cost-function gradient (as a function of space and time and an arbitrary starting operating point). With observations taken at optimal locations defined by (a) the maxima of the diagonal elements of G or (b) that of the trace of G, gradient-based optimization schemes are used to locate cost-function minimum. The methodology is tested on an air-sea interaction model where results indicate that judicious placement of observations avoiding flatness in control space give good results whereas placement that leads to small absolute-valued gradients produce poor results. The theory also gives guidance on the minimum number of observations necessary to achieve success in locating the cost-function minimum.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bapat RB (2012) Linear algebra and linear models, 3rd edn. Hindustan Book Agency, New Delhi
Google Scholar
Casti JL (1977) Dynamical systems and their applications: linear theory. Academic Press, New York
Google Scholar
Casti JL (1985) Nonlinear system theory. Academic Press, New York
Google Scholar
Crisan D, Rozovskii B (2011) The Oxford handbook of nonlinear filtering. Oxford University Press, London
Google Scholar
Isidori A (1985) Nonlinear control systems. Springer, New York
Google Scholar
Kalman RE (1960a) On the general theory of control systems. In: Proceedings of the first IFAC congress, Moscow, pp 481-492
Google Scholar
Kalman RE (1960b) A new approach to linear filtering and prediction problem. Trans Am Soc Mech Eng, J Basic Eng Ser D 83:35–45
Google Scholar
Kalman RE (1963) Mathematical description of linear dynamical system. SIAM J Control 1:152–192
Google Scholar
Kalman RE, Falb P, Arbib M (1969) Topics in mathematical system theory. McGraw Hill, New York
Google Scholar
Kang W, Xu L (2012) Optimal placement of mobile sensors for data assimilation. Tellus A 64:17133
Article Google Scholar
Kang W, Xu L (2014) Partial observability for some distributed parameter systems. Int J Dyn Control 2(4):587–596
Article Google Scholar
King S, Kang W, Xu L (2015) Observability for optimal sensor locations in data assimilation. Int J Dyn Control 3:416-424
Google Scholar
Krener AJ (2008a) Observability of vortex flows. In: Proceedings of the forty seventh IEEE conference on decision and control, Cancun, Mexico
Google Scholar
Krener AJ (2008b) Eulerian and Lagrangian observability of point vortex flows. Tellus A 60:1089–1102
Article Google Scholar
Krener AJ, Kayo I (2009) A quantitative measures of obserability. In: Proceedings of the IEEE on decision and control, Shanghai, China, pp 6413–6418
Google Scholar
Kushner HJ (1964a) On the dynamical equations of conditional probability density functions with applications to optimal stochastic control. J Math Anal Appl 8:332–344
Article Google Scholar
Kushner HJ (1964b) On the differential equations satisfied by conditional probability densities of Markov process. SIAM J Control 2:106–119
Google Scholar
Kushner HJ (1967) Dynamical equations for optimal nonlinear filtering. J Diff Equations 3:179–190
Article Google Scholar
Lakshmivarahan S, Honda Y, Lewis JM (2003) Second-order approximation to 3-D VAR cost function: applications to analysis/forecast. Tellus 55A:371–384
Article Google Scholar
Lakshmivarahan S, Lewis JM (2010) Forward sensitivity method for dynamic data assimilation. Advances in meteorology, vol 2010, Article ID 375615, 12 pp
Google Scholar
Lakshmivarahan S (2016) Convergence of a class of weak solutions to the strong solution of linear constrained quadratic minimization problem: a direct proof using matrix identities. In: ED, (ed) Data assimilation for atmospheric, oceanic and hydrologic applications, vol III. Seon Ki Park and Liang Xu, pp 115–119
Google Scholar
Lakshmivarahan S, Lewis JM, Jabrzemski R (2017) Forecast error correction using dynamic data assimilation. Springer, New York
Book Google Scholar
Lakshmivarahan S, Lewis JM, Hu J (2020a) On controlling the shape of the cost functional in dynamic data assimilation: guidelines for placement of observations and application to Saltzman’s model of convection. J Atmos Sci 77:2969–2989
Google Scholar
LeDimet FX, Talagrand O (1986) Variational algorithms for analysis and assimilation of meteorological observations. Tellus 38A:97–110
Article Google Scholar
Lewis JM, Derber J (1985) The use of adjoint equations to solve a variational adjustment problem with advective constraints. Tellus 37A:309–322
Article Google Scholar
Lewis JM, Lakshmivarahan S, Dhall SK (2006) Dynamic data assimilation: a least squares approach, vol 104. Encyclopedia of mathematics and its applications. Bridge University Press, London
Google Scholar
Lewis JM, Lakshmivarahan S (2008) Sasaki’s PivotalContributions: calculus of variations applied to weather map analysis. Mon Weather Rev 136:3553–3567
Article Google Scholar
Lewis JM, Lakshmivarahan S, Hu J, Rabin R (2020a) Placement of observations to correct return flow forecasts. E-J Severe Storms Meteorol 15(4):1–20
Google Scholar
Lewis JM, Lakshmivarahan S, Maryada S (2020b) Placement of observations for variational data assimilation: application to Burgers’ Equation and Seiche Phenomenon, This volume
Google Scholar
Meyer CD (2000) Matrix analysis and applied linear algebra. SIAM Publications, Philadelphia
Book Google Scholar
Nijmeijer H, van der Schaft A (1990) Nonlinear dynamical control systems. Springer, New York
Book Google Scholar
Vidyasagar M (2002) Nonlinear system analysis, 2nd edn. SIAM Publication, Philadelphia
Book Google Scholar
Yoshimura R, Yakeno A, Misaka T, Obayashi S (2020) Application of observability Gramian to targeted observation in WRF data assimilation. Tellus A 72(1):1–11
Article Google Scholar

Download references

Acknowledgements

We are grateful to an anonymous reviewer for bringing the papers by Krener (2008a, b); Krener and Ide (2009); Kang and Xu (2012, 2014), King et al. (2015) and Yoshimura et al. (2020) related to a classification - strong and weak observability, and their applications to our attention.

Author information

Authors and Affiliations

School of Computer Science, University of Oklahoma Norman, Norman, OK, 73019, USA
S. Lakshmivarahan & Sai Kiran Reddy Maryada
National Severe Storms Laboratory, Norman, OK and Desert Research Institute, Reno, NV, USA
John M. Lewis

Authors

S. Lakshmivarahan
View author publications
You can also search for this author in PubMed Google Scholar
John M. Lewis
View author publications
You can also search for this author in PubMed Google Scholar
Sai Kiran Reddy Maryada
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Lakshmivarahan .

Editor information

Editors and Affiliations

Climate and Energy Systems Engineering, Ewha Womans University, Seoul, Korea (Republic of)
Seon Ki Park
Marine Meteorology Division, Naval Research Laboratory, Monterey, CA, USA
Liang Xu

Appendices

Appendix A Role of observability in Estimation

A.1 Historical Background

Kalman is a series of papers (Kalman 1960a, 1963) and Kalman et al. (1969) laid the foundations of the state space approach to modern Control/Systems Theory by introducing several basic concepts - controllability/reachability, observability/constructability, realizability and stability, all related to the analysis, design and (optimal) control of engineering systems. This Appendix provides a short summary - a bird’s eye view, of the role of observability in state/parameter estimation problem that is critical to both Control Theory and Dynamic Data Assimilation. For a more elaborate treatment of these concepts and their applications refer to the two volumes by J. L. Casti - Casti (1977) for linear analysis and Casti (1985) for the nonlinear counterpart.

It is useful to broadly divide the problems in Control Theory into two classes: open-loop and closed-loop/feedback control. Many of the household appliances - washer/dryer, microwave owen, light bulb, bread toaster, to name a few, implement the open loop control strategy where the control action is limited to a simple on or off switch to execute a preprogrammed task.

Feedback control, on the other hand, involves comparing the current state of a system with a prespecified reference value. If the error = (reference - current state) is positive, the controller generates an extra input/forcing that forces the current state towards the reference. If the error is negative, then the controller lets the system relax to the reference, without any extra forcing. Examples of feedback controlled devices are too numerous-the fly ball governor in Steam Engines, the pressure cooker in the kitchen, cruise control in automobiles, thermostat control of temperature in a building, sophisticated avionics in aircraft flight control, etc.

From the above discussion, it should now be obvious that a fundamental requirement in the design of feedback control relates to the ability to measure the current state of a system being controlled. However, except in special cases, the current state may not be directly observed but can measure only certain (scalar or vector valued) functions - called output in Engineering and observations in geosciences, of the state in question. Roughly speaking, observability relates to the ability to estimate/reconstruct a past state from the future observations or outputs.

Despite its origin in Control/Systems theory, observability plays an important role in the estimation of the initial conditions and parameters of a dynamical model that arise within the context of the 4-dimensional variational (4-D VAR) approach to dynamic data assimilation, which is our primary interest.

A.2 Observability: Linear, Deterministic, Time Invariant Model

We follow the notations laid out in the main body of the paper. Consider a linear, deterministic, time invariant, discrete time model given by

$$\begin{aligned} x(k+1) = Mx(k) \end{aligned}$$

(66)

with x(0) is the unknown initial condition, where $M \in R^{n \times n}$ is the one step state transition matrix, assumed to be non-singular. Solving (66), it is obvious that

$$\begin{aligned} x(k) = M^kx(0). \end{aligned}$$

(67)

Let $H \in R^{m \times n}$ and

$$\begin{aligned} z(k) = Hx(k) = HM^kx(0). \end{aligned}$$

(68)

be the noiseless observations of the state x(k).

It is assumed that we have a set $S = \{z(1), z(2), \ldots , z(N)\}$ of N outputs and our goal is to estimate x(0) using S. To this end, we stack the N output vectors in a column to create a new vector $z(1:N) \in R^{Nm}$ given by

$$\begin{aligned} z(1:k) = \begin{bmatrix} z(1) \\ z(2) \\ \vdots \\ z(N) \end{bmatrix} = \begin{bmatrix} HM \\ HM^2 \\ \vdots \\ H(M^N) \end{bmatrix}x(0) = Lx(0) \end{aligned}$$

(69)

where $L \in R^{Nm \times n}$. By Cayley-Hamilton theorem, since $M^n$ can be expressed as linear combinations of $M^k$ for $0 \le k < n$, we only need to consider $N < n$.

A necessary and sufficient condition for the existence and uniqueness of the solution x(0) satisfying (69) is that the matrix L must be a full rank matrix, that is, rank(L) = n. In this case, we say that the matrix pair (M, H) is observable and we can recover x(0) exactly by solving

$$\begin{aligned} G(N)x(0) = L^Tz(1:k) \end{aligned}$$

(70)

where $ G(N) = (L^TL) \in R^{n \times n}$ is called the observability Gramian and is given by

$$\begin{aligned} G(N) = \sum _{k=1}^{N} (M^T)^k(H^T H) M^k. \end{aligned}$$

(71)

Indeed, this Gramian G(N) is symmetric and positive definite when L is of full rank, and the solution, x(0) is given by

$$\begin{aligned} x(0) = (L^TL)^{-1} L^T Z(1:k) \end{aligned}$$

(72)

We leave it to the reader to verify the following claims by computing $H = \begin{bmatrix} HM \\ HM^2 \end{bmatrix}$ Claim 1: Let $M_1 = \begin{bmatrix} 1 &{} 1 \\ 0 &{} 2 \end{bmatrix}$, $H_1 = \begin{bmatrix} 1&0\end{bmatrix}$. Then $H_1M_1 = (1, 1)$ and $H_1M_1^2 = (1, 3)$ and $\begin{bmatrix} H_1M_1 \\ H_1(M_1)^2 \end{bmatrix}$ = $\begin{bmatrix} 1 &{} 1 \\ 1 &{} 3 \end{bmatrix}$ which is of rank = 2 and hence $(M_1, H_1)$ is observable.Refer to Example 4.1 in Sect. 4 for more details.

Claim 2: Let $M_2 = \begin{bmatrix} 1 &{} 0 \\ 1 &{} 2 \end{bmatrix}$ and $H_2 = (1, 0)$. Then $H_2M_2$ = (1, 0) and $H_2M_2^2$ = (1, 0). Then $\begin{bmatrix} H_2M_2 \\ H_2(M_2)^2 \end{bmatrix}$ = $\begin{bmatrix} 1 &{} 0 \\ 1 &{} 0 \end{bmatrix}$ which is of rank = 1. Hence, $(M_2, H_2)$ is not observable. We leave it to the reader to verify that with $ H_3 =(0, 1) $, the pair $ (M_2,H_3) $ is observable.

A.3 Generalizations

For completeness, we now enlist several extensions of the above result as Remarks with citations to the appropriate literature.

Remark 3

Linear time invariant model with noisy observation: Consider the model in (66) but the observations are subjected to additional Gaussian noise where

$$\begin{aligned} z(k) = Hz(k) + \xi _k \end{aligned}$$

(73)

where $\xi _k \sim N(0, R_k)$ and $\xi _k$ is temporally uncorrelated. In this case. the least squares solution is obtained by minimizing the weighted sum of the squared errors, given by

$$\begin{aligned} J(x(0)) = \sum _{k = 1}^{N}[z(k) - HM^{k}x(0)]^TR_k^{-1}[z(k) - HM^{k}x(0)]. \end{aligned}$$

(74)

It can be verified (Chap. 5, LLD (2006)) that the minimizer $\hat{x}(0)$ is obtained as the solution of the linear system.

$$\begin{aligned} G(N)x(0) = \sum _{k=1}^{N} M^k H^T R_k^{-1} Z(k) \end{aligned}$$

(75)

when the observability Gramian G(N) is given by

$$\begin{aligned} G(N) = \sum _{k=1}^{N} (M^T)^k)[ H^T R_k^{-1} H]M^k \end{aligned}$$

(76)

is positive definite where $(M^k)^T = (M^T)^k$.

Remark 4

Linear, deterministic, time varying system with noiseless observations: This case is treated in full in Chap. 4, Casti (1977) where several examples are also given. Extension to noisy observations can be easily obtained by following along the strategy described in Remark 3.

Remark 5

Non-linear deterministic systems–Local observability: While the observability analysis of linear model is intrinsically global (no constraint on x(0)), that of nonlinear system can be viewed from a local or a global point of view. Both of these cases are treated in Chap. 5 of Casti (1985). Local analysis relies on the observability of the first-order variational equation and applying the conditions in Chap. 4 of Casti (1977) referred to in Remark 4. To wit, let

$$\begin{aligned} x(k+1) = M(x(k)) \end{aligned}$$

(77)

be the nonlinear model with x(0) as the initial condition and

$$\begin{aligned} z(k) = h(x(k)) \end{aligned}$$

(78)

be the observation.

One approach is to linearize (77)—(78) about a base trajectory starting from an arbitrarily chosen initial state y(0). Let $\delta x(0) = x(0) - y(0)$ be the perturbation super imposed on y(0). Then the dynamics of $\delta x(k)= x(k) - y(k)$ is given by variational equation which is linear, time varying dynamics:

$$\begin{aligned} \delta x(k+1) = D_M(k) \delta x(k) \end{aligned}$$

(79)

with $\delta x(0)$ as its initial condition and the induced variation in z(k) is given by

$$\begin{aligned} \delta z(k) = D_h(k) \delta x(k) \end{aligned}$$

(80)

where $D_M(k)$ and $D_h(k)$ are the Jacobians of M(y(k)) and h(y(k)). It can be verified that $\delta x(0)$ can be estimated by minimizing a sum of squared error criterion similar to (74). The resulting $\delta x(0)$ is obtained by solving

$$\begin{aligned} G(N) \delta x(0) = \sum _{k=1}^{N}D_h^T(k) D_M(k) \delta z(k) \end{aligned}$$

(81)

where

$$\begin{aligned} G(N) = \sum _{k=1}^{N}D_M^T(k) [D_h^T(k) D_h(k)] D_M(k) \end{aligned}$$

(82)

is the required Gramian. Indeed, we can recover $\delta x(0)$ provided G(N) in (82) is positive definite.

Remark 6

Nonlinear deterministic system: Global observability: Analysis of global observability of a non linear model is considerably more involved and requires concepts and tools from differential geometry. An exquisite expose’ of this topic is contained in Casti (1985) and in Chap. 7, Vidyasagar (2020). For more detailed treatment, refer to Isidori (1985) and Nijmeijer and van der Schaft (1990).

Remark 7

Linear and non-linear filtering: Kalman in another epoch making paper (Kalman (1960b)) developed a sequential method of estimating the state of a stochastic, linear, dynamical model when the observations are linear but noisy, called the Kalman filtering. Extensions to non-linear stochastic models with noisy nonlinear observations are known since the early 1960s. Refer to Kushner (1964a, b, 1967) and Chap. 5, Casti (1985) for a quick summary of results in nonlinear filtering. The handbook on"Nonlinear Filtering" by Crisan and Rozovskii contains a comprehensive treatment of this and related topics.

Appendix B Results from matrix theory

For completeness and ease of reference, we collect a set of results from Matrix Theory that are basic to the developments in this paper. For detailed proofs refer to Meyer (2000); Bapat (2012).

B.1 Solution of linear systems: Let$A \in R^{m \times n}$ be a linear map from $R^n$ to $R^m$. The range of A, denoted by Range(A), is the subspace of $R^m$ generated by the linear combination of the columns of A. Thus, $Range(A) \subseteq R^m$ and

$$\begin{aligned} Range(A) = \{y \in R^m | y = Ax \, , x \in R^n \}. \end{aligned}$$

(83)

The null space of A, denoted by Null(A) is the set of vectors in $R^n$ annihilated by A. That is, $Null(A) \subseteq R^n$ and

$$\begin{aligned} Null(A) = \{x \in R^n | Ax = 0 \}. \end{aligned}$$

(84)

The rank of A, denoted by Rank(A), is the number of linearly independent columns or equivalent by the number of linearly independent rows of A. Clearly,

$$\begin{aligned} Rank(A) \le m\wedge n = min\,\,(m, n) \end{aligned}$$

(85)

If equality holds in (85), then A is said to be of full rank, otherwise, it is rank deficient. In the following, we catalog the conditions for the existence and uniqueness of the solution of the linear system.

$$\begin{aligned} Ax = b \end{aligned}$$

(86)

where $x \in R^n$ and $b \in R^m$. The system (86), given A and b, is said to be consistent if there exists a vector $x \in R^n$ that satisfies (86), otherwise, it is inconsistent. For example, the homogeneous system.

$$\begin{aligned} Ax = 0 \end{aligned}$$

(87)

is always consistent, since $x = 0$ satisfies it. But the non-homogeneous system in (86), depending on the properties - relative location of b in $R^m$, may or may not be consistent.

If the system is consistent (b $\in $ Range (A)), then we can talk about the solution in the traditional sense where the residual, $r(x) = b-Ax$ = 0. On the other hand, if (86) is inconsistent (b $\notin $ Range (A)), then we have to contend with the so called least squares solution—that minimizes the square of the length of the non-zero residual vector r(x).

The functional form and uniqueness of the solution of (86) critically depends on two factors: (a) relative values of m and n and (b) the rank of A. For brevity, we only consider the case when A is of full rank.

Case B.1.1: Let $m = n$ and $Rank(A)=n$. Then, A is non-singular and the solution of (86) is given by

$$\begin{aligned} x = A^{-1}b. \end{aligned}$$

(88)

Case B.1.2: Let $m>n$ and $Rank(A)=n$. In this case unique solution of (86) is given by

$$\begin{aligned} x = A^{+}b \end{aligned}$$

(89)

where

$$\begin{aligned} A^{+} = (A^{T}A)^{-1} A^T \in R^{n \times m} \end{aligned}$$

(90)

called the generalized or Moore-Penrose inverse of A that satisfies the following conditions:

$$\begin{aligned} \begin{aligned} AA^{+}A = A, (A^{+}A)^T = A^{+}A \\ A^{+}AA^{+} = A^{+}, (AA^{+})^T = AA^{+}. \end{aligned} \end{aligned}$$

(91)

The matrices $A^TA \in R^{n \times n}$ and $AA^T \in R^{m \times m}$ are called Gramians of A. When $Rank(A)=n$, $(A^TA)$ is a symmetric and positive definite matrix. It can be verified that $A^{+}A = I_n$ and $AA^{+}= A(A^TA)^{-1}A^T$ is the orthogonal projection matrix onto the range of A.

Case B.1.3: Let $m < n$ and $Rank(A) = m$. In this case, there are infinitely many solutions of (86) and the one with minimum norm is given by

$$\begin{aligned} x = A^{+}b \end{aligned}$$

(92)

where

$$\begin{aligned} A^{+} = A^T(AA^{T})^{-1} \in R^{n \times m} \end{aligned}$$

(93)

is the generalized inverse of A that satisfy (91). It can be verified $AA^{+} = I_m$ and $A^{+}A = A^T(AA^T)^{-1}A$ is the orthogonal projection onto the range of $A^T$.

B.2: Rank of the partitioned matrix G: We start by stating a general result relating to partitioned symmetric matrices. Let

$$\begin{aligned} S = \begin{bmatrix} A &{} B \\ B^T &{} C \end{bmatrix} \end{aligned}$$

(94)

be a symmetric matrix with both $A \in R^{q \times q}$ and $C \in R^{r \times r} $ symmetric and $B \in R^{p \times q}$. Let A be non-singular. If $I_K$ denotes an identity matrix of order K, then

$$\begin{aligned} P = \begin{bmatrix} I_p &{} 0 \\ -B^TA^{-1} &{} I_q \end{bmatrix} \end{aligned}$$

(95)

is non-singular, since Det(P) = 1. By direct multiplication, it can be verified that

$$\begin{aligned} P S P^T = \begin{bmatrix} A &{} 0 \\ 0 &{} C-B^{T}A^{-1}B. \end{bmatrix} \end{aligned}$$

(96)

where $C-B^TA^{-1}B$ is called the Schur Complement of A in S. The following claim is easily proved. (Chap. 3, Bapat (2012))

Claim

1.
If S is SPD, then so is $C-B^TA^{-1}B$.
2.
Let S be symmetric. If S is positive definite then so are A and $C-B^TA^{-1}B$
3.
$Det (S) = Det (A) Det (C-B^TA^{-1}B)$. Now consider the symmetric matrix $G \in R^{(n+p) \times (n+p)}$ in its partitioned form given by
$$\begin{aligned} G = \begin{bmatrix} U^T\bar{H}U &{} U^T\bar{H}V \\ V^T\bar{H}U &{} V^T\bar{H}V. \end{bmatrix}. \end{aligned}$$
(97)

Recall that $\bar{H} = D_h^T R^{-1} D_h$ is symmetric where $D_h \in R^{m \times n}$ and $R^{-1} \in R^{m \times m}$ is assumed to be non-singular. Under the assumptions $m \ge n$ and $D_h$ is of full rank, it follows that $\bar{H}$ is non-singular. If in addition, U is non-singular then, $A= U^T\bar{H}U$ is symmetric, positive definite and hence non-singular. Then identifying $B = U^T\bar{H}V$ and $C = V^T \bar{H}V$, it can be verified that the Schur Complement of $A = U^T\bar{H}U$ in G reduces to a zero matrix of size $r \times r$. That is, the matrix on the right hand side of (B.14), becomes

$$\begin{aligned} PGP^T = \begin{bmatrix} U^T\bar{H}U &{} 0 \\ 0 &{} 0 \end{bmatrix} \end{aligned}$$

(98)

which is a matrix of rank n. That is, $G(k) \in R^{(n+p) \times (n+p)}$ of the forward sensitivity matrices U(k), V(k) and the Jacobian $D_h(k)$ at time k, is a rank deficient matrix of rank n.

B.3. Rank of the sum $\sum _{k=1}^{t}G(k)$, for some integer t>0:

Let $G \in R^{(n+p) \times (n +p)}$ be a symmetric matrix of rank n. Then there exists are orthogonal matrix $Q \in R^{(n+p) \times (n +p)}$ such that

$$\begin{aligned} Q^TGQ = \begin{bmatrix} D &{} 0 \\ 0 &{} 0 \end{bmatrix} \end{aligned}$$

(99)

where $QQ^T = Q^TQ = I_{n+p}$, $D = diag(\alpha _1, \alpha _2, \ldots , \alpha _n)$ with

$$\begin{aligned} \alpha _1 \ge \alpha _2 \ge \ldots \ge \alpha _n > 0 \end{aligned}$$

(100)

Now we can build a matrix $\bar{G}$ as

$$\begin{aligned} \bar{G} = Q \begin{bmatrix} 0 &{} 0 \\ 0 &{} I_p \end{bmatrix}Q^T. \end{aligned}$$

(101)

Then, it can be verified that

$$ Q^T(G + \bar{G})Q = \begin{bmatrix} D &{} 0 \\ 0 &{} I_p \end{bmatrix} $$

Hence,

$$ (G + \bar{G}) = Q \begin{bmatrix} D &{} 0 \\ 0 &{} I_p \end{bmatrix} Q^T $$

is a full rank matrix of rank($n+p$). Stated in other words, by adding a suitably designed matrix $\bar{G}$ of rank p to the matrix G, we can create a matrix of full rank.

Recall that Gramian G by definition is SPSD and in general $Null(G) \ne \emptyset $ and $DIM(Range(G)) + DIM(Null(G)) = n$.

B.4. Verification of (4.17):

If G is SPD, then there exists an eigen decomposition of G given by

$$\begin{aligned} G = QDQ^T \end{aligned}$$

(102)

where Q is an orthogonal matrix of eigenvectors and $D = diag(d_1, d_2, d_3, \ldots , d_n)$ is a diagonal matrix of the corresponding eigenvalues of G where $Q^TQ = QQ^T = I$ and

$$\begin{aligned} d_1 \ge d_2 \ldots \ge d_n > 0. \end{aligned}$$

(103)

It then follows from (102) that for $k \ge 1$

$$\begin{aligned} G^k = QD^k Q^T. \end{aligned}$$

(104)

Now define $\eta = Q^T \hat{f}$. Then, since Q is orthogonal, we get

$$\begin{aligned} \sum _{i=1}^{n}{\eta ^2_i} = ||\eta ||^2 = ||Q^T\hat{f}||^2 = ||\hat{f}||^2 = 1. \end{aligned}$$

(105)

Thus, we can interpret $\{ \eta ^2_i \}$ as the probability distribution of a random variable d where

$$\begin{aligned} Prob[d = d_i] = \eta _i^2 \end{aligned}$$

(106)

Consequently,

$$\begin{aligned}<\hat{f}, G^k \hat{f}> =< \hat{f}, QD^kQ^T \hat{f}> = <\eta , D^k \eta > = \sum \eta _i^2 d_i^k = \mu _{k} \end{aligned}$$

(107)

the $ k^{th}$ (non-central) moment of the random variable d. Since

$$Var(d) = \mu _2 - \mu _1^2 \ge 0,$$

we get $\mu _2^{1/2} \ge \mu _1$. Consequently, from

$$\frac{\mu _1}{\mu _2^{1/2}} = \frac{<\hat{f}, G \hat{f}>}{<\hat{f}, G^2 \hat{f}>}^{1/2} \le 1,$$

claim (17) follows.

B.5. Spectral Radius of $(I-\beta G)$ in (19)

From (20), using (104) and (107), we get

$$\begin{aligned} \beta = \frac{f^TG^2f}{f^TG^3f} = \frac{\hat{f}^T G^2 \hat{f}}{\hat{f}^T G^3 \hat{f}} = \frac{\sum \eta _i^2 d_i^2}{\sum \eta ^2_i d_i^3}. \end{aligned}$$

(108)

Hence, using (108), from

$$[I - \beta G] = [I - \beta QDQ^T] = Q[I - \beta D]Q^T,$$

it can be verfied that the eigenvalues of $[I - \beta G]$ are

$$\begin{aligned} 1 - \beta d_i = 1 - d_i \frac{\sum \eta _i^2 d_i^2}{\sum \eta _i^2 d_i^3} \end{aligned}$$

(109)

The spectral radius, inview of (103), is

$$\begin{aligned} \rho (I - \beta G) = max \{ 1- d_i \frac{\sum \eta ^2_i d_i^2}{\sum \eta _i^2 d_i^3}\} \le 1-d_n \frac{\sum \eta _i^2 d_i^2}{\sum \eta _i^2 d_i^3} \le 1 \end{aligned}$$

(110)

since

$$d_n \sum \eta _i^2 d_i^2 \le \sum \eta _i^2 d_i^3.$$

Appendix C Conditions for the Matrix L in (9) to be of full rank

Let $A \in R^{n \times n}$ be a non-singular, diagonalizable matrix. Let $Q = [q_1, q_2, \ldots , q_n] \in R^{n \times n}$ and $D = Dia(d_1, d_2, \ldots , d_n) \in R^{n \times n}$ be the matrices of eigenvectors and the corresponding eigenvalues of A. Then, by definition

$$\begin{aligned} AQ = QD \end{aligned}$$

(111)

and the columns of Q are linearly independent and constitute a basis for $R^n$.

Let $b \in R^n$ and define the Krylov sequence.

$$\begin{aligned} K_p(A, b) = \{b, Ab, A^2b, \ldots , A^{p-1}b\} \end{aligned}$$

(112)

for $1 \le p \le n$. The space generated by the columns in $K_p(A, b)$ is called Krylov subspace and is denoted by $Span \{ K_p(A, b) \} $. Let

$$\begin{aligned}{}[K_p(A, b)] = [b, Ab, A^2b, \ldots , A^{p-1}b] \in R^{n \times p} \end{aligned}$$

(113)

be corresponding Krylov matrix. Let

$$\begin{aligned} S_k = \{q_{i_1}, q_{i_2}, \ldots , q_{i_k} \} \end{aligned}$$

(114)

be a k-subset of eigenvectors of A, for $1 \le k \le n$. Then, $ DIM (Span(S_k))=k$ and the $Span \{S_k \} $ is an invariant subspace of A. That is, if $b \in Span \{ S_k \}$ then so is Ab. It can be verified that if $y \in S_k$, that is, y an eigenvector of A then, from

$$\begin{aligned} Span\{K_p(A,y)\} = Span\{y\} \end{aligned}$$

(115)

it follows that the dimension of the Krylov subspace is one. Stated in words, if b is an eigenvector of A, since the vector Ab is a constant multiple of b, the the dimension of the Krylov subspace in (115) is one.

Let the energy in a vector $b \in R^n$, be measured by the square of its norm:

$$\begin{aligned} ||b||^2 = \sum _{j=1}^{n} b_j^2 . \end{aligned}$$

(116)

Let $\bar{b} \in R^n$ be the new coordinates representation of b in the new basis defined by the eigenvectors of A. That is

$$\begin{aligned} b = Q\bar{b} \end{aligned}$$

(117)

If, for some $1 \le j \le n$, $\bar{b}_j = 0$, then we say that b has no energy along the $j^{th}$ eigen direction $q_j$ of A. That is, b belongs to the invariant subspace of dimension $(n-1)$ defined by the rest of all eigenvectors, $q_i \ne q_j$. Stated in other words, if $\bar{b}$ has no zero (row) element, then the energy in b is distributed across all the eigen directions of A. This discussion leads to the following:

Property C.1: Expanding Krylov Subspace:

If $b = Q \bar{b}$ is such that $\bar{b}$ has no zero (row) element, then for $1 \le p \le n$

$$\begin{aligned} DIM\{Span\{K_p(A,b)\}\}= p. \end{aligned}$$

(118)

By way of generalizing the above property, now consider

$$\begin{aligned} B = [b_1, b_2, \ldots , b_m] \in R^{n \times m} \end{aligned}$$

(119)

a full rank matrix for some $1 \le m \le n$. Then, we can extend the Krylov subspace using B in place of b as

$$\begin{aligned} K_p(A, B) = \{B, AB, A^2B, \ldots , A^{p-1}B\}. \end{aligned}$$

(120)

Clearly

$$\begin{aligned} Span\{K_p(A, B)\} = \cup _{j=1}^{m} Span{K_p(A,b_j)}. \end{aligned}$$

(121)

Let $\bar{B} \in R^{n \times m}$ be such that

$$\begin{aligned} B = Q\bar{B} \end{aligned}$$

(122)

and let $\bar{B}$ has no rows of zeros. Then, it can be easily verified that the total energy in B as measured by the Frobenius norm

$$||B||^2_F = \sum _{j = 1}^{m}||b_j||^2$$

is distributed across all eigen directions of A. This leads to the following:

Property C.2: Expanding Krylov subspace: If $B \in R^{n \times n}$ is such that $B = Q\bar{B}$ and $\bar{B}$ has no zero rows, then

$$\begin{aligned} DIM\{Span\{K_p(A,B)\}\} = mp \end{aligned}$$

(123)

for . That is, for , the dimension of the Krylov subspace $K_p(A,b)$ is n.

Obserability of (M, H)- pair: Now consider the observability matrix $L \in R^{Nm \times n}$ given in (4.9). Then $L^T$ is related to a Krylov matrix given by

$$\begin{aligned} L^T = M^TK_N(M^T, E^T) \end{aligned}$$

(124)

where

$$\begin{aligned} K_N(M^T, E^T) = [E^T, M^TE^T, (M^T)^2E^T, \ldots , (M^T)^{N-1}E^T]. \end{aligned}$$

(125)

Then Property C.2 immediately suggests an answer to the question: when is L in (4.9) a full rank matrix?.

Corollary 1

Let $E^T$ be such that its total energy is distributed across all of the eigen directions of $M^T$. Then, setting $p = N$, $A = M^T$, and $B= E^T$ in (123), it follows from (123) that

$$\begin{aligned} Rank(K_N(M^T, E^T)) = Nm \end{aligned}$$

(126)

for some N in the range

Corollary 2

Rank of product matrices (Meyer (2000), Chap. 4) If $B \in R^{m \times n}$ and $C \in R^{n \times p}$ then

$$\begin{aligned} Rank(B) + Rank(C) - n \le Rank(BC) \le Rank(B) \wedge Rank(C). \end{aligned}$$

(127)

Using the fact that $Rank(M^T) = Rank(M)$, it immediately follows that the Rank of $K_N(M^T, E^T)$ for is n. Hence, by the Property C.4, the observability matrix $L^T$ and its transpose, L are of full rank. Consequently, the observability Gramian $G = L^TL$ in (10) is symmetric and positive definite (SPD). Stated in other words, the condition for G to be SPD rests entirely on the choice of the E (with respect to M) in the sense that the distribution of the total energy in all of the columns of $E^T$ must be spread across all the eigen directions of $M^T$. Clearly the choice of E depends on the forward operator H and the noise covariance, R as defined in (8)-(9). From (9) and (10), recall that

$$\begin{aligned} G = L^TL = \sum _{k=1}^{N}G(k) \end{aligned}$$

(128)

where

$$\begin{aligned} G(k) = (EM^k)^T(EM^K). \end{aligned}$$

(129)

Now, column partition $EM^K$ as

$$\begin{aligned} EM^K = [\upeta _1(k), \upeta _2(k), \ldots , \upeta _n(k)] \end{aligned}$$

(130)

where $\upeta _i(k) \in R^m$. Then, it can be verified that the $(i, j)^{th}$ element of the outer product matrix G(k) is given by

$$\begin{aligned}{}[G(k)]_{ij} = <\upeta _i(k), \upeta _j(k)>. \end{aligned}$$

(131)

Consequently, the diagonal elements of G are given by

$$\begin{aligned}{}[G]_{ii} = \sum _{k=1}^{N}||\upeta _i(k)||^2. \end{aligned}$$

(132)

Special Case: Setting $H = I_n$, and $R = I_n$, we get $E = I$ and $G(k) = (M^T)^kM^k = U^T(k)U(k)$. In this case, $\upeta _i(k) = U_i(k)$, the ith column of the forward sensitivity matrix U(k). Consequently,

$$\begin{aligned}{}[G]_{ii} = \sum _{k=1}^{N}||U_i(k)||^2. \end{aligned}$$

(133)

Now recall that $U_i(k) = \frac{\partial x(k)}{\partial x_i(0)} \in R^n$ is the vector of sensitivity of x(k) with respect to the ith component of the initial condition. From (133) it immediately follows that, by placing the observations where the sum in (133) is a maximum with respect to k, we can indeed control the condition number of G.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lakshmivarahan, S., Lewis, J.M., Reddy Maryada, S.K. (2022). Observability Gramian and Its Role in the Placement of Observations in Dynamic Data Assimilation. In: Park, S.K., Xu, L. (eds) Data Assimilation for Atmospheric, Oceanic and Hydrologic Applications (Vol. IV). Springer, Cham. https://doi.org/10.1007/978-3-030-77722-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-77722-7_9
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77721-0
Online ISBN: 978-3-030-77722-7
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)

Publish with us

Policies and ethics

Observability Gramian and Its Role in the Placement of Observations in Dynamic Data Assimilation

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendix A Role of observability in Estimation

A.1 Historical Background

A.2 Observability: Linear, Deterministic, Time Invariant Model

A.3 Generalizations

Remark 3

Remark 4

Remark 5

Remark 6

Remark 7

Appendix B Results from matrix theory

Appendix C Conditions for the Matrix L in (9) to be of full rank

Corollary 1

Corollary 2

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation