A statistical analysis of the Jacobian in retrievals of satellite data

Remote sensing has become an essential component of the geosciences (the study of Earth and its system components). Remote sensing measurements are almost always energies measured in selected parts of the electro-magnetic spectrum. That is, the geophysical variable of interest is only observed indirectly; a forward model relates the energies to the variable(s) of interest and other elements of the state. The first derivative of that forward model with respect to the state is known as the Jacobian. In this chapter, we review the importance of the Jacobian to inferring the state, and we use it to diagnose which state elements may be difficult to estimate. We develop the Statistical Significance Filter and flag those state elements that consistently fail to get through the filter.


Introduction
Remote sensing of the environment is a fundamentally important part of humans' quest to understand the Earth system and how the different components interact (e.g., climate, water, carbon). In the future, this knowledge may be critical to our survival. Satellite and aircraft campaigns allow a "bird's-eye view" of large parts of Earth, but not all campaigns are alike. For example, polar-orbiting satellites allow global coverage, passive instruments rely on the sun's reflected light and do not take measurements when there are clouds or when it is night, and programs such as NASA's ASCENDS will measure day or night, anywhere on the orbit track.
In this chapter, a passive instrument on a polar-orbiting satellite, namely Japan's Greenhouse Gases Observing Satellite (GOSAT), will be used as a leading example. However, the idea behind what I shall present is general and could apply to many remote sensing inversion problems involving a non-linear forward model. In such problems, the goal is to infer a hidden state from energies detected by an instrument sensitive to certain known bands of the electro-magnetic spectrum.
Section 6.2 of this chapter gives a statistical framework behind the problem of uncertainty quantification of retrieved states. Section 6.3 calls out the Jacobian matrix as an important component of the retrieval algorithm and defines a unit-free Jacobian for subsequent statistical analysis. That analysis is described in Sect. 6.4, where a Statistical Significance Filter is defined. In Sect. 6.5, this methodology is applied to a number of retrievals taken over Australia, where certain state elements are flagged as being potentially difficult to estimate. The last section, Sect. 6.6, finishes with a discussion of the results obtained.

A Statistical Framework for Satellite Retrievals
The biases, variances, and mean squared prediction errors of retrievals need to be calculated in the general setting of a nonlinear forward model. The book by Rodgers (2000) has a section on error analysis, but it approaches the problem mostly from a numerical-sensitivity viewpoint. The strongly statistical viewpoint given here calculates the first two moments of a retrieval and the distribution of elements of the associated Jacobian matrix (defined below as K). In the case where relationships are non-linear, the well known "delta method" (based on Taylor-series expansions; e.g., Meyer 1975, Chap. 10) gives approximate (to leading orders) biases and mean squared prediction errors of the estimators (Cressie and Wang 2013).
The n -dimensional radiances Y are related to the n -dimensional state X through a non-linear forward model, Y = F(X) + , (6.1) where the state vector X includes volume mixing ratios of CO 2 at prespecified geopotential heights, the error vector ∼ Gau(0, S ), and X and are statistically independent. Further, there is an a priori assumption that where ∼ Gau(0, S ). Notice that if there is consistent bias present in the retrieval, this can be accounted for by adding it to X , leaving the assumption, ∼ Gau(0, S ), intact. Define the matrices, where x is any atmospheric state. (Recall that the true state is denoted as X.) The n × n matrix K(⋅) is called the Jacobian. Partial derivatives of K(⋅) represent the degree of non-linearity in the forward model. In the case of a linear forward model, K is constant, and any partial derivatives of it are zero.
An estimate of X, sometimes called a retrieval, is often obtained by choosing an X that allows F(X) to be "close to" Y, subject to smoothness conditions onX. This regularisation is usually defined as follows: Minimise with respect to X, which results in the retrievalX. The n × n matrix G(⋅) represents a type of "gain" matrix in the relationship between retrievalX and data Y; that is, In the linear case, G is constant and the "remainder" term is zero.
The n × n matrix A(⋅) yields the averaging kernel matrix in the relation between retrieval and true state; that is, In the linear case, A is constant, the "remainder" term is G , and recall that is independent of X.
In this section, I discuss the bias vector and the mean-squared-prediction-error (MSPE) matrix of the retrieval,X. The bias vector is defined as: where recall that X is the prior mean of the state vector X.
The MSPE matrix is defined as: where var(X − X) is the covariance matrix of the retrieval error,X − X. The MSPE matrix can be a more appropriate statistical measure of uncertainty than the covariance matrix of retrieval error when there is bias present. When the bias is zero, the two measures of uncertainty are the same. When the forward model is linear, it is easily seen (e.g., Rodgers 2000) that the bias vector, That is, in the linear case,X is unbiased. Further, in the linear case, the MSPE matrix can be derived exactly and written in a number of equivalent ways. From Connor et al. (2008), Cressie and Wang (2013), where the MSPE matrix is given bŷ When the forward model is nonlinear, the bias ofX is nonzero, and the equalities in (6.9) are no longer true. However, from the "delta method," Cressie et al. (2016) show that (6.7) and (6.9) hold, to leading order. In what follows, a leading-order analysis is carried out. This amounts to assuming the forward model to be locally linear, which is a weaker assumption than assuming global linearity, namely Y = c + KX + , across the whole state space defined by all possible values of X.
The locally linear forward model is derived using a Taylor-series expansion: where models the lack of fit of the local linear model (about the linearisation point x = X 0 ) to F(X). The linearisation point X 0 is often chosen to be the prior mean X , but I want to emphasise here that it need not be.

The Jacobian Matrix and its Unit-Free Version
The Jacobian matrix is the first derivative of the n -dimensional forward function vector, F(x), with respect to the n -dimensional state x. From the definition given in (6.3), it is an n × n matrix. Write the matrix as (K ij ), and note that the units of K ij are radiance (energy) per unit of state-space element j. Define the vectors, where diag(⋅) is a matrix operator that extracts a vector made up of the matrix's diagonal elements. Then the unit-free Jacobian is defined as follows: (6.10) During the retrieval, the most difficult and time-consuming part is to minimise (6.6); for example, using a Levenberg-Marquardt algorithm requires evaluation of the Jacobian matrix at each iteration of the minimisation. LetK ij be a generic Jacobian element used during the retrieval. Then define the corresponding unit-free version as, and denotê≡ (̂i j ) as the n × n unit-free Jacobian matrix. For satellite retrievals, the data vector Y can often be partitioned as and band 1 , … , band K are mutually exclusive index sets that represent a grouping of radiances according to which bands of the electro-magnetic spectrum they belong. For example, Japan's GOSAT and NASA's Orbiting Carbon Observatory-2 (OCO-2) instruments have K = 3 bands, corresponding to the oxygen A band (OA), the weak carbon dioxide band (WC), and the strong carbon dioxide band (SC); our analysis in Sect. 6.5 uses data from GOSAT's three bands. Another example is from NASA's Atmospheric Infrared Sounder (AIRS) instrument flying on the Aqua satellite, which has K = 4 bands, corresponding to four geophysical variables, namely temperature, water vapour, ozone, and carbon dioxide.
In what follows, we abbreviate "band k " to "b k ." Because the unit-free Jacobian has elements that are potentially comparable, we can partition it and analyse it in comparable ways. Recall that the index j corresponds to a given element of the state vector, for example, a water-vapour scale factor or a near-surface carbon-dioxide volume mixing ratio. Then fix the state element j, and consider the behaviour of the jth column as row i varies within individual bands. That is, for a fixed j, consider to be a random sample from a distribution indexed by k, for bands k = 1, … , K. Consequently, instead of thinking about n ⋅ n entries in the Jacobian, attention turns to n ⋅ K distributions. For example, for the retrievals from GOSAT data that are being considered here, n = 2240, n = 112, and K = 3. Hence, the pair (j, k) indexes one of 336 possible distributions, whose mean, jk , is of primary interest. For j a fixed element of the state vector, if j1 = j2 = ⋯ = jK = 0, then that element is poorly determined by the data alone; see Sect. 6.4. This is a flag that says the (prior) mean and precision of the jth state element need to be specified very carefully in the second term of (6.6) in order to obtain an acceptably precise retrievalX j .

Statistical Significance Filter
To leading order, the forward model (6.1) can be written as, which is a multiple-regression model with known, typically different, intercepts given by the elements of c; known covariates K 1 , … , K n (the n columns of K); and unknown regression coefficients X 1 , … , X n . Clearly, if K j is zero, then X j will not be estimable. Further, if for a given j, {|K ij | ∶ i = 1, … , n } are uniformly "small," then the uncertainty associated with the estimate of X j will be large.
In the previous section, we noted that for remote sensing retrievals, the n elements in Y can be partitioned into K bands, Y 1 , … , Y K . Then write (6.14) equivalently as K equations. In obvious notation that respects the partitioning, where {K jk ∶ j = 1, … , n } are the n vectors corresponding to the kth band. Clearly, if K jk = 0, then its unit-free version, jk , is also 0. Hence, the problem of whether X j is poorly determined in the forward model (6.1) can be addressed in a statistical manner by considering the retrieval's unit-free Jacobian entries {̂i j ∶ i = 1, … , n } as K arrays of random variables, {̂i j ∶ i ∈ b k }, for k = 1, … , K. If, for a fixed j, the means j1 , … , jk of these K arrays are all zero, then X j will be difficult to estimate.

Hypothesis Tests
Consider (6.13) and make the following assumption: For a given retrieval, a given state element j, and a given band k, where "iid" denotes "independent and identically distributed," and "Dist( )" denotes a probability distribution with mean . For this retrieval, the idea is to flag those state elements and bands for which the null hypothesis, H 0,jk ∶ jk = 0, is not rejected. In particular, failure to reject the composite hypothesis, (6.16) implies that the jth state element will be difficult to estimate in the given retrieval.
Since the elements of {̂i j ∶ i ∈ b k } are considered to be a sample from a distribution with mean jk , I shall construct a test statistic from these unit-free Jacobian values. A considerable amount of exploratory data analysis showed the common distributional assumption within the partitioned arrays to be largely correct, with occasional gross outliers that would challenge many statistical testing procedures. Those were controlled by transforming eacĥi j to |̂i j | 1∕2 , and the robust test statistic,

Distribution Theory for the Robust Test Statistic
Consider generic iid random variables W 1 , … , W m distributed according to a Gaussian distribution with mean W and variance 2 W , which is written as Gau( W , 2 W ). I now obtain distribution theory forX under the null hypothesis in order to carry out a significance test. If Y ∼ Gau(0, 1), then E(|Y| 1∕2 ) = 0.82216 and var(|Y| 1∕2 ) = 0.12192, which was derived by Cressie and Hawkins (1980). Then under

To test
denotes "is approximately distributed as," and the approximation is established by Cressie and Hawkins (1980). Now the distribution of the medianX from a random sample X 1 , … , X m of Gaussian random variables can be approximated as Gaussian with mean E(X) = E(X 1 ), and variance var(X) = var(X 1 )∕2m. If all these results are combined, then under the null hypothesis H 0 in (6.18),X Clearly, the alternative hypothesis H 1 in (6.18) is accepted if the test statisticX is large. At significance level , H 1 is accepted if where Φ −1 (⋅) is the inverse cumulative distribution function of a Gau(0, 1) random variable. In practice, an estimate of W will be needed. Continuing with the same approach as above, an asymptotically unbiased, robust estimator of W is used. Now, W = var(|W i | 1∕2 )∕0.12192, and hence var(|W i | 1∕2 ) can be estimated using the median absolute deviation (MAD): is obtained and substituted into (6.20). My approach to constructing this robust statistic to test whether a mean is zero, using data that may contain large, unpredictable outliers, is somewhat unusual, but it is statistically advantageous. First, the data {W 1 , … , W m } are made resistant by transforming to the square-root scale where variability is dampened. Then the transformed data {|W 1 | 1∕2 , … , |W m | 1∕2 } are used to define a robust test statistic, given here by the median; see (6.19). Finally, the null distribution is derived, resulting in a critical region given by (6.20) with the robust estimator (6.21) substituted in. In the next subsection, the distribution theory derived in this subsection is used in the context of multiple hypothesis testing, resulting in the Statistical Significance Filter.

Multiple Hypothesis Tests Define the Statistical Significance Filter
The elements of the unit-free Jacobian are considered as replicates within bands, which results in n (number of state elements) times K (number of bands) hypothesis tests of {H 0jk ∶ jk = 0, for j = 1, … , n and k = 1, … , K}. To test H 0j given by (6.16), jointly for j = 1, … , n , I use a family-wise error rate of 1% and conservative Bonferroni adjustments to obtain a level of significance, = .01∕(n ⋅ K), that is used in each individual hypothesis test of the null hypotheses, {H 0jk }.
The Statistical Significance Filter only allows estimates {̃j k } to get through the filter if {H 0jk } are rejected, respectively. A given state element, j say, is flagged as problematic in a given retrieval if, simultaneously, the hypotheses H 0j1 , … , H 0jK are not rejected. If it consistently happens that under similar (or different) geophysical conditions, the jth element's bands fail to get through the Statistical Significance Filter, that element X j is flagged as being weakly sensitive to the radiance measurements Y. Hence, estimation of X j would be difficult if a very disperse prior distribution in (6.2) were chosen for it.
In the next section, I apply the Statistical Significance Filter to 30 retrievals from Japan's GOSAT instrument that measures atmospheric carbon dioxide, here over central Australia.

ACOS Retrievals of the Atmospheric State from Japan's GOSAT Satellite
Shown in Fig. 6.1 are 30 locations of retrievals from Japan's GOSAT satellite, where the ACOS (Atmospheric CO2 Observations from Space) retrieval algorithm was used. Specifically, ACOS Version B2.8 was used here, for which n = 112 state elements were retrieved from n = 2240 radiances spread roughly equally between the K = 3 bands, namely the OA band, the WC band, and the SC band; see Sect. 6.3. The soundings are over an arid part of Australia with uniformly high albedo, during the period from 5 June 2009-26 July 2009 (Source: CIRA, Colorado State University). The methodology and inference is illustrated on the retrieval at one of those locations, hereafter referred to as Location 1. Results from the other 29 retrievals are summarised at the end of this section. A number of the state elements in B2.8 are functions of geopotential height, here labelled as 1 (top of atmosphere) down to 20 (surface of Earth). Figure 6.2 shows unit-free ice-cloud Jacobian values in a column of the atmosphere for Location 1; only those values that got through the Statistical Significance Filter are shown. It can be seen that for the ice-cloud variable, Jacobian values in the OA band are not statistically significant at higher altitudes in the atmospheric column, and hence they are potentially difficult to estimate. Figure 6.3 shows that the Statistical Significance Filter applied to water vapour (H 2 O) in the column results in a similar set of plots. Contrast these to Fig. 6.4, which is for the all-important carbon-dioxide (CO 2 ) variable; only values in the SC band get through the Statistical Significance Filter.
The analysis of the retrieval for Location 1 yields non-significant Jacobian entries (i.e., forward-model derivatives near zero) in all three bands for the following state elements:  Fig. 6.5; there, a light (green) stripe in a given band for a given state element indicates that the corresponding mean is not significantly different from zero. A light stripe in every band for the given state element indicates that extra care will be needed when specifying a prior for that element. Each of the 11 elements listed above have a light stripe in every band.
The analysis was carried out on all 30 retrievals, and eight elements of the 112dimensional state vector emerged as always having non-significant Jacobian values in all three bands for all 30 retrievals. They were:  The results indicate a lack of sensitivity of these eight elements in the forward equation F given in (6.1), for the dry, bright, flat-terrain conditions found over central Australia. Different land surfaces and atmospheric states would almost certainly result in different elements being identified.

Discussion
The Jacobian matrix K is the first derivative of a vector-valued function F(x) of a state vector x. Consistently small elements in the jth column of K indicate that the jth element will be difficult to estimate (predict) based on data, Y, alone.
If prior information, as well as the data, is used to predict the state vector, this research indicates that acceptable precision for estimating this jth element may require the prior variance to be tightly constrained. For example, the element that is the H 2 O scale factor is tightly constrained physically in the prior. Thus, a retrieval of that element may cause no problem, even though its column in K fails to get through the Statistical Significance Filter. Regarding the 20 CO 2 elements that make up the CO 2 profile in the atmospheric column, the retrievals analysed here show the importance of the strong CO 2 band (SC) to its estimation. The best result would be if all 20 ⋅ 3 = 60 hypothesis tests were rejected; at Location 1, only 17, all in the SC band, were rejected ( Fig. 6.4).
Current versions of ACOS-like retrievals have between 40-50 state elements. The research presented here, on the statistical properties of the Jacobian, would allow a comparison of different versions through the behaviour of their unit-free Jacobian values. Common to all of these versions is 20 CO 2 elements, and the respective estimates of the means in each of the three bands (OA, WC, SC) can be compared across versions. Cressie N, Wang R (2013)  Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.