1 Introduction

Geophysics is the investigation of the Earth based on the principles of physics, whereby the physical properties of the Earth medium (i.e., the unknown model) are inverted from either surface- or space-based observations. Geophysical inverse problems are often underdetermined (e.g., Menke 2015; Tarantola and Valette 1982; Wiggins 1972) owing to the various limitations of these observations. However, advances in generalized or regularized inverse theory over the last century now make it possible to solve underdetermined inverse problems (e.g., Hansen 1992; Lawson and Hanson 1995; Levenberg 1944; Moore 1920; Morozov 1984; Penrose 1955; Tikhonov 1963). Although the quality and reliability of such an under-constrained solution is essential in a data-poor environment, the verification of a geophysical solution is difficult or even impossible, as the investigated medium is generally inaccessible. As such, geophysicists have to make great efforts in the methodology of solution appraisal. However, understanding what is reliable in some geophysical solutions remains perhaps the most exciting challenge to date (Foulger et al. 2015).

An inverted solution (x) that represents investigated medium (x) can be described as:

$$\underline {\varvec{x}} = r({\varvec{x}}),$$
(1)

where the operator r, which can also be denoted by r:x → x, represents the relationship between x and x. This problem can be linearized as:

$$\underline {\varvec{x}} = {{\varvec{Rx}}},$$
(2)

where R, which can also be denoted by R:x → x, is the resolution matrix (Backus and Gilbert 1968, 1970), a linear projection approximation of r:x → x (Eq. (1)). R has been widely used in solution appraisals in geophysics (e.g., Aki et al. 1977; An 2012; Aster et al. 2005; Menke 2015; Tarantola and Valette 1982; Wiggins 1972; Yao et al. 1999) and other research areas (e.g., Lütkenhöner and Grade de Peralta Menendez 1997; Katamreddy and Yalavarthy 2012).

However, the properties of R remain poorly understood in practical problems, and considerable uncertainties and/or inconsistencies regarding R still exist. For example, early studies mostly analyzed the diagonal entries to evaluate the resolvability of the target medium (e.g., Aki et al. 1977; Day-Lewis et al. 2005; Wiggins 1972), even though the solution is actually related to all of the entries in R (Eq. (2)). R has been mostly applied to estimate the resolution length (or resolution width). This length should be retrieved from a given row in R (e.g., An 2012; Barmin et al. 2001; Crosson 1976), although the result appears to be similar to that from the corresponding column (e.g., Alumbaugh and Newman 2000; Miller and Routh 2007; Pilkington 2016). Why and when are they similar? An inverse problem is often solved using reference model, whereby the inverted solution is not a model of the medium but rather a perturbation (Δx) of the reference model (xi). R can be provided in the inversions (e.g., Jackson 1972; Ren and Kalscheuer 2020), but it represents the Δx → Δx projection, not x → x. What is the relationship between the matrix and r:x → x? Can the matrix be used to estimate the resolution length of the solution x (= Δx + xi)? All the above questions will be addressed in this paper.

R is particularly useful in model appraisal, but it comes with various limitations. For example, some of the general factors (e.g., observational and data processing errors) cannot be reflected by the matrix. The resolution estimated from the matrix with limitations may be unrealistically high (Pilkington 2016). A matrix R:x → x with all of the factors in a complete process can overcome these limitations. However, no such matrix exists to date.

In total, R is a unique quantitative indicator of the reliability of a given solution, and it is also important in understanding the relationships between the solution and the observations, regularization, and other factors. However, various uncertainties and limitations regarding R remain. This paper reviews previous resolution matrices and clarifies both the significance and properties of the matrices, which often appear in practical inversions, to explain how to appropriately employ such a matrix in a given study. Furthermore, this paper clarifies the resolution matrices in nonlinear inversions and suggests a new resolution matrix, which can include all of the factors in a linear/nonlinear problem. This study can therefore assist in the appropriate selection and implementation of a resolution matrix and provide the reader with a better understanding of both the quality and reliability of the solution and the relationships between the solution and all of the factors in the study system, which are important for further improving the system.

2 Resolution Matrices from Observations and Regularization Matrices

Resolution matrices that are derived from observations and regularization matrices have been widely applied in various research studies. This section provides a review of resolution matrices and their significance. Furthermore, the matrix properties, which are important in matrix applications, are first clarified.

2.1 Variables Used

x, x: Real medium (or true model) vector, solution (or inverted model) vector.

xi, xi: ith real-medium parameter, ith solution parameter.

ri,j: Entry at the ith row and jth column of matrix R.

ri,* (or r*,j): ith row (or jth column) vector of matrix R.

Σri,* or ΣiR: Sum of all of the entries in the ith row of matrix R.

2.2 Solutions of the Inverse Problem

The goal of a geophysical investigation is to directly retrieve the solution of the medium (xD) from observations (d = d + δd) that are contaminated with errors (δd) via the inverse equation:

$$\underline {{\varvec{x}}}_{\text{D}} = g^{ - {\text{g}}} \underline {{\varvec{d}}} ,$$
(3)

which is based on the physical relationship between the medium parameters (m × 1 vector x; [x1, x2,…, xm]T) and observational data (n × 1 vector d; [d1, d2,…, dn]T), with the latter defined as:

$${{\varvec{d}}} = g({{\varvec{x}}}),$$
(4)

the operator g in Eq. (4) is often not invertible, such that a generalized inverse of g, g–g, is used, as in Eq. (3).

Equation (3) can be expressed in linear form as:

$$\underline {{\varvec{x}}}_{\text{D}} = {{\varvec{G}}}^{ - {\text{g}}} \underline {{\varvec{d}}} ,$$
(5)

and rewritten as:

$${{\varvec{d}}} = {{\varvec{Gx}}},$$
(6)

the n × m matrix G, which is normally termed the observation matrix, is composed of the sensitivities of d with respect to x. The generalized inverse of G, G–g (e.g., Lawson and Hanson 1995; Moore 1920; Penrose 1955; Tan 2017; Tarantola and Valette 1982), can be either a right inverse:

$${{\varvec{G}}^{ - {\text{g}}} {\text{ = }}{\varvec{G}}^{\text{T}} \left( {{\varvec{GG}}^{\text{T}} } \right)^{ - 1}}, \quad {n < m},$$
(7)

or a left inverse:

$$\begin{array}{*{20}c} {{{\varvec{G}}}^{ - {\text{g}}} { = }\left( {{{\varvec{G}}}^{\text{T}} {{\varvec{G}}}} \right)^{ - 1} {{\varvec{G}}}^{\text{T}} ,} & {n \ge m} \\ \end{array} .$$
(8)

G–g can also be obtained via singular value decomposition (SVD) (Golub and Kahan 1965; Golub and Reinsch 1970; Varah 1973) or some form of matrix factorization (e.g., Gentle 2007; Golub 1965). The calculated G–g from those methods is tightly related or exactly equivalent.

Figure 1a shows an example (Example 1) of a one-dimensional (1-D) underdetermined inverse problem that mimics the relationship between distance, slowness (x), and travel time (d). The problem is described by 10 linear equations (Eq. (S1) in the supporting information), with 100 (m = 100) unknown parameters in x and 10 (n = 10) observations (d). The coefficient of the ith parameter (xi) in the jth equation for dj is the product of the segment length xi that is traveled by the jth observation (Fig. 1a) and the sensitivity of dj with respect to xi. All of the coefficients are stored in G (Fig. 1b). Several coefficients in Eq. (S1) (or the entries in G) for the second and ninth observations are greater than one, which means that higher sensitivities are set for the parameters because the segment lengths of all the parameters are equal to one (Fig. 1a). Figure 1c shows a synthetic x and its pseudo-inverse solution xD. The synthetic error-free observation d (δd = {0}) and prediction of xD (dD) are shown in Fig. 1d.

Fig. 1
figure 1

Inverse problem Example 1. This is a simple 1-D ray-propagation (linear) problem that is defined by either d = Gx or Eq. (S1). Arrow lines in (a) illustrate the ray paths. The observation vector d contains the travel times for all ten rays (dj; j = 1, …, 10). The vector x includes the slowness (reciprocal of speed) in each unit segment to be resolved (xi; i = 1, …, 100). The matrix G in (b) contains the combinations of the distance segments travelled by the ten rays and their synthetic sensitivities. Several parameters are set to a high sensitivity in the second and ninth observations, which simulates a study with parameters that possess different sensitivities. c Synthetic model x (circles) and solutions xD, x(I), and x(L1) (lines) obtained via generalized inversions, with zeroth- (I) and first-order (L1) Tikhonov regularizations included in the latter two inversions. d The synthetic observation data d (circles) from x, and the predictions dD, d(I), and d(L1) (lines) from the solutions xD, x(I) and x(L1), respectively

Both the data coverage (observation distribution) and sensitivities, which are stored in G, influence the solution in the synthetic example (Fig. 1). For example, the solution parameters from x1 to x10 are constrained by only one observation (the first travel path; the top left arrow in Fig. 1a), such that the nonzero entries for all ten parameters in the first row of G (Fig. 1b) and the coefficients in the first row of Eq. (S1) are the same and equal to one. Similarly, the parameters from x11 to x20 are also constrained by one observation (the second path), but the nonzero entries for the ten parameters in the second row of G (Fig. 1b) and the coefficients in the second row of Eq. (S1) are different. The resultant x1x10 values in xD (Fig. 1c) are the same and equal to the average of the synthetic model parameters x1x10 (circles in Fig. 1c), whereas the resultant x11x20 values are quite different, both from each other and their respective synthetic values (x11x20). The differences in x11x20 are caused by different sensitivities because they all have the same path segments. Parameters x21x40 and x41x60 are constrained by two paths, but their path-overlapping patterns (Fig. 1a and b) differ. The differences in x21x60 in xD (Fig. 1c) are related to the path coverage. Parameters x61x90 are constrained by several observations. x91x100 are not constrained by any observations, such that they are all equal to zero in xD.

However, most geophysical inverse problems are ill-posed (i.e., there are not sufficient observations to obtain a unique and stable solution), and the generalized solution xD, such as that in Example 1 (Fig. 1c), is not physically rational. Furthermore, large discrepancies may exist between the real model x and solution xD, especially for parameters with poor or no observation coverage (Fig. 1a–c), even though there is a good fit between the observation data and model predictions (Fig. 1d). Additional artificial constraints (often called regularization) (e.g., Aster et al. 2005; Benning and Burger 2018; Engl et al. 2000; Levenberg 1944; Menke 1989; Tikhonov 1963) on the model parameters must therefore be included during the inversion to obtain a physically rational solution. The regularization-based forward equation then becomes:

$${{\varvec{b}}} = {{\varvec{Ax}}},$$
(9)

where the nb × m matrix A and nb × 1 vector b are:

$$\begin{array}{*{20}c} {{{\varvec{A}}} = \left[ {\begin{array}{*{20}r} \hfill {{\varvec{G}}} \\ \hfill {{\varvec{C}}} \\ \end{array} } \right]} & {{\text{and}}} & {{{\varvec{b}}} = \left[ {\begin{array}{*{20}r} \hfill {{\varvec{d}}} \\ \hfill {{\varvec{c}}} \\ \end{array} } \right]} \\ \end{array} ,$$
(10)

respectively, which contain an nc × m (nc = nbn) matrix C and an nc × 1 vector c, respectively, both of which are related to the regularization.

Tikhonov regularization (Levenberg 1944; Tikhonov 1963) is used largely in geophysical inversions (e.g., Aster et al. 2005; Constable et al. 1987; Menke 1989). C is denoted by λLn in nth-order Tikhonov regularizations, and c (Eq. (10)) is a zero vector ({0}). The factor λ is the regularization parameter, which balances the contributions of either the observations and regularization in the inversion, or the fit to the observational data d and regularization vector c. Tests that employ ad hoc methods (e.g., Craven and Wahba 1978; Hansen 1992; Morozov 1984) are often used to determine λ. The matrix L0 for zeroth-order Tikhonov regularization (damping regularization), which minimizes the model (Levenberg 1944), is the identity matrix I. Ln (n > 0; e.g., L1 in Eq. (S2) in the supporting information) is a zero-row-sum band matrix. L1 regularization (flatness regularization) flattens the model via minimizing the first-order gradient of the model. Ln Tikhonov regularization exerts uniform a priori constraints to all of the solution parameters. Therefore, regularization (C = λLn) with an optimal weighting factor λ produces the optimal average regularizing effect for all of the parameters. However, if C contains a diagonal matrix W (= diag(w1, w2,…)) (i.e., C = λWLn), then the regularization is heterogeneous and spatially variant for the model (e.g., An 2020; Katamreddy and Yalavarthy 2012; Pogue et al. 1999; Sanny et al. 2018).

Regularization at least makes A a full-column rank matrix, such that A−1 (or the left inverse A−g), which is an m × nb matrix, can be uniquely obtained. The solution (x) of an inversion using regularization is:

$$\underline {{\varvec{x}}} = {{\varvec{A}}}^{ - {\text{g}}} \underline {{\varvec{b}}} .$$
(11)

where b and d are replaced by b and d, respectively, in Eq. (10). x can also be obtained via least-squares or minimum-norm inversions using the objective function:

$$\min \left\| {\left[ {\begin{array}{*{20}c} {{\varvec{G}}} \\ {{\varvec{C}}} \\ \end{array} } \right]{{\varvec{x}}} - \left[ {\begin{array}{*{20}c} {{\varvec{d}}} \\ {{\varvec{c}}} \\ \end{array} } \right]} \right\|^2 .$$
(12)

When Tikhonov regularization (C = λL, c = {0}) is used, a truncated form of A−g (m × n matrix A−t) can be obtained via (e.g., Aster et al. 2005; Barmin et al. 2001; Crosson 1976):

$${{\varvec{A}}}^{ - {\text{t}}} = ({{\varvec{G}}}^{\text{T}} {{\varvec{G}}} + \lambda^2 {{\varvec{L}}}^{\text{T}} {{\varvec{L}}})^{ - 1} {{\varvec{G}}}^{\text{T}} .$$
(13)

x is then given by:

$$\underline {{\varvec{x}}} = {{\varvec{A}}}^{ - {\text{t}}} \underline {{\varvec{d}}} ,$$
(14)

which is the same as that in Eq. (11). x in Example 1 (Fig. 1a), which uses first-order Tikhonov regularization, is shown in Fig. 1c. An optimal factor λ of one is selected by Morozov’s discrepancy principle (Morozov 1984) for the inversion.

The resultant x via Eq. (11), which is obtained using regularization, is generally more rational than xD. For example, x (denoted as x(L1)) in Example 1 (Fig. 1c) is closer to the synthetic model than xD, such that x is preferred over xD (Eq. (5)) in practical inversions (i.e., the solution provided in a practical ill-posed inversion is x rather than xD).

If the errors (δd) in the measured observation data d (= d + δd) are known, then the solution x in Eq. (11) becomes:

$$\underline {{\varvec{x}}} = {{\varvec{A}}}^{ - g} {{\varvec{b}}} + \delta {{\varvec{x}}}_{\text{d}} ,$$
(15)

where:

$$\delta {{\varvec{x}}}_{\text{d}} = {{\varvec{A}}}^{ - g} \left[ {\begin{array}{*{20}r} \hfill {\delta {{\varvec{d}}}} \\ \hfill 0 \\ \end{array} } \right] = {{\varvec{A}}}^{ - {\text{t}}} \delta {{\varvec{d}}}.$$
(16)

It is noted that δxd is only the contribution of the observational errors to the solution and is therefore not the solution error. The solution error or residual, δx (= x − x), is related to both δd and the other factors in the x → x process.

2.3 Resolution Matrices from the Observations and Regularization Matrices

The process to obtain the solution (x) of a true medium (x) is a projection (r:x → x) from x to x, which can be described using Eq. (1) (Fig. 2a). If the projection is linear, then x can be written as a linear regression equation (Fig. 2a):

$$\underline {{\varvec{x}}} = {{\varvec{Rx}}} + \delta {{\varvec{x}}}_{{\text{of}}} ,$$
(17)

where R is the slope of the regression, δxof is a constant offset. If δxof is ignored, then the linear regression becomes the linear projection of Eq. (2) (Fig. 2a), where R is the so-called resolution (Backus and Gilbert 1968, 1970) or projection matrix. For a nonlinear problem, either R or R:x → x can be considered a linear approximation of r:x → x (Fig. 2a).

Fig. 2
figure 2

Illustration of resolution matrices a in a general study and b in a study with offset error. The model x (= [x1]) in the illustration constains one parameter (x1). A nonlinear relation r:x → x between solution (x) and true model (x) is approximated by linear projection (x = Rx, Eq. (2)) and linear regression (x = Rx + δxof, Eq. (17)). R is the slope of the linear relations

However, this matrix has been obtained via matrix operations, with several different resolution matrices potentially being obtained (An 2012). If the observational errors (δd) in data d in Eq. (10) are ignored, then d = d. Replacing d in Eq. (5) with d in Eq. (6) causes the projection from x to xD (x → xD) to become:

$$\underline {{\varvec{x}}}_{\text{D}} = {{\varvec{R}}}_{\text{D}} {{\varvec{x}}},$$
(18)

where the resolution matrix RD (Table 1) is of the following form (e.g., Jackson 1972; Menke 1989; Wiggins 1972):

$${{\varvec{R}}}_{\text{D}} = {{\varvec{G}}}^{ - {\text{g}}} {{\varvec{G}}}.$$
(19)
Table 1 Resolution matrices

The transformation from x to x (x → x) is obtained by inserting Eq. (9) into Eq. (11):

$$\underline {{\varvec{x}}} = {{\varvec{R}}}_{\text{I}} {{\varvec{x}}},$$
(20)

where resolution matrix RI (Table 1) is of the form (An 2012):

$${{\varvec{R}}}_{\text{I}} = {{\varvec{A}}}^{ - g} {{\varvec{A}}}.$$
(21)

When the vector c is a zero vector (e.g., in Tikhonov regularization) and d in Eq. (11) is replaced by that in Eq. (6), the transformation from x to x (x → x) is:

$$\underline {{\varvec{x}}} = {{\varvec{R}}}_{\text{H}} {{\varvec{x}}},$$
(22)

where the resolution matrix (RH) (Table 1) takes the form:

$$\begin{array}{*{20}c} {{{\varvec{R}}}_{\text{H}} = {{\varvec{A}}}^{ - g} {{\varvec{B}}},} & {{{\varvec{B}}} = \left[ {\begin{array}{*{20}c} {{\varvec{G}}} \\ 0 \\ \end{array} } \right]} \\ \end{array},$$
(23)

if the truncated form A–t is used, then Eq. (23) becomes (An 2012):

$${{\varvec{R}}}_{\text{H}} = {{\varvec{A}}}^{ - {\text{t}}} {{\varvec{G}}},$$
(24)

if A−t is not truncated from A−g and instead calculated from Eq. (13), which employs Tikhonov regularization of the form λL, then RH becomes (e.g., Aster et al. 2005; Barmin et al. 2001; Boschi 2003; Crosson 1976):

$${{\varvec{R}}}_{\text{H}} = ({{\varvec{G}}}^{\text{T}} {{\varvec{G}}} + \lambda^2 {{\varvec{L}}}^{\text{T}} {{\varvec{L}}})^{ - 1} {{\varvec{G}}}^{\text{T}} {{\varvec{G}}}.$$
(25)

The three resolution matrices for Example 1, RD and RH in the inversion using the regularization matrix I (RH(I)) and RH in the inversion using the regularization matrix L1 (RH(L1)), are shown in Fig. 3a–c.

Fig. 3
figure 3

The direct resolution matrix a RD, and hybrid resolution matrices b RH(I) and c RH(L1) for the inverse problem in Fig. 1a. A regularization parameter (λ) of one is used in the inversions. RH(I) and RH(L1) are for the inversions that employ I and L1 regularization matrices, respectively. d, e The 48th row and 48th column vectors of the matrices. The x48 parameter is constrained by the fifth observation, which overlaps with the sixth observation (Fig. 1a); the parameters from x40 to x60 are influenced by these two observations. The entries in row r48,* of RD (d) are zeros, with the exceptions of r48,41r48,55 (positive), and r48,56r48,60 (negative)

If G is a full-column rank matrix and no regularization is used, then xD equals x, and RD, RI, and RH are the same as the identity matrix. Otherwise, xD and x are different, and the three resolution matrices differ. Regularization needs to ensure that A is a full-column rank matrix, such that RI is still an identity matrix. The practical solution is x, as opposed to xD, due to regularization, and the resolution matrix for the x → x projection is RH, as opposed to RD. RD has been paid little attention in previous regularized inversions in this case. However, RD is still very important for understanding the reliability of the solution in the inversions, as explained in the “Resolvability and constrainability from the resolution matrix” section.

Even though RD and RH are often different, they are both commonly called the model resolution matrix, which may confuse readers. The notations suggested by An (2012) for the matrices are adopted here for clarity, where RD is the direct resolution matrix, RI is the regularized resolution matrix, and RH is the hybrid resolution matrix.

2.4 Properties of the Resolution Matrices

2.4.1 R D from only the Observation Matrix

Truncated SVD of G allows RD to be written in the form (e.g., Aster et al. 2005; Jackson 1972; Wiggins 1972):

$${{\varvec{R}}}_{\text{D}} = {{\varvec{V}}}_\rho {{\varvec{V}}}_\rho^{\text{T}} ,$$
(26)

where Vρ (= {vi,j}m×ρ, ρ = rank(G)) is an unitary matrix with right singular vectors of G. Equation (26) indicates that RD is a Gram matrix (or Gramian), which can be created by multiplying a matrix with its own transpose, as in Eq. (26).

The Gram matrix (e.g., Gentle 2007), RD (e.g., Fig. 3a), is symmetric (Table 2), with its rank (rank(RD)) equal to both its trace (trace(RD)) and the rank of Vρ (rank(Vρ) = ρ) (Eq. (26)). Here rank(RD) equals ρ because rank(Vρ) is equal to rank(G(ρ)). The diagonal elements in RD are nonnegative, but the off-diagonal elements in RD can be negative, with the exception that either VρT is a full-column rank matrix or Vρ is a full-row rank matrix. However, Vρ for an underdetermined problem is not a full-row rank matrix because ρ is smaller than m, such that the off-diagonal entries of RD of an underdetermined problem can be negative.

Table 2 Properties of typical resolution matrices

A negative ri,j entry appears in RD (Table 2) when some columns in two rows of G are like in a band matrix with the entries (0 and nonzero numbers a to d) below:

$$\left[ {\begin{array}{*{20}c} {...}&a&{...}&b&{...}&0&{...} \\ {...}&0&{...}&c&{...}&d&{...} \\ \end{array} } \right].$$
(27)

entries in G–g related with the nonzero entries a and d in G have different signs from those in G, causing the negative ri,j. This happens in G when xi and xj are constrained by different observations, which also constrain another parameter in common. For example, x48 and x58 are respectively constrained by the fifth and sixth observations (Fig. 1a), but the two observations also constrain the solution parameters x52x55 in common. Consequently, r48,58 in RD is negative (Fig. 3a).

2.4.2 R H with Uniform Regularization Using λ I

When uniform zeroth-order Tikhonov regularization (λL0 or λI) is used, SVD of G allows the general inverse A–t (Eq. (13)) to be written as below (e.g., Menke 2012):

$${{\varvec{A}}}^{ - {\text{t}}} = {{\varvec{V}}}{(}{{\varvec{SS}}}{\bf{ + }}\lambda^2 {{\varvec{I}}}{)}^{ - {1}} {{\varvec{SU}}}^{\text{T}} ,$$
(28)

where U is a unitary matrix with left singular vectors of G, S is a nonnegative diagonal matrix with singular values of G (si). The resolution matrix RH(λI) (Eq. (25)) can be written as:

$$\begin{aligned} {{\varvec{R}}}_{\text{H}} (\lambda {{\varvec{I}}}) &= {{\varvec{V}}}{(}{{\varvec{SS}}}{\bf{ + }}\lambda^2 {{\varvec{I}}}{)}^{ - {1}} {{\varvec{SU}}}^{\text{T}} {{\varvec{USV}}}^T , \\ &= {{\varvec{V}}}{(}{{\varvec{SS}}}{\bf{ + }}\lambda^2 {{\varvec{I}}}{)}^{ - {1}} {{\varvec{SSV}}}^T \\ &= {{\varvec{VFV}}}^{\text{T}} = {{\varvec{V}}}_\rho {{\varvec{F}}}_\rho {{\varvec{V}}}_\rho^{\text{T}} \\ \end{aligned}$$
(29)

where Fρ (= diag(f1, f2, …, fρ)) (Aster et al. 2005) is truncated F and with the positive constants (fi):

$$f_i = \frac{s_i^2 }{{s_i^2 + \lambda^2 }},$$
(30)

The positive diagonal matrix Fρ can be written as a product of Eρ (= diag(f11/2, f21/2, …, fρ1/2)) and EρT. Equation (29) can then be written as:

$${{\varvec{R}}}_{\text{H}} (\lambda {{\varvec{I}}}) = {{\varvec{V}}}_\rho {{\varvec{E}}}_\rho {{\varvec{E}}}_\rho^{\text{T}} {{\varvec{V}}}_\rho^{\text{T}} = ({{\varvec{V}}}_\rho {{\varvec{E}}}_\rho )({{\varvec{V}}}_\rho {{\varvec{E}}}_\rho )^{\text{T}} ,$$
(31)

where:

$${{\varvec{V}}}_\rho {{\varvec{E}}} = \{ v_{i,j} f_j^{1/2} \} .$$
(32)

Equation (31) indicates that RH(λI) is a Gram matrix (Table 2), like RD. However, when a spatial variant regularization of the form λWI is used, the matrix RH(λWI) cannot be written in the form of matrix multiplication with its own transpose, and is therefore not Gramian.

RH(λI) (e.g., Fig. 3b) is symmetric, like RD. RH(λI) has a rank that is equal to rank(VρEρ) (Eq. (31)). The diagonal entries of RH(λI) are nonnegative, but the other entries can be negative. Equation (30) indicates that fj1/2 is in the range (0,1). Equation (32) indicates that a given column (e.g., the jth column) of VρEρ equals the same column vector of Vρ multiplied by fj1/2. All of the entries in VρEρ will be closer to zero than Vρ, as fj1/2 is a positive value that is less than one. All of the entries in RH(λI) therefore have weaker intensities than those in RD, but they possess similar intensity patterns, as observed in comparisons of Fig. 3a and b, and Fig. 3d and e. Furthermore, trace(RH(λI)) is smaller than trace(RD), such that the resolvability of x decreases after regularization using λI, as explained in the “Resolvability and constrainability from the resolution matrix” section.

2.4.3 R H Using a Zero-Row-Sum Regularization Matrix

The matrices of derivative regularizations, e.g., the high-order Tikhonov regularization matrices λLn and λWLn (n > 0), are a zero-row-sum ns × m matrix (S0) (ns > 0), such that

$${{\varvec{S}}}^0 {\bf{1}} = {\bf{0,}}$$
(33)

where 1 (= {1}m×1) is a vector with all elements one and 0 is a zero vector. Regardless of the number (ns) of rows of S0, a relation below:

$$\begin{aligned} ({{\varvec{G}}}^{\text{T}} {{\varvec{G}}} + \lambda^2 ({{\varvec{S}}}^0 )^{\text{T}} {{\varvec{S}}}^0 {\bf{)1}} &= {{\varvec{G}}}^{\text{T}} {{\varvec{G}}}{\bf{1}} + \lambda^2 ({{\varvec{S}}}^0 )^{\text{T}} ({{\varvec{S}}}^0 {\bf{1}}{)} \\ &= {{\varvec{G}}}^{\text{T}} {{\varvec{G}}}{\bf{1}} + \lambda^2 ({{\varvec{S}}}^0 )^{\text{T}} {\bf{0}} \\ &= {{\varvec{G}}}^{\text{T}} {{\varvec{G}}}{\bf{1}} \\ \end{aligned},$$
(34)

exists. If the regularization matrix (C, Eq. (10)) is a zero-row-sum matrix (S0), the Eq. (34) allows a relation on the resolution matrix RH (Eqs. (24) and (25)):

$$\begin{aligned} {{\varvec{R}}}_{\text{H}} {\bf{1}} &= {{\varvec{A}}}^{ - {\text{t}}} {{\varvec{G}}}{\bf{1}}{\bf{}} \\ &= ({{\varvec{G}}}^{\text{T}} {{\varvec{G}}} + \lambda^2 ({{\varvec{S}}}^0 )^{\text{T}} {{\varvec{S}}}^0 )^{ - 1} {{\varvec{G}}}^{\text{T}} {{\varvec{G}}}{\bf{1}} \\ &= ({{\varvec{G}}}^{\text{T}} {{\varvec{G}}} + \lambda^2 ({{\varvec{S}}}^0 )^{\text{T}} {{\varvec{S}}}^0 )^{ - 1} ({{\varvec{G}}}^{\text{T}} {{\varvec{G}}} + \lambda^2 ({{\varvec{S}}}^0 )^{\text{T}} {{\varvec{S}}}^0 ){\bf{1}} \\ &= {\bf{1}} \\ \end{aligned},$$
(35)

therefore, when regularization matrix is a zero-row-sum matrix (S0), the resolution matrix RH is a one-row-sum matrix S1 (while S11 = 1). This can be simplified as:

$$\left[ {\begin{array}{*{20}c} {{\varvec{G}}} \\ {{{\varvec{S}}}^0 } \\ \end{array} } \right]^{ - {\text{g}}} \left[ {\begin{array}{*{20}c} {{\varvec{G}}} \\ {\bf{0}} \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {{\varvec{G}}} \\ {{{\varvec{S}}}^0 } \\ \end{array} } \right]^{ - {\text{t}}} {{\varvec{G}}} = {{\varvec{S}}}^1 ,$$
(36)

where 0 is a zero matrix.

As the high-order Tikhonov regularization matrices λLn and λWLn (n > 0) (e.g., flatness and smoothness) are both a zero-row-sum S0 matrix. Therefore, RH(λLn) and RH(λWLn) are S1 matrices (i.e., all of the rows in RH(λLn) and RH(λWLn) sum to one; Table 2). All of the rows in RH(L1) in Fig. 3c sum to one, as shown in Fig. 4a.

Fig. 4
figure 4

a Row-vector sums and b main-diagonal elements of the resolution matrices RD and RH in Fig. 3

The resolution matrix RH(S0) is similar to a stochastic matrix (a square matrix with non-negative elements and each row summing to 1), but the entries in RH (or S1) can be negative due to the same reason above for RD. S0 matrix (is often band matrix) or the combined matrix of S0 with G (Eq. (36) or A in Eq. (10)) often includes two rows like Eq. (27), i.e., the S1 regularization matrix always yields two parameters (e.g., x1 and x11) that are constrained by the same observation (the first row of G), with one of them (x11) also being constrained by one of the other observation or regularization row (the tenth row of L1 in Eq. (S2)). The two rows also constrain a third parameter (x10). The entries in A−t and A−g related with parameters often have different signs from G, and the r1,11 entry in RH(L1) (Fig. 3c) is consequently negative. Therefore, the application of Ln regularization can cause more negative entries in RH(Ln) (e.g., Fig. 3c) than RD (Fig. 3a), and then RH is not a stochastic matrix.

A one-row-sum matrix RH(S1) implies that one is an eigenvalue of the projection RH, such that there exists an equation:

$${{\varvec{R}}}_{\text{H}} {{\varvec{x}}}_{\text{p}} = 1{{\varvec{x}}}_{\text{p}},$$
(37)

Eqs. (22) and (37) indicate that a medium (xp) can be fully resolved and equal to a solution (xp = RHxp), even though xp is not the medium that is currently measured. It is impossible for all of the row sums in RH(λI) to equal one, thereby demonstrating that higher-order Tikhonov regularization is superior to damping (or λI) regularization from the viewpoint of row sums.

2.4.4 R H Using Mixed Regularization

If the regularization is mixed using higher-order regularizations (e.g., L2 and L3), then the new regularization matrix is still a matrix S0, and RH is a matrix S1. Otherwise, if it is mixed using damping and a higher-order Ln (n > 0) (e.g., Sigloch 2011; Tewarson 1977), then the mixed regularization matrix C is neither a diagonal matrix nor a zero-row-sum matrix. Therefore, the new RH is neither a symmetric (Gram) matrix nor a one-row-sum matrix.

2.5 Significances of the Resolution Matrices

2.5.1 Row Vector = Content Function of the Medium

Equation (2) highlights that a solution parameter xi can be expressed as a weighted sum of all of the model parameters in x (or [x1, x2, …xm]T):

$$\underline x_i = \sum_{j = 1}^m {r_{i,j} x_j } ,$$
(38)

where the ri,j entry of matrix R plays a role in weighting the jth model parameter xj in the summation. The ri,* row vector of R therefore acts like an averaging vector (Backus and Gilbert 1968), with the entries in ri,* representing the accurate contents (or contributions) of all of the medium parameters to (or in) the ith solution parameter xi. Therefore, ri,* can also be termed the content (or contribution) function of the medium in xi (Table 3).

Table 3 Vectors of resolution matrix

2.5.2 Column Vector = Spreading Function of the Medium

The ri,j entry signifies the contribution of the medium parameter xj to xi, such that all of the entries in the jth column vector (r*,j) (Table 3) correspond to the contributions of the jth model parameter (xj) to either all of the parameters in x or the spread of xj into the parameters. This has been considered the Green’s function (also termed the point spread function (Smith 1997), or impulse response function) of xj to x.

2.5.3 Significances of the Matrices

The matrices RD, RI, and RH are slopes of the linear projection from x to x, but have different significances (Table 1) on the transformation. Regularization at least makes matrix A full rank (invertible); i.e., RI should be an identity matrix. If the x → x projection matrix RI is an identity matrix, then a unique solution (x) can be obtained from a given G and C. Otherwise, some of the model parameters remain poorly constrained, and further regularization should be employed.

RD is only produced from G (Eq. (19)). Therefore, RD represents the effects and contributions of x from the given observations, regardless of whether regularization is used or not.

RH only reflects the x → x projection, with no consideration of other factors (e.g., errors) in the matrix. RH can therefore evaluate the reliability of x. Furthermore, the construction of RH as a mixture of the observations (G) and regularization (λC in A) (Eq. (23)) means that it reflects some combination of the observational and regularization effects in the solution. The variations or differences between RH and RD therefore represent the effects of regularization on the inversion, as RD only reflects the observational effects on the solution.

All three matrices, especially RH and RD, are therefore essential for understanding the inversion and its result, such as the resolvability of x, the uniqueness and reliability of x, and the effects of regularization.

2.5.4 Column Vector Variations Due to Regularization Changes

The role of regularization on the projection from x to x can be revealed via a comparison of the resolution matrices RD and RH. One entry (ri,j) of either RD in Eq. (19) or RH in Eq. (24) can be written as:

$$r_{i,j} = \sum_k {u_{i,k} g_{k,j} } ,$$
(39)

where ui,j represents an entry of either G−g or A−t, and gk,j is an entry of G. RH (Eq. (24)) can be considered RD (Eq. (19)), with the variation in ui,j obtained by replacing G−g with A−t. Equation (39) indicates that if the g*,j column vector of G is given, then the variation of ui,j only influences the jth column vector (r*,j) of R. Therefore, the addition of regularization (C) in A (Eq. (10)) allows the jth column vector in RH to be considered a function of the jth column (with no relationship arising among the other columns) in RD. Regularization essentially changes the spread functions from RD to RH, such that the spread function is sensitive to regularization (Table 3).

These column vector variations due to regularization changes are well-illustrated by comparing RD and RH(L1) (Fig. 5c and d, respectively) of a linear inverse problem example (Example 2) with three-point observations (G in Fig. 5a) and 50 unknowns in x (Fig. 5a and b). The L1 regularization matrix and λ = 1 are used in the inversion. If a column in RD (e.g., r*,5) is all zeros (Fig. 5c), then the corresponding column vector in RH (r*,5) (Fig. 5d) will be all zeros. Otherwise, the tenth column in RD has one nonzero entry (r10,10), and the tenth column vector in RH(L1) (r*,10) (Fig. 5c and d) has nonzero entries around the r10,10 entry. The variations between the column in RD and that in RH are due to regularization. Similar results can be found via a comparison of RH(L1) and RD (Fig. 3) for Example 1 (Fig. 1).

Fig. 5
figure 5

Resolution matrices for linear inverse problem Example 2. a The observation matrix G. The observations consist of only three points at parameters x10, x30, and x32. Only one entry in each row in G is nonzero. b A synthetic model and its corresponding solution, which was constrained using first-order Tikhonov regularization and λ = 1. c, d Resolution matrices RD and RH. RD has only three nonzero entries. RH has more nonzero entries, but they are in the same columns (10, 30, and 32) as those in RD

Comparisons of the column vectors of RD and RH for the same parameter xj can reveal the regularization effect, as the variations in the spread functions from RD to RH are due to regularization. Here, the jth column vector in RH is only related to the jth column in RD, such that the relationship between the relative magnitudes of neighboring entries in a row vector of RD may be preserved in RH. A given row vector in RH may exhibit a similar pattern or curve to that in RD. For example, the curves of the r48,* row vectors in RD and RH (Fig. 3d) are similar, but those of the r*,48 column vectors (Fig. 3e) are very different.

2.6 Do the Projection Matrices Represent Practical Projections?

The projection r:x → x (Eq. (1)) includes the effects of all of the factors (e.g., uncertainty in observation d and prior information c (Menke 2015), Eq. (12)) in the process from x to x, but RD, RI, and RH do not. The resolution estimated from RH can be unrealistically higher than that obtained via synthetic tests (Pilkington 2016), which may be due to the limitation of the projection matrix.

2.6.1 R H Versus Observational Errors

The observational errors δd influence the solution x. When δd is considered, Eq. (2) becomes:

$$\underline {{\varvec{x}}} = {{\varvec{R}}}(\delta {{\varvec{d}}}){{\varvec{x}}},$$
(40)

where R(δd) represents R as a function of δd, with R(δd = 0) equaling R for the error-free case.

Inserting Eq. (6) into Eq. (15) then yields a projection that is similar to the form in Eq. (17):

$$\underline {{\varvec{x}}} = {{\varvec{R}}}_{\text{H}} {{\varvec{x}}} + \delta {{\varvec{x}}}_{\text{d}},$$
(41)

Eq. (41) shows that RH is independent of δxd, which represents the effect of δd in the solution. However, the error ranges can be used to weight the data in an inversion that employs a weighted least squares (WSL) approach; RH determined via WSL inversion is slightly different to the above RH, but it is also not R(δd). Therefore, RH does not include the effect of observational errors.

2.6.2 Resolution Matrix of the Full Process from x to x?

Is there a matrix that reflects all of the factors in the x → x process? If x is the result of a process that incorporates all of the factors, then the resolution matrix R that is directly inverted from x and x via Eq. (2) reflects all of the factors. However, the inversion scheme to determine such a matrix is often impossible in geophysics. The true model for a region (x) may be never known; otherwise, the matrix R for a region with known x is redundant and unnecessary.

Equation (2) represents a transformation from the real model x to the corresponding solution x, or a process via the projection r:x → x (or R). This approach is independent of x and can be isolated from the practical true model x and practical solution x. If x is not the true medium but rather a synthetic model, then the inversion of R is possible.

Recovery tests, e.g., checkerboard tests (Lévěque et al. 1993), that employ a synthetic model with a specific structure are frequently used to retrieve a qualitative resolution. The output solution x of a synthetic test that employs an input model with a random structure x also contains resolution information (An 2012). An (2012) suggested that the statistical resolution matrix RS (Table 1) can be determined by statistically comparing a limited number of input synthetic random models x and their correspondent output solutions x via a Gaussian function approximation for each row vector.

The output solution x includes all of the known factors in the x → x process, with RS including all of these factors. However, the matrix is only approximate and not necessarily accurate. If a large number of x and x are given, then an accurate and complete resolution matrix can be obtained on the basis of Eq. (2), as discussed in the following section.

3 Resolution Matrices of the Complete Process

An accurate resolution matrix that includes all of the factors in the complete x → x projection process is suggested in this section.

3.1 Method

Equation (2) has to be reorganized for the inversion of R because R is an m × m matrix, and both x and x are m × 1 vectors. The extended form of Eq. (2) is:

$$\left\{ {\begin{array}{*{20}r} \hfill {\underline x_1 = } & \hfill {r_{{1},{1}} x_1 + } & \hfill {r_{{1},2} x_2 + } & \hfill { \cdot \cdot \cdot + } & \hfill {r_{{1},m} x_m } \\ \hfill {\underline x_2 = } & \hfill {r_{{2},{1}} x_1 + } & \hfill {r_{{2},2} x_2 + } & \hfill { \cdot \cdot \cdot + } & \hfill {r_{{2},m} x_m } \\ \hfill \vdots & \hfill {} & \hfill {} & \hfill {} & \hfill {} \\ \hfill {\underline x_m = } & \hfill {r_{m,{1}} x_1 + } & \hfill {r_{m,2} x_2 + } & \hfill { \cdot \cdot \cdot + } & \hfill {r_{m,m} x_m } \\ \end{array} } \right.,$$
(42)

where ri,j is the element at the ith row and jth column of R. The equation is reorganized as:

$$\left\{ {\begin{array}{*{20}r} \hfill {\underline x_1 = } & \hfill {x_1 r_{{1},{1}} + } & \hfill {x_2 r_{{1},2} + } & \hfill { \cdot \cdot \cdot + } & \hfill {x_m r_{{1},m} } & \hfill {} & \hfill {} & \hfill {} & \hfill {} & \hfill {} & \hfill {} & \hfill {} \\ \hfill {\underline x_2 = } & \hfill {} & \hfill {} & \hfill {} & \hfill {} & \hfill {x_1 r_{{2},{1}} + } & \hfill { \cdot \cdot \cdot + } & \hfill {x_m r_{{2},m} } & \hfill {} & \hfill {} & \hfill {} & \hfill {} \\ \hfill \vdots & \hfill {} & \hfill {} & \hfill {} & \hfill {} & \hfill {} & \hfill {} & \hfill {} & \hfill \ddots & \hfill {} & \hfill {} & \hfill {} \\ \hfill {\underline x_m = } & \hfill {} & \hfill {} & \hfill {} & \hfill {} & \hfill {} & \hfill {} & \hfill {} & \hfill {} & \hfill {x_1 r_{m,{1}} + } & \hfill { \cdot \cdot \cdot + } & \hfill {x_m r_{m,m} } \\ \end{array} } \right..$$
(43)

The compact form of Eq. (43) is:

$$\underline {{\varvec{x}}} = {{\varvec{Xr}}},$$
(44)

where X is a band matrix that is composed of xT vectors and r is a vectorization of RT:

$${{\varvec{X}}}{ = }\left[ {\begin{array}{*{20}c} {{{\varvec{x}}}^{\text{T}} } & {} & {} & 0 \\ {} & {{{\varvec{x}}}^{\text{T}} } & {} & {} \\ {} & {} & \ddots & {} \\ 0 & {} & {} & {{{\varvec{x}}}^{\text{T}} } \\ \end{array} } \right]_{m \times m^2 }$$
(45)

and:

$${{\varvec{r}}} = {\text{vec}}({{\varvec{R}}}^{\text{T}} ) = {[}\begin{array}{*{20}c} {r_{1,1} } & {r_{1,2} } & { \cdot \cdot \cdot } & {r_{1,m} } & {r_{2,1} } & { \cdot \cdot \cdot } & {r_{2,m} } & { \cdot \cdot \cdot } & {r_{m,1} } & { \cdot \cdot \cdot } & {r_{m,m} } \\ \end{array} {]}^{\text{T}} .$$
(46)

Unlike Eq. (2), x in Eq. (44) is a dependent variable of r. If we have a real model (X) and its corresponding solution (x), then the resolution vector r can be inverted via Eq. (44), as r is a vector with m2 unknowns that is constructed from m equations and x has m elements.

Application of the projection to a random synthetic model xk outputs the corresponding solution xk. The model Xk (a band matrix for xk, Eq. (45)) and xk still satisfy Eq. (44). One can therefore obtain N solutions (x1, x2, …, xN) by performing the same projection for N different random synthetic models (X1, X2,…, XN), and then all of the solutions can be used to construct a new equation from N equations, following Eq. (44):

$$\{ \underline {{\varvec{x}}} \} = \{ {{\varvec{X}}}\} {{\varvec{r}}},$$
(47)

where:

$$\begin{array}{*{20}c} {\{ \underline {{\varvec{x}}} \} = \left[ {\begin{array}{*{20}c} {\underline {{\varvec{x}}}^1 } \\ {\underline {{\varvec{x}}}^2 } \\ \vdots \\ {\underline {{\varvec{x}}}^N } \\ \end{array} } \right]} & {{\text{and}}} & {\{ {{\varvec{X}}}\} = \left[ {\begin{array}{*{20}c} {{{\varvec{X}}}^1 } \\ {{{\varvec{X}}}^2 } \\ \vdots \\ {{{\varvec{X}}}^N } \\ \end{array} } \right]} \\ \end{array},$$
(48)

the extended form of Eq. (47) is shown in Eq. (S4) in the supporting information. One synthetic model (Xk) produces m equations, as outlined in Eq. (43), with Eq. (47) including N × m equations for N synthetic models. If Eq. (47) is constructed of m2 independent equations, then r will be uniquely resolvable via:

$${{\varvec{r}}} = \{ {{\varvec{X}}}\}^{ - 1} \{ \underline {{\varvec{x}}} \},$$
(49)

a resolution matrix R is then obtained by converting the vector r back into a matrix.

The new matrix R is obtained via either Eq. (49) or (2) without any approximation. The synthetic solutions (xk) are the result of the complete x → x process with all of the factors. Therefore, the resultant R reflects all of the factors (various errors, simplification, etc.) in the complete process, and is termed the complete resolution matrix, which is denoted RC (Table 1).

This extraction of RC from random synthetic input models and output solutions is similar to that proposed by An (2012). An (2012) focused on the extraction of a reliable resolution length from a small number of xi and xi to construct the approximate resolution matrix RS (Table 1). Here, an accurate resolution matrix (RC) is directly inverted without approximation. Various procedures, such as linear and nonlinear inverse problems—and also non-inverse problems (An 2012)—can be implemented to obtain RC (Fig. 6) and RS, as they are isolated from the x → x process. For example, RS can be obtained for kriging and minimum curvature gridding (Chiao et al. 2014). The extraction of RC by Eq. (47) is a linear regression of the relationship between xk and xk (Fig. 2a), such that RC can then be considered a linear approximation of r:x → x for either the nonlinear inverse or non-inverse problem.

Fig. 6
figure 6

Flowchart for obtaining the complete resolution matrix (RC) from synthetic random models and solutions. r is the vector form of RC

3.2 Equation Simplification

The ability to resolve RC via Eq. (49) requires the inversion of a large matrix with ≥ m2 rows and m2 columns. However, the calculation can be simplified.

Equations (42) and (43) state that the ith solution parameter xik of the solution xk can be written as:

$$\underline{x}_i^k = {{\varvec{R}}}_i {{\varvec{x}}}^k = ({{\varvec{x}}}^k )^{\text{T}} {{\varvec{R}}}_i^{\text{T}} ,$$
(50)

where Ri is the ith row of R (ri,*). All of the equations for the parameters from xi1 to xiN form:

$$\{ \underline{x}_i \} = \{ {{\varvec{x}}}^{\text{T}} \} {{\varvec{R}}}_i^{\text{T}} ,$$
(51)

where:

$$\begin{array}{*{20}c} {\{ \underline{x}_i \} = \left[ {\begin{array}{*{20}l} {\underline{x}_i^1 } \hfill \\ {\underline{x}_i^2 } \hfill \\ \vdots \hfill \\ {\underline{x}_i^N } \hfill \\ \end{array} } \right]} & {{\text{and}}} & {\{ {{\varvec{x}}}^{\text{T}} \} = \left[ {\begin{array}{*{20}l} {({{\varvec{x}}}^1 )^{\text{T}} } \hfill \\ {({{\varvec{x}}}^2 )^{\text{T}} } \hfill \\ \vdots \hfill \\ {({{\varvec{x}}}^N )^{\text{T}} } \hfill \\ \end{array} } \right]} \\ \end{array} .$$
(52)

Equation (51) is actually the same as Eq. (S4), but it only contains the rows for the solution parameter xi. Equation (51) can then be used to invert the ith row of RC (Ri) from:

$${{\varvec{R}}}_i^{\text{T}} = \{ {{\varvec{x}}}^{\text{T}} \}^{ - 1} \{ \underline{x}_i \},$$
(53)

the other rows of RC can also be inverted using Eq. (53).

The inversion of RC via Eq. (53) is easier than that via Eq. (49). Equation (49) has an (left) inverse of the matrix {X} with N × m rows and m2 columns. However, Eq. (53) has an (left) inverse of the much smaller (N × m) matrix {xT}. Furthermore, the inverse of {xT} for RiT in Eq. (53) is the same for the calculation of the other rows (e.g., RjT), and is then directly used in the calculation for all of the rows of RC.

We derived RC for Example 1 (Fig. 1) using the same regularization to obtain RH(L1) (Fig. 3c) without considering observation errors (δd) for comparison. The solutions (e.g., xk) for 100 (N = 100) different synthetic random models (e.g., xk) were resolved in step 1 (Fig. 6) of the calculation. The resultant RC (Fig. 7b) is the same as RH (Fig. 3c) for the linear inverse problem. However, RC can reflect all of the factors in a practical study, whereas RH only reflects the effects of G and regularization.

Fig. 7
figure 7

Resolution matrix RC for the 1-D inverse problem in Fig. 1. a Example synthetic (true) random model (input model) and its corresponding solution (output model) for the inverse problem. b Matrix RC or R-of. The resolution matrix RC is calculated from 100 pairs of random input and output models, as in (a). Flatness regularization, with λ = 1.0 used. R-of is the resolution matrix after the offset errors are isolated (Eq. (55)). c, d Two row vectors and e, f two column vectors of RC in (a). Red lines in (c–f) are Gaussian approximations of the vectors around the diagonal entry. The row and column vectors for the 48th parameter can be represented by Gaussian function curves (c and e, respectively), whereas those for the 95th parameter are far from Gaussian curves (d and f, respectively). The column vector (r*,95) (f) is all zeros. The widths of the Gaussian approximation curves (c–e) can represent the widths of the vector curves

3.3 Resolution Matrices with Error Effects

All of the measurements and processing include errors which influence the solution x and then the x → x projection. RC can include the effects of various (quantifiable and unquantifiable) errors and additional prior information (such as C and c in Eq. (10)). For example, system simplification may cause error in solution but is often difficult to be quantified. If xk is obtained through the simplification, the resulted RC will reflect the error related with the simplification. However, for the sake of comparison, examples on the effects of quantified errors in data are given below.

When the observational errors δd are considered, the equation with R(δd) (Eq. (40)) is of the same form as Eq. (2). Therefore, the above method (Fig. 6) for obtaining the complete resolution matrix RC via either Eq. (49) or (53) on the basis of Eq. (2) can be used to obtain R(δd) (Eq. (40)). One pair of models (xk, xk) serves as an independent measurement for R in this method. However, the addition of more factors in the processing than those contained in G and C make the relationship between x and x more complex, such that a larger number (> m) of model pairs is often necessary to obtain a reliable RC.

The traditional resolution matrix RH (Eq. (41)) cannot include the effect of observation error, then equals R(0) or R(δd = 0). The resolution matrix (RC or R(δd)) with the effect of observation error is different from RH. An equation with their difference can be obtained from Eqs. (40) and (41):

$$\delta \underline {{\varvec{x}}} = ({{\varvec{R}}}(\delta {{\varvec{d}}}) - {{\varvec{R}}}_{\text{H}} ){{\varvec{x}}},$$
(54)

Eq. (54) indicates the difference (R(δd) – RH) or (R(δd) – R(0)) is a projection matrix on the solution error (δx).

Random errors exist in practical studies. The influence of random errors on the solution diminishes as the number of observations increases. However, if observations are limited, then the average effect of the random errors appears as a regular systematic error; we therefore only test systematic errors here. A stronger regularization (larger λ) is normally required for inversions with observational errors. However, we used the same λ that was applied in the above error-free inversion for comparison.

Two main types of systematic errors, offset and scale factor errors, are tested here. Offset observational errors in a linear inversion will introduce offset errors in x (Fig. 2b). The x → x process with offset errors in x can be better represented by Eq. (17) than Eq. (40). Equation (17) has more variables (R and δxof) than Eq. (40), such that the process with offset errors is more complex and requires a larger number of model pairs to obtain R(δd). If we still invert for R(δd) via Eq. (40) using the same number (N = 100) of model pairs (xk, xk) in Fig. 7, then the resultant matrix is somewhat unstable (Fig. 8a). The column vector is somewhat stable (Fig. 8b) due to regularization (L1) because flatness regularization directly influences the column vectors, but not the row vectors. R(δd) generally becomes stable when a larger number (e.g., N = 200) of model pairs (Fig. 8a) is used.

Fig. 8
figure 8

a Row and b column vectors of the resolution matrices for a process that incorporates the offset observational errors calculated via Eq. (40) using 100 (N = 100) and 200 (N = 200) model pairs. The row vector of the matrix obtained from 100 model pairs is unstable in the row vector (a). The column vector (b) is less influenced by the errors because the L1 regularization used here mainly affects the column vectors via a smoothing process, whereas the row vectors are minimally affected

Scale factor observational errors in a linear inversion will introduce scale factor errors in x. The x → x process with scale factor errors in x can be well represented by Eq. (40), with R(δd) equaling RH multiplied by the factors related to the scale factor errors.

An offset error of 2.0 s (δd1 = 2.0 s) and a scale factor error of 20% (δd1 = 0.2d1 =  ~ 2.5 s) in the first observation data (d1), which have a mean of ~ 12.5 s for all of the synthetic observations, are considered. Two hundred (N = 200) model pairs are used, with the resultant matrices R(δd) shown in Figure S1. The matrix differences, R(δd) − RH(L1), which reflect the observational error effects in the projection from x to x, are shown in Fig. 9c and d.

Fig. 9
figure 9

Complete resolution matrices for processes considering (left) a 2.0 offset error and (right) 20% scale factor error in the first observation data (d1) for the inverse problem in Fig. 1a. a, b Example synthetic model and corresponding solution. c, d Resolution matrix difference E (= R(δd) – R(0)). e, f Row sums of R(δd). R(δd) is the projection matrix estimated by Eq. (40), which incorporates the observational errors in d1. R(0) is the same as in Fig. 7b. PE = parameters constrained by erroneous observation d1

These two types of observational errors yielded similar errors (δxd) in the solution parameters (x1x25) in x(L1) (Fig. 9a, b), resulting in similar summation curves for the 1st–25th content vectors in R(δd) (Fig. 9e, f). However, the errors produced different effects in the projection matrices R(δd). The row vector of R(δd) for a solution parameter with errors (e.g., x5) in Fig. 9c is very different from that in Fig. 9d. The addition of the scale factor error δd1 (= 0.2d1) only caused resolution matrix changes in the 1st–10th columns (Fig. 9d) (i.e., only the contributions of x1x10, which are constrained by d1, are influenced, as expected). However, the offset error δd1 (= 2.0) caused changes in all of the columns (Fig. 9c). Therefore, the spread functions are sensitive to different types of errors.

3.4 Offset Error Isolation

The x → x process with offset errors in x (δx) (Fig. 2b) can be better represented via linear regression (Eq. (17)), whereby the offset errors are isolated from either x or R(δd)x. When the offset errors δxof (= [o1, o2, …, om]T) are isolated, the obtained resolution matrix is denoted as either R-of or R-of(δd). Equation (17) can be written as:

$$\underline {{\varvec{x}}} = {{\varvec{R}}}_{\text{ - of}} (\delta {{\varvec{d}}}){{\varvec{x}}} + \delta {{\varvec{x}}}_{{\text{of}}} = {{\varvec{R}}}_{ + 1} {{\varvec{x}}}_{ + 1} ,$$
(55)

where R+1 = [R-of δxof], x+1 = [xT 1]T. The extended form of Eq. (55) is:

$$\left[ {\begin{array}{*{20}c} {\underline x_1 } \\ {\underline x_2 } \\ \vdots \\ {\underline x_m } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {r_{1,1} }&{r_{1,2} }&{ \cdot \cdot \cdot }&{r_{1,m} }&{o_1 } \\ {r_{2,1} }&{r_{2,2} }&{}&{r_{2,m} }&{o_2 } \\ \vdots & {} &\ddots & {} &\vdots \\ {r_{m,1} }&{r_{m,2} }&{ \cdot \cdot \cdot }&{r_{m,m} }&{o_m } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {x_1 } \\ {x_2 } \\ \vdots \\ {x_m } \\ 1 \\ \end{array} } \right].$$
(56)

Equation (55) is of the same form as Eq. (17), such that the m × (m + 1) matrix R+1 can be inverted using model pairs (e.g., xk and xk) via the above procedure for RC (Fig. 6). The obtained R+1 is denoted as RC+1 (Table 1). As expected, the resultant matrix R-of(δd) for the above case with offset errors is the same as R(δd = 0) (Fig. 7b), and the resultant δxof (not shown here) equals the offset errors in x.

3.5 Properties of the Complete Resolution Matrix

Unlike RD (from G only) and RH (from G and C), a complete resolution matrix can be obtained from any combination of all of the factors in a study (Table 1). If the synthetic solution xk is obtained from a generalized inversion using error-free observations, then RC is equal to RD, with both matrices characterizing the same properties. If regularization is used during the processing, then RC is equal to RH, with both matrices characterizing the same properties. Therefore, the properties of RC vary based on the considered factors in the x → x process.

3.6 Utilities of Resolution Matrices

Resolution matrices (e.g., RC) are a quantitative indicator not only for solution but also for other factors in a study system. All the factors (e.g., solution, observation, and regularization) can be appraised by the matrices.

Resolution matrix has been widely used to appraise solution reliability in an inverse problem (e.g., Aki et al. 1977; Aster et al. 2005; Backus and Gilbert 1970; Menke 2012; Tarantola and Valette 1982; Wiggins 1972). The diagonal entry ri,i reflects whether xi can be resolved or how much of xi is contained in xi and then has received considerable attention for several decades (e.g., Aki et al. 1977; Day-Lewis et al. 2005; Wiggins 1972). Resolution spread estimated from resolution matrix can quantify the degree of departure of R from an identity matrix, and then the goodness of the model (Backus and Gilbert 1970; Menke 2012, 2015). However, resolution length, discussed in the “Resolution length” section, is widely used in practical model appraisals at present. Furthermore, the matrices RS and RC can also be applied to evaluate solution stability. In an unstable inversion, a small variation in observation causes a large change in solution, then RS and RC calculated from unstable solutions from random models are sensitive to the instability.

The resolvability and constrainability of x under given observation, which is the most important information in a study, controls the solution reliability and must be taken into account to select parameterization and regularization. They can be quantitatively retrieved from resolution matrices, which is explained in the “Resolvability and constrainability from the resolution matrix” section.

The influences of regularization and errors on solution can also be evaluated by resolution matrices. The matrix RD reflects model projection under given observation. RH reflects the projection with the combination of observation and regularization. The difference, RHRD, reflects the regularization influence on solution, which is discussed above in the “Significances of the resolution matrices” section. If an error exists in the process to obtain x, RC will include the effect of the error. The difference, RCRD, reflects the error’s effect, which is discussed in the Sects. 3.3 and 3.4.

4 Resolution Length

The smallest possible feature that can be detected is an important constraint in the model. The feature size is generally called the (spatial) resolution or resolution length (or width). Following the suggestion of Lebedev and Nolet (2003), the resolution length is defined here as the half size of the feature. This section focuses on how to obtain the resolution length from a resolution matrix.

4.1 Content Extent Versus Resolution Length

Equation (38) indicates that one solution parameter generally comes from the weighted averages of all of the medium parameters, with the entries from a row vector of R being the weights or contents. The smallest resolvable feature represented by xi or the resolution length at xi should therefore be estimated from the row vector.

The medium parameters with high contents/weights in xi mainly determine xi, such that the feature represented by xi is related to the high-content segment (e.g., r48,41 to r48,55 in Fig. 7c) in the row-vector curve. However, as the feature is represented by xi, the resolution length is not defined by the high-value segment extent, but rather the distances from xi to the parameters at the segment borders. In Fig. 7c, the resolution length for x48 is not the half distance from x41 (r48,41) to x55 (r48,55), but it is instead related to the distances from x48 (r48,48) to x41 (r48,41) and x55 (r48,55); similarly, the resolution length for x95 is not the half distance from x86 to x90, but it is instead related to the distances from x95 to x86 and x90 (Fig. 7d).

In general, a parameter (e.g., xi) and its neighbors should provide a large contribution to the average (the solution parameter xi) (Jackson 1972), and the contributions (ri,j; j = 1, m) should decrease quickly with increasing distance from xi to the other parameter (xj). Therefore, each row of the resolution matrix (ri,j; j = 1, m) can be approximated as either a Gaussian-shaped function (e.g., An 2012; Fichtner and Trampert 2011; Nolet 2008) or a cone (Barmin et al. 2001). The row-vector curve may be similar to the shape of a Gaussian function (e.g., r48,*; Fig. 7c), but it can be quite different (e.g., r95,*; Fig. 7d). In general, the width of the Gaussian function approximation to the row-vector curves can represent the resolution length (distance from xi to the borders of the high-content segment) (Fig. 7c and d). For example, the resolution lengths in Fig. 7c and d are 4 km for x48 and 9 km for x95, respectively.

4.2 Estimation from a Row or Column Vector?

The resolution length may also be estimated from a column vector (e.g., Alumbaugh and Newman 2000; Smith 1997). RD is symmetric, such that the resolution lengths estimated from its ith row and column vectors are the same. However, regularization directly influences the spread functions (or column vectors), such that most of the resolution matrices with regularization (e.g., RH) are asymmetric. Therefore, the resolution lengths estimated from the ith column and row vectors can be different.

The resolution lengths that are estimated from the column and row vectors are largely the same or similar when Tikhonov regularization is applied. The resolution matrix RH(λI) is symmetric, such that the lengths from its row and column vectors are same. Higher-order Tikhonov regularizations (λLn) yield a smoother column vector (r*,48 in Fig. 7e) than its corresponding row vector (r48,* in Fig. 7e), and then the lengths from the two vectors are largely similar, as previously confirmed (Miller and Routh 2007; Pilkington 2016), but with exceptions.

The column vector cannot provide a valid resolution length for a parameter constrained by no observations (e.g., x95 in Fig. 1a, b). The column vector for the parameters in RH (e.g., r*,95 in Fig. 7f) is all zeros, but the corresponding row vector is not (r95,* in Fig. 7d), with a row-vector sum for R(S0) (e.g., RH(Ln)) that is equal to one. The row vector can provide a reliable resolution length in this case, whereas the column vector cannot (Table 3).

The row vector may be unstable and possess strong oscillations when large errors exist (e.g., r5,* in Fig. 8a, c). The high-amplitude oscillations around the diagonal entry of the row vector may imply an illusion of high resolution. However, the corresponding column vector in RH(Ln) is smoother (Fig. 8b) than the row vector (Fig. 8a), such that the length estimated from the column vector is more reliable and less influenced by these large errors (Table 3).

In summary, the resolution length should generally be taken from the row vectors based on the definition of the resolution length (Table 3). However, special cases (e.g., observation errors) may yield an unstable row-vector curve that may in turn influence the estimated resolution from this vector. Therefore, it is advised to simultaneously extract the resolution length from the row and column vectors of the resolution matrix to ensure the resolution length is accurately defined.

4.3 Resolution Estimations that are not from R

The spatial resolution in seismic tomography studies is widely estimated via visual inspection of the restoration of the synthetic structure (e.g., checkerboard tests) (Feng and An 2010; Lévěque et al. 1993; Thurber and Ritsema 2009), as illustrated in Fig. 10a–c. If the checker size can be recovered at a given location, then the resolution length at that location in the final result is at least the same as the checker size. This method is powerful and easily realized. However, the resolution length is qualitative, not quantitative. Furthermore, the recovered (x) and synthetic checkerboard pattern model (x) in one test are equivalent to one pair of x and x. Several tests cannot produce sufficient model pairs to provide fully resolution information, as explained in the “Resolution matrices of the complete process” section.

Fig. 10
figure 10

Example resolution lengths for a given tomography study of Rayleigh-wave dispersion at a period of 50 s. The figures are edited from Figs. 13 and 14 in Ma et al. (2014). a–c The resolution lengths are estimated using checkerboard tests, with the recoveries of three specific checker sizes inspected. The statistical resolution lengths in (d) are estimated from synthetic random models

The quantitative resolution length can still be retrieved when no resolution matrix is given or needed. The output solution x of a synthetic test using an input model with a random structure x contains the resolution information (An 2012). Quantitative resolution information can be retrieved via a number of approaches, including a comparison of many x and x (An 2012), cross correlation of x and x (Trampert et al. 2013), and autocorrelation of x (Fichtner and Leeuwen 2015). The resolution lengths in Fig. 10d were obtained using the An (2012) method on the basis of limited pairs of random synthetic models and solutions (Ma et al. 2014). The An (2012) method has been easily realized in various studies (e.g., Chevrot et al. 2014; Chiao et al. 2014; Lin et al. 2014; Ma et al. 2014). The resolution length distribution (Fig. 10d) is often easier and more informative for the general reader than synthetic checkerboard recovery tests (Fig. 10a–c) (Ma et al. 2014).

5 Resolvability and Constrainability from the Resolution Matrix

Several essential questions arise when the real model x cannot be fully resolved in x for an ill-posed problem. For example, how much information from x can be reflected in the solution x, or what is the resolvability of xi (or the content of xi in xi)? How much information from x is controlled by the observations? What is the constrainability (constraining status) of an individual solution parameter xi under given observations? These questions are not only essential for understanding the reliability of the solution, but also instrumental in providing basic information to guide the improvement of the study system. These questions are essentially centered on the relationship between x and x, such that their answers can be derived from the x → x projection/resolution matrix.

The ri,j entry in R represents the accurate content or contribution of the jth model parameter xj to the ith solution parameter xi. Therefore, the entries of the ith content vector (ri,*) and their sum (Σri,*) are indicators of the resolvability of xi and the constrainability of xi. As previously mentioned, RH reflects the combined effects of the observation matrix and regularization during the x → x process, whereas RD only reflects the observational effects. The reliability of the practical solution has therefore been evaluated via RH, but not RD. However, the constrainability of the solution parameter xi under given observations can be better obtained from RD than from RH for the same reason. The factors rather than the observations and regularization matrices can also influence the reliability of xi; this cannot be reflected by RD and RH, but can be by RC.

5.1 Resolvability Defined by the Main-Diagonal Element

Equation (38) indicates that the main-diagonal element ri,i reflects the content (or contribution) of the real model parameter xi in (to) its counterpart xi. This entry reflects whether xi can be resolved or how much of xi is contained in xi. ri,i can therefore be considered the resolvability of xi (Table 3) and has received considerable attention for several decades (e.g., Aki et al. 1977; Day-Lewis et al. 2005; Wiggins 1972).

The main-diagonal element ri,i (e.g., Fig. 4b for Example 1) may take one of the following values:

  • ri,i = 1 (as illustrated in Fig. 11a). The curve shape of the elements in the ith row vector is a delta function, where all of the elements are zero, except ri,i. In this case, xi equals xi, which means that xi is well constrained and xi can be fully resolved. If ri,i is in RD, then the parameter xi is fully resolvable under the given observations.

  • ri,i = 0 (Fig. 11b). This case indicates that xi makes no contribution to its counterpart xi and is unresolvable. For example, parameter x95 in Example 1 is not constrained by an observation (Fig. 1a, b), such that r95,95 in both RD and RH is zero (Figs. 3 and 4b), and x95 in xD equals zero (Fig. 1c).

  • ri,i ∈ (0,1). xi partially contributes to its counterpart xi and is partially resolvable. If ri,i in RD belongs to (0,1), then xi shares the observation with other parameters (e.g., xj); xi therefore contributes to both its counterpart xi and xj. For example r1,1 = 0.1 in RD in Example 1 (Fig. 4b), which indicates that x1 shares observation 1 with x2 (Fig. 1a); x1 therefore contributes to both x1 and x2.

Fig. 11
figure 11

Illustration of the four constraining statuses for the ith solution parameter xi based on the elements of the ith row vector ri,*

The main-diagonal element ri,i in RD reflects the resolvability of xi under a given observational condition, such that the sum of all of the main-diagonal elements, or trace(RD), can be considered the resolvability of the model vector x. As RD is a Gram matrix, trace(RD) equals the number of independent observations (rank(G)). RD will therefore be an identity matrix, and x can be fully resolved when either trace(RD) or rank(G) equals the number of model parameters (m).

5.2 Deviation from the Expectation Given by the Row-Vector Sum

Equation (38) indicates that xi equals the weighted sum of all of the parameters in x, and Σri,* (or ΣiR) is the sum of all of the weighted ri,*. If R is a stochastic matrix, then a sum of one means that xi reflects the true average of x (Nolet 2008) and lies at least within the extremes of x. For example, if Σ1RD = 1 (Fig. 4a) and r1,* in RD are positive (Fig. 3a), then x1 in xD is a good representative of x1 (Fig. 1c). However, the assumption is often false, as the entries in R can be negative (Menke 2015) (i.e., R is often not a stochastic matrix). For example, Σr1,* in RH(L1) in Example 1 equals one (Fig. 4a), but the parameter x1 in x(L1) (Fig. 1c) deviates from the expected average. The polarity of ri,* must therefore be considered when Σi(R) is used to judge the reliability of xi.

While ΣiR = 1 does not necessarily correspond to the perfectness of xi, the deviation of ΣiR from one is a good indicator of the deviation of xi from the true model average (Table 3). A comparison of Figs. 4a and 1c indicates that an overestimated (larger than the model average) parameter xi (x54 in xD and x(I) in Fig. 1c) corresponds to ΣiR > 1 (e.g., Σ54RD and Σ54RH(I) in Fig. 4a), and an underestimated (smaller than the model average) parameter (x48 in xD and x(I) in Fig. 1c) corresponds to ΣiR < 1 (e.g., Σ48RD and Σ48RH(I) in Fig. 4a).

5.3 Difference Between Neighboring Parameters

If the parameter xi is partially resolvable (ri,i ∈ (0,1)), then the solution parameter xi will include content (ri,ii > 0) from neighboring parameters (e.g., xii), and the ri,i and ri,ii entries in the ith row vector of R can reflect the similarity between xi and xii.

When two neighboring parameters xi and xii (e.g., x60 and x61; Fig. 1a, b) are constrained by unrelated observations, ri,i < 1 and ri,ii = 0 (Fig. 11c), as observed for r60,61 and r60,60 in either RD or RH(I) (Fig. 3a and b). In this case, xii does not contribute to xi (Eq. (38)), and xi does not contribute to xii. It is possible to discriminate the difference between xi and xii in xi and xii from the unrelated observations, as x61 in either xD or x(I) is obviously different from x60 (Fig. 1c).

When two neighboring parameters xii and xi (e.g., x13 and x14, which are constrained by the second observation in Fig. 1a, b) are constrained by the same or related observations, both ri,i and ri,ii (ii = i − 1 or i + 1; Fig. 11d) (e.g., r13,13 and r13,14 in either RD or RH(I); Fig. 3a and b) are in the range (0,1). In this case, xi has contents from both xii and xi. xii also has contents from both xii and xi because of the symmetry of RD. Consequently, xi and xii cannot be discriminated, and their difference is often related to the difference Δri,ii:

$$\Delta r_{i,ii} = \left| {r_{i,i} - r_{i,ii} } \right|.$$
(57)

The difference between xi and xii in xD is often very large when Δri,ii is large, even if the real model parameters xi and xii are the same. When Δri,ii is small, the difference is also small, even if xi and xii are quite different. When Δri,ii is zero, xi is often equal to xii. The solution parameters x13 and x14 in either xD or x(I) are quite different (Fig. 1c), even though the real model parameters x13 and x14 are almost the same. Δri,ii is therefore a good indicator of the difference between xi and xii (Table 3) if ri,i is less than one, even though this difference has no relation to the difference between xi and xii.

5.4 Short Summary on Constrainability

In summary, the constrainability of a solution parameter can be quantitatively evaluated from its content vector in a resolution matrix (Table 3). The main-diagonal element (ri,i) can be considered the resolvability of xi. If RD is used, then ri,i values of 0, 1, and (0,1) mean that xi is unresolvable, fully resolvable, and partially resolvable, respectively, for given observations. The deviation of the content vector sum (Σri,*) from one can be considered an indicator of the deviation from the model expectation. Values of Σri,* > 1 and < 1 mean that xi is overestimated and underestimated, respectively. In the case where Σri,* = 1 and all of the elements ri,* are nonnegative, xi is the model true average. The difference between ri,i and ri,iiri,ii) is a reflection of the difference between xi with xii for a partially resolvable parameter xi (ri,i ∈ (0,1)). Large and small Δri,ii values correspond to large and small differences between two neighboring parameters, respectively, although the solution difference has no relation to the difference between the true medium parameters.

5.5 Constrainability from R H

Practical studies that employ regularization do not provide RD, but rather RH. Estimation of the parameter constrainability under a given observation from RH is necessary in this case to determine how much information in the solution is from observation rather regularization. As mentioned above, RH(λI) is the most similar matrix to RD, making it a good alternative to RD for evaluating the constrainability. However, most of the main-diagonal elements in RH(λI) are smaller than those in RD. The row vector sum of RH(λI) may therefore be smaller than that of RD, such that its deviation from one cannot be used to evaluate the underestimated parameters. RH(λI) can also be used to evaluate the well constrained parameters, as the curve shape of their row vectors is still a delta function, even though ri,i may not be one.

RH(Ln) (n > 0; e.g., RH(L1) in Fig. 3c) is significantly different from RD (Fig. 3a). With the exception of the unconstrained parameters, the main-diagonal elements of RH(L1) (ri,i and rii,ii) for two neighboring parameters (xi and xii) that are constrained by the same observation can be different (Fig. 4b). The row-vector sum of RH(L1) for any single parameter equals one (Fig. 4a). Therefore, the main-diagonal elements ri,i and row-vector sum ΣiRH(L1) cannot be used to evaluate the constrainability of the parameter xi. However, the all-zero column vectors of RH(Ln) (n = 0, 1, 2) for the unconstrained parameters are the same as those in RD, such that the unconstrained parameters and unconstrained neighbors can be evaluated from RH(Ln). Furthermore, the curve shape of the row vectors of RH(Ln) (n = 0, 1) is similar to that of RD (Figs. 1f and 3). The mutual relationship between two neighboring parameters can therefore be roughly evaluated using Δri,ii of RH(Ln).

5.6 Resolution Upper Bound

RD is only related to G (Eq. (19)), with no relationship to the observation data d with observational errors (δd) and other factors, whereas RH is composed of both G and C. Regularization adds artificial constraints to make the solution appear more rational. However, various factors, including observational errors, regularization, and the instability of the solution can decrease (but not increase) the resolution reflected by RD. Therefore, the resolution derived from RD marks the upper bound of resolvability (Table 1).

While repeated observations can improve the precision of the solution by improving the precision in the observation data (d), they cannot increase either rank(G) or the number of independent observations. Repeated observations therefore have no effect on RD and cannot improve the upper bound of resolution. The only way to improve the upper bound of resolution is to increase the independent observations (or rank(G)).

5.7 What is a Perfect Inversion?

A perfect inversion, or a perfect constrainability, corresponds to a resolution matrix that equals the identity matrix I (e.g., Jackson 1972; Menke 1989). However, this rule is only applicable for direct resolution matrix RD, not the other resolution matrices (e.g., RH or RC). A main-diagonal element ri,i of one in RD means that xi can be fully resolved in xi, thereby implying that a perfect inversion has identity matrix of RD. RD always equals I except when the problem is under-determined. However, RH includes regularization, and regularization is largely employed when the constrainability of the solution parameters under a given observational condition is imperfect, (i.e., trace(RD) is often smaller than m or the inverse of G is unstable). Therefore, trace(RH) is always smaller than trace(RD), regardless of the quality of the regularization, such that RH will never be an identity matrix I. Therefore, the perfectness of the inversion cannot be judged based on the degree of similarity between RH and I. The perfectness of a study using regularization should be judged based on the amount of valid information in G that is reflected in the solution, as different regularizations yield different solutions.

6 Resolution Matrices in Nonlinear Inversions

The relationship between x and x is different from that between x (or x) and d (or d). The relationship between x and x (r:x → x) of a nonlinear problem can be nonlinear, but can also be linear, such as when true model x is perfectly resolved (x = x = I x = r(x)). This indicates, the x → x projection is a different relation with the problem but relates with the ability of solving the problem. The nonlinear inverse problem cannot be described by either Eq. (6) or (9), such that neither RD (Eq. (19)) nor RH (Eq. (24)) can be obtained for the problem. Anyway, Eq. (2) remains a valid linear projection approximation of the nonlinear equation (Eq. (1)), such that RC, which is directly obtained from Eq. (2), can still be provided, regardless of the method used to solve the problem.

A nonlinear inverse problem can be solved via either a global optimization or linearized method. If the problem is resolved using linearized iterative methods (e.g., Aster et al. 2005; Bourgeois et al. 1989), such as Newton’s method, then a resolution matrix can be obtained after each iteration (Jackson 1972) from either Eq. (19), Eq. (24), or a similar equation. However, the resolution matrix is not any of the above-mentioned resolution matrices (RD, RH, RC, RI, and RS), but rather one of three new resolution matrices that are specifically constructed for nonlinear inverse problems.

An iterative inversion is designed to invert for the model perturbation Δxi (= xxi–1) of the reference model xi–1 at the ith iteration on the basis of the first-order approximation of the inverse problem (Eq. (4)):

$$\Delta {{\varvec{d}}}^i = {{\varvec{G}}}_{\text{J}}^i \Delta {{\varvec{x}}}^i .$$
(58)

where Δdi = d − g(xi−1) and GJi is a Jacobian matrix with the partial derivatives of d taken with respect to x at xi−1. Equation (58) is a linear equation, such that the solution of the perturbation Δxi (Δxi) can be obtained from GJi and/or C using the same methods that were employed to obtain the model solution x from G and/or C in the above linear inversions (e.g., Eq. (11)). However, the inverted solution (Δxi) is a model perturbation, not the model. The inverted model (solution) after the ith iteration xi is xi−1 + Δxi, which is the reference model at the next inversion iteration. This is the general procedure of Newton’s method.

A surface-wave inversion to constrain the S-wave velocity structure (Example 3) (Fig. 12a) is synthetized here to illustrate the three resolution matrices that can be implemented in a linearized nonlinear inversion. The employed inversion is a typical nonlinear inversion approach in geophysics that is widely applied to elucidate 1-D and three-dimensional sedimentary, crustal, and lithospheric Earth structures (e.g., Feng and An 2010; Knopoff 1972; Snoke and James 1997; Wiggins 1972; Xia et al. 1999); therefore, the details of the nonlinear relationship between x and d are not explained here. First-order Tikhonov regularization has been widely used in this inversion approach, although it prevents the correction of bad discontinuities in the reference model (An 2020). A regularization approach that is adapted to the reference models suggested by An (2020) can overcome this problem and lead to a rapidly convergent iterative inversion; this regularization approach is used here. The regularization parameter was set to λ = 0.01 after a series of tests that explored the trade-off between the misfit and model flatness. The synthetic observation (d) and predictions (Fig. 12a) and partial-derivative matrix GJi (Eq. (58)) were calculated using the surf96 program (Herrmann 2013). Given a reference model at the first iteration (the starting model), the model solutions after the first to fifth iterations (xi) and their fits to the reference model are shown in Fig. 12a. The solution x5 after the fifth iteration is nearly the same as x4 after the fourth iteration, which implies that the inversion converged around x4. x4 is, therefore, considered the final solution.

Fig. 12
figure 12

Example resolution matrices for the solutions after iteration 4 of Newton’s iterations of the surface-wave dispersion inversion to determine the 1-D S-wave velocity structure. a Synthetic true model (x) and solutions (xi) for the 1-D S-wave velocity structure after the ith iteration. Synthetic observations (d) and predictions d(xi) of the surface-wave dispersion curves for solution xi. b Complete resolution matrix RC for solution x4 (i = 4). c RJH4, which is the resolution matrix for solution improvement just after iteration 4. d Difference between RJH4 and RJH1→4, which is the resolution matrix of the solution improvement on the starting model. Gaussian widths (red dashes) in (b–d) correspond to the resolution length

6.1 Linear Approximation of r:xx

The RC calculation for a nonlinear inversion is the same as that for the above linear inversions, whereby only the synthetic input models (x) and their corresponding solutions (x) are used. This calculation, which is based on either Eq. (43) or (47), is in fact a linear regression of x and x (i.e., RC is a linear approximation of r:x → x) (Table 1) (Fig. 2a). However, the relationship between x and x is often nonlinear due to the complexity in the x → x process for a nonlinear problem. The calculation of a reliable RC therefore requires more pairs of input/output models. Furthermore, the resultant RC will somewhat depend on the synthetic random input models. If they are closer to the practical medium, then the resultant RC will be more realistic.

6.2 Projection of Solution Improvement After an Iteration

As Eq. (58) is of the same form as Eq. (6), the process from Δxi (= xxi–1) to Δxi (= xixi–1) after the ith iteration can be expressed in a form similar to Eq. (22) to represent the x → x process (e.g., Jackson 1972; Wiggins 1972):

$$\Delta \underline {{\varvec{x}}}^i = {{\varvec{R}}}_{\text{J}}^i \Delta {{\varvec{x}}}^i$$
(59)

or:

$$\underline {{\varvec{x}}}^i - \underline {{\varvec{x}}}^{i - 1} = {{\varvec{R}}}_{\text{J}}^i ({{\varvec{x}}} - \underline {{\varvec{x}}}^{i - 1} ),$$
(60)

where RJi denotes the resolution matrix R:Δxi → Δxi. Equation (59) is of the same form as Eq. (2), such that RJi (denoted RJDi) can be obtained via Eq. (19):

$${{\varvec{R}}}_{{\text{JD}}}^i = ({{\varvec{G}}}_{\text{J}}^i )^{ - {\text{g}}} {{\varvec{G}}}_{\text{J}}^i .$$
(61)

When the regularization matrix C is used, RJi (denoted RJHi) can be obtained via Eq. (24):

$${{\varvec{R}}}_{{\text{JH}}}^i = ({{\varvec{A}}}_{\text{J}}^i )^{ - {\text{t}}} {{\varvec{G}}}_{\text{J}}^i ,$$
(62)

where AJi is the combination of GJi and C, which is similar to how A is the combination of G and C in Eq. (10). The matrix RJDi (or RJHi) is often denoted R previously, but it possesses a different significance than the resolution matrix R:x → x.

The first-order Taylor expansion of Eq. (1) at xi−1 is:

$$\underline {{\varvec{x}}}^i = \underline {{\varvec{x}}}^{i - 1} + {{\varvec{J}}}_r (\underline {{\varvec{x}}}^{i - 1} )({{\varvec{x}}} - \underline {{\varvec{x}}}^{i - 1} ),$$
(63)

where Jr(xi−1) denotes the Jacobian matrix (or the gradient) of the projection r:x → x at reference model xi–1. Equation (63) has the same form as Eq. (59). Therefore, RJi is exactly the Jacobian matrix Jr(xi−1) (Table 1). Practically, RJi represents the projection from x − xi−1 to xi − xi−1 (Eq. (59)) (i.e., the projection of the solution improvement on the reference model (xi–1) just after the ith iteration) (e.g., RJ1 illustrated in Fig. 13b). The matrix RJi in Example 3 (e.g., RJH4 just after the fourth iteration (Fig. 12c)) is slightly different than RC (Fig. 12b).

Fig. 13
figure 13

Illustration of resolution matrices in a linearized inversion. The model x contains one parameter (x1). a True model (x) and solution at each iteration. b Projections with resolution matrices RC or RCi (for r:x → xi), RC1 (for r:x → x1), RJ1 at first iteration (slope of r:x → x at x0), and RJ1→i after i iterations (slope of r:x → x from x0 to xi)

6.3 Projection of the Solution Improvement up to an Iteration

The inversion after each iteration is represented by Eq. (59), but an application needs one more iteration. The solution improvement from the kth to ith iterations can be expressed as:

$$\underline {{\varvec{x}}}^i - \underline {{\varvec{x}}}^{k - 1} = {{\varvec{R}}}_{\text{J}}^{k \to i} ({{\varvec{x}}} - \underline {{\varvec{x}}}^{k - 1} ),$$
(64)

where:

$$\Delta {{\varvec{R}}}_{\text{J}}^{k \to i} = \Delta {{\varvec{R}}}_{\text{J}}^i + \Delta {{\varvec{R}}}_{\text{J}}^{k \to (i - 1)} - \Delta {{\varvec{R}}}_{\text{J}}^i \Delta {{\varvec{R}}}_{\text{J}}^{k \to (i - 1)} .$$
(65)

If k = i just after the ith iteration of a given inversion, then Eq. (64) should be the same as Eq. (59). The matrix RJi→(i−1) = {0}, which means that RJii = RJi. Therefore, if k = 1, then Eq. (64) becomes:

$$\underline {{\varvec{x}}}^i - \underline {{\varvec{x}}}^0 = {{\varvec{R}}}_{\text{J}}^{1 \to i} ({{\varvec{x}}} - \underline {{\varvec{x}}}^0 ),$$
(66)

where:

$${{\varvec{R}}}_{\text{J}}^{1 \to i} = {{\varvec{R}}}_{\text{J}}^i + {{\varvec{R}}}_{\text{J}}^{1 \to (i - 1)} - {{\varvec{R}}}_{\text{J}}^i {{\varvec{R}}}_{\text{J}}^{1 \to (i - 1)} .$$
(67)

The matrix RJ1→i represents the projection from Δx (= x − x0) to Δx (= xi − x0), which is a projection of the solution improvement on the starting model x0 after i iterations (xi − x0) (Fig. 13b). This is different from the gradient of r:x → x at xi (RJi), as RJ1→i is the slope of r:x → x from x0 to xi (Table 1). The magnitude difference between RJH1→4, which is obtained using RJH1, …, RJH4, and RJH4 (Fig. 12d) is remarkable. The matrix RJ1→i better represents the solution improvement in the inversion than RJi.

6.4 Four Types of Resolution Matrices in a Linearized Inversion

The matrices RJi:Δxi → Δxi and RJ1→i:Δx → Δx in a linearized inversion can also be obtained via the RC calculation. If the synthetic perturbations Δxi (= x − xi−1) and solution Δxi after the ith iteration are used as the true model x and corresponding solution x, respectively, then the resultant RC (denoted RJCi) should be the same as RJHi:Δxi → Δxi. If the synthetic perturbations Δx (= x − x0) and corresponding solution Δx1→i (= xi − x0) are used, then the resultant RC (denoted RJC1→i) is the same as RJH1→i.

Furthermore, if the synthetic x and corresponding solution xi (= xi–1 + Δxi) after the ith iteration are used in the RC calculation, then the resultant RC (denoted RCi) (e.g., RC and RC1 in Fig. 13b) represents the projection from x to xi (= xi–1 + Δxi) just after the iteration (Ri:x → xi). This is one more new type of resolution matrix that may appear in a linearized iterative application. If xi is final solution x, RCi is written as RC.

In summary, completed resolution matrix for the x → x process can be obtained from the linear approximation of r:x → x, regardless of the method used to solve a given nonlinear inverse problem. However, a linearized iterative application can have four classes of resolution matrices (Tables 1 and 2, Fig. 13b), RC, RCi, RJ1→i, and RJi, which represent the x → x, x → xi, (x − x0) → (xi − x0), and (x − xi−1) → (xi − xi−1) projections, respectively. R is a linear approximation of the operator r, whereas RJ is the Jacobian matrix of r. The resolution matrix RJHi, which is often provided in the literature, reflects the solution improvement just at the ith iteration. The matrix RJH1→i reflects the solution improvement of the solution after all i iterations from starting model to the solution xi.

The surface-wave dispersion inversion in Example 3 highlights that even though the magnitudes of RC, RJH4, and RJH1→4 (Fig. 12b–d) are obviously different, their magnitude patterns (that of RJH1→4 is not shown here) are similar. The resolution lengths estimated from the three matrices (Fig. 12d) are also similar. Therefore, the resolution lengths retrieved from either RJH1→4 or RJH4 in a surface-wave dispersion inversion are also acceptable if RC cannot be given.

7 Conclusion

Here, we reviewed previous resolution matrices and their applications to clarify the properties of resolution matrices in linear and nonlinear inversions that implement zeroth- and higher-order Tikhonov regularizations. We explained how to use the resolution matrix to understand both the resolvability of the medium parameters and the constrainability of the solution parameters. Furthermore, we suggested a new resolution matrix, the complete resolution matrix, which reflects all of the factors in a study system. This new matrix, which is able to overcome many of the limitations encountered by previous matrices, can be broadly applied in linear and nonlinear (inverse and non-inverse) problems. This study is designed to assist the reader in fully understanding both the concept and application of a resolution matrix and in recognizing how to appropriately appraise a solution and understand the relationship between the solution and all of the factors in the study. These matrix suggestions can guide the reader in improving the study system.