Abstract
We propose a novel dictionary learning addon for the Inverse Problem Matching Pursuit (IPMP) algorithms for approximating spherical inverse problems such as the downward continuation of the gravitational potential. With the addon, we aim to automatize the choice of dictionary and simultaneously reduce the computational costs. The IPMP algorithms iteratively minimize the Tikhonov–Phillips functional in order to construct a weighted linear combination of socalled dictionary elements as a regularized approximation. A dictionary is an intentionally redundant set of trial functions such as spherical harmonics (SHs), Slepian functions (SLs) as well as radial basis functions (RBFs) and wavelets (RBWs). In previous works, this dictionary was chosen manually which resulted in high runtimes and storage demand. Moreover, a possible bias could also not be ruled out. The additional learning technique we present here allows us to work with infinitely many trial functions while reducing the computational costs. This approach may enable a quantification of a possible bias in future research. We explain the general mechanism and provide numerical results that prove its applicability and efficiency.
1 Introduction
The climate is changing. In particular, the repercussions caused by the loss of water in the soil or the ice sheets are dangerous to human kind as they fuel droughts as well as the sea level rise. Thus, the geosciences are taking care to monitor the mass transport of the Earth, see e. g. Fischer and Michel (2013b), Flechtner et al. (2020), IPCC (2014), Lin et al. (2018), NASA (2020), Sneeuw and Saemian (2019), Wiese et al. (2020) as well as the results of the DFG SPP 1257 (20062014) coordinated by Ilk and Kusche, see e. g. Kusche et al. (2012). The mass transport is obtained by modelling the gravitational potential from timedependent satellite data, e. g. from the GRACE and GRACEFO satellite mission, see e. g. Devaraju and Sneeuw (2017), Flechtner et al. (2014a, 2014b), NASA Jet Propulsion Laboratory (2020), Schmidt et al. (2008), Tapley et al. (2004), The University of Texas at Austin, Centre for Space Research (2020).
Mathematically, we can approximate the Earth’s surface with the unit sphere \(\Omega \). The gravitational potential f on the Earth’s surface is usually expanded in SHs \(Y_{n,j},\ n\in {\mathbb {N}}_0,\ j=n,\ldots ,n,\) such that we have
which holds in the \(\mathrm {L}^2(\Omega )\)sense. Correspondingly, at a satellite orbit radius \(\sigma >1\), we upward continue this representation at every point \(\eta \in \Omega \) via the operator \({\mathcal {T}}\) and obtain the gravitational potential V outside the Earth, see e. g. Baur (2014), Freeden and Michel (2004), Moritz (2010),
which actually holds pointwise. Obviously, \({\mathcal {T}}\) has exponentially decreasing singular values. The wellknown EGM2008 model represents the gravitational potential as given by (1) truncated at degree \(N=2190\). For a satellite at 500 km height, this leads to a singular value of
This behaviour is mathematically deduced and, thus, does not depend on a specific experiment setting. Naturally, we are much more interested in the inverse problem of the downward continuation, i. e. if we have given values of the potential V at \(\sigma \eta \), we are interested in the values of f at \(\eta \). However, the inverse of \({\mathcal {T}}\) has exponentially increasing singular values. Thus, it is not continuously dependent on the data, see e. g. Engl et al. (1996), Louis (1989), Michel (2005, 2022), Rieder (2003). This means, mathematically, the downward continuation is an exponentially illposed inverse problem and needs sophisticated methods to be tackled. We are interested in further developing such methods for illposed inverse problems. Thus, we consider the continuation of the gravitational potential and not of the potential gradients here because the former has a higher instability. Moreover, here the classical downward continuation problem using satellite data serves our need of a wellunderstood illposed inverse problem though we are aware that, in the last years, research also concentrated on the also important airborne downward continuation, see, e.g. Novák et al. (2001), Eicker (2008), Naeimi (2013), Lieb (2017).
One possible approach for illposed inverse problems is the IPMP algorithms, i. e. here the Regularized (Orthogonal) Functional Matching Pursuit (R(O)FMP) algorithm. To the knowledge of the authors, only our own group developed matching pursuits for illposed inverse problems. Thus, for more details, we refer to e. g. Fischer and Michel (2013a), Michel (2015), Michel and Orzlowski (2017), Michel and Telschow (2016). Using discrete values of V, these methods iteratively build an approximation of f as a best basis expansion drawn from dictionary elements. A dictionary \({\mathcal {D}}\) is an intentionally redundant set of trial functions. For spherical tasks like the downward continuation, it may contain SHs, SLs, RBFs and/or RBWs. Then, \({\mathcal {D}}\) contains global functions as well as localized ones and bandlimited as well as nonbandlimited, respectively. In each iteration, the next dictionary element is chosen such that it minimizes the Tikhonov–Phillips functional regarding the current residual and approximation. In the ROFMP algorithm, it simultaneously fulfils to a certain extent orthogonality relations with the previously chosen basis elements as well. Thus, in this case, the coefficients are updated regularly to maintain their optimality. Note that the methods can in principle also be used for spatiotemporal data. For developing them further, however, we do without a temporal aspect.
The IPMP algorithms usually work with a large finite, a priori chosen subset of the infinitely many possible trial functions as its dictionary. This is obviously an obstacle for users who are new to the incorporated trial functions. Moreover, it also leads to a long runtime and a high storage demand which are both preventing us from considering realistic experiment settings (e.g. using \(5 \times 10^6\) data points). Further, we also recognize that a manually chosen dictionary possibly biases the obtained approximation though without alternatives this can hardly be quantified.
Thus, we further developed the IPMP algorithms by adding a novel dictionary learning technique. Note, however, that our task at hand differs from established dictionary learning challenges: instead of solving pure approximation problems, we have to deal with the operator of the inverse problem; moreover, we intend to focus on learning established trial functions in contrast to defining new trial functions by learning values at grid points. Note that the former provides a higher comparability with wellknown models. Thus, we cannot straightforwardly use previous dictionary learning strategies, see e. g. Aharon et al. (2006), Bruckstein et al. (2009), Engan et al. (1999a, 1999b), Prünte (2008), Rubinstein et al. (2010) but need to develop our own approach.
Here, the dictionary learning addon for the SLs, RBFs and RBWs consists of a 2step process of (first global then local) nonlinear constrained optimization problems in order to compute optimized candidates for these types of trial functions. We utilize the NLOpt library, see Johnson (2019), for this as, to the knowledge of the authors, it is reliable and offers a large range of local and global optimization algorithms to choose from. A candidate for the SHs, however, can be obtained by comparing the values of the respective objective function for different SHs. Then, we learn particular functions as well as a maximal SH degree. All of these candidates together constitute again a finite dictionary and we can proceed with the (remaining) routine of the IPMP algorithms in this iteration. After termination, we obtain an approximation of f in an optimized best basis whose elements can be reused as a learnt dictionary in future runs of the IPMP algorithms. The IPMP algorithms which include the novel learning addon are called the Learning Inverse Problem Matching Pursuit (LIPMP) algorithms, i. e. the Learning Regularized (Orthogonal) Functional Matching Pursuit (LR(O)FMP) algorithm. Note that the LRFMP algorithm here is an advanced version of the one presented in Michel and Schneider (2020).
In the sequel, we formally but shortly introduce the SHs, SLs, RBFs and RBWs as well as a dictionary in Sect. 2. Then, we explain the idea of the LIPMP algorithms which includes an overview of the IPMP algorithms and the novel learning addon in Sect. 3. Particularly, we focus on how the 2step optimization process fits into the routine of the IPMP algorithms. Then, we show the applicability and efficiency of the addon as well as the learnt dictionary in a series of experiments in Sect. 4.
This paper is based on Schneider (2020) and was partly presented at the EGU2020: Sharing Geoscience Online (Schneider and Michel 2020).
2 Basics
Let \({\mathbb {R}}\) be the set of all real numbers, \({\mathbb {R}}^d\) be the real, ddimensional vector space, \({\mathbb {N}}\) be the set of all positive integers and \({\mathbb {N}}_0\) be that of all nonnegative integers. We denote \(\Omega :=\{x\ \in {\mathbb {R}}^3 :\ x=1\}\) as the twodimensional unit sphere and \({\mathbb {B}}:={\mathbb {B}}^3 :=\{x\ \in {\mathbb {R}}^3 :\ x< 1\}\) the open unit ball. Furthermore, we can represent \(\eta (\varphi ,t) \in \Omega \) as usual in spherical coordinates via the longitude \(\varphi \in [0,2\pi [\) and the polar distance \(t = \cos (\theta ) \in [1,1]\) for the latitude \(\theta \in [0,\pi ]\).
2.1 Spherical harmonics
Spherical harmonics (SHs) \(Y_{n,j}\) are (global) polynomials on \(\Omega \). They have a distinct degree \(n \in {\mathbb {N}}_0\) and order \(j=n,\ldots ,n\). In practice, we usually choose the fully normalized SHs
for \(\eta (\varphi ,t) \in \Omega \), an \(\mathrm {L}^2(\Omega )\)normalization \(p_{n,j}\) and the associated Legendre functions \(P_{n,j}\). An example is given in Fig. 4a. For further details, the reader is referred to, e. g., Freeden et al. (1998); Freeden and Schreiner (2009), Müller (1966).
2.2 Slepian functions
Slepian functions (SLs) are bandlimited, spatially optimally localized trial functions. Here, the spatial localization region shall be a spherical cap parametrized with \(c = \cos (\theta ) \in [1,1]\) and its centre \(A(\alpha ,\beta ,\gamma )\varepsilon ^3\in \Omega \). Note that \(\theta \) is the polar angle between vectors pointing at the apex and at an arbitrary point of the base and \(\alpha ,\ \gamma \in [0,2\pi [\) and \(\beta \in [0,\pi ]\) denote the Euler angles, \(A\in \mathrm {SO}(3)\) is a rotation matrix and \(\varepsilon ^3=(0,0,1)^\mathrm {T}\) is the North Pole. Then, we obtain for each localization region \(R :=R(c,A(\alpha ,\beta ,\gamma )\varepsilon ^3) \in {\overline{{\mathbb {B}}}}^4 :=\left\{ x \in {\mathbb {R}}^4\ :\ x\le 1\right\} \) a set of \(k=1,\ldots ,(L+1)^2\) Slepian functions defined by
where \(L \in {\mathbb {N}}_0\) is the bandlimit. The Fourier coefficients \(g^{(k,L)}_{l,m}(R),\ l=0,\ldots ,L,\ m=l,\ldots ,l,\) are obtained from the eigenvectors of the related algebraic eigenvalue problem of optimizing a bandlimited, in SHs expanded function in the region R. An example is given in Fig. 4a. Note that a commuting operator provides a stable computation of these values if the localization region is a spherical cap. For further details, the reader is referred to, e. g., Albertella et al. (1999), Grünbaum et al. (1982), Michel (2013), Seibert (2018), Simons and Dahlen (2006).
2.3 Radial basis functions and wavelets
As examples for nonbandlimited localized trial functions, we consider Abel–Poisson kernels \(K(x,\cdot )\) (APKs) and wavelets \(W(x,\cdot )\) (APWs) due to their closed form. That means, we have
and
for \(\eta \in \Omega \) and with the characteristic parameter \(x\in {\mathbb {B}}\). The kernels act as low pass filters, whereas the wavelets are band pass filters. Examples are given in Fig. 4b. For further details, the reader is referred to, e. g., Freeden et al. (1998), Freeden and Michel (2004), Freeden and Schreiner (1998, 2009), Freeden and Windheuser (1996), Michel (2013), Windheuser (1995).
2.4 A dictionary
A dictionary is a set of trial functions. Usually, we intentionally choose a redundant system such that it contains global as well as local functions and bandlimited as well as nonbandlimited functions, respectively. In this way, we are able to tailor an approximation in a heterogeneous basis which, thus, combines the best of several worlds. Then, the specific needs of the solution can in all probability be met more flexibly. In other words, the different trial functions are used to recover different aspects of the signal: global ones model major trends within the signal and local ones recover detail structures.
A dictionary can be specified as follows. We first define subsets for each type of trial function under investigation. Here, we set
for SHs,
for SLs,
for APKs and
for APWs. Such subsets are called trial function classes. The union of the defined trial function classes gives us the dictionary \({\mathcal {D}}\):
confer Michel (2022), Michel and Schneider (2020), Schneider (2020). In general, it is not necessary that \({\mathcal {D}}\) is finite. However, we emphasize an infinite dictionary as \({\mathcal {D}}^{\mathrm {Inf}}\). Note that, depending on the actual choice of \([N]_\mathrm {SH},\ [B_K]_\mathrm {APK}\) and \([B_W]_\mathrm {APW}\), \({\mathcal {D}}\) may be complete in certain function spaces like \(\mathrm {L}^2(\Omega )\), see e. g. Freeden et al. (1998), Michel (2013), Schneider (2020) and the previously mentioned references regarding the SHs, APKs and APWs. This naturally holds for \({\mathcal {D}}^{\mathrm {Inf}}\). In particular, each subsystem \([N]_{\mathrm {SH}},\ [B_K]_{\mathrm {APK}}\) and \([B_W]_{\mathrm {APW}}\) can be complete in \(\mathrm {L}^2(\Omega )\) – take \(N={\mathcal {N}}\) and dense subsets \(B_K,\ B_W \subset {\mathbb {B}}\). This is the mentioned and desired redundancy of the dictionary. The representation of an unknown function in an overcomplete dictionary \({\mathcal {D}}\) is, therefore, not unique. The algorithmic choice and the associated objective function determine which specific representation is obtained eventually. The details of this are explained in the following.
3 The LIPMP algorithms
We consider the downward continuation of the gravitational potential from satellite data to the Earth’s surface. That is, mathematically speaking, we consider the illposed inverse problem \(y={\mathcal {T}}_\daleth f\) with the data \(y \in {\mathbb {R}}^\ell ,\ y_i = V(\sigma \eta ^i)\), the satellite height \(\sigma >1\), grid points \(\eta ^i \in \Omega \) for \(i=1,\ldots , \ell \) at the Earth’s surface and the operator \({\mathcal {T}}_\daleth f :=(({\mathcal {T}}f)(\sigma \eta ^i))_{i=1,\ldots ,\ell }\) with the upward continuation operator \({\mathcal {T}}\) as in (1). Thus, \({\mathcal {T}}_\daleth \) is the corresponding evaluation operator of the upward continuation operator for an \(\ell \)dimensional discretized grid. Our task is to approximate the gravitational potential f at the Earth’s surface \(\Omega \).
The LIPMP algorithms introduce an addon to the established IPMP algorithms. The remaining routines coincide. Thus, we give a short overview of the latter’s strategy.
3.1 The underlying IPMP algorithms
Note that the IPMP algorithms discussed here have been developed by the Geomathematics Group Siegen in the past decade. Thus, this subsection is based on the previously mentioned literature from our group. We summarize here the main aspects to enable a general understanding of the methods necessary to describe the learning addon. To avoid repetition of previously published content, we refer to the respective literature on the RFMP and ROFMP, respectively, for implementation details.
Due to the (severe) illposedness of the inverse problem at hand (see Sect. 1), it is indispensable to consider a regularization for the downward continuation. The IPMP algorithms utilize a Tikhonov–Phillips regularization which is an established and wellperforming choice, e. g. for the downward continuation. The Tikhonov–Phillips regularization aims to solve the (approximative) regularized normal equation by minimizing the socalled Tikhonov–Phillips functional which consists of a data error and a penalty term.
In general, the minimization can be done via various approaches. We consider the IPMP algorithms for this here. Using an initial (guessed) approximation \(f_0\), e. g. \(f_0 \equiv 0\), the methods iteratively build a minimizer as a linear combination of weighted dictionary elements \(d_n \in {\mathcal {D}}\):
in the case of the RFMP algorithm and
in the case of the ROFMP algorithm. Recall that due to the aforementioned redundancy in the dictionary, the approximation is usually built in a heterogeneous basis due to a mixture of global as well as local and bandlimited as well as nonbandlimited, respectively, trial functions. Note that the superscript (N) in (4) refers to an update of the coefficients in each iteration step due to the orthogonality process: to improve the efficiency of the RFMP, a prefitting technique is included. This is a usual approach with matching pursuits, see, e.g. Vincent and Bengio (2002). Prefitting means that, in each iteration of the ROFMP, the previously chosen weights are updated to prevent the algorithm from picking the same trial function more than once. Note that, in practice, we usually consider the iterated (L)ROFMP algorithm which restarts the prefitting technique after a prescribed number of iterations for practical as well as theoretical reasons. For readability, in the sequel, we do without an additional subscript that would be necessary for this restart process.
The respective residuals are
for the RFMP algorithm and, in the case of the ROFMP algorithm,
where \({\mathcal {P}}_{{\mathcal {V}}_N^\perp }\) is the orthogonal projection onto the orthogonal complement
In both cases, we have \(R^0 = y{\mathcal {T}}_\daleth f_0\) which yields \(R^0 = y\) if \(f_0 \equiv 0\). In each iteration N, we choose the weights \(\alpha _{N+1} \in {\mathbb {R}}\) and \(\alpha _{N+1}^{(N+1)} \in {\mathbb {R}}\), respectively, as well as the basis function \(d_{N+1} \in {\mathcal {D}}\) such that we minimize, for \(\lambda >0\), the Tikhonov–Phillips functional
for the RFMP algorithm and, for the ROFMP algorithm,
where the projection is given by
with the projection coefficients \(\beta _n^{(N)}(d)\). These projection coefficients are given by
for \(n=1,\ldots ,N1.\) With these, we define
Note that, in contrast to (5), the expansion (6) need not be a projection. For the penalty term, we use the norm of the Sobolev space \({\mathcal {H}}_2(\Omega ) \subset \mathrm {L}^2(\Omega )\) which is the completion of the set of all squareintegrable functions for which
is finite, see e. g. Freeden et al. (1998), Michel (2013).
In practice, we utilize that the minimization above is equivalent to the maximization of
and
respectively, where we obtain the weights via
respectively. The IPMP algorithms most commonly terminate if the relative data error falls below a certain threshold like the noise level or if a certain number of iterations is reached.
The dictionary \({\mathcal {D}}\) is finite and manually (though influenced by experience) chosen in most of the previous publications on an IPMP algorithm as the use of an infinite dictionary \({\mathcal {D}}^{\mathrm {Inf}}\) was an open question at that time. Then, the maximizer of (7) and (8), respectively, is found by comparing the objective function for all dictionary elements.
Obviously, particularly the dictionary causes some difficulties for a wider use of these methods: first of all, a user who is unexperienced with the presented trial function might have a hard time choosing a wellworking dictionary. Moreover, though the choice can be improved by experience, it remains that usually a very large finite dictionary is used as it is most likely better working. However, its preprocessing causes high computational costs particularly with respect to runtime and storage demand. At last, such a dictionary is naturally prone to produce a biased approximation \(f_N\) and \(f_N^{(N)}\), respectively, though this is hard to quantify as we lack practical alternatives. In contrast, if we automatized the choice of (i.e. learn) a finite dictionary, we would remedy the repercussions of a lack of experience. For this automation, it is vital to implement the use of all possible functions from (some of) the trial function classes, i.e. the use of an infinite dictionary. At least, from a theoretical point of view, an infinite dictionary should also be less prone to bias than a finite one. However, it probably depends on the specific realization of using an infinite dictionary whether also the computational costs can be reduced. Note that the reduction in computational costs is a first step to applying the methods in competitive experiments. This again might, in future research, allow for a quantification of any existing bias.
To enable the use of an infinite dictionary, the LIPMP algorithms were developed. In particular, the LIPMP algorithms expand the IPMP methods to an infinite dictionary via a learning addon.
3.2 The learning addon
We start by considering how an infinite dictionary \({\mathcal {D}}^{\mathrm {Inf}}\) can be introduced into the routine of the IPMP algorithms. From this, the learnt dictionary follows naturally. If we run an IPMP algorithm with the learning addon, we say that we run an LIPMP (Learning IPMP) algorithm. Thus, note that we obtain the LRFMP and the LROFMP with this approach.
The infinite dictionary \({\mathcal {D}}^{\mathrm {Inf}}\) for the downward continuation of satellite data is defined by
with
for fixed \({\overline{N}},\ L \in {\mathbb {N}}_0\). The trial function class of the SH is still finite. This is due to their discrete nature of the characteristic parameters: the degree n and the order j. Nonetheless, the other trial function classes are truly infinite. This means, in the case of a SL, that the choice of the parameters of the centre \(\alpha ,\ \beta ,\ \gamma \) and the size c of the spherical cap are arbitrary, while their bandlimit is fixed and finite in analogy to the SH choice. In the case of APKs and APWs, the centres can be chosen from all points in the ball \({\mathbb {B}}\).
The main obstacle for using \({\mathcal {D}}^{\mathrm {Inf}}\) is the determination of the maximizer of (7) and (8), respectively, in the truly infinite trial function classes. For this, we introduce an additional optimization step into the routine. In particular, in this step, we determine a finite dictionary of (optimized) candidates \({\mathcal {C}}\) from the infinite \({\mathcal {D}}^{\mathrm {Inf}}\). Then, \({\mathcal {C}}\) acts as a finite dictionary and we can proceed in the current iteration just like in an iteration of the respective IPMP algorithm. Therefore, after termination (which correspondingly obeys to the same rules as in the IPMP algorithms), we obtain an approximation \(f_N\) and \(f_N^{(N)}\), respectively, in a best basis of optimized dictionary elements. The latter constitute the learnt (finite) dictionary which can be used in future runs of the IPMP algorithms.
Due to the different nature of the classes, we distinguish a strategy for the SH and the remaining three trial function classes. The approach to learn SHs has already been explained to a certain extent in Michel and Schneider (2020). For completeness and some additional insights, we summarize it here again. We propose to learn a maximal SH degree as well as particular SHs simultaneously. The trial function class \([{\widetilde{N}}]_\mathrm {SH}\) includes all SHs up to a degree \({\overline{N}}\) (see (9)). The value of \({\overline{N}}\) should be chosen much larger than it is sensible for the data y. Then, we again follow the previous approach and determine and compare the values of (7) and (8), respectively, for all SHs up to degree \({\overline{N}}\). Hence, after termination, we have a certain set of SHs that are used in the representation (3) and (4), respectively. Most likely, the algorithms will have determined a smaller, properly learnt maximal SH degree \(\nu \in {\mathbb {N}},\ \nu < {\overline{N}},\) on its own. Moreover, we obtain a set of distinct SHs used in the approximation and, thus, contained in the learnt dictionary. Note that this approach demands a finite starting dictionary \({\mathcal {D}}^\mathrm {s}\) in the LIPMP algorithms which contains (at least) \([{\widetilde{N}}]_\mathrm {SH}\).
For the remaining trial function classes of the SLs, APKs and APWs, we determine each candidate by solving nonlinear constraint optimization problems. Note that we use N for the iteration number next. The objective functions in the Nth iteration are \(\mathrm {IPMP}(d(z);N)\) where d(z) denotes a SL, APK or APW, respectively, and we have
The replacement character \(d(z)(\cdot )\) stands for either\(g^{(k,L)}(R,\cdot )\) with \(z=R(c,\alpha ,\beta ,\gamma ) \in {\mathbb {B}}^4\) or for \(K(x,\cdot )\) and \(W(x,\cdot )\) with \(z=x\in {\mathbb {B}}\). Hence, we maximize \(\mathrm {IPMP}(d(z);N)\) with respect to the characteristic parameter vector z of the trial functions SL, APK and APW, respectively.
As \(\mathrm {ROFMP}(\cdot ; \cdot )\) is not welldefined for previously chosen dictionary elements, we use \(\mathrm {ROFMP}_S(d(z);N)\) which is the product of \(\mathrm {ROFMP}(\cdot ;\cdot )\) from (8) and a spline to avoid neighbourhoods of critical basis functions from their respective trial function class. Let \(\varepsilon \) be the size of such a neighbourhood. To avoid a neighbourhood of a current z, such a spline is given by
where \(\tau = \Vert z  z^{(n)} \Vert ^2\) denotes the distance between the current value z and previously chosen values \(z^{(n)}\). In the Nth iteration, this yields
Note that the product on the righthand side only considers those \(d(z^{(n)})\) which are from the same trial function class as d(z). Then, for each truly infinite trial function class, we solve the maximization problem \(\mathrm {IPMP}(d(z);N) \rightarrow \max !\) in each iteration N.
Note that we have to model the corresponding constraints on the parameter vectors z as well. For the SLs, we have
For the APKs and APWs, we obtain
With these constraints and the objective function \(\mathrm {IPMP}(\cdot ;\cdot )\), we can solve each optimization problem with any established and suitable approach for nonlinear optimization. Note, however, that if a gradientbased method is chosen, we have to determine the gradients of \(\mathrm {IPMP}(\cdot ;\cdot )\) with respect to the characteristic parameters of the trial function under investigation. As we showed for the kernels and \(\mathrm {RFMP}(\cdot ;\cdot )\) in the first publication on the LIPMP algorithms, Michel and Schneider (2020), this can be done with the standard rules of differentiation. In particular, as we have seen in (7) and (8), the objective function of such a nonlinear approximation is given as a quotient of certain terms. Thus, the first step is to use the quotient rule. Then, we have to consider the derivatives of the single terms of the numerator and denominator. These terms fall into one of two categories: either we consider an \({\mathbb {R}}^\ell \) or an \({\mathcal {H}}_2(\Omega )\)inner product. In the first case, the interesting aspect is the derivative of \({\mathcal {T}}_\daleth d(z)\) with respect to z. For the APK and APW, this is straightforwardly obtained. For the SLs, we need a bit more effort as we need to derive the Wigner rotation matrices and the Gauß algorithm to obtain the derivative of the coefficients. When differentiating the \({\mathcal {H}}_2(\Omega )\)inner products, we have to take a closer look on the derivation of the projection coefficients and of terms of the form \(\langle d_n, d(z) \rangle _{{\mathcal {H}}_2(\Omega )}\) and \(\langle d(z), d(z) \rangle _{{\mathcal {H}}_2(\Omega )}\), where \(d_n\) is any previously chosen trial function and d(z) is the current trial function to be optimized. The former problem reduces again to the derivative of \(T_\daleth d(z)\) which is discussed in the first case. The derivation of \(\langle d_n, d(z) \rangle _{{\mathcal {H}}_2(\Omega )}\) has already been discussed for the SLs there as well as it only depends on the derivative of their coefficients. For the APKs and APWs, the derivation of \(\langle d_n, d(z) \rangle _{{\mathcal {H}}_2(\Omega )}\) effectively reduces to the terms
for \(x\in {\mathbb {B}},\ m\in \{1,2\},\ n\in {\mathbb {N}}_0\) and spherical harmonics \(Y_{n,j}\) as well as Legendre polynomials \(P_n\). Note, however, that this reduction includes a discussion of exchanging limits. With the wellknown spherical formulation of \(\nabla \), the terms are easily derived. Note that any possible singularity turned out to be welldefined under closer inspection. Last but not least, the derivation of \(\langle d(z), d(z) \rangle _{{\mathcal {H}}_2(\Omega )}\) is obvious as the exchange of limits has been discussed in the former case. For those interested in the details of this derivation, we fully published it in Schneider (2020). As it is quite lengthy, we abstain from repeating it here.
In practice, we solve the optimization problems using the NLOpt library, see Johnson (2019). In particular, as it is advised there, we solve them in a 2stepoptimization procedure. That means, we first determine a global solution (with a derivativefree method) and, then, refine this using a gradientbased local method. We include both solutions in the learnt set of candidates \({\mathcal {C}}\) just in case there are problems with a solver. If the global solver needs a sensible starting solution, we can include a selection of SLs, APKs or APWs in the starting dictionary as well. Also, this starting solution can then be included in \({\mathcal {C}}\). However, these should not have a major impact on the learnt dictionary.
In Michel and Schneider (2020), we proposed the use of certain additional features to guide the learning process. Though the features proved to be helpful in certain learning settings, from our experience, using a 2step optimization, i. e. solving the described optimization problems first globally and then locally, as well as using more diverse trial function classes remedies the urgent need of some rather manual features. Nonetheless, some of them like an iterative application of the learnt dictionary (i. e. allowing only the first N dictionary elements in the Nth iteration of the IPMP when the learnt dictionary is used) are in particular helpful when we have to balance the tradeoff between numerical accuracy and runtime. Thus, one should bear in mind that, in some cases, it can be helpful to guide the learning process. We explain in the description of the experiments which few additional features we use here.
In Fig. 1, a schematic representation of the learning method is given. The starting point is the red circle which represents the initialization of the experiment parameters. Then, the iteration process starts (’next iteration’). In each iteration, the methods steps into the trial function class under consideration (’find optimal APK/APW/SH/SL’) and solves the respective optimization problem, e.g. via a 2step procedure. The solutions are passed to ’learnt set of candidates’ which builds the finite dictionary of candidates. From there, the common routine of an iteration step of the respective IPMP algorithm is executed: ’choose a best candidate as \(d_{n+1}\)’, ’check termination criteria’ and, if the method does not terminate yet, ’updates of IPMP’. After the updates (e.g. of the residual and possibly the weights), the method steps into the next iteration. When it terminates, we obtained a ’learnt dictionary and approximation’ (green circle).
By construction, the LIPMP algorithms yield an approximate solution of the inverse problem as well as a learnt dictionary for this problem. Hence, they can be used as standalone approximation algorithms or as auxiliary algorithms to determine a finite dictionary automatically.
Moreover, they inherit the convergence results of the IPMP algorithms (see the literature mentioned above). In particular, for infinitely many iterations in the LRFMP algorithm, we have convergence of the approximation to the solution of the Tikhonov–Phillips regularized normal equation. This means: the determined approximation is stable with respect to noise in the data and it tends (theoretically) to the exact (unstable) solution, if the regularization parameter tends (together with the noise) to zero.
3.3 Discussion of approaches for GRACE data
In this paper, we propose the (L)IPMPs for modelling the mass transport on the Earth using, e.g. GRACE data. Naturally, we should discuss this approach with respect to the common modelling of such data via spherical harmonics or mascons.
Approximating GRACE data via spherical harmonics is the traditional approach but has shown to produce a NorthSouth striping in the gravitational field caused, e.g. by the mission design as well as processing strategies, see, e.g. Chen et al. (2021). In our research, we have also seen this with Level 2 data. That is why we include a very basic destriping ansatz (see Sect. 4.1) which is sufficient here. Nevertheless, there exist various methods (see, for instance, the references given below or in Chen et al. (2021)) which take care of the NorthSouth stripes in a much more sophisticated manner. However, these methods yielded in signal loss, see Chen et al. (2021), Watkins et al. (2015).
Watkins et al. (2015) show that constrained mascons yield a higher resolution (thus, less signal loss) without the need for postprocessing destriping methods. Other mascon approaches, see, e.g. Luthcke et al. (2013), Save et al. (2016), also supersede the spherical harmonics approach in that respect. Though the details may vary with respect to a specific mascon approach, the general principle is as follows: the Earth’s surface is paved with specific patches, for instance spherical caps or hexagons. For each of these patches, a mass value is determined. Hence, a mascon is some kind of finite element on the Earth’s surface. Along with noise constraints and regularization, they are used as basis functions to approximate the gravitational potential via GRACE Level 1b measurements (e.g. the kband range rate). Note that, see Chen et al. (2021), the mascon approach can be transformed into a spherical harmonic ansatz which enables a spectral representation of the results.
The question is now how the (L)IPMP approach fits into these concepts. We include spherical harmonics in the dictionary of an (L)IPMP. We do not use finite elements here (although it is in principle possible). However, we utilize RBFs and RBWs which are local functions as well. Thus, one could say that, with the (L)IPMPs, we combine the general ideas of both approaches into one algorithm.
There are two things to be noted: first, because the RBFs and RBWs can only be represented in a spherical harmonic series, we obviously cannot—without loss of accuracy—transform our approximation into a pure and finite spherical harmonic one. However, from the mathematical point of view there is also no necessity to do this. Moreover, previous studies (Michel and Telschow 2014, 2016; Telschow 2014) showed that, dependent on the experiment setting, distinguishing the IPMP approximation into the different trial function classes yields a multiscale representation. Second, Watkins et al. (2015) discussed the size of the mascon patches in use and concluded that the chosen size was a compromise between regions of low and high latitude (i.e. the equatorial and the polar area) and that future research should investigate mascons with flexible sizes of—in their case—spherical caps. This is interesting because, when comparing the mascon approach and the LIPMPs, the size of the spherical cap resembles the scale of the local dictionary elements used. However, the LIPMPs include by construction all scales of RBFs and RBWs. That means, the LIPMPs do not necessitate similar compromises in the basis functions but already implement what appeared sensible in the NASA research.
Thus, we assume that our method could in all probability be competitive with the established spherical harmonic as well as mascon approach if we use Level 1b data as well.
4 Numerical results
We first summarize the general setting of the experiments. Then, we consider the performance of the LIPMP methods when using different noise levels. Next, we show the results of the LIPMP algorithms as standalone approximation methods. At last, we show results for comparing a manually chosen and a learnt dictionary in the IPMP algorithms. Note that our test scenarios here shall serve as proofs of concept in the sense that the main features of the addon are demonstrated. In a continuing project, we investigate the behaviour for more realistic data and for other applications.
4.1 Experiment setting in general
We use the unit sphere as an approximate relative surface of the Earth. Then, we can use (1) for the evaluation of the operator. Of course, this is a simplification from real life. However, it suffices for our purposes. Moreover, it allows us to use the mentioned singular value decomposition. Using other geometries (e.g. an ellipsoid or the geoid) would lead to enormous numerical burden with respect to the evaluation of the operator as well as the regularization terms. As data, we use the EGM2008, see e. g. Pavlis et al. (2012), as well as GRACE data from May 2008 as expansion coefficients in (1). In both cases, we use the degrees equal or greater than 3 up to the highest given one (i.e. degree 2190 and order 2159 for the EGM2008 and degree and order 60 for GRACE). We do without the degree 2 because then the signal contains visible local aspects instead of representing majorly only the Earth’s ellipticity. Further, we evaluate the respective expansion on a regularly distributed Reuter grid of 12684 points. For an example of a Reuter grid, see Fig. 4d.
The question arises how the used resolution of the EGM2008 and the GRACE data fits to the number of data points. With respect to (2), we see that a spherical harmonic \(Y_{n,j}\) has 2j extrema at the equator. This means we have maximally 120 extrema for the GRACE data and 4318, respectively, for the EGM2008 data. This resembles a resolution of \(\approx 360\) km and \(\approx 20\) km, respectively, at a satellite height of 500 km. From the definition of the Reuter grid in Michel (2013), we see that, for an even number \(N \in {\mathbb {N}}\), we obtain 2N grid points at the equator (i.e. \(\theta = \pi /2\)). In our case, we have \(N=100\) which leads to 12684 grid points distributed over the sphere. Thus, at the equator, we have a resolution of 200 grid points, or \(\approx 215\) km. Obviously, we undersample the EGM2008 and oversample the GRACE data (in particular, as we also destripe the GRACE data). We aim to implement the (L)IPMPs in future research more efficiently such that we can increase our resolution of the data in use. For the following tests, it was not yet possible to increase the number of data points adequately. However, because we present an over and an undersampled problem, the results are sufficient for the proof of concept we intend to show.
The Driscoll–Healy grid was used for obtaining the approximation error. At the Earth’s surface, the EGM2008 has a resolution of \(\approx 9\) km while the GRACE data attain \(\approx 334\) km. The Driscoll–Healy grid we use has a resolution of \(\approx 111\) km. Thus, we gain more information of the solution when using the Driscoll–Healy grid. However, note that we need to sample more points when looking at the approximation error in order to evaluate whether our approximation suits the solution well in between data points. Moreover, note that a regular grid does not introduce an additional challenge on the methods as it avoids critical data gaps. However, in Sect. 4.3, we also discuss an experiment with an irregular grid. For the IPMPs, irregular grids were already discussed in Michel and Telschow (2014, 2016), Telschow (2014). Thus, the methods themselves can be compared more easily.
For the relative root mean square error (RMSE)
we utilize grid points \({\tilde{\eta }}^i\) of an equiangular Driscoll–Healy grid, see e. g. Driscoll and Healy (1994), Michel (2013), of 65341 points where f is the exact solution (presumed according to the EGM2008 or GRACE data) and \(f_N\) is our approximation after N iterations. For an example of a Driscoll–Healy grid, see Fig. 4d. Note that, for a meaningful analysis of the absolute error, we have to use a different grid with much more grid points than the data are given at. We choose the Driscoll–Healy grid because it is obviously very different in its distribution. Besides the relative RMSE, we also consider the relative data error \(\Vert R^N\Vert _{{\mathbb {R}}^\ell }/\Vert R^0\Vert _{{\mathbb {R}}^\ell }\) and the absolute approximation error.
Moreover, for the GRACE data, we utilize the arithmetic mean of the Level 2 Release 05 provided by the GFZ, JPL and UTCSR as it was advised in Sakumura et al. (2014). Further, we smooth the data with a Cubic Polynomial Spline of order 5, see, e. g. Schreiner (1996), Freeden et al. (1998), Fengler et al. (2007), to remove the NorthSouth striping. We are aware that there exist many and more sophisticated methods to remove satellite striping, see, e.g. Davis et al. (2008), Klees et al. (2008), Kusche (2007). However, here, we aim to show the competitiveness of the methods and do not strive for discovering new geoscientific phenomena. Thus, this very basic destriping approach suffices for our needs.
If not stated otherwise, the data are modelled on a 500 km satellite height and are perturbed with \(5\%\) Gaussian noise, such that we have perturbed data \(y^\delta \) given by
for the unperturbed data \(y_i\) and a Gaussian distributed random number \(\varepsilon _i\). Certainly, for a more realistic scenario, one could also use specific GRACErelated noise instead.
The algorithms terminate if the relative data error falls below the noise level, increases above 2 or if 1000 iterations are reached. The latter two criteria are necessary, because we tested different regularization parameters and some of them turned out to be inappropriate and yielded a numerically diverging sequence. We implemented the iterated (L)ROFMP algorithm and restarted the prefitting procedure after 100 iterations. Amongst the tested regularization parameters, we chose that which minimized the relative RMSE if the relative data error reached the noise level at termination.
The optimization problems are solved by the ORIG_DIRECT_L (globally) and the SLSQP (locally) algorithms from the NLOpt, see Johnson (2019). As it is advised, we narrow the constraints by \(10^{8}\). Due to the regularization, the narrowing can be relatively small. Further, we set some termination criteria for the optimization procedures. We found the following values to be useful in practice: we limit the absolute tolerance for the change of the objective function between two successive iterates as well as the tolerance between the iterates themselves to \(10^{8}\). Moreover, we allow 5000 function evaluations and 200 seconds for each optimization.
With respect to the SLs, APKs and APWs, we forbid to choose two trial functions of the same type which are as close as \(\varepsilon = 5 \times 10^{4}\) or closer in one (L)ROFMP step. The distance between two trial functions is obtained as the distance between their characteristic parameters. In the case of the APKs and APWs, we compute the Euclidean norm of \(xx^{(n)}\). In the case of SLs, we use \(\Vert zz^{(n)}\Vert ^2 = (cc^{(n)}) + \arccos ({\overline{z}}\cdot {\overline{z}}^{(n)})\), where \({\overline{z}} = (\alpha ,\beta ,\gamma )\) and \({\overline{z}}^{(n)} = (\alpha ^{(n)},\beta ^{(n)},\gamma ^{(n)})\). From our experience, a value smaller than \(5 \times 10^{4}\) appears to be too small to prevent illdefinedness in the objective function. Further, we use the same regularization parameter for learning and applying the learnt dictionary. The regularization parameter is constant unless anything different is stated. We apply the dictionary learning iteratively (confer Michel and Schneider 2020). That means, in the Nth step of the IPMP only the first Nth learnt dictionary elements can be chosen.
As the starting dictionary, we use
with a regularly distributed Reuter grid \(X^\mathrm {s}\) of 123 grid points. Thus, the starting dictionary contains 13903 trial functions. This allows the experiments of the LIPMP algorithms to run on an HPC node of 48 GB RAM with 12 CPUs.
4.2 Results for different noise levels
In the described setting, we run the LIPMP algorithms with the EGM2008 data and different noise levels to investigate the influence of noise on the results before we analyse the learning methods as standalone approximation methods as well as compare the results from a learnt and a manually chosen dictionary in the next sections. In particular, we tested no noise, \(1\%\), \(2\%\), \(3\%\), \(5\%\) and \(10\%\) of Gaussian noise for the downward continuation from 500km using EGM2008 data. The results are given in Table 1. There we give the determined regularization parameter, the completed iterations as well as the relative RMSE and data error for the mentioned noise levels for both the LRFMP and the LROFMP. We computed new random numbers (confer (10)) but ran each algorithm only once for each noise level. Of course, we realize that a more profound approach of evaluating the methods’ behaviour for different noise levels would be to create, for each noise level, a sufficiently large number of perturbations and run each algorithm for each perturbation. Note that each run of an LIPMP would include a search for a regularization parameter as well. However, we abstain from this due to the associated high demand on calculation time. In view of our aim to further increase the efficiency of the (L)IPMPs, such computationally complex validations can hopefully become feasible in the near future. Note that, nonetheless, by generating different random numbers for each noise level, we also contain a little bit variation in the used noise here as well.
In Table 1, we see that for decreasing noise

the regularization parameter decreases,

the number of completed iterations increases and

—most importantly—the relative RMSE decreases,

though it cannot reach similar values as if no satellite height would be used (see Sect. 4.3).
Note that, for noise equal to or higher than \(2\%\), we present here the results where the methods terminate when the noise level is reached and—amongst those—the relative RMSE is lowest. In the experiments with less noise, the methods never reached the noise level for the tested regularization parameters before 1000 iterations. Hence, we present the results with the lowest RMSE despite that the noise level is not reached (yet). Furthermore, note that the number of completed iterations is generally lower for the LROFMP. This is in accordance with previous publications on the nonlearning ROFMP and, thus, can also be expected for the learning variant. However, this suggests that, in the case of \(2\%\) noise, the result of the LRFMP could be improved if we allowed more than 1000 iterations. Similarly, this could also be assumed for \(1\%\) noise as the number of completed iterations increases with decreasing noise. However, again for efficiency reasons as well as for a better comparability with Sects. 4.3 and 4.4, we stick with the chosen maximal number of 1000 iterations here.
All in all, the most important result from these experiment is that the relative RMSE decreases with decreasing noise level. This shows that the performance of the LIPMP algorithms is influenced by noise only in the expectable way. Further, we also see that the influence of the satellite height appears to be more significant to remaining errors than the noise level (compare with the results of the pure approximation in Sect. 4.3). Both of these influences are wellunderstood and minimized with a regularization but cannot sensibly be abandoned. With this in mind, we next analyse the approximations of the learning methods as well as the learnt dictionary.
4.3 The LIPMP algorithms as standalone approximation methods
By construction, the learning algorithms themselves incorporate the maximization of the same objective function which also occurs in the IPMP. Thus, they should be usable as standalone approximation algorithms. We investigate this next: we consider the approximation of EGM2008based surface data as well as the downward continuation of regularly and irregularly distributed EGM2008based satellite data by the LRFMP as well as the LROFMP algorithm. Moreover, we verify the downward continuation of contrived data by the LROFMP algorithm. Due to the orthogonality procedure, we assume that the LROFMP algorithm is more suited to distinguish the contrived data.
The irregularly distributed grid has already been used in Michel and Telschow (2014) and simulates a denser data distribution on the continents. It is given in Fig. 4d and includes 6968 grid points. The contrived data consist of 3 SHs and APKs, respectively:
where the notation \(x(r,\varphi ,\theta )\) with the radius r, the longitude \(\varphi \) and the latitude \(\theta \) is used. \({\widetilde{K}}\) stands for the \(\mathrm {L}^2(\Omega )\)normalized APKs. The data are again perturbed by \(5\%\) Gaussian noise. Correspondingly, the starting dictionary (for the test with contrived data only) is given as
where \(X^\mathrm {s}\) is a regularly distributed Reuter grid of 6 points. Then, the APKs included in the contrived data are not contained in the starting dictionary. We allow a maximum of 100 iterations here because the data consist of only six trial functions.
In Table 2, we give an overview of the results. The type of experiment is abbreviated: “approximation” stands for the experiment where no satellite height is included, “downward continuation, (ir)regular grid” stands for the use of 500 km satellite height and an (ir)regularly distributed grid and “contrived data” is selfexplanatory. Further, we state the regularization parameter, the completed iterations, the respective maximally learnt SH degrees and the relative RMSEs as well as data errors. In Figs. 2a to 3a, we see the absolute approximation errors obtained in the different experiments. Figure 3b shows the given and chosen APKs of the experiment with contrived data.
Generally, the remaining errors are situated in regions where we expect them to be. In the case of the approximation of surface data, Fig. 2a, and the downward continuation of regularly distributed satellite data, Fig. 2b, c, respectively, we find deviations to the solution particularly in the Andean region, the Himalayas and the Pacific Ring of Fire. This is reasonable as the gravitational potential contains much more local structure there which is highly influenced by the noise and the satellite height. Note that we included the same results in Figure 2b, c twice: the lefthand side of Fig. 2b can be compared to Fig. 5a and the righthand side to Fig. 5b, i.e. the approximation of the LIPMP can be compared with the respective approximations of the IPMP with a manually chosen or a learnt dictionary. Figure 2c compares the solution of the LRFMP and the LROFMP as standalone approximation algorithms with each other. Further, we find that, in the case of approximating surface data, i. e. using potential data which is not damped due to satellite height, the methods obtain much better relative RMSEs while still counting more data errors than in the case of the downward continuation, see Table 2. The latter is clear, since more local structures are visible on the surface and appear relatively larger with respect to the noisy data. Obviously, they also need more iterations in this case. Again, as the data contain more information in this case, this behaviour can be expected. Nonetheless, these experiments show that the LIPMP algorithms can, indeed, be used as standalone approximation algorithms.
Similar results are obtained for the downward continuation of irregularly distributed satellite data, Fig. 3a. However, we notice that some additional errors occur here in comparison with the results of the regularly distributed data, Fig. 2b. In particular, these are located mostly in areas with larger data gaps, see, e. g., the North Atlantic and the Indian Ocean. This points out that the LIPMP algorithms are able to distinguish smoother and rougher regions on its own and, thus, prevent local gaps to have a global influence.
At last, we consider the test with the contrived data. First of all, we note that the LROFMP algorithm is able to approximate this data as well, see Table 2. Moreover, we saw that the SHs are obtained exactly. Further, the APKs are either clustered around the solutions or have a very small coefficient, see Fig. 3b. Note that those few wrongly chosen APKs may be caused by the present noise. The SHs are easier to distinguish, most likely because of their orthogonality. Hence, we see that the LROFMP algorithm is also able to distinguish global trends and local anomalies.
4.4 Learning a dictionary
At last, we consider the competitiveness of the learnt dictionary. For this, we extend the general setting from Sect. 4.1 in the following way. We compare the learnt dictionary with a manually chosen dictionary which is similar to those in previous publications, see e. g. Telschow (2014):
such that
with a regularly distributed Reuter grid \(X^\mathrm {m}\) of 4551 grid points and
All in all, the manually chosen dictionary contains 95152 trial functions. We undertake this comparison because it is most sensible as we have explained in Michel and Schneider (2020): a comparison with the best dictionary of a sensibly large set of random dictionaries cannot seriously be put into practice due to high memory demand and a long runtime. Note that, in some of the literature on the IPMP algorithms mentioned before, the approaches have been compared to traditional methods like splines which is why we abstain from this here. Further note that, due to the size of the manually chosen dictionary, the respective tests ran on a node with 512 GB RAM and 32 CPUs which is a much higher memory demand than the LIPMP algorithms had.
In Table 3, we see a summary of the results of the experiments. We compare the IPMP algorithms with the manually chosen dictionary (*), the learnt dictionary (**), a learnt dictionary when using the nonstationary regularization parameter \(\lambda _N = \lambda _0 \cdot \Vert y\Vert _{\mathbb {R}}^\ell /N\) for the iteration \(N\in {\mathbb {N}}\) (“nonstationary learnt”; ***), and a learnt dictionary where only the SHs, APKs and APWs were considered (“learntwithoutSlepianfunctions”; ****). We give the regularization parameter, the size of the dictionary, the number of completed iterations, the maximal SH degree included in the dictionary, the relative data error and RMSE at termination and the needed CPUruntime in hours. Note that the size of the learnt dictionaries is given as a “less or equal than” value since elements may be contained multiple times. Moreover, we identify the following aspects:

Due to our termination criteria, the relative data error was at the noise level in all cases.

In all experiments, the relative RMSE is about the same size. In comparison with Michel and Schneider (2020), we conclude that the IPMP algorithms produce better results if more trial function classes are available. Further, the learnt dictionary yields similar results. In Figs. 5 and 6, we also see that, in all cases, the remaining errors lie within regions of higher local structures, i. e. the Andean region, the Himalayas and the Pacific Ring of Fire in the case of EGM2008 data as well as the Amazon basin in the case of the GRACE data. These detail structures cannot be represented because of the noise and the satellite height (confer Sect. 4.2).

The nonstationary learnt as well as the learntwithoutSlepianfunctions dictionary produce results which are similar to the others regarding the relative RMSE such that these settings could be explored in future research. However, to quantify the influence of the nonstationary regularization parameter on the approximation, the number of tests here is too low.

The learnt dictionary is less than \(1\%\) of the size of the manually chosen dictionary.

The maximal SH degree of the learnt dictionaries is a truly learnt degree.

For the LRFMP algorithm, the CPUtime needed for learning and applying the learnt dictionary is lower or similar than applying the manually chosen dictionary. In particular, without the Slepian functions, the needed CPUtime is much smaller.

For the LROFMP algorithm, there exist settings which have a smaller runtime as well. In particular, for the GRACE data, this is always the case. Similarly, the learntwithoutSlepianfunctions dictionary is also learnt in a much shorter time for the EGM2008 data. However, there are two cases for the EGM2008 data where the runtime is higher than for the manually chosen dictionary. This could be caused by the nonstationary regularization parameter, the orthogonality procedure itself and / or the use of the Slepian functions.
It appears that, by using a learnt dictionary, the ROFMP algorithm is not that much superior to the RFMP algorithm as it seemed in previous publications. Thus, if we need to learn a dictionary (despite the LIPMP algorithms being sufficient algorithms themselves), we would advise to learn a dictionary for the RFMP in the same setting as it should be applied to (experiment (**) in Table 3) since it yields solid and comparable results. Note that the RFMP is overall easier to implement and needs less runtime. Further, the experiments (***) and (****) in Table 3 yield starting points to improve the result or the runtime and, thus, should be borne in mind as well.
5 Conclusion and outlook
The downward continuation of the gravitational potential from satellite data is important for many reasons such as monitoring the climate change. One approach for this is presented by the IPMP algorithms. They seek iteratively the minimizer of the Tikhonov–Phillips functional and, in this way, obtain a weighted linear combination in dictionary elements as an approximation. For practical use, the IPMP algorithms had to be improved regarding the automation of the dictionary choice, the runtime and the storage demand. For this reason, the novel LIPMP algorithms include an addon such that an infinite dictionary can be used. Further, a finite dictionary can be learnt as well.
Our numerical results in this paper are meant to be a proof of concept. They show that both the nonlearning IPMP with a learnt or a manually chosen dictionary as well as the LIPMP algorithms yield good results. However, the LIPMP algorithms have additional advantages in terms of CPUruntime, storage demand, sparsity and the consequences of the number of different types of trial functions in use. Hence, we suggest that the IPMP algorithms with a manually chosen dictionary may be used if those aspects are not critical because these methods are easier to implement. Otherwise, we advise to include the addon, i. e. use the LIPMP algorithms either for obtaining a learnt dictionary or as a standalone approach. However, obviously, the former is probably redundant in the light of the latter. In particular, we prefer the LRFMP algorithm as it was presented here because it has a lower runtime than the LROFMP algorithm and is easier to implement.
Here, we showed only results of a learnt dictionary that is applied to the same data again. After all, an even more interesting use of a learnt dictionary is probably given if we have many similar data as for instance from longrunning satellite missions like GRACE. Applying a learnt dictionary to unseen data is basically possible with our approach as well. However, this most likely demands very lengthy tests for suitable regularization parameters. In the light of the LIPMP algorithms as standalone approximation algorithms, it does not seem sensible to put that much effort in learning a finite dictionary for future use on unseen data. This is except, maybe, if computation time is restricted when unseen data arrives but is much less restricted beforehand.
In an ongoing project, we are working on the use of \(5\times 10^6\) data points, true satellite tracks, observablerelated noise and the downward continuation of the gravitational force (i.e. the gradient of the potential) in the LIPMP algorithms which could show the competitiveness of the algorithms in a more realistic setting. If this actually includes the use of data that is not obtained from SHs, we then also might be able to quantify whether the LIPMPs approximations contain any bias. Moreover, we are interested in applying the algorithms to other geoscientific tasks, e. g., traveltime tomography from seismology. As both of these current research aspects inevitably work with big data, we are able to tackle them today only due to the significant improvements regarding storage demand and runtime made by the LIPMP algorithms.
Data availability
Publicly available data of GRACE and the EGM2008 were used, see Devaraju and Sneeuw (2017), Flechtner et al. (2014a, 2014b), NASA Jet Propulsion Laboratory (2020), Pavlis et al. (2012), Schmidt et al. (2008), Tapley et al. (2004), The University of Texas at Austin, Centre for Space Research (2020). The particular datasets generated and analysed during the current study are available from the authors on reasonable request. The code generated and used during the current study is also available from the authors on reasonable request.
References
Aharon M, Elad M, Bruckstein A (2006) KSVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322
Albertella A, Sansó F, Sneeuw N (1999) Bandlimited functions on a bounded spherical domain: the Slepian problem on the sphere. J Geodesy 73(9):436–447
Baur O (2014) Gravity field of planetary bodies. In: Grafarend E (ed) Encyclopedia of geodesy. Springer, Cham, pp 1–6
Bruckstein AM, Donoho DL, Elad M (2009) From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Rev 51(1):34–81
Chen Q, Shen Y, Kusche J, Chen W, Chen T, Zhang X (2021) Highresolution GRACE monthly spherical harmonic solutions. J Geophys Res Solid Earth 126(1):e2019JB018892
Davis JL, Tamisiea ME, Elósegui P, Mitrovica JX, Hill EM (2008) A statistical filtering approach for Gravity Recovery and Climate Experiment (GRACE) gravity data. J Geophys Res Solid Earth 113(B4)
Devaraju B, Sneeuw N (2017) The polar form of the spherical harmonic spectrum: implications for filtering GRACE data. J Geodesy 91(12):1475–1489
Driscoll JR, Healy DM (1994) Computing Fourier transforms and convolutions on the 2sphere. Adv Appl Math 15(2):202–250
Eicker A (2008) Gravity field refinement by radial basis functions from insitu satellite data. Ph.D. thesis, University of Bonn. https://bonndoc.ulb.unibonn.de/xmlui/handle/20.500.11811/3245. Accessed 10 June 2021
Engan K, Aase SO, Husøy JH (1999a) Method of optimal directions for frame design. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), number 5, pp 2443–2446
Engan K, Rao BD, KreutzDelgado K (1999b) Frame design using FOCUSS with method of optimal directions (MOD). In: Proceedings of the Norwegian signal processing symposium, pp 65–69
Engl HW, Hanke M, Neubauer A (1996) Regularization of inverse problems. Mathematics and its applications. Kluwer Academic Publishers, Dordrecht
Fengler MJ, Freeden W, Kohlhaas A, Michel V, Peters T (2007) Wavelet modeling of regional and temporal variations of the Earth’s gravitational potential observed by GRACE. J Geodesy 81(1):5–15
Fischer D, Michel V (2013a) Automatic bestbasis selection for geophysical tomographic inverse problems. Geophys J Int 193(3):1291–1299
Fischer D, Michel V (2013b) Inverting GRACE gravity data for local climate effects. J Geodet Sci 3(3):151–162
Flechtner F, Morton P, Watkins M, Webb F (2014a) Status of the GRACE followon mission. In: Marti U (ed) Gravity, Geoid and Height Systems. International Association of Geodesy Symposia, vol 141. Springer, Cham, pp 117–121
Flechtner F, Sneeuw N, Schuh WD (eds) (2014b) Observation of the system earth from space—champ, grace, GoCE and future missions. Springer, Berlin
Flechtner F, Landerer F, Save H, Dahle C, Bettadbur S, Watkins M, Webb F (2020) NASA and GFZ GRACE Followon mission: status, science, advances. https://doi.org/10.5194/egusphereegu20203077. Accessed 29 May 2020
Freeden W, Michel V (2004) Multiscale potential theory with applications to geoscience. Birkhäuser, Boston
Freeden W, Schreiner M (1998) Orthogonal and nonorthogonal multiresolution analysis, scale discrete and exact fully discrete wavelet transform on the sphere. Constr Approx 14(4):493–515
Freeden W, Schreiner M (2009) Spherical functions of mathematical geosciences—a scalar, vectorial, and tensorial setup. Springer, Berlin
Freeden W, Windheuser U (1996) Spherical wavelet transform and its discretization. Adv Comput Math 5(1):51–94
Freeden W, Gervens T, Schreiner M (1998) Constructive approximation on the sphere—with applications to geomathematics. Oxford University Press, Oxford
Grünbaum FA, Longhi L, Perlstadt M (1982) Differential operators commuting with finite convolution integral operators: some nonAbelian examples. SIAM J Numer Anal 42(5):941–955
IPCC (2014) Climate Change 2014: synthesis report. Contribution of Working Groups I, II and III to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change. IPCC, Geneva, Switzerland
Johnson SG (2019) The NLopt nonlinearoptimization package. http://github.com/stevengj/nlopt. and https://nlopt.readthedocs.io/en/latest/. Accessed 2 April 2020
Klees R, Revtova EA, Gunter BC, Ditmar P, Oudman E, Winsemius HC, Savenjie HHG (2008) The design of an optimal filter for monthly GRACE gravity models. Geophys J Int 175(2):417–432
Kusche J (2007) Approximate decorrelation and nonisotropic smoothing of timevariable GRACEtype gravity field models. J Geodesy 81(11):733–749
Kusche J, Klemann V, Bosch W (2012) Mass distribution and mass transport in the Earth system. J Geodyn 59–60:1–8
Lieb V (2017) Enhanced regional gravity field modeling from the combination of real data via MRR. Ph.D. thesis, Technical University of Munich. https://dgk.badw.de/fileadmin/user_upload/Files/DGK/docs/c795.pdf. Accessed 10 June 2021
Lin Y, Yu J, Cai J, Sneeuw N, Li F (2018) Spatiotemporal analysis of wetland changes using a kernel extreme learning machine approach. Remote Sens 10(7):1129
Louis AK (1989) Inverse und schlecht gestellte Probleme. Teubner, Stuttgart
Luthcke SB, Sabaka TJ, Loomis BD, Arendt AA, McCarthy JJ, Camp J (2013) Antarctica, Greenland and Gulf of Alaska landice evolution from an iterated GRACE global mascon solution. J Glaciol 59(216):613–631
Michel V (2005) Regularized waveletbased multiresolution recovery of the harmonic mass density distribution from data of the Earth’s gravitational field at satellite height. Inverse Prob 21(3):997–1025
Michel V (2013) Lectures on constructive approximation—Fourier, spline, and wavelet methods on the real line, the sphere, and the ball. Birkhäuser, New York
Michel V (2015) RFMP—an iterative best basis algorithm for inverse problems in the geosciences. In: Freeden W, Nashed MZ, Sonar T (eds) Handbook of geomathematics, 2nd edn. Springer, Berlin, pp 2121–2147
Michel V (2022) Geomathematics—modelling and solving mathematical problems in geodesy and geophysics. Cambridge University Press (in production)
Michel V, Orzlowski S (2017) On the convergence theorem for the Regularized Functional Matching Pursuit (RFMP) algorithm. GEM Int J Geomath 8(2):183–190
Michel V, Schneider N (2020) A first approach to learning a best basis for gravitational field modelling. GEM Int J Geomath 11: Article 9. https://doi.org/10.1007/s1313702001435
Michel V, Telschow R (2014) A nonlinear approximation method on the sphere. GEM Int J Geomath 5(2):195–224
Michel V, Telschow R (2016) The regularized orthogonal functional matching pursuit for illposed inverse problems. SIAM J Numer Anal 54(1):262–287
Moritz H (2010) Classical physical geodesy. In: Freeden W, Nashed MZ, Sonar T (eds) Handbook of geomathematics, 2nd edn. Springer, Berlin, pp 253–289
Müller C (1966) Spherical harmonics. Springer, Berlin
Naeimi M (2013) Inversion of satellite gravity data using spherical radial base functions. Ph.D. thesis, University of Hannover. https://dgk.badw.de/fileadmin/user_upload/Files/DGK/docs/c711.pdf. Accessed 10 June 2021
NASA (2020) Global climate change: scientific consensus. https://climate.nasa.gov/scientificconsensus/. Accessed 3 March 2020
NASA Jet Propulsion Laboratory (2020) GRACE Tellus. https://grace.jpl.nasa.gov/. Accessed 2 April 2020
Novák P, Kern M, Schwarz KP (2001) Numerical studies on the harmonic downward continuation of bandlimited airborne gravity. Stud Geophys Geod 45:327–345
Pavlis NK, Holmes SA, Kenyon SC, Factor JK (2012) The development and evaluation of the Earth Gravitational Model 2008 (EGM2008). J Geophys Res Solid Earth 117(B4). Correction in Volume 118, Issue 5
Prünte L (2008) Learning: waveletdictionaries and continuous dictionaries. Ph.D. thesis, University of Bremen. https://elib.suub.unibremen.de/diss/docs/00011034.pdf. Accessed 3 March 2020
Rieder A (2003) Keine Probleme mit inversen Problemen. Eine Einführung in ihre stabile Lösung. Vieweg, Wiesbaden
Rubinstein R, Bruckstein AM, Elad M (2010) Dictionaries for sparse representation modeling. Proc IEEE 98(6):1045–1057
Sakumura C, Bettadpur S, Bruinsma S (2014) Ensemble prediction and intercomparison analysis of GRACE timevariable gravity field models. Geophys Res Lett 41(5):1389–1397
Save H, Bettadpur S, Tapley BD (2016) Highresolution CSR GRACE RL05 mascons. J Geophys Res Solid Earth 121(10):7547–7569
Schmidt R, Flechtner F, Meyer U, Neumayer KH, Dahle C, König R, Kusche J (2008) Hydrological signals observed by the GRACE satellites. Surv Geophys 29(4–5):319–334
Schneider N (2020) Learning dictionaries for inverse problems on the sphere. Ph.D. thesis, University of Siegen, Geomathematics Group. https://doi.org/10.25819/ubsi/5431. Accessed 11 November 2020
Schneider N, Michel V (2020) Dictionary learning algorithms for the downward continuation of the gravitational potential. Presentation at the EGU2020: sharing geoscience online. https://doi.org/10.5194/egusphereegu20202367. Accessed 29 May 2020
Schreiner M (1996) A pyramid scheme for spherical wavelets. AGTM Report (170). Geomathematics Group, University of Kaiserslautern
Seibert K (2018) Spinweighted spherical harmonics and their application for the construction of tensor Slepian functions on the spherical cap. Ph.D. thesis, University of Siegen, Geomathematics Group, universi – Universitätsverlag Siegen, Siegen. https://dspace.ub.unisiegen.de/handle/ubsi/1421. Accessed 9 August 2021
Simons FJ, Dahlen FA (2006) Spherical Slepian functions and the polar gap in geodesy. Geophys J Int 166(3):1039–1061
Sneeuw N, Saemian P (2019) Nextgeneration gravity missions for drought monitoring. ESA Living Planet Symposium, Milan, Italy
Tapley BD, Bettadpur S, Watkins M, Reigber C (2004) The gravity recovery and climate experiment: mission overview and early results. Geophys Res Lett 31(9):L09607. https://doi.org/10.1029/2004GL019920
Telschow R (2014) an orthogonal matching pursuit for the regularization of spherical inverse problems. Ph.D. thesis, University of Siegen, Geomathematics Group, Verlag Dr. Hut, Munich
The University of Texas at Austin, Centre for Space Research (2020) Grace gravity recovery and climate experiment. http://www2.csr.utexas.edu/grace/. Accessed 3 March 2020
Vincent P, Bengio Y (2002) Kernel matching pursuit. Mach Learn 48(1–3):165–187
Watkins MM, Wiese DN, Yuan DN, Boening C, Landerer FW (2015) Improved methods for observing Earth’s time variable mass distribution with GRACE using spherical cap mascons. J. Geophys Res Solid Earth 120(4):2648–2671
Wiese D, Boening C, Zlotnicki V, Luthcke S, Loomis B, Rodell M, Sauber J, Bearden D, Chrone J, Horner S, Webb F, Bienstock B, Tsaoussi L (2020) The NASA mass change designated observable study: progress and future plans. https://doi.org/10.5194/egusphereegu202012077
Windheuser U (1995) Sphärische Wavelets: Theorie und Anwendung in der Physikalischen Geodäsie. PhD thesis, University of Kaiserslautern, Geomathematics Group
Acknowledgements
The authors gratefully acknowledge the financial support by the German Research Foundation (DFG; Deutsche Forschungsgemeinschaft), projects MI 655/72 and MI 655/141. Further, we thank Dr. Roger Telschow for handing us the irregularly distributed grid. Last but not least, we are grateful for using the HPC Cluster Horus and Omni maintained by the ZIMT of the University of Siegen for our numerical results.
Open Access
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Funding
Open Access funding enabled and organized by Projekt DEAL. Volker Michel gratefully acknowledges the support by the German Research Foundation (DFG; Deutsche Forschungsgemeinschaft), projects MI 655/72 and MI 655/141.
Author information
Authors and Affiliations
Contributions
The research was carried out during the Ph.D. and Postdocprojects of Naomi Schneider. In both DFGfunded projects, Volker Michel is the principal investigator.
Corresponding author
Appendices
Table of abbreviations (listed in alphabetical order)
\([\cdot ]_\bullet \)  Trial function class 
\(A(\alpha ,\beta ,\gamma )\)  Rotation matrix depending on the Euler angles \(\alpha ,\ \beta ,\ \gamma \) 
\(\alpha \)  Euler angle \(\alpha \in [0,2\pi [\) 
\(\alpha _N\)  Coefficient related to the Nth chosen dictionary element in the (L)RFMP 
\(\alpha _N^{(N)}\)  Coefficient related to the Nth chosen dictionary element in the (L)ROFMP 
\(A_N(d)\)  Squareroot of numerator of \(\mathrm {RFMP}(\cdot ;\cdot )\) 
\(A_N^{(N)}(d)\)  Squareroot of numerator of \(\mathrm {ROFMP}(\cdot ;\cdot )\) 
APKs  AbelPoisson kernels 
APWs  AbelPoisson wavelets 
\(B_K\)  Subset of the unit ball in 3 dimensions which defines the APKs included in a dictionary 
\(b_n^{(N)}(d)\)  Representation of a dictionary element d in previously chosen dictionary elements 
\(B_N(d)\)  Denominator of \(\mathrm {RFMP}(\cdot ;\cdot )\) 
\(B_N^{(N)}(d)\)  Denominator of \(\mathrm {ROFMP}(\cdot ;\cdot )\) 
\(B_W\)  Subset of the unit ball in 3 dimensions which defines the APWs included in a dictionary 
\({\mathbb {B}}\)  Unit ball in three dimensions: \({\mathbb {B}}= \{ x\in {\mathbb {R}}^3 :x< 1\}\) 
\({\mathbb {B}}^d\)  Unit ball in d dimensions: \({\mathbb {B}}^d = \{ x\in {\mathbb {R}}^d :x< 1\}\) 
\({\overline{{\mathbb {B}}}}^d\)  Unit ball in d dimensions including the boundary: \({\overline{{\mathbb {B}}}}^d = \{ x\in {\mathbb {R}}^d :x\le 1\}\) 
\(\beta \)  Euler angle \(\beta \in [0,\pi ]\) 
\(\beta _n^{(N)}(d)\)  Projection coefficients needed for \(b_n^{(N)}(d)\) 
c  Size of spherical cap 
\({\mathcal {C}}\)  Learnt set of candidates 
\(d_N\)  Nth chosen dictionary element 
\({\mathcal {D}}\)  Dictionary 
\({\mathcal {D}}^{\mathrm {Inf}}\)  Infinite dictionary 
\({\mathcal {D}}^{\mathrm {m}}\)  Manually chosen dictionary 
\({\mathcal {D}}^{\mathrm {s}}\)  Starting dictionary 
\(\varepsilon ^3\)  \((0,0,1)^\mathrm {T}\), the North Pole 
\(f_0\)  Initial (guessed) approximation 
\(f_N\)  Approximation of the (L)RFMP 
\(f_N^{(N)}\)  Approximation of the (L)ROFMP 
\(g^{(k,L)}(R,\cdot )\)  Slepian function 
\(g^{(k,L)}_{l,m}(R)\)  (m, l)th Fourier coefficient of a Slepian function (expanded in spherical harmonics) 
\(\gamma \)  Euler angle \(\gamma \in [0,2\pi [\) 
GRACE  Gravity recovery and climate experiment 
GRACEFO  GRACE follow on 
\({\mathcal {H}}_2(\Omega )\)  Sobolev space on the sphere 
\(\mathrm {IPMP}(\cdot ; \cdot )\)  Objective function of an (L)IPMP algorithm 
\(K(x,\cdot )\)  AbelPoisson kernel 
\({\mathcal {L}}\)  Set of all tuples (k, L) of bandlimits L and related kth Slepian function 
\(\lambda \)  Regularization parameter 
(L)IPMP  Learning inverse problem matching pursuit 
(L)R(O)FMP  (Learning) regularized (orthogonal) functional matching pursuit 
\({\mathcal {N}}\)  Set of all tuples (n, j) of degree n and order j of all spherical harmonics 
\({\overline{N}}\)  Maximal degree of SHs in infinite dictionary 
\(\nu \)  Learnt maximal degree of SHs 
\(\Omega \)  Unit sphere in three dimensions: \(\Omega = \{ x\in {\mathbb {R}}^3 :x=1\}\) 
\(P_{n,j}\)  Associated Legendre function of degree n and order j 
\({\mathcal {P}}_{{\mathcal {V}}_N^\perp }\)  Orthogonal projection onto the \({\mathcal {V}}_N^\perp \) 
\(R=R(c,A(\alpha ,\beta ,\gamma )\varepsilon ^3)\)  Spherical cap; the localization region of a Slepian function 
RBFs  Radial basis functions 
RBWs  Radial basis wavelets 
\(\mathrm {RFMP}(\cdot ; \cdot )\)  Objective function of the (L)RFMP algorithm 
\(R^N\)  Residual of the Nth iteration in an (L)IPMP 
\(\mathrm {ROFMP}(\cdot ; \cdot )\)  Objective function of the (L)ROFMP algorithm 
SHs  Spherical harmonics 
\(\sigma \)  Satellite height 
SLs  Slepian functions 
\(S_z\)  Spline needed for \(\mathrm {IPMP}(\cdot ; \cdot )\) in the case that the (L)ROFMP is run 
\({\mathcal {T}}\)  Operator of the inverse problem 
\({\mathcal {T}}_\daleth \)  Evaluation operator with respect to grid points of the related inverse problem operator \({\mathcal {T}}\) 
\({\mathcal {V}}_N^\perp \)  Orthogonal complement of the span of \({\mathcal {T}}_\daleth d_n\) for previously chosen dictionary elements \(d_n\) 
\(W(x,\cdot )\)  AbelPoisson wavelet 
\(X^\mathrm {s},\ X^\mathrm {m}\)  Discrete set of centres of APKs and APWs in starting and manually chosen dictionary 
y  Data 
\(y^\delta \)  Perturbed data 
\(Y_{n,j}\)  Spherical harmonic of degree n and order j 
Supplementary plots from the numerical experiments
We give plots of examples of trial functions as well as the EGM2008 and the deviation from the mean field of the GRACE data in May 2008. Further, we give the irregular data grid and plots of the absolute approximation error of the comparison tests (See Figs. 4, 5, 6).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Schneider, N., Michel, V. A dictionary learning addon for spherical downward continuation. J Geod 96, 21 (2022). https://doi.org/10.1007/s0019002201598w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s0019002201598w
Keywords
 Dictionary learning
 Inverse problems
 Gravitational potential
 Matching pursuits
 Numerical modelling
 Satellite geodesy
Mathematics Subject Classification
 31B20
 41A45
 65D15
 65J20
 65K10
 65R32
 68T05
 86A22