1 Introduction

Geophysics, both a systematic framework and a method for Earth exploration (Brown and Slawinski 2017) is traditionally divided into subdisciplines around distinct research methods, each focusing on a different observable, and making inferential statements about Earth properties through separate inversions for subsurface structure. In this paper we discuss the possibilities and advantages of conducting inversions that combine multiple physical observables in one single mathematical framework.

1.1 Forward and inverse problems in global geophysics

All geophysics, like ancient Gaul, is divided into three parts, one of which is the domain of potential-field methods (gravity, geomagnetism, geo-electricity), the other that of seismic wave phenomena, geodynamics occupying the third. All these differ from each other in formalism, conventions, and best practices. Seismology is the most direct of all of these approaches, because their sources—earthquakes—reach furthest down inside the Earth. The seismic wavefield thus rather directly samples its interior structure, carrying information back up to the receiver. In contrast, gravitational or geodynamic observations are perennially confined to the two-dimensional bounding surface of the volume whose structure and properties we seek to illuminate.

Inverse modelling in geophysics is classically divided into two distinct but overlapping realms. One of them aims to image the subsurface, focusing on material contrasts and high-frequency— “sharp”—structure. The second purports to constrain the parameters of the imaged subsurface structure, resolving material properties and low-frequency—“smooth”—structure. In seismology, where the distinction is most relevant (Mora 1988, 1989; Woodward 1992), one speaks of “migration” versus “tomography”—“inversion”, by any other name (Nolet 2008; Schuster 2017).

Geophysical observables that are measured at Earth’s surface ultimately derive from partial differential equations (PDEs), which lend themselves to theoretical analysis (Mead and Ford 2020; Michel and Simons 2017) and numerical simulation. One can synthetically create measurements based on an initial model—a starting guess for the subsurface structure—typically, but not exclusively, as we shall see, a “low-wavenumber” smooth model upon which “high-wavenumber” sharp contrasts are sought to be superposed (Bunks et al. 1995; Yuan and Simons 2014).

1.2 Geophysical observables and governing equations

In this work we consider three geophysical observables and their governing equations: (i) the gravitational acceleration and potential derived from Poisson’s equation, (ii) the acoustic wave field controlled by the linear wave equation, and (iii) the mantle flow field governed by the steady-state non-inertial Stokes equations for mass and momentum conservation in incompressible media together with Poisson’s equation.

The gravitational acceleration, the gradient of Newton’s potential, is sensitive to mass density via Poisson’s equation (Kellogg 1967). Inverting for Earth’s density structure to match observed gravity anomalies, disturbances, potential anomalies, or equipotential undulations (Blakely 1995; Hofmann-Wellenhof and Moritz 2006) is an ill-posed problem (Xu 1992a, b). Without prior information it is generally impossible to uniquely resolve the location and strength of the causative anomalies (Chao 2005; Dorman and Lewis 1970; Oldenburg 1974; Parker 1973), and the null-space with “anharmonic” density distributions remains inaccessible without non-gravimetric constraints (Fischer and Michel 2012; Michel 2005; Michel and Fokas 2008; Michel and Orzlowski 2015).

The second measurable is the surface displacement caused by the propagation of elastic or acoustic waves, e.g., after an earthquake (Aki and Richards 1980; Dahlen and Tromp 1998) or active probing (Sheriff and Geldart 1995; Yilmaz 2001). For simplicity we only look at the pressure caused by acoustic waves, inverting for the wave speed, which combines information on Hookean elastic moduli and mass density. The inversion of the acoustic wave equation cannot precisely discriminate the cause for the observed differences in the wave speed, but it is able to relatively accurately image the subsurface location of the anomalies (Gauthier et al. 1986; Tarantola 1984a, b).

The third measurable is the surface velocity caused by the motion of highly viscous solid rock in the subsurface (Glatzmaier 2014; Schubert et al. 2001), measured on geologic time scales or contemporaneously, e.g., by global navigation satellite systems (GNSS) such as the Global Positioning System (GPS). The material flow in the Earth’s interior can be modeled by the Stokes equations (Gerya 2019; Kennett and Bunge 2008), which are non-linear. The inversion of surface velocities is characterized by a rather low sensitivity to the location of the driving forces, but it can with some accuracy infer the density and viscosity of subsurface anomalies, especially combined with inference from seismology (Conrad et al. 2013; Forte and Mitrovica 2001; Harig et al. 2010; Lithgow-Bertelloni and Richards 1998).

Newton, Hooke, and Stokes: by these mnemonics we designate the three different integro-differential forward operators that map different portions of parameter space onto distinct types of observations. The individual inverse problems have been addressed by many authors. Their combination, and, specifically, the relative value of combining observations, is the subject of this paper. Within the broader context of the numerical solution of inverse problems in geophysics, we unify approaches for these three very different problems, and discuss how to combine and contrast them, highlighting the trade-offs between their various sensitivities.

1.3 Geological test scenarios

We consider three simplified geological scenarios, (i) a sinking or “falling” block (Morgan 1965), which can be interpreted as a Rayleigh-Taylor instability (Conrad and Molnar 1997), e.g., representing a lithospheric drip or a rising magma chamber, (ii) a “subduction” setting (Stern 2002), described by a denser, higher-viscosity and colder, and, by inference, higher wave-speed “slab” embedded in a background mantle, and (iii) a buoyant, bell-shaped anomaly at the bottom of the domain embedded in a background mantle, such as can be interpreted as a simplified version of a mantle “plume” (Condie 2001; Morgan 1971).

1.4 Guide to the paper

We study the inverse problem by (i) mapping out the least-squares cost function for each of the attendant observables separately, and combined, to analyse how a combination of observations results in a more unique or steeper cost function shape. This is done for two material parameters that are defined on geometrically fixed subsets. We furthermore calculate (ii) the pointwise gradient fields of the forward solutions with respect to the solution variables, and (iii) the gradients of a cost function with respect to the material parameters for the inversion.

We limit our scope to gradient-based inversions, including possible regularization, discussing their use in resolving the geometry and amplitude of subsurface anomalies. We do not consider operator inversions (Michel and Simons 2017; Mead and Ford 2020), subspace methods (Geng et al. 2020) or statistical frameworks (Fichtner and Simutė 2018) that invert for anomalous subsurface structure by exploring the cost-function space by a combination of statistical and deterministic methods. Deterministic gradient-based methods such as those presented here can only find the minimum of a function that is closest to the initial guess, which is not necessarily the global minimum. However, they enjoy the advantage of a relatively low computational numerical cost.

We present the forward and adjoint equations of each of the PDEs used to simulate the three distinct geophysical observables. For the analysis we require the gradients of the material parameters with respect to the solutions. To this end we derive the adjoint equations for all three PDEs. The adjoint gradient calculation is independent of the number of parameters, only requiring one additional linear solution, thus making it possible to perform field-based inversions. We develop the adjoint system of equations using the formal Lagrangian approach following Tröltzsch (2010). This approach results in the weak forms of the forward, adjoint, and gradient equations, which are discretized using FEniCS (Alnæs et al. 2015; Logg et al. 2012), a high-level Python or C++ framework to efficiently develop finite-element (FE) codes using PETSc (Balay et al. 2019), a management suite of highly optimized parallel data structures which gives access to direct, as well as iterative solvers.

Lastly, we suggest a simple geometrical shape optimization algorithm that does not require shape discretization as boundary term. Such an algorithm can be useful if sufficient laboratory constraints on, e.g., rock type exist, such that the material parameters are well known and the shape of the anomaly is of dominant interest.

2 Theoretical framework

All of our considerations will be explicitized for a two-dimensional bounded domain of Cartesian space, \(\varOmega \in {\mathbb {R}}^2\), with boundary \(\delta \varOmega \), and space coordinates \({\mathbf {r}}\).

2.1 Gravity

Newton’s gravitational potential, corrected for first-order effects such as the Earth’s equilibrium shape and its rotation, varies continuously across the Earth’s surface, reflecting inhomogeneities in the mass density of the interior of the Earth (Blakely 1995). Additionally accounting for an independently known or presumed regional background mass distribution, we consider the relation (Hofmann-Wellenhof and Moritz 2006) between a localized density ‘perturbation’, \(\rho ({\mathbf {r}})\), i.e. a mass anomaly at depth within the domain \(\varOmega \), and the ‘anomalous’ potential, \(u({\mathbf {r}})\), that it generates, and to which we have access at the measurement surface, here presumed to be the flat upper boundary \(\varOmega _{\mathrm {obs}}=\delta \varOmega _{\mathrm {top}}\):

$$\begin{aligned} \nabla ^2 u&= 4\pi G\rho \qquad \text {within}\,\,\,\varOmega , \end{aligned}$$
(1)
$$\begin{aligned} u&= u_{\mathrm {obs}}\,\quad \qquad \text {on}\,\,\,\delta \varOmega _{\mathrm {top}}. \end{aligned}$$
(2)

The unknown mass density \(\rho \) is in kg m\(^{-3}\), the measurable gravitational potential u is in m\(^2\) s\(^{-2}\), the universal gravitational constant \(G=6.674\times 10^{-11}\) m\(^3\) kg\(^{-1}\) s\(^{-2}\), and \(\nabla ^2\) is Laplace’s operator. Obtaining the density distribution \(\rho ({\mathbf {r}})\) from boundary observations of the harmonic potential \(u(\rho )\) through Poisson’s relation, Eq. (1), is the well-studied non-unique, exponentially ill-posed problem of ‘inverse gravimetry’ (Freeden and Nashed 2018; Michel and Fokas 2008).

We seek to minimize the quadratic misfit functional, subscripted N for “Newton”,

$$\begin{aligned} F_N(u)=\frac{1}{2} \int _{\varOmega _{\mathrm {obs}}}^{} \big [u_{\mathrm {obs}}-u(\rho )\big ]^2\,dx, \end{aligned}$$
(3)

an integral over the linear dimension x that defines the top boundary, \(\delta \varOmega _{\mathrm {top}}\). In practice we will restrict our observations to discrete points on the upper surface \(\delta \varOmega _{\mathrm {top}}\), representing surficial sensor locations.

We now formulate the adjoint version of this problem. We minimize the functional in Eq. (3) subject to the elliptic PDE constraint in Eq. (1). Following, e.g., Martin et al. (2012), Plessix (2006), Tröltzsch (2010), or Virieux and Operto (2009), we calculate the first, second and third variation of the Lagrangian,

$$\begin{aligned} {\mathscr {L}}_N=\frac{1}{2}\int _{\varOmega _{\mathrm {obs}}}\big [u_{\mathrm {obs}}-u\big ]^2\,dx-\int _{\varOmega }\varvec{\nabla }u \cdot \varvec{\nabla }v\,dx-\int _{\varOmega }4\pi G\rho \, v\,dx, \end{aligned}$$
(4)

which we recognize as the sum of \(F_N(u)\), the function that is to be minimized, and the weak form of the forward problem that acts as a constraint to the minimization, which is obtained (Zienkiewicz 1977) by integrating Eq. (1) over \(\varOmega \) after multiplication by an appropriate test function v and its gradient \(\varvec{\nabla }v\).

The first variation of Eq. (4), with respect to the test function, returns the weak form of the forward problem in Eq. (1).

The second variation, with respect to the solution u, yields the ‘adjoint equation’, to be solved for the adjoint field v using a new set of test functions \({\hat{u}}\),

$$\begin{aligned} \frac{\partial {\mathscr {L}}_N}{\partial u} ({\hat{u}}) =\int _{\varOmega _{\mathrm {obs}}}\big [u_{\mathrm {obs}}-u\big ]\,{\hat{u}}\,\,dx-\int _{\varOmega }\varvec{\nabla }v\cdot \varvec{\nabla }{\hat{u}}\,\,dx=0 . \end{aligned}$$
(5)

Since u is the solution of the forward problem, it is clear that the forward equation must be solved before the adjoint equation, which, however, remains strictly linear. A useful interpretation of the adjoint equation is that it solves the original PDE but with the mismatch between observations and forward predictions as a source term.

The third variation is with respect to the design variable, here: the density \(\rho \). This results in the weak form of the gradient of the misfit function,

$$\begin{aligned} \frac{\partial {\mathscr {L}}_N}{\partial \rho } (\hat{\rho })=-\int _{\varOmega }4\pi G \,v\,\hat{\rho } \,dx=\frac{dF_N}{d\rho } . \end{aligned}$$
(6)

Here, v is the solution of the adjoint equation, Eq. (5).

After discretization within \(\varOmega \), the gradient can be used by the inversion procedure, e.g., via gradient descent, whereby the density is updated iteratively until convergence, according to the line search

$$\begin{aligned} \rho _{i+1}=\rho _i-\alpha _i \frac{dF_N}{d\rho } , \end{aligned}$$
(7)

with i the iteration counter. We choose \(\alpha _i\) to evolve according to:

$$\begin{aligned} \alpha _i=\zeta \frac{\displaystyle {a_0}}{\displaystyle {\max \left( |dF_N/d\rho |_{i=0}\right) }} , \end{aligned}$$
(8)

where \(a_0\) is the initial perturbation of the parameter, and \(\zeta \) is an increment that is increased in case the gradient update was successful in terms of decreasing the misfit function, or decreased, if not.

More advanced techniques such as quasi-Newton methods improve convergence by supplying the Hessian of the problem (Fichtner and Trampert 2011; Ma and Hale 2012), the curvature of \(F_N(u)\). For the simple test cases presented here, we save ourselves the computational burden.

The non-uniqueness of the gravity inversion problem is well-known (Oldenburg 1974). The spatial extent, depth of burial, and density contrast of the perturbing body trade off in producing identical disturbing potentials at the surface. While we treated the gravity inversion problem for subsurface structure imaging first, we envisage it as the second step to constrain the density of an anomaly whose approximate location can be independently imaged, e.g., by the seismic method discussed in the next section.

2.2 Seismology

Seismology is undoubtedly the major imaging tool in geophysics, both for exploration (Claerbout 1992) and whole-Earth structure (Nolet 2008). Whether excited by artificial or natural sources, diffusive forcings or earthquakes, the linearized balance between infinitesimal perturbations of stress and strain, within the constitutive framework of Hookean elasticity, results in the propagation of waves whose speeds are controlled by the distribution of mass density and stiffness parameters inside the Earth. The hyperbolic PDE of interest relates the seismograms, the time-resolved records of ground motion sampled across the measurement surface \(\delta \varOmega _{\mathrm {top}}\), to the inferred sources \(f({\mathbf {r}},t)\) producing the wave motion and the internal wave-speed distribution, \(c({\mathbf {r}})\) that we seek to recover. Limiting ourselves to the acoustic case (Tarantola 1984a), the primary observable is pressure, \(u({\mathbf {r}},t)\), which satisfies the suitably initialized acoustic wave equation

$$\begin{aligned} \frac{1}{c^2} \frac{\partial ^2 u}{\partial t^2}-\nabla ^2 u&=f \quad \qquad \text {within}\,\,\,\varOmega , \end{aligned}$$
(9)
$$\begin{aligned} u&=u_{\mathrm {obs}}\qquad \text {on}\,\,\,\delta \varOmega _{\mathrm {top}} . \end{aligned}$$
(10)

The unknown wave speed c is in m s\(^{-1}\), the acoustic pressure u is in kg m\(^{-1}\) s\(^{-2}\). Obtaining the three-dimensional wave-speed \(c({\mathbf {r}})\) from boundary observations of the wavefield u(c) through the Navier–Cauchy–Hooke mapping, Eq. (9), exemplifies the mathematical problem of seismological inversion that remains of interest to (geo)mathematicians (Freeden 2015; Stefanov et al. 2019). The complex geometry of the Earth’s interior, the rich nature of the seismic wavefield, and the diverse regimes under which it can—or cannot—be observed have led to as many solution strategies (Akçelik et al. 2002; de Hoop et al. 2009; Nolet 2015; Symes 2009; Tromp 2020). Lastly, we note the coupled nature of the seismic and the gravitational inverse problems (Berkel and Michel 2010; Liu and Tromp 2009; Michel 2015). For the purpose of this paper, it suffices to note that the acoustic wave speed c relates to the mass density \(\rho \) via \(c=\sqrt{\kappa /\rho }\) through \(\kappa \), the bulk modulus.

As in the gravimetric inverse problem, we restrict the observations to discrete receiver locations on the top surface \(\delta \varOmega _{\mathrm {top}}\). We consider the source term f known (in practice, we use a second-derivative-of-the-Gaussian, or “Ricker” source wavelet, \(f(t)=0.2 e^{-1000 (t-0.07)^2}\)), and we neglect to write the absorbing boundary conditions to simplify the exposition.

We minimize the quadratic misfit function, subscripted H for “Hooke”,

$$\begin{aligned} F_H(u)=\frac{1}{2}\int _{0}^{T}\int _{\varOmega _{\mathrm {obs}}}\big [u_{\mathrm {obs}}-u(c)\big ]^2 \,dx\,dt, \end{aligned}$$
(11)

over the spatial boundary dimension and integrated over the time window of interest, \(0\le t\le T\), i.e. for a finite, bandlimited “arrival” pulse. When the time-window is long and the bandwidth is large, misfit functions of the type in Eq. (11) are known as “full-waveform”. In order to use the wave equation in Eq. (9) as a constraint in the minimization, we write the full Lagrangian of the system,

$$\begin{aligned} \begin{aligned} {\mathscr {L}}_H&=\frac{1}{2}\int _{0}^{T} \int _{\varOmega _{\mathrm {obs}}}\big [u_{\mathrm {obs}}-u\big ]^2\,dx\,dt\\&\quad -\int _{0}^{T}\int _{\varOmega }\left[ \frac{1}{c^2} \frac{\partial u}{\partial t} \frac{\partial v}{\partial t}-\varvec{\nabla }u\cdot \varvec{\nabla }v\right] \,dx\,dt-\int _{0}^{T} \int _{\varOmega }f \,dx\,dt. \end{aligned} \end{aligned}$$
(12)

The first variation of Eq. (12) with respect to the appropriate test function v and its gradient \(\varvec{\nabla }v\) results in the weak form of Eq. (9).

The second variation, with respect to the solution u, returns the adjoint equation:

$$\begin{aligned} \begin{aligned} \frac{\partial {\mathscr {L}}_H}{\partial u}({\hat{u}})&= \int _{0}^{T} \int _{\varOmega _{\mathrm {obs}}}\big [u_{\mathrm {obs}}-u(c)\big ]\,{\hat{u}}\,\,dx\,dt\\&\quad -\int _{0}^{T}\int _{\varOmega }\left[ \frac{1}{c^2}\frac{\partial v}{\partial t}\frac{\partial {\hat{u}}}{\partial t} -\varvec{\nabla }v\cdot \varvec{\nabla }{\hat{u}}\right] \,dx\,dt-\int _{\varOmega }\frac{\partial v}{\partial t}(T) \frac{\partial {\hat{u}}}{\partial t}(T) \,dx=0 , \end{aligned} \end{aligned}$$
(13)

to be solved for v. Note that the right-hand-side of Eq. (13) is only known at \(t=T\), in contrast to the forward problem where the source evolves from \(t=0\) to \(t=T\). This final-value problem thus has to be solved backwards in time from \(t=T\) to \(t=0\). Again, the interpretation of the adjoint equation is that it solves the original wave equation, backwards in time and with updated boundary conditions, but now sourced by the waveform differences between the observations and the forward fields.

The third variation, with respect to the design variable c, results in the weak form of the gradient of the misfit function, namely

$$\begin{aligned} \frac{\partial {\mathscr {L}}_H}{\partial c} ({\hat{c}}) = -\int _{\varOmega }\frac{2}{c^3}\left[ \int _{0}^{T} \frac{\partial u}{\partial t} \frac{\partial v}{\partial t} \,\,dt\right] \,{\hat{c}}\,\,dx=\frac{dF_H}{dc} . \end{aligned}$$
(14)

Here, v is the solution of the adjoint equation Eq. (13), propagating in time reversal, and u is the solution of the forward problem, propagated forward in time. Thus, Eq. (14) is the convolution of the forward and the adjoint solutions, a weighted interaction integrated against the partial derivative of Eq. (9) with respect to c.

The above derivations are merely rudimentary sketches of material extensively treated by Tarantola (1984a) and Tromp et al. (2005) using Green function approaches under the Born approximation, by Liu and Tromp (2006) and Liu and Tromp (2009) via the method of Lagrange multipliers, and by Fichtner et al. (2006b), Fichtner et al. (2006a), Plessix (2006), and Virieux and Operto (2009) in an abstract operator formalism.

It has been clearly shown over the last few decades (e.g. Tromp 2020; Virieux and Operto 2009) that the PDE-constrained minimization approach known as full-waveform inversion (FWI) is capable of rather precisely imaging Earth’s interior and subsurface structures. While the potential for geologic interpretation increases when gravitational and seismic observables are being treated in tandem, the joint resolution of density and wave speed anomalies is generally difficult with the types of measurements typically at hand. Fortunately, we have recourse to a last type of observables and a final, nonlinear PDE with which to constrain data misfit functions, as we shall see in the next section.

2.3 Geodynamics

The Earth’s silicate mantle behaves as a highly viscous fluid over geologic timescales (Ranalli 1995; Schubert et al. 2001), and the description of its structure as an incompressible “Stokes” flow at steady state (Kennett and Bunge 2008; Malvern 1969) is a useful abstraction. The velocity \({\mathbf {u}}({\mathbf {r}})\) and stress \(\varvec{\sigma }({\mathbf {r}})\) fields are governed by mass and momentum conservation equations with free-slip boundary conditions on both sides and the bottom, and a zero-traction boundary condition at the free-surface top:

$$\begin{aligned} \varvec{\nabla }\cdot {\mathbf {u}}&= 0 \quad \text {within}\,\,\,\varOmega , \,\,\,\,\,\quad \qquad {\mathbf {u}}\cdot \varvec{{\hat{\gamma }}}= 0 \quad \text {on}\,\,\,\delta \varOmega \backslash \delta \varOmega _{\mathrm {top}}, \end{aligned}$$
(15)
$$\begin{aligned} \varvec{\nabla }\cdot \varvec{\sigma }&= -\rho {\mathbf {g}}\quad \text {within}\,\,\,\varOmega , \,\,\qquad \varvec{\sigma }\cdot \varvec{{\hat{\gamma }}}= {\mathbf {0}}\quad \text {on}\,\,\,\delta \varOmega _{\mathrm {top}}. \end{aligned}$$
(16)

Here, \(\rho \) is the mass density, \(\varvec{{\hat{\gamma }}}\) is the outward normal vector, and the constant gravitational acceleration \({\mathbf {g}}\). Stokes’ constitutive law relates the total stress \(\varvec{\sigma }\) to the flow-induced strain rate \(\varvec{{\dot{\varepsilon }}}\) as follows:

$$\begin{aligned} \varvec{\sigma }=-p{\mathbf {I}}+2\eta \varvec{{\dot{\varepsilon }}}, \qquad \varvec{{\dot{\varepsilon }}}=\frac{1}{2}\left( \varvec{\nabla }{\mathbf {u}}+ [\varvec{\nabla }{\mathbf {u}}]^\mathrm {T}\right) , \end{aligned}$$
(17)

where \(p({\mathbf {r}})\) is the static equilibrium pressure of the fluid at rest, \({\mathbf {I}}\) the identity tensor, and \(\eta \) the (effective) dynamic viscosity, constant for Newtonian flows, or, allowing for the explanation of a fuller range of observable Earth phenomena both in the lab and at the plate-tectonic scale (Malevsky and Yuen 1992; Parmentier et al. 1976), for the case of power-law creep (Schubert et al. 2001), the strain-rate dependent

$$\begin{aligned} \eta (\varvec{{\dot{\varepsilon }}})=\eta _0 \left( \frac{\dot{\varepsilon }_{II}}{\dot{\varepsilon }_0} \right) ^{\frac{1}{n}-1} , \end{aligned}$$
(18)

where \(\eta _0\) and \(\dot{\varepsilon }_0\) are reference values, \(\dot{\varepsilon }_{II}=(1/2\varvec{{\dot{\varepsilon }}}:\varvec{{\dot{\varepsilon }}})^{1/2}\) is the second invariant of the strain rate, and \(1\lesssim n\lesssim 5\) the applicable power-law coefficient (Petra et al. 2012; van den Berg et al. 1993).

The velocity \({\mathbf {u}}({\mathbf {r}})\), in m s\(^{-1}\), and the pressure \(p({\mathbf {r}})\), in Pa, are the solutions to the PDE system (15)–(17), or indeed, the full equation of motion

$$\begin{aligned} -\varvec{\nabla }p+\rho {\mathbf {g}}+\varvec{\nabla }\cdot \left[ 2\eta (\varvec{{\dot{\varepsilon }}}) \varvec{{\dot{\varepsilon }}}\right] ={\mathbf {0}}. \end{aligned}$$
(19)

The unknown parameters of interest are \(\rho \), the mass density in kg m\(^{-3}\), the reference viscosity \(\eta _0\), in Pa s, and n, the dimensionless stress exponent.

In order to define the Lagrangian for the misfit function, subscripted S for “Stokes”,

$$\begin{aligned} F_S({\mathbf {u}})=\frac{1}{2} \int _{\varOmega _{\mathrm {obs}}}^{}[{\mathbf {u}}_{\mathrm {obs}}-{\mathbf {u}}(\rho ,\eta _0,n)\big ]^2 \,dx, \end{aligned}$$
(20)

which we seek to minimize with respect to the PDE constraint Eq. (19), we combine the solutions of Eq. (15) into a single solution \(({\mathbf {u}},p)\). The full Lagrangian reads:

$$\begin{aligned} \begin{aligned} {\mathscr {L}}_S&=\frac{1}{2} \int _{\varOmega _{\mathrm {obs}}}\big [{\mathbf {u}}_{\mathrm {obs}}-{\mathbf {u}}\big ]^2 dx +\int _{\varOmega }2\eta (\varvec{{\dot{\varepsilon }}}) \varvec{{\dot{\varepsilon }}}({\mathbf {u}}):\varvec{{\dot{\varepsilon }}}({\mathbf {v}})\,dx\\&\quad -\int _{\varOmega }p\,\varvec{\nabla }\cdot {\mathbf {v}}\,dx-\int _{\varOmega }(\varvec{\nabla }\cdot {\mathbf {u}})\,q\,dx+\int _{\varOmega }\rho {\mathbf {g}}\cdot {\mathbf {v}}\,dx. \end{aligned} \end{aligned}$$
(21)

The first variation of the Lagrangian with respect to an appropriate set of test functions \(({\mathbf {v}},q)\) results in the weak form of Eq. (19).

The second variation, with respect to the solutions \(({\mathbf {u}},p)\) in some new direction \((\hat{\mathbf {u}},{\hat{p}})\) results in the adjoint equation:

$$\begin{aligned} \begin{aligned} \frac{\partial {\mathscr {L}}_S}{\partial ({\mathbf {u}},p)}(\hat{\mathbf {u}},{\hat{p}})&= \int _{\varOmega _{\mathrm {obs}}}\big [{\mathbf {u}}_{\mathrm {obs}}-{\mathbf {u}}\big ]\,\hat{\mathbf {u}}\,dx+\int _{\varOmega }{\mathbb {D}}({\mathbf {u}}) \varvec{{\dot{\varepsilon }}}({\mathbf {v}}):\varvec{{\dot{\varepsilon }}}(\hat{\mathbf {u}}) \,dx\\&\quad -\int _{\varOmega }(\varvec{\nabla }\cdot {\mathbf {v}}){\hat{p}}\, \,dx-\int _{\varOmega }q\,\varvec{\nabla }\cdot \hat{\mathbf {u}}\,dx=0. \end{aligned} \end{aligned}$$
(22)

In this equation \(({\mathbf {u}},p)\) are the forward velocity and pressure, respectively. The equation is to be solved for the adjoint solution \(({\mathbf {v}},q)\) using appropriate test functions \((\hat{\mathbf {u}},{\hat{p}})\). \({\mathbb {D}}({\mathbf {u}})\) is the derivative of the nonlinear viscosity with respect to the forward velocity (e.g Reuber et al. 2020):

$$\begin{aligned} {\mathbb {D}}({\mathbf {u}}) = 2\eta (\varvec{{\dot{\varepsilon }}}) \left( {\mathbb {I}} + \left( \frac{1}{n}-1\right) \frac{\varvec{{\dot{\varepsilon }}}({\mathbf {u}}) \otimes \varvec{{\dot{\varepsilon }}}({\mathbf {u}})}{2\dot{\varepsilon }_{II}^2}\right) , \end{aligned}$$
(23)

with \({\mathbb {I}}\) being the fourth-order identity tensor. Note that compared to the PDEs in the previous sections, Eq. (22) is not self-adjoint due to the nonlinear viscosity (18).

The third variation of the Lagrangian with respect to the design parameter set \({{\mathbf {m}}} = \left( \, \rho \quad \eta _0 \quad n \,\right) ^\mathrm {T}\) in the new direction \({\hat{\mathbf {m}}}=\left( \, {\hat{\rho }} \quad \hat{\eta }_0 \quad {\hat{n}} \,\right) ^\mathrm {T}\) reads:

$$\begin{aligned} \frac{\partial {\mathscr {L}}_S}{\partial {{\mathbf {m}}}} ({\hat{\mathbf {m}}})= \left( \begin{array}{c} \int _{\varOmega }{\mathbf {g}}\cdot {\mathbf {v}}\,\hat{\rho } \,dx\\ \int _{\varOmega }2\left( \dot{\varepsilon }_{II}/\dot{\varepsilon }_0\right) ^{\frac{1}{n}-1}\varvec{{\dot{\varepsilon }}}({\mathbf {u}}):\varvec{{\dot{\varepsilon }}}({\mathbf {v}})\,\hat{\eta }_0\,\,dx\\ -\int _{\varOmega }\frac{\eta (\varvec{{\dot{\varepsilon }}}) \log \left( \dot{\varepsilon }_{II}/\dot{\varepsilon }_0\right) }{n^2}\,{\hat{n}}\,dx\end{array} \right) =\frac{dF_S}{d{{\mathbf {m}}}} , \end{aligned}$$
(24)

where \({\mathbf {u}}\) and \({\mathbf {v}}\) are the forward and adjoint velocity fields, respectively, and p and q are the forward and adjoint pressure terms.

The adjoint Stokes method has been firmly established for geodynamic inverse problems (Bunge et al. 2003; Ghelichkhan and Bunge 2016; Horbach et al. 2014), revealing, e.g., mantle structure from plate-motion histories (Bunge et al. 2002), plate-boundary coupling (Ratnaswamy et al. 2015), the thermal conditions of the mantle (Colli et al. 2018), and rheological parameters in steady-state models (Reuber et al. 2018).

2.4 Combining cost functions

In this work we aim to minimize the combined cost function

$$\begin{aligned} F=F_N(u) + F_H(u) + F_S({\mathbf {u}}) . \end{aligned}$$
(25)

We note that this function and its gradients are additive, leading to an optimization procedure that is uncoupled in theory, and the different parameter updates and the forward problem solutions do not mutually influence each other. Recent studies (Crestel et al. 2018) have suggested introducing an explicit coupling via, e.g. cross-gradient or density-norm regularization. Here we focus solely on the algorithmic structure of the uncoupled inverse problem. However, cross-product regularization can be approximated by updating the parameters only at the points for which all gradients agree in the update direction, as we shall show.

3 Geological settings

As we wrote in Sect. 1.3 we consider three geological settings in this work: highly idealized end-members that nevertheless comprise recognizable scenarios with considerable complexity. Relevant material parameters will be summarized in Table 1.

3.1 Falling block (FB)

The first scenario is of a “falling” block (FB) of a certain mass and viscosity, sinking in a less dense and less viscous matrix. Such a setting may occur on many different scales in nature, for instance when crystals settle in a magma chamber (Martin and Nokes 1988, 1989), or when a portion of Earth’s lithosphere “delaminates” (Elkins-Tanton 2005, 2007). The geometry is shown in Fig. 1. In our scenario the block has a higher seismic wave speed. The synthetic data are produced with the viscosity, wave speed and density distributions shown.

Table 1 Material parameters used in the different geological scenarios
Fig. 1
figure 1

Geometry of the falling block (FB) scenario (left column) and computed geophysical observables (right column). Upper-left panel shows the density distribution. Lower-left panel shows the seismic wave speed distribution, and the locations of sources at depth (red stars) and surface receivers (blue triangles). Upper-right panel shows the observable gravity anomaly, middle right the vertical surface velocity, and lower right the displacement seismograms from one of the sources to each of the receivers (color figure online)

3.2 Subduction zone (SZ)

The second setting is that of a subduction zone (SZ), a bent descending “slab” (Melosh and Raefsky 1980; Sleep 1975): colder, denser, and more viscous than the ambient mantle, and with a higher seismic wave speed. The geometry is shown in Fig. 2, as are the data synthetics for the gravity anomaly, the surface velocity, and the seismograms generated by sources at depth, recorded by receivers at the surface.

3.3 Mantle plume (MP)

The third setting is intended to capture the essentials of a mantle plume: a rising upwelling, hotter than, and as such with lower density, wave speed and viscosity than, the ambient mantle (Morgan 1971; Sleep 1990; Wilson 1973). As presented in Fig. 3, we abstract this setting by adding a Gaussian shaped perturbation at the bottom of the modelling domain. A rigid crust is located at the top.

Fig. 2
figure 2

Geometry of the subduction zone (SZ) scenario (left column) and computed geophysical observables (right column). Upper-left panel shows the density, lower-left panel the wave speed, and the source-station geometry. Right columns show the geophysical observables, in the same layout as Fig. 1

Fig. 3
figure 3

Geometry of the mantle plume (MP) scenario (left column) and synthetic geophysical observables (right column). As in Figs. 1 and 2 the upper left shows the density distribution, lower left panel the wave speed distribution. Panels in the right columns show, from top to bottom, the gravity anomaly, vertical surface velocity, and displacement seismograms at different receivers

4 Numerical experiments

In this section we present the collection of numerical results. We first formulate a motivation based on grid-search sampling, highlighting that a combined cost function indeed contains more information than any of them separately. This is followed by a visualization of the “kernels” of the three sets of governing PDEs. These kernels are the derivatives of the forward solution with respect to the material parameters. We present examples for continuous-field recovery using pointwise adjoint gradients subject to the three types of observable constraints, both sequentially and iteratively. Finally, we propose a geometric update algorithm based on the pointwise gradients.

4.1 Shape of the cost function

At first, one can show that the coupled cost function of Eq. (25) is indeed “better” than the respective single cost functions. With “better” we aim for (i) a more uniquely defined, and (ii) a steeper cost function, which may increase the convergence rate of the optimization procedure.

A straightforward approach involves performing a grid search. A grid search algorithm only requires discretization of the forward model over the parameter space and an evaluation of the forward model at each discrete point. We choose the viscosity, the density, the wave speed, and the depth to the tip or center of the anomaly as parameters. Wave speed and density are coupled such that an increase in density implies an increase in wave speed. We discretize the parameter space as a \(10\times 10\times \)3 viscosity-density-depth matrix. The geometry of the structure (apart from the tip) is considered known. The synthetic data are always computed with the parameter combination that is exactly in the center of the discretized parameter space.

In order to compare the cost functions we normalize them as

$$\begin{aligned} F_\beta ^*=\frac{F_\beta -\bar{F_\beta }}{\mathrm {std}(F_\beta )}-\min \left[ \frac{F_\beta -\bar{F_\beta }}{\mathrm {std}(F_\beta )}\right] , \end{aligned}$$
(26)

where \(F_\beta \) is an individual cost function, i.e. \(F_N\) for the gravitational (Newton) problem (Eq. 3), \(F_H\) for the seismoacoustic (Hooke) problem (Eq. 11) and \(F_S\) for the geodynamics (Stokes) problem (Eq. 20), \(\bar{F_\beta }\) denotes the mean of all such cost functions calculated during the grid search and \(\mathrm {std}\) denotes the standard deviation. We recall that F denotes the combined cost function of Eq. (25).

Fig. 4
figure 4

Sampled cost functions for the falling block (FB) scenario. The darker and the larger the circles the higher the value of the cost function at the sample point. Upper row: cost function for the surface velocity, for depth versus viscosity, and density versus viscosity, respectively. Here, and in the following, Stokes(i) refers to the value of the respective index i of the third and fixed variable for the Stokes equation, i.e. Stokes(5) refers to a viscosity value of 1000 or a density value of 1.5 while Stokes(1) refers to a depth value of 0.5. As supported by the Stokes sinker theory, the viscosity of the falling block exerts no influence on the surface velocity. All axes below the line are in terms of the depth of the anomaly against the primary design variables density or wave speed. We label wave speed, ranging from 0.9 to 1.9, after scaling to density. The second row shows the cost function for the surface velocity (Stokes), the seismic wave displacement (Hooke) and the gravitational potential (Newton). The third row shows possible combinations of the pure cost functions. The bottom panel shows the combined cost function for all observables. The bottom-row panel shows that the combined cost function is indeed substantially steeper than each separate cost function and will lead to more unique solutions than is afforded by, for instance, minimization for the gravitational cost-function by itself

Fig. 5
figure 5

Sampled cost functions for the subduction zone (SZ) scenario. Axes, labels and plot arrangement are as in Fig. 4. Density-equivalent seismic wave speeds again range from 0.9 to 1.9. Upper row panels: cost function for the surface velocity, for depth versus viscosity, and density versus viscosity, respectively. Second row: cost function for the surface velocity (Stokes), the seismic displacement field (Hooke) and the gravitational potential (Newton). Third row shows combinations of the pure cost functions. The bottom-row panel shows the cost function combined for all observables, highlighting its benefits

Fig. 6
figure 6

Sampled cost functions for the mantle plume (MP) scenario. Axes, labels and plot arrangement are again as in Figs. 4 and 5, but density-corresponding values of the seismic wave speed range from 0.3 to 1.3. Upper row: cost functions for the surface velocity for depth versus viscosity, and density versus viscosity. Second row: the cost function for the surface velocity (Stokes), the seismic displacement (Hooke) and the gravitational potential (Newton). The third row shows combinations of the pure cost functions. The bottom-row plot shows the combined cost function for all observables. The combined cost function is substantially steeper than each of the separate cost functions, and more unique than, for instance, the gravitational-potential cost function by itself. However, even in this case the combined cost function remains non-unique

All three geological settings are investigated, starting with the falling block (FB) example scenario introduced in Fig. 1. In Fig. 4 the size and color of the circles indicate how good (small, light gray) or bad (large, dark gray) the fit is at the considered point. The top two panels show the influence of the viscosity of the falling block. In this particular setup it appears to be almost irrelevant. The three separate cost functions shown in the second row all show a “valley” of good solutions, which illustrates the difficulty for simple gradient-descent algorithms to converge.

While curvature information from the Hessian matrix could improve convergence of such problems, the different shapes of the valleys suggest that a combination of cost functions might be better behaved. The final combined cost function F is shown in the bottom panel: it indeed satisfies our notion of being ‘better’, i.e., steeper and more unique than each cost function taken separately.

The second test case is the subduction zone (SZ) example of Fig. 2. The top two panels in Fig. 5 suggest that the viscosity of the subducting part does have an impact on the cost function shape, which is, however, much smaller than the equivalent one for the density. Again, the combined cost function, shown in the bottom panel, is essentially unique, and much steeper than any separate ones. One important note is that a change in the subduction depth also changes the overall volume of material. This relation has implications for the gravitational anomaly which is mainly driven by the volume and amplitude of the anomalous material present at depth.

The third test case is the mantle plume (MP) scenario of Fig. 3, with the results presented in Fig. 6. As in the case before, a change in the tip of the Gaussian anomaly also results in a change of the overall volume of material. The impact of viscosity is limited and only pronounced in the limit that the plume is significantly more viscous than the surrounding mantle, which is geodynamically implausible. In this scenario even the combined cost function shows an undesirable valley of acceptable solutions, but it is steeper than the separate cost functions.

4.2 Continuous field recovery by discretized inversion

Section 4.1 showed how using a combined cost function helps in defining a better-posed minimization problem than using cost functions separately. The question remains how a non-grid-search deterministic inversion approach might make use of this beneficial cost surface shape. For the purpose of continuous-field recovery or kernel calculation, the parameter space spans the discretized range of the forward problems, substantially increasing the dimensionality. We seek the pointwise parameter field, based on the pointwise adjoint gradients, after assuming a homogeneous initial guess, and ignoring any prior information on space, time, or the parameter of interest.

4.2.1 Sensitivity kernels

To start out, it is useful to inspect the sensitivity of the forward solutions to the pointwise material parameters that control them. This sensitivity kernel for material parameters will have a value proportional to their influence on the forward solution, subject to the current initial configuration. As such, it affords insight into the information content of the purely forward solutions in terms of being able to solve for the material parameters as part of the inverse problem.

The sensitivity kernel of each of the PDEs contained in Eqs. (1)–(2), (9)–(10), and (15)–(17), can be derived by assuming a cost function equal to the forward solution, i.e. by changing the expressions \(F_\beta \) in Eqs. (3), (11) and (20) to

$$\begin{aligned} {\hat{F}}_\beta =\int _{\varOmega _{\mathrm {obs}}}^{} u({{\mathbf {m}}}) \,dx, \end{aligned}$$
(27)

whereby \({{\mathbf {m}}}\) is the parameter (set) of interest, i.e. mass density \(\rho \) for the Newton problem, acoustic wave speed c for the Hooke problem, and \((\rho ,\eta _0,n)\) for the Stokes problem, where \(\eta _0\) and n are the reference viscosity and the power-law exponent, respectively. Following the same framework as described in Sect. 1 we obtain, instead of the gradient of the cost function, the gradient of the forward operator on the boundary \(\varOmega _{obs}\), with respect to the parameters \({{\mathbf {m}}}\). Only the adjoint source, that is, the right-hand-side of the adjoint equations Eqs. (5), (13), and (22), changes.

For the example of the gravitational potential the adjoint equation now reads:

$$\begin{aligned} \frac{\partial {\mathscr {L}}_N}{\partial u} ({\hat{u}}) =\int _{\varOmega _{\mathrm {obs}}}\frac{\partial {\hat{F}}_N}{\partial u} \,{\hat{u}}\,\,dx-\int _{\varOmega }\varvec{\nabla }v\cdot \varvec{\nabla }{\hat{u}}\,\,dx=0 , \end{aligned}$$
(28)

noting that \(\partial {\hat{F}}_N\Big /\partial u=1\), and mutatis mutandis for the other PDEs.

The gradient equations remain unchanged—a change in the cost function, without regularization, only affects the adjoint source—and one retrieves (i) \(d{\hat{F}}_N/d\rho \) the sensitivity of the gravitational potential to the density, (ii) \(d{\hat{F}}_H/dc\), the sensitivity of the displacement to the wave speed, and (iii) \(d{\hat{F}}_S/d{{\mathbf {m}}}\), the sensitivity of the vertical surface velocity to the density, viscosity or power-law exponent, respectively.

Since the kernels image the sensitivity of the gravitational potential, the displacement due to an acoustic wave, and the surface velocity, with respect to density, wave speed, and viscosity, respectively, they are to be interpreted as resolution indicators. Where the kernels vanish the material parameter has no discernible effect on the observations, and any updates of or interpretations based on recovered anomalies will be spurious. It should nevertheless be noted that all kernels change shape in function of their initial conditions, and thus are only valid for the current state. The kernels for the falling block scenario are summarized in Fig. 7, the ones for the subduction zone setting in Fig. 8 and those for the mantle plume in Fig. 9.

The kernel of the gravitational potential with respect to the density subject to the Poisson equation is very local to the measurement points followed by a diffuse profile towards the bottom of the domain. As such this measurable is also not very useful for imaging a subsurface structure, as is well known in the literature.

The wave speed kernel shows a complicated pattern due to reflections at the interfaces and its time history. Systematic study of the kernels for wave speed anomalies is the subject of extensive literature in the seismological community (Tromp et al. 2005).

The kernel for the surface velocity with respect to the density subject to the Stokes equations has a very diffuse pattern. Almost all density nodes have an effect on the surface velocity. This quantity may thus not be very useful to reconstruct the geometry of an anomaly. In comparison, the viscosity kernel is very localized around the anomaly, suggesting that even a small change in the viscosity around the anomaly will influence the surface velocity.

Fig. 7
figure 7

Sensitivity kernels for the Stokes flow, Hookean wavefield, and Poisson equations in the falling block scenario. The upper left panel shows the kernel for the surface velocity with respect to the density. The kernel is very diffuse and suggests that the block has the highest impact on the surface velocities. The upper right panel shows the kernel of the surface velocity with respect to the viscosity. This kernel is very localized to the edges of the block, suggesting that changing the viscosity at these points will most affect the surface velocity. The lower left panel shows the kernel of the wave speed with respect to the seismic deformation. This shows a predictably high influence along the ray paths. The lower right plot shows the kernel of the density with respect to the gravitational potential. Due to the nature of the Poisson equation this kernel simply shows a diffuse pattern decreasing with depth

Fig. 8
figure 8

Sensitivity kernels for the wavefield, flow and gravity equations for the subduction zone scenario, laid out identically to Fig. 7. The kernel for the surface velocity with respect to the density (upper left) is again very diffuse and suggests that the block has the highest impact on the surface velocities. The kernel of the surface velocity with respect to the viscosity (upper right) is very localized to the edges of the block, suggesting that changing the viscosity at these points will affect the surface velocity the most. The wave speed kernel with respect to the seismic deformation (lower left) shows a high influence along the ray paths. The lower right plot shows the kernel of the density with respect to the gravitational potential. Due to the nature of the Poisson equation this kernel again simply diffuses away with increasing depth

Fig. 9
figure 9

Sensitivity kernels for the wavefield, Stokes and Poisson equations for the mantle plume scenario, laid out as in Figs. 7 and 8. The surface velocity with respect to density kernel (upper left) is diffuse and indicates that the block has the highest impact on the surface velocities. The kernel of the surface velocity with respect to the viscosity (upper right) does not show any localized structure: there is no difference in viscosity for this scenario. The lower left panel with the kernel of the wave speed with respect to the seismic deformation shows a strong influence concentrated along the seismic ray paths. The lower right plot shows the kernel of the density with respect to the gravitational potential. Due to the nature of the Poisson equation this kernel mainly shows a diffuse pattern decreasing with depth

4.2.2 Collinear gradient updates

As we have seen, different governing equations reveal different features. The uppermost row in Fig. 10 shows the pointwise gradient for the falling block scenario, for the density subject to the Stokes equations (\(dF_S/d\rho \)), the viscosity in the Stokes equations (\(dF_S/d\eta \)), the wave speed from the wave equation (\(dF_H/dc\)), and the density from the gravitational potential (\(dF_N/d\rho \)). Only the wave-speed gradient is localized enough to reveal the geometry of the anomaly. The two gradients from the Stokes equation are rather diffuse, as is that of the Poisson equation. The same behavior can be seen for the two other cases—the subduction zone scenario shown in the top rows of Fig. 11, and for the mantle plume scenario shown in the top row of Fig. 12.

The second, third and fourth rows in Figs. 10, 11 and 12 show the areas where the respective sum of gradients (normalized by their standard deviation, as in Eq. 26) point in the same direction. Nodes in which the gradients would steer the solution toward different directions should probably not be considered in an inversion framework: updates there will necessarily result in a worse update of the cost function for any one of the observables. By design we consider that an increase in the viscosity has to mimic an increase in the density and the wave speed, and vice versa. The same behavior can also be achieved by adding a regularization that opts to optimize the cross product of the gradients, by imposing a penalty on points where the gradients point in different directions—an approach that is described in more detail elsewhere (e.g., Crestel et al. 2018).

As expected, only considering Stokes flow and the gravitational potential will not improve the solutions much. However, the combination of all available gradients is substantially closer to the synthetic shape than the one derived from the gradients of the Stokes and gravitational potential, respectively, and even better than the wave speed gradient by itself.

Fig. 10
figure 10

Cost-function gradients for the falling block scenario. The upper row shows the raw gradients for the surface velocity with respect to the density (\(dF_S/d\rho \)) and the viscosity (\(dF_S/d\eta \)), the deformation caused by the seismic wave with respect to the wave speed (\(dF_H/dc\)) and the gravitational potential with respect to the density (\(dF_N/d\rho \)). In the panels below the gradients are quantized, such that only nonzero values and collinear gradients are emphasized. The second row shows first-order combinations of the quantized raw gradients, as indicated. The third row shows second-order combinations of those same gradients. The bottom row shows the result of all quantized gradients combined and the target shape of the anomaly. The fully combined gradient—only considering the nodes where all gradients point in the same direction—is substantially closer to the synthetic result than any of the individual raw gradients. The improvement impacts even the wave speed gradient, traditionally a stand-alone tool to invert for anomaly geometry

Fig. 11
figure 11

Cost-function gradients for the subduction zone scenario, in a layout identical to that of Fig. 10 and 11. The top row shows raw gradients. The second row shows first-order combinations of the quantized raw gradients, the third row shows second-order combinations. The last row shows the result of all quantized gradients combined, next to the target shape of the anomaly. The fully combined gradient, when considering the nodes where all gradients point in the same direction, is substantially closer to the synthetic result than any of the raw gradients. The resulting fully combined gradient is broader than the negative part of the raw wave speed gradient. This is due to the fact that the gradient does not discriminate for the sign. The strictly negative part of the gradient is still very close to the synthetic shape, but contains the suggestion for how to change the surrounding background

Fig. 12
figure 12

Cost-function gradients for the mantle plume scenario, laid out as in Figs. 10 and 11. The top row shows the raw gradients, the second row first-order, the third row second-order combinations of the quantized gradients. The last row shows all the quantized gradients combined, and the synthetic shape of the anomaly. The fully combined gradient at the nodes where all gradients point in the same direction is substantially closer to the synthetic result than any of the raw gradients. As in Fig. 11, the fully combined gradient is broader than the negative part of the raw wave speed gradient. This is due to the fact that the gradient does not discriminate for the sign of the anomaly. The purely negative part of the gradient is still very close to the synthetic shape but hints at the need to change the background to reduce the cost function

4.2.3 Sequential gradient updates

After investigating the sensitivity kernels and initial cost-function gradients, the latter can be used for gradient-descent inversions. The intuitive approach is to update the parameters sequentially. This approach is in common use, since community codes usually only optimize one of the cost functions for either wave speed, density based on gravitational potential or viscosity and density based on flow. This workflow is vulnerable to errors introduced due to post-processing in each separate inversion, or from the unavailability of actual source data. Our work provides the individual pieces to develop one coherent code that inverts for the described observables.

In a sequential framework, one would (i) invert for the pointwise wave speed and thus the geometry of the anomaly, then (ii) choose a contour line in the inverted wave speed, (iii) reduce the number of design parameters by discretizing this contour line and treating other geometries, for instance the crust, as constant phases, (iv) first invert for a density subject to the gravitational potential, which—knowing depth and size of the anomaly—is now unique in a density, and then (v) follow up by inverting for the surface deformation by optimizing the viscosity of the anomaly subject to the Stokes equations. This ultimately leads to a good fit to all observables, and a model that is close to the true solution.

Deriving formal uncertainties on the final solution, which are in part driven by the user’s choice of the initial contour line, remains to be explored, e.g. via statistical sampling approaches (Gouveia and Scales 1998) or by investigating the shape of the Hessian (Petra et al. 2014).

Fig. 13
figure 13

Summary of the sequential inversion framework for the falling block scenario. The top row shows the inversion of the seismic displacement for the wave speed and the evolution of the cost function with gradient-descent iteration. The middle row shows the converged result and the cost function evolution for the second step of the sequential approach, which inverts the gravitational potential for the density subject to using a geometry for the anomaly chosen on the basis of a contour line of the wave speed inversion result preceding this step. The bottom row shows the final result of the last step of the sequential approach, the inversion for viscosity from the surface deformation. The known true values for neither density nor viscosity were reached. The wave speed contour chosen or the wave speed inversion result itself may not have been accurate enough for a fully successful recovery

Figure 13 presents the sequential workflow for the falling block scenario. From top to bottom, we show the final inversion results for acoustic wave speed (c), mass density (\(\rho \)) and viscosity (\(\eta \)) in the panels of the left column, and the behavior of the cost functions for the Hooke (\(F_H\)), Newton (\(F_N\)) and Stokes (\(F_S\)) problems, in descending order, throughout the gradient-descent iterations, normalized by their initial values (\(F_H^0\), \(F_N^0\) and \(F_S^0\)). The wave speed optimization, despite being based on only two sources, accurately positions the anomaly in the center of the domain. The inversion for the gravitational potential finds a density contrast of about 1.2, lower than the synthetic value of 1.5, owing to the volume of the recovered anomaly being substantially higher than the target volume. The surface deformation is used to optimize for the viscosity. Interestingly, this inversion converges on a negative viscosity anomaly, despite the target, shown in Fig. 1, being positive (see Table 1). Figure 14 shows the workflow for the subduction zone scenario. In this case, the seismic inversion almost perfectly captures the shape and size of the wave speed anomaly, which in turn results in a good reproduction of the density and viscosity, as compared with Fig. 2. Figure 15 shows the sequential workflow for the mantle plume scenario. Here, as in the falling block scenario, the outline of the input anomaly could only approximately be imaged from the inversion of the seismic records. As a result, the recovered density is also further away from the synthetic input, see Fig. 3. The flow inversion is omitted here, since the viscosity of the anomaly was unchanged from the background.

Fig. 14
figure 14

Sequential inversion framework for the subduction zone scenario, with layout as for Fig. 13. Top: gradient-descent wave speed inversion result, and cost function reduction. Middle: converged result for the second step: density from gravity, within a wave speed contour. Bottom: converged result of the final step: viscosity inverted from surface deformation. Neither density nor viscosity completely match the known input, hinting at the sensitivity of the algorithm to the selection of an appropriate wave speed contour

Fig. 15
figure 15

Summary of the sequential inversion framework for the mantle plume scenario, laid out exactly as Figs. 13 and 14. Top row shows inversion result for the wave speed and the reduction of the cost function. Middle row shows the result of the second step, the converged final result of the inversion for density from the gravitational potential inside of a contour line of the prior wave speed result. Bottom row shows the final result of the third step of the sequential approach, the determination of viscosity from the surface deformation. The density and viscosity are not perfect matches for the known input anomalies

4.2.4 Semi-iterative procedure

In contrast to the sequential approach, all parameter fields may receive gradient updates at the same time, either iteratively or by combining gradients of the same parameter. Such an approach may constitute an improvement over the sequential approach from Sect. 4.2.3, as in this approach gradients can immediately react to differences in the flow field caused by parameter updates.

For the mantle plume scenario, adding density gradients of the surface-deformation and the gravitational-potential cost functions, and stopping the algorithm once they diverge, results in a reduction of the combined cost function. However, the main role of such a procedure may well lie in identifying when the initial shape of the target anomaly is wrong. The density inferred from the flow field and from the gravitational potential should agree with each other after the global cost function has been sufficiently reduced to the point where it can be surmised that the anomaly geometry is captured correctly to first order.

One improvement would be to tweak the contour line of the wave speed result that is taken as a basis for continued inversion, depending on the difference between the Stokes and the Newton gradients. The result of such an algorithm is shown in Fig. 16 for the falling block scenario, in Fig. 17 for the subduction zone scenario, and in Fig. 18 for the mantle plume scenario. The density inversion results for the Stokes flow problem and the Newton gravitational potential, respectively, are shown, as is \(\varDelta \), the difference in density. Once the density difference is minimized, one can finally invert for the corresponding pointwise viscosity subject to the flow solution.

To summarize, this algorithm would start with a pointwise wave speed inversion, followed by a binarization—based on a contour line of the inverted wave speed—of the inverted anomaly in order to avoid non-local effects due to the flow and gravity gradients. Consequently, the algorithm iterates the binarization of the wave speed with a separate update of the density subject to the flow and gravity fields, respectively. Once the difference between the two inverted density fields is minimized, the final viscosity can be inferred from one additional inversion of the flow field.

Fig. 16
figure 16

Summary of the iterative approach for the density inversion under the falling block scenario. Top row shows the results of the inversion for the density with respect to the Stokes field and the Newton potential, respectively, based on an initial binarized subdivision chosen after a contour line of the initial wave speed inversion. The second row shows the pairs of results derived from choosing a slightly larger, or smaller, binarization contour line. The third row shows the results for an even larger and smaller contour line. The inverted density anomalies \(\rho \), constant inside the individual subdomains, are quoted in white. The anomaly difference, between the Stokes and the Newton solutions for each scenario, is labeled \(\varDelta \)

Fig. 17
figure 17

Iterative approach results for the density in the subduction zone scenario, with a layout and labeling exactly as in Fig. 16. Top row shows density of the Stokes and Newton problems for constant-density subdomains based on a wave speed anomaly contour. The second and third rows show pairs of results based on larger, and smaller, values of the initial wave speed subdivision contour value. Annotations detail recovered density anomalies, \(\rho \), and their differences, \(\varDelta \), under the different inversion methods

Fig. 18
figure 18

Iterative density inversion in the mantle plume scenario, laid out and annotated as in Figs. 16 and 17

A competing algorithm will again start with the inversion for the wave speed and its binarization, but then normalize by their standard deviation and add the gradients of the density subject to gravitational potential and flow. This effective density gradient is then used for an inversion for the density structure minimizing the flow and the gravitational potential misfits. Simultaneously, the reference viscosity, subject to the flow misfit, is optimized. This offers added potential for imaging: one can choose a subdiscretization for the parameter space such that it is coarser than the solution space. Different levels of coarsening can reveal the robustness of structures.

The results of such an approach are presented in Fig. 19 for the pointwise gradient under the same discretization as the solution, whereas in Fig. 20 we used a fourfold coarsening—rendering the gradient space four times smaller than the solution space, and in Fig. 21 we used an eightfold coarsening factor.

Fig. 19
figure 19

Summary of the simultaneous inversion for the pointwise density and reference viscosity. The rows correspond to the scenarios falling block (FB), subduction zone (SZ) and mantle plume (MP). The columns show one of the resulting inverted material parameters, (i) the wave speed c subject to the acoustic wave equation, (ii) the density \(\rho \) masked by a contour line from the inverted wave speed, (iii) the reference viscosity \(eta_0\) masked by a contour line from the inverted wave speed. The last column shows the convergence history for surface deformation from the Stokes equation as function of gradient-descent iterations

Fig. 20
figure 20

Summary of the simultaneous inversion for a homogeneously 4 \(\times \) coarsened density and reference viscosity. The rows correspond to the scenarios falling block (FB), subduction zone (SZ) and mantle plume (MP). The columns show one of the resulting inverted material parameters, (i) the wave speed c subject to the acoustic wave equation, (ii) the density \(\rho \) masked by a contour line from the inverted wave speed, (iii) the reference viscosity \(\eta _0\) masked by a contour line from the inverted wave speed. The last column shows the convergence history for surface deformation from the Stokes equation as function of gradient-descent iterations

Fig. 21
figure 21

Summary of the simultaneous inversion for a homogeneously 8\(\times \) coarsened density and reference viscosity. The rows correspond to one of the following scenarios, falling block (FB), subduction zone (SZ) and mantle plume (MP). The columns show one of the resulting inverted material parameter, (i) the wave speed c subject to the acoustic wave equation, (ii) the density \(\rho \) masked by a contour line from the inverted wave speed, (iii) the reference viscosity \(\eta _0\) masked by a contour line from the inverted wave speed. The last column shows the convergence history for surface deformation from the Stokes equation as function of gradient-descent iterations

The inversion with factor 4 reveals the shapes of both block and mantle plume better than using an uncoarsened gradient. This algorithm, of course, has to be tested in terms of whether the observed improvements are consistently maintained in more complex settings.

The collinear gradient inversion described in Sect. 4.2.2, whereby parameter updates are applied where the gradients point in the same direction, can be applied in the iterative context of this section also, as shown in Fig. 22. In contrast to the previous iterative approaches the wave speed inversion is neglected here. As expected, the sole combination of the gravity and flow gradients for the densities and the viscosity does not result in a good image of the anomaly, but only recovers a very broad area to the left of the synthetic subduction zone. This observation is however consistent with the kernels shown in Fig. 8, where the viscosity and the density show the highest sensitivity towards the top-left area of the synthetic anomaly.

Fig. 22
figure 22

Summary of the simultaneous inversion for the density and viscosity subject to the Stokes and Poisson equation, respectively, for the subduction zone scenario. Material parameters are only updated at points where all three gradients point in the same direction. The left panel shows the inverted density, the middle panel the inverted reference viscosity, and the right panel shows the convergence history for the surface deformation subject to the Stokes equations as function of gradient-descent iterations. For this inversion no acoustic wavefield information was considered

4.3 Regularization

Until now in this paper, wave-equation information has been used to “precondition” subsequent inversions via contour-informed binarized masking of the gradients of the flow and gravitational constraints. Another approach is to use the initial wave speed inversion results as explicit regularization. Here, we employ a Tikhonov-type (Aster et al. 2005; Tikhonov and Arsenin 1977) regularization whereby the PDE-specific data-space cost functions \(F_\beta \) are penalized by the addition of a model-space quadratic norm, as

$$\begin{aligned} {\tilde{F}}_\beta =F_\beta +\frac{\vartheta }{2}\Vert {\mathscr {R}}({{\mathbf {m}}})\Vert ^2_2+\frac{\varphi }{2}\Vert {{\mathbf {m}}}\Vert ^2_2\,\,, \end{aligned}$$
(29)

with \(\vartheta \) and \(\varphi \) trade-off parameters, and \({\mathscr {R}}\) a diffusion-style (Scherzer and Weickert 2000; Villa et al. 2019) operator:

$$\begin{aligned} {\mathscr {R}}=\varvec{\nabla }\cdot \left[ \!b\left( \left| \frac{\partial F_H}{\partial c}\right| \right) \varvec{\nabla }\right] , \end{aligned}$$
(30)

where \(b\left( \left| \partial F_H/\partial c \right| \right) \) is the binarization of the pointwise derivative of the Hookean cost function in function of the wavespeed, as shown in the first panel of Fig. 23. Our aim is to bias the inversion into correlating updates with the wavespeed results. Results of such an inversion for the falling block case are shown in Fig. 23. In this initial experiment the updates for the density and the viscosity are not shedding much additional light on the shape or parameters of the anomaly, but we do keep this procedure in mind as a future research direction.

Fig. 23
figure 23

Summary of the regularized simultaneous inversion for the pointwise density and viscosity subject to the Stokes and Poisson equations in the falling block scenario. Material parameters are only updated at points where all three gradients point in the same direction. The first panel shows the binarized gradient of the wave speed used in the regularization term. The second panel shows the density inversion result. The third panel shows the inverted reference viscosity. The fourth panel shows the convergence history for the surface deformation subject to the Stokes equations over the gradient-descent iterations

4.4 Shape recovery by parameterized inversion

In contrast to inverting for both geometry and material parameters of subsurface anomalies, geological situations arise where the parameter contrasts are known or can be assumed from prior knowledge, but the shape of the anomalies is unknown and remains of interest. For instance, rock type and seismic wavespeed perturbations in a subduction zone might be relatively well known, but the angle and shape of the zone might be unclear and the seismic station network very sparse.

An algorithm to tackle such an inverse problem, which has gained great popularity in engineering, is the deterministic level set method (Aghasi et al. 2011; Burger 2001; Dorn and Lesselier 2015). The “level set” is the common boundary \(\delta \varOmega _{\phi }\) of a smooth function within the domain \(\varOmega \). The level set function \(\phi \) vanishes on the boundary and decreases or increases away from it. In our application we take the signed distance of the normalized wavespeed result, \(c({\mathbf {r}})\), of the waveform inversion (as shown in, e.g., Fig. 14) as the initial level set function:

$$\begin{aligned} \phi ({\mathbf {r}}) = \text {sign}\left( \frac{c({\mathbf {r}})-\text {min}[c({\mathbf {r}})]}{\text {max}[c({\mathbf {r}})]-\text {min}[c({\mathbf {r}})]}\right) . \end{aligned}$$
(31)

Points where \(\phi >0\) are considered “subduction zone”, whereas “background” mantle has \(\phi \le 0\), resulting in a indicator function \(\chi \) for the material parameters, for instance

$$\begin{aligned} \rho ({\mathbf {r}}) = \chi _{\phi >0} \; \rho _{SZ} + \chi _{\phi <0} \; \rho _{BG}. \end{aligned}$$
(32)

The objective is to change the level set (the shape of the anomaly), in order to improve the fit to the observations. The evolution is by a Hamilton–Jacobi equation with time \(\tau \) an optimization time that proceeds according to the inversion iterations, as follows:

$$\begin{aligned} \frac{\partial \phi }{\partial \tau } + {\mathbf {v}}\cdot \varvec{\nabla }\phi = 0. \end{aligned}$$
(33)

The aim is to obtain a boundary advection velocity \({\mathbf {v}}\) so as to result in a reduction of the target cost function. This is achieved by evaluating the gradient of the cost function with respect to the indicator function. Using the vector \(\varvec{{\hat{\gamma }}}\), the normal to the boundary \(\delta \varOmega _{\phi }\), and \(v_\gamma \), the velocity component projected onto it,

$$\begin{aligned} \varvec{{\hat{\gamma }}}(\delta \varOmega _{\phi },\tau ) = \frac{\varvec{\nabla }\phi }{| \varvec{\nabla }\phi |} \quad \text{ and }\quad v_\gamma ={\mathbf {v}}\cdot \varvec{{\hat{\gamma }}}, \end{aligned}$$
(34)

we simplify Eq. (33) to

$$\begin{aligned} \frac{\partial \phi }{\partial \tau } + v_\gamma \,|\varvec{\nabla }\phi | = 0 , \end{aligned}$$
(35)

to describe the evolution of the level set. The direction perpendicular to the boundary is taken to be the gradient of the cost function with respect to the material parameter. Where this gradient is negative, the size of the anomaly will be increased by advection of the level set away from the boundary, by an amount proportional to the magnitude of the gradient of the material parameter. In this work, the advection equation (35) is solved with a forward Euler scheme (Laurain 2018; Peng et al. 1999), care being taking to reinitialize the level set function every few iterations.

Fig. 24
figure 24

Summary of the level set inversion with the vertical surface velocity as data under the subduction zone scenario. The panels in the upper row are, from left to right: the synthetic shape of the subduction zone to be reconstructed, represented as a binarized version of the level set function \(\phi \); the converged result using only the gradient of the density; the converged result using only the viscosity gradient; converged result using both the density and viscosity gradients. The panels in the row below show the initial guess for the geometry of the subduction zone based on a prior wave speed inversion, and the normalized cost functions over the gradient-descent iterations corresponding to the three cases in the panels above them

Figure 24 presents the result of the level-set inversion for the subduction zone. The initial guess for the shape of the anomaly, obtained from the wavespeed contour of an already quite successful seismic inversion, is already quite close to the truth, except it reaches too deep and extends too wide. Our aim is to improve the image by the addition of information from the vertical Stokes velocity at the surface. Three cases are considered: (i) where only the gradient of the velocity with respect to the density is used to advect the level set; (ii) where only the gradient of the velocity with respect to the viscosity is used as advection velocity, and (iii) where both gradients are used as advection velocity. Interestingly, the inversion using only the density gradient results in an anomaly that is thinner than the target. In contrast, the inversion using just the viscosity results in an anomaly that is too large. An inversion that uses both gradients rather nicely reproduces the upper parts of the anomaly, bending at the correct depth, but failing to reduce the vertical extent of the imaged structure.

Using a seismic wave speed inversion result as the initial guess for a subsequent geometric level-set inversion driven by additional geophysical observables is a promising approach. While these few experiments have barely done it justice, we hope that our suggestions will find their way into future research programs.

5 Discussion

Sensitivity kernels and cost-function gradients highlight both strengths and weaknesses of the various geophysical observables that can be taken into consideration for the inversion for material parameters and the imaging of Earth structure.

The acoustic wave equation provides good initial sensitivity to the outline and geometry of wave speed anomalies that are a priori unknown. Hence, wave speed inversion of seismogram data, records of transient surface deformation, is recommended as the first step in a geophysical imaging study. Wave speed inversion results can be non-unique and marred by artifacts and spurious additional anomalies. Wave speeds themselves cannot be directly translated into rheological parameter values.

The gravitational potential observed at the Earth’s surface, described by Poisson’s equation driven by mass density perturbations, lacks direct resolving power or information on the geometry of the anomalies. Hence it is recommended to precondition or regularize inversions of potential-field data using predefined geometrical shapes inside of which a presumably slowly varying or even constant density becomes the unknown inversion parameter.

Tectonic surface deformation operates on very long timescales and hence is another essentially time-independent geophysical observable. The Stokes flow equations link surface velocity to density and viscosity anomalies inside the Earth. However, the gradients with respect to the density are only very diffusely illuminating the subsurface anomaly positions, and sometimes biasing their location towards the measurement surface. Viscosity gradients tend to be localized around the positions of viscosity jumps, thus also lacking direct and precise information on the location of unknown anomalies. Care must be taken to not simply update the viscosity at such positions in order to fit the observed data, which could lock the result into a local minimum of the cost function. Nevertheless, the flow equations are the sole set of the three PDEs considered in this paper that are also sensitive to the effective viscosity. Hence, minimally, they can act as a cross check on density inversions resulting from the gravitational potential.

Under optimal circumstances wave speed inversions provide multiple plausible initial guesses for the location and geometry of subsurface anomalies. The inversion for the density of this anomaly should then fit the gravity signal and at the same time the surface deformation. If the latter remains challenging the initial guess may not have been good enough. Otherwise, the flow equations can be used to, finally, deliver the effective viscosity of the anomaly.

6 Conclusion and outlook

Combining multiple geophysical observables and inverse methods into a coherent framework for the joint multiparameter imaging of subsurface structures has moved from theoretical considerations into the practical realm through the availability of flexible computational routines that allow for data assimilation and inversion under PDE constraints. The promise of improving the topology of individual cost functions by summing them into a combination, and of reducing the non-uniqueness plaguing single methods used individually, has motivated our work. We focused on three classes of geophysical fields: seismic wave propagation influenced by wave speed anomalies, the gravitational disturbing potential caused by density perturbation, and steady-state surface deformation controlled by density and viscosity structure, in three geological settings: a falling block, a subduction zone, and a mantle plume.

Our positive outlook is informed by (i) the comparison of low-dimensional grid-searches within the landscape of summed cost functions, showing those to be preferred over sampling individual cost functions separately; (ii) the calculation of the cost-function gradients and sensitivity kernels of each observable with respect to each design variable, unveiling that gradient combinations outperform sequential optimization; (iii) ideas of preconditioning and regularizing combined inversions, and for geometric level-set, shape-optimizing rather than pointwise approaches.

To reap the benefits of combined inversion approaches, researchers within geophysical subdomains should continue to make their data and models widely available, so that they can be used as initial models, additional constraints, preconditioners, or regularizers, across the disciplines. Our work has furthered the cause by laying out the technical pieces to efficiently calculate the gradients of the three geophysical observables described above, with respect to the design variables of density, wavespeed and viscosity, using an adjoint-state framework.

The simple examples and straightforward algorithms presented here do not point to a single “best-performing” coupled inversion procedure, but high-level languages such as FEniCS will continue to make it affordable for others to develop sandbox testing frameworks for combined inversions following the techniques and procedures described in this paper. All of our examples have been deterministic and gradient-based, stopping short of evaluating Hessian matrices to quantify uncertainties, but it is clear that the embedding of our procedures within more sophisticated statistical sampling approaches will open up fruitful new avenues of research.

We imagine a future in which additional observations such as magnetic anomaly measurements (Lewis and Simons 2012), electric resistivity probings (Robbins and Plattner 2018) and ground-penetrating radar (Domenzain et al. 2018; Plattner 2020), would form part and parcel of a holistic geophysical inversion scheme. Even within our own limited framework we imagine including additional complexity in the physical description of the Earth system, such as incorporating full elasticity and even anelastic viscous dampening of the seismic wavefield (Askan et al. 2007, 2010; Pan and Wang 2020; Tarantola 1986, 1988), including visco-elasto-plastic effects in the Stokes equation (Peltier 1985), and allowing for additional kernel evaluations, e.g., for internal friction, with additional coupling between Hooke’s and Stoke’s problems through the elastic shear modulus. However, our examples also illustrate that for a combined inversion approach to be truly proclaimed superior to a sequential one, it needs to be thoroughly tried and tested on real-world examples, which might impose on future studies additional technical developments.