A review of inverse methods in seismic site characterization

Gosselin, Jeremy M.; Dosso, Stan E.; Askan, Aysegul; Wathelet, Marc; Savvaidis, Alexandros; Cassidy, John F.

doi:10.1007/s10950-021-10047-8

A review of inverse methods in seismic site characterization

Original Article
Open access
Published: 19 April 2022

Volume 26, pages 781–821, (2022)
Cite this article

Download PDF

You have full access to this open access article

Journal of Seismology Aims and scope Submit manuscript

A review of inverse methods in seismic site characterization

Download PDF

Jeremy M. Gosselin ORCID: orcid.org/0000-0002-0375-4102¹,
Stan E. Dosso²,
Aysegul Askan³,
Marc Wathelet⁴,
Alexandros Savvaidis⁵ &
…
John F. Cassidy^2,6

4212 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

Seismic site characterization attempts to quantify seismic wave behavior at a specific location based on near-surface geophysical properties, for the purpose of mitigating damage caused by earthquakes. In recent years, techniques for estimating near-surface properties for site characterization using geophysical observations recorded at the surface have become an increasingly popular alternative to invasive methods. These observations include surface-wave phenomenology such as dispersion (velocity-frequency relationship) as well as, more recently, full seismic waveforms. Models of near-surface geophysical properties are estimated from these data via inversion, such that they reproduce the observed seismic observations. A wide range of inverse problems have been considered in site characterization, applying a variety of mathematical techniques for estimating the inverse solution. These problems vary with respect to seismic data type, algorithmic complexity, computational expense, physical dimension, and the ability to quantitatively estimate the uncertainty in the inverse solution. This paper presents a review of the common inversion strategies applied in seismic site characterization studies, with a focus on associated advantages/disadvantages as well as recent advancements.

Flexible multimethod approach for seismic site characterization

Article Open access 22 July 2022

A review of near-surface QS estimation methods using active and passive sources

Article Open access 16 March 2022

Seismic site characterization with shear wave (SH) reflection and refraction methods

Article Open access 01 January 2022

1 Introduction

Seismic site characterization encompasses a variety of approaches to assess the hazards associated with earthquake ground shaking at a specific location, and is of key importance in mitigating damage caused by earthquakes. Site characterization generally requires estimation of the structure and geophysical properties of the shallow subsurface for the purpose of predicting seismic wave behavior. As seismic waves propagate into materials with lower seismic impedance (e.g., near-surface sediments and soils), the wave amplitude increases due to conservation of energy (Anderson et al. 1986; Anderson et al. 1996). Amplification of seismic waves also occurs at specific frequencies due to resonances within near-surface layers. Furthermore, larger-scale two- and three-dimensional (2D and 3D) structures such as sedimentary basins can trap and focus seismic energy, amplifying waves (e.g., Bard and Bouchon 1985; Campillo et al. 1989; Graves et al. 1998). Consequently, the geophysical properties at a site significantly affect the amplitude, frequency content (spectrum), and duration of ground motions during an earthquake. Greater knowledge of these properties (e.g., subsurface seismic velocities, density, and attenuation), including their spatial distribution, is therefore critical for understanding and predicting site-specific seismic hazards. Geophysical inversion provides a variety of approaches to estimate in situ sub-surface properties from surface observations, and often represents a convenient and economical alternative to invasive approaches (e.g., drilling and down-hole methods).

The general topic of geophysical inversion is considered in several texts (e.g., Parker 1994; Tarantola 2005; Menke 2018; Aster et al. 2018) and reviews in the literature (e.g., Jackson 1972; Parker 1977; Treitel and Lines2001; Sambridge and Mosegaard 2002). This paper presents a review of common inversion approaches for estimating geophysical models from seismic data for the purpose of site characterization. The paper is structured roughly in order of increasing complexity of the approaches reviewed. Many of the same inversion approaches considered here have been applied to estimate earth structure from local to global scales. Furthermore, these approaches have been applied to many other types of geophysical data (e.g., gravity, magnetic, electromagnetic). It is not intended, nor is it possible, for this paper to provide an exhaustive account of the seismic site characterization literature. It is also not the intention to promote one inversion approach over others, but rather to summarize for the reader the variety of techniques that are available (with a focus on recent advancements), and to consider some of the associated advantages/disadvantages.

The goals of this paper are to review the theory, assumptions, limitations, and practical application of inversion methods commonly used for seismic site assessment and characterization. Many factors in the inverse problem can influence the recovered model, and impact site characterization and predicted earthquake response. The range of issues considered here is broad and varies with respect to seismic data type, algorithmic complexity, computational expense, physical dimension, and the ability to quantitatively estimate the uncertainty in the inverse solution. Given the range of methods discussed here, this paper first provides a brief overview of some general aspects of inversion (applicable to all problems) in Section 2 before discussing specific inversion approaches in more detail. Most approaches considered here are based on recovering an optimal (best-fit) model of the shallow subsurface. Linearized and fully non-linear approaches to finding optimal one-dimensional (1D) models of subsurface structure are reviewed in Sections 3 and 4, respectively. As an alternative to methods that find a single optimal solution, Bayesian approaches which provide a probabilistic result are also reviewed in this paper in Section 5. Many studies in seismic site assessment are interested in more-complex 2D and 3D structures, such as sedimentary basins. Surface-wave tomography and full-waveform inversion (FWI) are two approaches for estimating 2D and 3D models that have recently become more applicable in site assessment studies, and are reviewed in Sections 6 and 7, respectively.

2 Theoretical overview

2.1 Models and data

Inversion can be defined as the estimation of the parameters of a postulated model that represents a physical system of interest, using observations of some process that interacts with the system (i.e., data). In the case of seismic site characterization, the physical system is the geophysical structure of the shallow subsurface, and the data are observations of seismic waves that interact with this structure. It is important to recognize that the model always represents an idealization of the actual physical system. For example, a model consisting of a 1D profile of seismic velocity assumes the subsurface is laterally homogeneous (and typically isotropic). An important issue in all inverse problems is whether the postulated model adequately represents the physical system. For example, assuming lateral homogeneity when the subsurface is actually laterally heterogeneous introduces modelling errors that can bias or preclude meaningful results. Such cases may necessitate more-complex 2D or 3D models of the system. Furthermore, within the context of site-specific seismic hazard assessment, such 2D or 3D models may ultimately be required to accurately predict the response of a site during an earthquake (e.g., Campillo et al. 1990; Trifunac2016).

In the case of a 1D profile, model parameters commonly represent the shear-wave velocity (V_S) and compressional-wave velocity (V_P) within discrete layers. Other parameters such as density (ρ) and attenuation are also considered. However, it is commonly accepted that the response of a site to an earthquake is predominantly influenced by the V_S structure. In 2D or 3D models, parameters typically represent seismic velocities of discrete cells in a spatial grid. The number and sizes of layers or cells in the model are also issues in model selection. Depending on the particular approach, these spatial properties may also be considered as model parameters estimated in the inversion (e.g., the number and thicknesses of layers in a 1D scenario).

As mentioned above, the data considered in this review to constrain subsurface structure represent observations of seismic wave phenomenology (other types of geophysical data are sometimes used for site characterization, but seismic data are the most common and informative). Common seismic data include measurements of the dispersion (variation of phase or group velocity with frequency) and the horizontal-to-vertical spectral ratio (HVSR) of surface waves. Both of these data types can be extracted from ambient seismic noise, although controlled-source and earthquake recordings are also used. In some cases, surface-wave attenuation curves can be measured from multi-channel active-source recordings. Detailed discussions on the measurement and processing of the various data types here are included in other review papers in this issue. Furthermore, many site characterizations studies estimate near-surface structure using body-wave (active-source) imaging methods (e.g., Williams et al. 2000). These problems are formulated differently than those discussed in this review, and are considered in other review papers in this issue. In the case of tomographic inversion, the data typically represent the travel times of particular seismic phases. In FWI, rather than considering the data to be specific features of a recorded seismogram (such as wave amplitudes or travel times), the data are taken to be the seismogram itself, which contains more information (and associated complexities).

Many site characterization applications are based on estimating specific regulatory-based representations of near-surface structure (e.g., the travel-time averaged V_S of the upper 30 m, known as V_S30). Such site-characterization parameters can be extracted from seismic inversion results. However, it is worth noting that other studies estimate these site parameters directly, based on empirical relationships for surface-wave data at particular wavelengths (e.g., Martin and Diehl 2004; Albarello and Gargani2010), or other proxies such as surficial geology (e.g., Wills and Clahan 2006) and topography (e.g., Yong et al. 2012). As these approaches do not formally represent inversion, they are not considered further in this review, but see Yong (2016) and Savvaidis et al. (2018) for further discussion on this topic.

The near-surface attenuation properties of a site are of significant importance in seismic site assessment. A common technique for studying near-surface attenuation is via the amplitude spectra of earthquake recordings, which typically display a decrease in amplitude at high frequencies that is often modelled by a spectral decay factor κ. The path-corrected component of κ, called κ₀, is believed to be a frequency-independent site-attenuation (and scattering) parameter (e.g., Pilz and Fäh 2017; Palmer and Atkinson 2020). Although useful, κ is typically estimated empirically, and does not represent the result of an inversion. Hence, this topic is not considered further here.

2.2 The forward problem

In order to consider an inverse problem, a solution must be available for the corresponding forward problem (also called the forward model or direct problem). For a given set of model parameters, the forward problem computes (predicts) the data that would be observed for this representation of the physical system (i.e., the forward problem simulates the physical processes that lead to the data). Mathematically, the forward problem represents a mapping from the model-parameter space to the data space, while the inverse problem represents the reverse mapping. Geophysical forward problems (predicting data for models) produce a unique solution that is generally stable in the sense that small changes to the model cause only small changes to the resulting data. In contrast, the inverse problem (estimating model parameters from data) is non-unique (more than one model can fit the data) and can be unstable (i.e., small changes to the data, such as errors, can cause large changes in the recovered model). Hence, solutions to inverse problems should always be appraised critically, and quantitative uncertainty estimation, when possible, is valuable.

To briefly consider the forward problem for the data types mentioned above, predicting surface-wave dispersion for laterally homogeneous models usually employs efficient numerical methods such as the Thomson-Haskell propagator-matrix (Thomson 1950; Haskell 1953; Knopoff 1964; Dunkin 1965; Gilbert and Backus 1966). For HVSR, a consensus for the cause of the shape of the spectral amplitude ratio has not been established (Lunedei and Malischewsky 2015; Molnar et al. 2018). As such, a variety of techniques have been used to solve the HVSR forward problem, with the various differences (or simplifications) in the underlying theories representing possible sources of error. Most 1D forward models are based on discrete layers with uniform geophysical properties within each layer. Model parameterizations that involve gradients (e.g., a power-law profile) can be parameterized in terms of multiple uniform sub-layers. Recently, spectral-element techniques have been developed to solve the 1D forward problem for more general profile forms including uniform layers, smoothly varying structure, or combinations thereof (e.g., Hawkins 2018). For tomographic inversion, the forward problem must determine the wavefront evolution or ray path for a particular seismic phase to accurately predict travel times (e.g., Rawlinson and Sambridge 2004). FWI requires computation of synthetic seismograms that accounts for complex heterogeneity and scattering effects. For this purpose, numerical approaches for solving the partial differential equations governing seismic wave propagation are typically employed (e.g., Bouchon et al. 1989; Virieux 1986; Komatitsch and Vilotte 1998; Akcelik et al.2003).

All inversion methods discussed in this paper incorporate an underlying approach to solving the corresponding forward problem, with associated assumptions that can impact the model solution. However, further discussion of the forward problem (unique to each data type and problem) is beyond the scope of this review.

2.3 Errors and misfit

The goal of most inversion approaches is to determine the set of model parameters for which the predicted data best fit the observed data. A perfect fit is generally not possible (or desirable) because of data errors, which include measurement errors on the observed data (e.g., due to instrument effects and competing seismic vibrations in the environment) and theory errors on the predicted data (due to simplified model parameterizations and approximate physics of the forward problem). A common measurement error in site characterization studies considering surface-wave dispersion is the misidentification, or interaction, of dominant modes in the data (O’Neill and Matsuoka 2005). Data errors (sum of measurement and theory errors) can be considered to represent all factors that cannot be modelled or accounted for in the inverse problem. Measurement errors are often considered to be statistical (i.e., aleatoric), whereas theory errors typically introduce systematic (i.e., epistemic) errors. In some cases, measurement errors can be characterized statistically from repeated sets of observed data. However, this approach does not apply to theory errors, the statistics of which are usually poorly known.

In many inversion approaches, the data misfit function is formulated based on knowledge or assumptions on the error processes. Considering the difference between observed and predicted data (called data residuals) to represent the errors, the misfit function can be derived by assuming a particular statistical distribution of the residuals given the model, which is then interpreted as the likelihood of the model. For example, the common assumption of Gaussian-distributed errors leads to an L₂ misfit function (negative log-likelihood function) consisting of the sum of squared residuals, weighted by the inverse error covariance matrix. The Gaussian assumption is supported by the Central Limit Theorem, which states that the sum of a number of error processes tends to a Gaussian distribution regardless of the distributions of the individual processes. Further, L₂ misfit minimization corresponds to least-squares methods, for which analytic solutions exist for linear and linearized problems. However, the assumption of Gaussian errors and least-squares misfit can be inappropriate for data sets containing outliers (data with improbably large errors). In such cases, the assumption of Lapalacian errors (i.e., a two-sided exponential distribution) is more robust. This assumption leads to an L₁ misfit function based on the sum of absolute values of residuals, weighted by the square-root of the inverse covariance matrix. No analytic solution to L₁ misfit minimization exists.

The L₁ and L₂ misfit functions mentioned above formally require knowledge of the error covariance matrix, which is often not available. However, under the assumption of independent, identically distributed (IID) errors (i.e., a diagonal covariance matrix with constant variance), the problem of misfit minimization is independent of the covariance matrix and unweighted misfit functions apply. This approach is commonly used for misfit minimization; however, it does not apply to parameter uncertainty estimation where error quantification is required. Furthermore, it should be noted that if the errors are not IID (common for geophysical data), the approach can lead to suboptimal (biased) solutions. In such cases, it may be preferable to estimate the error statistics from the data as part of the inverse problem. This requires variance estimation if the errors can be considered independent, or full covariance estimation for inter-dependent (e.g., serially correlated) errors. Variance/covariance can be estimated using non-parametric approaches where the residuals from an initial unweighted inversion are used to compute a diagonal or Topeplitz (band-diagonal) covariance matrix for a subsequent inversion. Alternatively, parametric approaches can be applied in non-linear (numerical) inversions by including the parameters for a model of the covariance matrix in the inversion.

Misfit functions can also be defined without a statistical foundation to serve the same purpose of quantifying the difference between observed and predicted data. For example, the optimal-transport metric has gained popularity as a misfit function in several seismological inverse problems (e.g., Métivier et al. 2016; Hedjazian et al. 2019). This misfit metric enhances certain desirable properties of the problem such as linearity of the relationship between data and model parameters, as well as the uniqueness of the problem. It is worth noting that several of the inversion strategies reviewed in this paper (particularly non-linear inversions) need not be based on specific assumptions on data errors.

Regardless of how the misfit function is defined, it can be interpreted as a hyper-surface in the model space (i.e., a multi-dimensional function over the parameters). The goal for many inverse methods is to estimate the optimal set of model parameters that represent the global minimum of the misfit surface. An important distinction is whether the inverse problem is linear, weakly non-linear, or strongly non-linear. Linear inverse problems have a single minimum of the misfit function over the parameter space. An idealization of such a misfit function is illustrated in Fig. 1a. In particular, for a linear inverse problem with Gaussian-distributed errors, the L₂ misfit function represents the (negative) logarithm of a Gaussian likelihood function over the parameter space, and an analytic expression for this solution exists.

Most inverse problems in geophysics (including seismic site assessment and characterization) are non- linear. A rare example of a linear problem is the inversion of measured surface-wave attenuation curves to estimate near-surface P-wave and/or S-wave attenuation (described by dimensionless quality factor Q) where, for known velocity structure, the relationship between surface-wave attenuation and body-wave attenuation is linear (e.g., Xia et al. 2002b). In some cases, the non-linearity of an inverse problem may be weak, such that the problem can be solved via linearization, iteratively stepping down the local misfit gradient to the minimum (Section 3). For strongly non- linear problems, the misfit surface can be complex, potentially including multiple disconnected regions with low misfit (i.e., local minima) in the parameter space, as illustrated in Fig. 1b. Linearized inversions can fail (converge to local minima or diverge) for strongly non-linear problems, and non-linear approaches are required (Section 4).

Non-linear inversion in seismic site assessment and characterization comprises a diverse collection of methods. These include the downhill simplex method (DHS, Section 4.1), which moves roughly down the misfit gradient through geometric moves without computing derivatives of the hyper-surface; global search methods such as simulated annealing (SA) and genetic algorithms (GA, Section 4.2), which apply a directed random search based on natural optimization processes; and the neighborhood algorithm (NA, Section 4.3), which sequentially subdivides the parameter search space to converge to the solution. Large-scale problems such as tomography and FWI are typically based on linearization for practical (computational) reasons. An alternative to inversion methods that seek the best-fit model solution (a point estimate in the parameter space), Bayesian inversion is based on probabilistic (numerical) sampling over the parameter space to estimate properties of the posterior probability density (PPD), providing parameter estimates together with quantitative (non-linear) uncertainty analysis (Section 5).

2.4 Parameterization

The approach to parameterizing the model is another important issue in inversion. Adopting too few parameters (e.g., layers in a 1D problem or grid cells in 2D or 3D) can underfit the observed data, leaving model structure unresolved and biasing parameter estimates. Conversely, adopting too many parameters can overfit the data (i.e., fit the errors on the data), resulting in models with spurious (unconstrained) structural features. Furthermore, more model parameters typically means greater computation cost.

The inversion also depends on the form of the model represented by the parameterization. For example, as mentioned above, most 1D site-characterization studies consider the subsurface to be represented by a discrete stack of uniform layers, with discontinuities in geophysical properties at layer boundaries. However, some recent inversions consider gradient (smooth) structures represented by linear or power-law functions (e.g., Molnar et al. 2010) or, more generally, by polynomial basis functions (Gosselin et al. 2017). In some cases, these gradient-based parameterizations have been shown to characterize unconsolidated soils and sediments better than discrete layered models, although this is not universal for all sites.

Some approaches to parameterization, particularly for linearized inversions, include only the geophysical properties for a fixed discretization of the subsurface model, since solving for spatial parameters (e.g., layer thicknesses or cell sizes) can significantly increase the non-lineariy of the problem. In this approach, the model is typically over-parameterized with many layers/ grid cells that are below the spatial resolution of the data to provide flexibility for the solution. The resulting under-determined inversion can be constrained by regularization (described in Section 3) to explicitly control the data misfit and structure of the solution.

An alternative parameterization approach is to solve for the spatial properties of the model (individual layer thicknesses or cell sizes) as part of the problem. This is more often applied in non-linear approaches, where the increased non-linearity is of less concern. This approach typically assumes a small number of layers/cells to constrain model structure, but the issue of over- or under-parameterization is sometimes given little attention beyond qualitative practices such as progressively increasing the number of layers until the data misfit stops decreasing significantly. However, quantitative approaches to parameterization based on formal information criteria or by sampling probabilistically over the number of parameters have been applied in site-assessment inversion, and will be reviewed herein.

2.5 Practical considerations

The inversion approaches discussed in the following sections of this paper have various advantages and disadvantages. For instance, linearized inversions often require the fewest data predictions (forward solutions) and are the least computationally demanding. Bayesian inversion numerically samples the model probability over the parameter space, and can require tens to hundreds of thousands of data predictions. Consequently, Bayesian methods are typically the most computationally demanding, but also provide the most informative solutions. Non-linear optimizations generally fall somewhere between these extremes in terms of computational demand.

Forward solutions for 1D models in seismic site characterization are quite efficient such that inversions can generally be solved in a matter of seconds (for linearized methods) to minutes (for Bayesian methods) with serial algorithms on a desktop computer. In microtremor array methods for local site-assessment applications (e.g., estimating geophysical properties at a site over 10s to 100s metres depth), the 1D assumption is required only over the lateral extent of the seismometer array (also $\sim $10–100 m). For larger-scale studies (e.g., tomographic inversion and FWI for sedimentary basin structure), 2D or 3D parameterizations are generally required. In these cases, the computational demand of the forward problem is much greater and most approaches rely on linearized inversion. Bayesian methods have been applied to large-scale seismic inversions at significant computational expense (e.g., Bodin and Sambridge 2009; Bodin et al. 2012; Gosselin et al. 2021); however, to date, such methods have not been applied to site assessment.

Inverse methods are sometimes treated as a “black box,” with data as input and a model of earth structure as output, but with little consideration of the underlying processes. However, the output model can be non-physical or meaningless if the data are of poor quality or if the error assumptions or model parameterization do not apply. The observed data should be inspected (visually or otherwise) for quality control and, whenever possible, “sanity checks” applied to assess the reliability of model solutions. For example, in HVSR inversion, the peak frequency for the spectral-ratio curve is strongly linked to the depth of the largest seismic impedance contrast in the subsurface (e.g., the soil-bedrock interface), with a lower frequency indicating a deeper interface. If the observed HVSR data possess a low-frequency peak, then it is reasonable to expect the inversion solution to include a large change in seismic velocity at a deep interface. For all inverse problems, the agreement between the observed and predicted data should be examined to ensure a meaningful fit (but note that good data fit is a necessary but not sufficient condition for a meaningful model solution). Examining the data residuals after an inversion can also provide useful insight into the data-error statistics. Further discussion on “guidelines” and details pertinent to the processing and inversion of surface-wave data in particular is provided by Foti et al. (2011), Foti et al. (2018), and Vantassel and Cox (2021).

Due to the non-uniqueness and potential instability of inverse problems, it is important to assess the uncertainty of the model solution, when possible. Understanding and quantifying the uncertainties in estimated near-surface structure (and associated effects on site characterization) is of significant interest for engineering and planning purposes, and has been identified as a critical issue in seismic site characterization (Cornou et al. 2006). However, uncertainty estimation can be challenging for inverse problems, and many methods are approximate and/or qualitative. For instance, linearization errors as well as regularization schemes generally preclude quantitative uncertainty estimation for linearized inversions. Global-search methods are designed to locate the optimal solution, but not estimate uncertainties. However, Bayesian inversion methods can provide rigorous uncertainty quantification for non-linear problems, including seismic site assessment (e.g., Molnar et al. 2010; Dettmer et al. 2012; Gosselin et al. 2017). Furthermore, the estimated inversion uncertainties can be propagated into site-assessment analyses to characterize the site and predict the expected range of seismic amplification and resonance, representing a valuable result for engineers and planners (Molnar et al. 2013; Gosselin et al. 2018).

3 Linearized inversion

As mentioned in the previous section, although inverse problems associated with seismic site characterization are functionally non-linear, in some cases the analytic theory for linear inversion can be applied via local linearization and iteration. This section reviews common linearized inversion techniques and their application to site assessment.

Consider a non-linear inverse problem with vectors of model parameters m, observed data d, and predicted data d(m). Equating the observed data to those predicted for the model sought, and expanding the data functional (forward problem) about a starting model m₀ to first order leads to

$$ \begin{aligned} \mathbf{d} & = \mathbf{d}(\mathbf{m})\\ & \approx \mathbf{d}({\mathbf{m}}_{\boldsymbol{0}}) + \mathbf{A}(\mathbf{m} - {\mathbf{m}}_{\boldsymbol{0}}). \end{aligned} $$

(1)

Here, A is the sensitivity or Jacobian matrix of partial derivatives, A_ij = ∂d_i(m₀)/∂m_0j. Neglecting higher-order terms in the expansion linearizes the inverse problem about the initial model. Rearranging Eq. 1 and defining data and model perturbations δd = d −d(m₀) and δm = m −m₀, respectively, leads to a linear relation:

$$ \boldsymbol{\delta} \mathbf{d} = \mathbf{A} \boldsymbol{\delta} \mathbf{m}. $$

(2)

This is the fundamental relationship between changes in a proposed model and resulting changes in the forward-modelled data, which can be used to refine the initial model.

For some inverse problems, analytic expressions for the required partial derivatives are available, but in other cases they must be estimated numerically (e.g., via finite differences). The sensitivity matrix encapsulates the physics/geometry for the (linearized) problem, and can provide useful insight. For example, Xia et al. (1999) examined the sensitivity matrix for 1D linearized inversion of surface-wave dispersion data, and concluded quantitatively that dispersion data are significantly more sensitive to V_S than V_P or ρ. Furthermore, Xia et al. (2003) determined that, for a given frequency, higher-order modes are more sensitive than the fundamental mode to deeper structure, and thereby provide greater information for, and constraint on, such structure. Note that we discuss the sensitivity matrix within the context of a discrete, parameterized model. The continuous (and analytic) equivalent to the sensitivity matrix are often called the Fréchet derivatives (McGillivray and Oldenburg 1990).

As mentioned previously, the assumption of Gaussian-distributed errors for a linear inverse problem leads to an analytic solution. For Eq. 2, this assumption leads to a likelihood function

$$ L(\mathbf{\delta m}) = \frac{1}{(2\pi)^{N/2}|\mathbf{C}|^{1/2}} \exp\left[ -\frac{{\varPhi}(\boldsymbol{\delta} \mathbf{m})}{2}\right], $$

(3)

where

$$ {\varPhi}(\boldsymbol{\delta} \mathbf{m}) = (\boldsymbol{\delta} \mathbf{d} - \mathbf{A}\boldsymbol{\delta} \mathbf{m})^{T}\mathbf{C}^{-1}(\boldsymbol{\delta} \mathbf{d} - \mathbf{A}\boldsymbol{\delta} \mathbf{m}) $$

(4)

is the misfit (negative log-likelihood) function and C is the data error covariance matrix. The best-fit model perturbation δm (for the linearized problem) can be found by maximizing the likelihood or, equivalently, minimizing the misfit: setting ∂Φ(δm)/∂δm = 0 leads to

$$ \boldsymbol{\delta} \mathbf{m} = \left[ \mathbf{A}^{T}\mathbf{C}^{-1}\mathbf{A} \right]^{-1} \mathbf{A}^{T}\mathbf{C}^{-1}\boldsymbol{\delta} \mathbf{d}. $$

(5)

The model solution is given by m₁ = m₀ + δm. Since higher-order terms were neglected, this may not represent a satisfactory solution, but the procedure can be repeated iteratively until the data are fit appropriately and/or the parameters no longer change between iterations.

The linearized solution, Eq. 5, requires the matrix A^TC^− 1A to be well-conditioned. In practice, depending on the choice parameterization, inversions for seismic site characterization are often ill-conditioned (nearly singular), which leads to unstable inversions. The inversion can be stabilized using singular-value decomposition (e.g., Parolai et al. 2006) and/or by incorporating additional constraints on the model parameters independent of the data.

A common strategy to incorporate constraints for stability is regularization (Fig. 2). Rather than minimizing the data misfit Eq. 4, regularization considers a more general objective function that augments the data misfit with a model misfit term

$$ \begin{aligned} {\Psi}(\boldsymbol{\delta} \mathbf{m},\beta) = & (\boldsymbol{\delta} \mathbf{d} - \mathbf{A}\boldsymbol{\delta} \mathbf{m})^{T}\mathbf{C}^{-1}(\boldsymbol{\delta} \mathbf{d} - \mathbf{A}\boldsymbol{\delta} \mathbf{m}) \\ & +\beta (\boldsymbol{\delta} \mathbf{m}-\boldsymbol{\delta} \hat{\mathbf{m}})^{T} \mathbf{R}^{T}\mathbf{R} (\boldsymbol{\delta} \mathbf{m}-\boldsymbol{\delta} \hat{\mathbf{m}}), \end{aligned} $$

(6)

where $\delta \hat {\mathbf {m}}$ represents a preferred value for δm, R represents a weighting matrix (the regularization matrix), and β is a trade-off parameter determining the relative importance of the two terms. Minimizing Eq. 6 with respect to δm leads to the regularized solution which may be expressed by

$$ \begin{aligned} \boldsymbol{\delta} \mathbf{m} & = \boldsymbol{\delta} \hat{\mathbf{m}} + \\ & \left[ \mathbf{A}^{T}\mathbf{C}^{-1}\mathbf{A} + \beta \mathbf{R}^{T}\mathbf{R} \right]^{-1} \mathbf{A}^{T}\mathbf{C}^{-1} \left[ \boldsymbol{\delta} \mathbf{d} - \mathbf{A}\boldsymbol{\delta} \hat{\mathbf{m}} \right]. \end{aligned} $$

(7)

In Eq. 7, the term βR^TR stabilizes the matrix inversion (cf. Eq. 5), given appropriate choices of β and R. The role of β as a trade-off parameter is clear: in the $\lim \beta \!\rightarrow \! 0$ Eq. 7 simplifies to Eq. 5 which minimizes data misfit alone, while the $\lim \beta \!\rightarrow \! \infty $ leads to $\delta \mathbf {m} = \delta \hat {\mathbf {m}}$ which minimizes model misfit. The goal is to determine an appropriate value for β which provides an acceptable data misfit while stabilizing the inversion. If suitable knowledge of the data error covariance is available, β can be chosen to fit the data according to a statistical criterion (e.g., χ² test). If not, a more subjective approach is to plot the data misfit versus the model misfit to determine a balance near the inflection point of this curve (the L-curve method, e.g., Hansen 1992).

The most common form of regularization used in (1D) near-surface seismic inversion sets R = I (the identity matrix) and $\boldsymbol {\delta } \hat {\mathbf {m}} = \boldsymbol {0}$ (i.e., a preference for small linearized step size), such that Eq. 7 becomes

$$ \boldsymbol{\delta} \mathbf{m} = \left[ \mathbf{A}^{T}\mathbf{C}^{-1}\mathbf{A} + \\ \hfill \beta \mathrm{I} \right]^{-1} \mathbf{A}^{T}\mathbf{C}^{-1}\boldsymbol{\delta} \mathbf{d}. $$

(8)

This solution adds β to the main diagonal of A^TC^− 1A to overcome ill-conditioning. For IID errors (i.e., C = σ²I), this method is often referred to as damped least squares. This regularization, favoring small δm, is consistent with local linearization which may only apply in a small region around the initial model.

Levenberg (1944) proposed a strategy for assigning β based on the number of linearized iterations such that β is larger for initial iterations (consistent with the local linear assumption) but gradually decreases for later iterations in order to better fit the observed data near convergence. Marquardt (1963) improved the inversion by specifying R^TR to be the diagonal components of A^TC^− 1A such that the relative weighting of model parameters in the regularization is defined by information in the sensitivity matrix. The Levenburg-Marquart and damped least-squares methods have been used extensively in linearized inversion for 1D seismic site characterization (e.g., Xia et al. 1999; Xia et al. 2002a; Xia et al. 2003; Forbriger2003).

Alternatively, regularizations can be defined to represent first- or second-order spatial derivative operators applied to the parameters to minimize model gradients or roughness, respectively (Constable et al. 1987; Aster et al. 2018). These regularizations minimize complex structure, producing simple (flat or smooth) models. For these regularizations, the linearized problem, Eq. 2, is typically recast as

$$ \boldsymbol{\delta} \mathbf{d} + \mathbf{A}\mathbf{m}_{\boldsymbol{0}} = \mathbf{A}\mathbf{m}, $$

(9)

such that the inversion and regularization are formulated for the updated model rather than the model perturbation. Minimum-structure regularized inversions are often used in 2D and 3D engineering-scale problems in site assessment (considered in Sections 6 and 7).

Linearized inversion may fail by diverging, or converging to a local (rather than global) minimum, if the non-linearity of the problem is strong and/or the initial model is poor. Non-linear methods, discussed in the following section, are designed to overcome these problems. Compared to non-linear methods, linearized inversion typically requires fewer forward operations (data predictions) since they exploit misfit gradient information rather than employ directed random searches. Parolai et al. (2006) considered the inversion of surface-wave dispersion to estimate near-surface 1D V_S structure using linearization as well as two non-linear methods (DHS and GA, discussed in Section 4). Similarly, Lu et al. (2016) compared linearization to SA. Both studies showed that, given a relatively accurate starting model, linearized inversion performed as well as non-linear methods. Some studies have considered multi-step hybrid inversions that initially apply a non-linear approach (e.g., GA) to estimate a good initial model for a subsequent linearized inversion (e.g., Picozzi and Albarello 2007; Lei et al. 2018). Further comparisons between linear and non-linear inversions of dispersion data for site characterization applications are given in Garofalo et al. (2016).

In linear inverse theory, the quality of the model solution can be assessed through the calculation of resolution and model covariance matrices. However, for linearized inversions, these measures suffer from linearization error, and regularization can preclude meaningful results; they do not appear to be commonly used in seismic site characterization.

4 Non-linear optimization

Linearized inversion methods considered in the previous section are sometimes referred to as local searches since, although they move efficiently downhill based on misfit gradient information, they typically remain close to the starting model and are prone to become trapped in local minima. As an alternative, non-linear search (optimization) methods are designed to widely search the space and (ideally) avoid sub-optimal solutions. A variety of non-linear optimization methods have been applied to geophysical inversion for site assessment. This includes DHS, global search methods of SA and GA, and NA, all of which are described in this section. The goal of these methods is to determine the set of model parameter values that minimizes the data misfit via numerical optimization; i.e., the various methods all solve the same problem, but apply different optimization schemes. NA has been most widely used in inversion for 1D seismic site characterization studies due to the availability of a user-friendly software implementation (geopsypack, Wathelet et al. 2020). For this reason, and because the original algorithm has been modified and improved for this application, the NA is reviewed in greater detail in Section 4.3. For a more technical discussion on some of the techniques reviewed in this section (for general applications in geophysics), as well as for further discussion on exploitation vs. exploration of the misfit hyper-surface in geophysical inverse methods, see Sambridge and Mosegaard (2002).

4.1 Downhill simplex

The DHS method (Nelder and Mead 1965) is an optimization method based on a geometric scheme for moving “downhill” in parameter space without calculating partial derivatives. DHS operates on a simplex (convex hull) of M + 1 models in an M-dimensional model parameter space, as illustrated in Fig. 3 for M = 3. Computing the misfit for each model of the simplex provides (limited) information on the local misfit gradient without derivative computations.

The simplex undergoes a series of transformations in order to work its way downhill. Each model is ranked according to its misfit. The algorithm initially attempts to improve the model with the highest misfit by reflecting it through the face of the simplex (as shown in Fig. 3b). If this new model has the lowest misfit in the simplex, an extension by a factor of 2 in the same direction is attempted (Fig. 3c). If the extension further reduces the misfit the result is retained; otherwise it is not. If the model obtained by the reflection still has the highest misfit in the simplex, the reflection is rejected and a contraction by a factor of 2 towards the lowest-misfit model is attempted (Fig. 3d). If none of these steps decrease the misfit below the second highest in the simplex, then a multiple contraction by a factor of 2 in all dimensions toward the lowest-misfit model is performed (Fig. 3e). The above series of steps is repeated until convergence is achieved (generally based on the simplex shrinking to a point in model space) or a maximum number of iterations is reached.

The DHS method moves progressively downhill in misfit without relying on linearization, but as it has limited ability to move uphill (except potentially multiple contractions), it is prone to becoming trapped in local minima in the parameter space. To improve the confidence in finding the global minimum, it is recommended to run the procedure several times, initiated from different starting models.

The DHS algorithm has been widely used in 1D site characterization studies, particularly considering surface-wave dispersion data (e.g., Ohori et al. 2002; Parolai et al. 2006; Zomorodian and Hunaidi 2006). García-Jerez et al. (2016) developed a widely used software package (HV-Inv) that applies the DHS method (among other optional optimization methods) to invert HVSR data for shallow 1D velocity structure. The software also supports the joint inversion of HVSR and dispersion data. Baziw (2002) applied DHS to seismic cone penetration data, which measure seismic body waves rather than surface waves and are typically considered using simplified direct conversions to seismic velocities in soils (as opposed to formal inversions).

As mentioned above and similar to linearized inversion (Section 3), the DHS method is prone to converge to local minima in the parameter space. DHS has been combined with global search methods to exploit the advantages of each. In 1D site characterization studies considering HVSR and/or surface wave dispersion data, the DHS method has often been used in combination with an initial global search method (most-often SA, discussed in Section 4.2) to reduce the possibility of converging to a local minimum (e.g., Alfaro Castillo 2006; Poovarodom and Plalinyot 2013; García-Jerez et al. 2019; Maklad et al.2020).

4.2 Global search: simulated annealing and genetic algorithms

Global-search methods are designed to explore the parameter space widely, and explicitly include the ability to move uphill in misfit in order to escape from local minima in search of a better solution. Two widely used global-search methods in geophysics are SA and GA, which are both based on analogies of non-linear optimization processes that exist in nature.

SA is based on an analogy with the natural optimization process of thermodynamic annealing, by which crystals are grown and metals hardened (Van Laarhoven and Aarts 1987). The optimization algorithm consists of a series of iterations involving random perturbations of the unknown model parameters (representing the thermodynamic system) of $\mathbf {m}\rightarrow \mathbf {m}^{\prime }$, with a resulting change to the data misfit function (analogous to free energy of the system) of ${\varPhi }(\mathbf {m})\rightarrow {\varPhi }(\mathbf {m}^{\prime })$. Perturbations that decrease the misfit are always accepted, while perturbations that increase misfit are sometimes accepted with an acceptance probability given by the Gibbs distribution of statistical mechanics

$$ A(\mathbf{m}^{\prime}|\mathbf{m})= \exp(-\triangle {\varPhi}/T), $$

where $\triangle {\varPhi }={\varPhi }(\mathbf {m}^{\prime })-{\varPhi }(\mathbf {m})$ represents the increase in misfit due to the perturbation and T is a control parameter (analogous to absolute temperature). According to this rule, perturbations that increase misfit are accepted with a conditional probability that decreases with increasing △Φ and decreasing T. Over the process of many such iterations the temperature T is gradually reduced from an initial high value (cooling/annealing the system in thermodynamic terms). Accepting some perturbations that increase Φ allows the algorithm to escape from local minima in search of a better solution. At early iterations (high T), the algorithm searches the parameter space in an essentially random manner. As T decreases, accepting increases in Φ becomes increasingly improbable, and the algorithm spends more time searching regions of low Φ, eventually converging to a solution which should approximate the global minimum.

The starting temperature, rate of reducing T, and the number and type of perturbations define the annealing schedule, which controls the efficiency and effectiveness of the algorithm. Adopting an annealing schedule that is too fast, i.e., decreases T too quickly or allows too few perturbations, can lead to sub-optimal solutions. Alternatively, adopting an annealing schedule that is overly cautious wastes computation time. Determining an appropriate annealing schedule is problem specific and generally requires some experimentation and familiarity with the inverse problem. SA can be related to Markov chain Monte Carlo (MCMC) methods (described for Bayesian inversion in Section 5), with the goal of optimization based on non-convergent sampling with decreasing T rather than probability estimation based on sampling to convergence at unit temperature.

SA has been used extensively in near-surface seismic studies, considering a wide range of seismic data types including surface-wave dispersion (e.g., Beaty et al. 2002; Yamanaka 2005; Pei et al. 2007; Lu et al. 2016), surface wave velocity spectra (e.g., Ryden and Park 2006), HVSR (e.g., García-Jerez et al. 2016), spectral ratios of direct S-waves from earthquakes (e.g., Dutta et al. 2009), and full seismic wavefields (e.g., Tran and Hiltunen 2012).

GA is based on an analogy to biological evolution according to the concept of “survival of the fittest.” GA simulates the genetic evolution of a collection (population) of models through many iterations (generations) to minimize their misfit. This is analogous to maximizing the population’s fitness to a specific ecological niche (Fogel et al. 1966; Holland et al. 1992; Sambridge and Mosegaard 2002). Each generation of models acts as parents for the next generation of (offspring) models through processes designed to mimic selection (pairing of parent models), crossover (recombination of parent genetic information in offspring), and mutation (random variations) in a manner that probabilistically favors models with lower misfits. To facilitate these processes, each model is normally coded as a string of parameters represented in binary (base 2) in terms of a pre-defined number of bits to represent a gene (and thereby discretizing the parameter space). A large variety of approaches to the selection, crossover, and mutation steps exist, which will not be discussed here. Like the annealing schedule for SA, these GA steps (and the tradeoff between their efficiency and robustness) are specific to the problem. GA has been widely used in near-surface seismic studies that invert surface-wave dispersion data (e.g., Yamanaka and Ishida 1996; Yamanaka 2005; Parolai et al. 2006), HVSR data (e.g., Fäh et al. 2001; Fäh et al. 2003), or both (e.g., Parolai et al.2005).

Global-search algorithms generally require subjective choices of tuning parameters that control algorithm performance (e.g., the annealing schedule in SA and the evolutionary steps in GA). These parameters are typically problem and data specific, making it challenging to know if the algorithm is properly tuned. Consequently, it is challenging to make rigorous and objective comparisons between techniques in terms of algorithm efficiency. In any case, Yamanaka (2005) compared several non-linear global-search techniques, including SA and GA, for inverting surface wave dispersion data to estimate shallow V_S structure. They found that both techniques produced comparable results. In a similarly motivated study, Garofalo et al. (2016) compared results from several global-search and local-search techniques (including NA, SA, GA and linearized methods) for inverting surface-wave dispersion to estimate shallow V_S structure. They found these methods all recovered similar velocity profiles over the depth range of data sensitivity. As mentioned in the previous section, Parolai et al. (2006) considered the inversion of surface-wave dispersion to estimate near-surface 1D V_S structure at a site near Cologne, Germany, using linearized inversion as well as DHS and GA. They processed ambient seismic noise recorded on a 2D (100 m by 150 m L-shaped) array of 11 instruments to calculate Rayleigh-wave dispersion at 51 frequencies between 1 and 5 Hz (Fig. 4a). In their implementation of GA, genetic operations were applied to a population of 30 individual models and the inversion repeated numerous times with different random initial populations. The final optimal V_S profile obtained from the linearized, DHS, and GA inversions are shown in Fig. 4b, with the associated predicted dispersion curves shown in Fig. 4a. The recovered V_S profiles from the three inversion methods are very similar, and produce nearly identical predicted dispersion data. Furthermore, Parolai et al. (2006) showed that the estimated site response (calculated as the amplification spectra for vertically propagating SH waves) of the V_S profiles obtained from various inversion strategies were consistent. These studies suggest that the inversion strategies are comparable in terms of recovering an optimal model (assuming a reasonable starting model for linearized inversion or DHS) when inverting high-quality surface-wave data for 1D structure. However, this assumes all other aspects of the inverse problem (i.e., forward physics, data error assumptions, model parameterization, etc.) are correct.

The greatest advantage of global-search algorithms is that they are relatively insensitive to the starting/initial model, unlike linearized inversions. No global-search algorithm is guaranteed to converge to the global minimum in misfit within a finite number of steps, although they can be much more effective than linearization (at increased computational cost). Furthermore, there is no general approach to determine whether the solution obtained actually represents the global minimum, although confidence is increased if repeated runs of the algorithm (with different random initializations) produce similar results. As mentioned previously, several studies in site characterization have adopted global-search techniques (such as SA and GA) as a heuristic approach to determining a suitable initial model for a subsequent local search (e.g., linearized or DHS) inversion (e.g., Alfaro Castillo 2006; Picozzi and Albarello 2007; Poovarodom and Plalinyot 2013; Lei et al.2018; García-Jerez et al. 2019; Maklad et al. 2020). This provides greater assurance of a suitable initial model for local-search success, but, again, does not guarantee the global minimum-misfit solution.

4.3 Neighborhood algorithm

The NA is a popular optimization technique proposed by Sambridge (1999) with numerous applications in a wide range of fields, and in particular to the inversion of surface-wave properties for seismic site assessment (Wathelet et al. 2004).

Like SA and GA, the NA is based on a random search of the multi-dimensional parameter space for the minimum-misfit model. This is illustrated in Fig. 5 for a 2D optimization of the Rastrigin function (Fig. 4a, a non-convex function with multiple local minima often used as a performance test for optimization algorithms). The NA is initiated with a population of random models distributed over the parameter space (coloured circles in Fig. 5b), and tries to orient the random generation of subsequent new models towards regions of the space likely to provide the lowest misfit. This is achieved by forming a neighborhood approximation to the misfit surface. The parameter space is divided into Voronoi cells built from the model locations in the current population (generator points). Any location inside a Voronoi cell is closer to its generator point than to any other model in the population. A constant misfit is assigned to each cell, equal to the misfit of each generating point, leading to a nearest-neighbor interpolation of the misfit function. For example, the Voronoi cell geometry associated with the initial population (20 models represented by coloured circles) is shown in Fig. 5b. The algorithm progresses by randomly generating N_s new models inside only the N_r Voronoi cells with the lowest misfits. These are represented by open circles in Fig. 5b for Ns = 10 and N_r = 10. The data misfits and Voronoi geometry are updated to include these new models, as shown in Fig. 5c, where the coloured circles indicate all models (initial and new). Another set of N_s new models are generated within the N_r lowest-misfit cells (open circles in Fig. 5c), and new Voronoi cells are computed (Fig. 5d). The same process is repeated until convergence to an optimal solution. As described, the NA has only two tuning parameters, N_s and N_r, that control the algorithm behavior between exploration and optimization. Even if it is temporarily trapped in a local minima, NA can quickly evolve to other areas of the parameters space as demonstrated by Sambridge (1999).

The NA implementation for surface-wave inversion proposed by Wathelet et al. (2004) was built around the original Fortran code provided by Sambridge (1999). Di Giulio et al. (2006) published one of the first applications of this inversion method to multiple ambient vibration (microtremor) arrays. The NA core was subsequently re-written in C++ by Wathelet (2008) to provide several improvements detailed below. This latter implementation has been widely used since 2008, with only minor modifications. Renalier et al. (2010) analyzed data from passive and active arrays at 10 documented sites in Europe to propose a parametrization strategy based on several repeated inversion trials with an increasing number of layers. With a database of 14 strong-motion sites in Europe, Di Giulio et al. (2012) addressed the parameterization issue following Akaike’s information criterion. Such quantitative approaches to parameterization based on formal information criteria are discussed further in the next section. Cox and Teague (2016) promoted parameterizations controlled by only a few tuning parameters: the number of layers and a layering ratio for V_P and V_S. This approach was further refined by Vantassel and Cox (2021). The original forward code (and inversion) focused on the phase velocity dispersion curves of Rayleigh waves. However, Love waves, group velocities (Roux et al. 2011), and Rayleigh-wave ellipticity (Hobiger et al. 2013) have also been considered.

Rickwood and Sambridge (2006) proposed an improved algorithm dedicated to parallel computing architectures that is particularly efficient when the computational cost of the forward problem is small. An open-source application dinver (distributed in geopsypack, Wathelet et al. 2020) implements a modified NA with a similar parallel structure. Unlike the algorithm proposed by Rickwood and Sambridge (2006), dinver was developed to minimize computer memory usage, which prevents distribution over multiple computing nodes (limiting the total number of CPUs that can be used simultaneously). Developed primarily for the inversion of dispersion curves and associated observables (autocorrelation and Rayleigh-wave ellipticity curves), dinver runs on a desktop computer with 4 or 8 cores in a reasonable computation time (a few minutes). Several other improvements of the original NA are implemented in dinver that are detailed below. Note that many of these improvements may also be applicable to other global-search techniques.

Wathelet (2008) noticed that, since the Voronoi geometry is not invariant to axis scales, the NA primarily explores parameters that the data are most sensitive to (i.e., V_S of near-surface layers) and tends to neglect the variability of other parameters (V_P or V_S of deeper layers). With a lack of exploration of the deepest layers, the obtained population of models may suggest a better resolution beyond the usual wavelength (λ) rule of thumb: a maximum resolution depth between λ/3 and λ/2 (Cox and Teague 2016). Wathelet (2008) implemented dynamic parameter axis scaling that maintains the region of interest to an equal size in all dimensions to overcome this. Wathelet (2008) also implemented parameter conditions such as constraining Poisson’s ratio (e.g., from 0.25 to 0.5), avoiding multiple low-velocity zones, parameterizing interface depths or layer thicknesses, and limiting layer thicknesses to a minimum percentage of the total depth (e.g., 5%). Furthermore, Wathelet (2008) showed the advantage of parameterizing interface depths instead of layer thicknesses in Monte Carlo inversions of surface-wave data, as this avoids uncontrolled prior information. Specifically, for a model defined by a set of layers with randomly generated thicknesses, the depth distribution of the deepest layers tends to a normal distribution (as supported by the Central Limit Theorem). Consequently, this parameterization can introduce structure that is not supported by the data. In the NA, the generation of new models that fulfill all parameter conditions requires that the current population of models also fulfill these conditions. For very small Voronoi cells, precision errors can violate this assumption, leading to unpredictable results. This issue is solved by using discrete parameters (i.e., parameters can only take a predetermined discrete set of values), even though these geophysical parameters are physically continuous.

In the original NA, the sampling density around the best models is directly influenced by the total number of generated models, which is a subjective tuning parameter. With parameter discretization, there is a minimum distance between models that limits the sampling density. The algorithm is able to explore different regions of the parameter space once the sampling density limit is reached for current regions. Thus, the total number of models in the population controls only the exploration level, not the sampling density. This has the effect of improving overall exploration of the parameter space. For instance, in Fig. 5d, none of the N_r cells with the lowest misfit are located near the local minimum of the Rastrigin function in the upper right corner. This region of the parameter space is not explored unless there is a mechanism to control the sampling density. Figure 6 compares parameter exploration for continuous and discrete parameterizations distributed on a logarithmic scale (other scale distributions can also be considered). The comparison is based on the inversion of a synthetic dispersion curve between 2 and 20 Hz computed for the earth model given in Table 1. The inversion is run five times with the same 11-parameter model (4 layers). V_S is allowed to vary from 50 to 3500 m/s, and interface-depths from 1 to 100 m for all layers. In Fig. 6a, the parameters are distributed continuously, while in Fig. 6b parameters are discrete. The figure shows improved relative exploration of the parameter space for discrete parameters.

Table 1 Synthetic earth model used in Figs. 6 and 7

Full size table

Figure 7 compares linear and logarithmic scalings for discrete parameters based on five repeated inversions of each for the same 11-parameter example. Figure 7a shows that the five NA inversion runs with linear parameter scaling are trapped in a local minimum, which is not the case for the inversions with logarithmic parameter scaling. For interface-depth parameters, the linear scale provides a higher probability of generating deep layers; conversely, the logarithmic scale provides higher probability for shallow layers. Figure 7b and d show that V_s structure below $\sim $30 m depth is not resolvable by the dispersion data. In Fig. 7b, all layer interfaces are found below $\sim $20 m depth, where the sensitivity of the dispersion data to V_s starts to diminish. Therefore, the variability (non-uniqueness) linked to the very shallow part is better explored in the logarithmic case. The conditions designed to avoid very thin layers, the data constraints, and a reasonable minimum depth imposed by the parameterization (1 m in this case for a minimum λ of 10 m) prevent the aggregation of all layers in the shallow part.

The selection of the N_r best models for which the corresponding neighborhoods are sampled is based on the calculated misfit value. For most problems (as discussed in Section 2), this is defined by the L₂ misfit normalized by the observed data uncertainties and by the number of data (Wathelet et al. 2004). Hence, a misfit value of unity indicates that the predicted data fit the observed data to (on average) one standard deviation. The NA algorithm can still be strongly influenced by small details in the observed data, and attempts to fit small scatters (even if they are assigned large standard deviations). Lomax and Snieder (1994) introduced the concept of an acceptable misfit level for GA inversions that implicitly considers only first-order details of the misfit function. Their inversion process was aimed at building as large an ensemble as possible of models that fit the observed data to a reasonable level. This represents a simplified view of data errors that transforms a Gaussian error distribution into a uniform distribution. This idea was also tested with the NA by Sambridge (2001).

Hollender et al. (2018) and Chmiel et al. (2021) used a flattened misfit at a smaller scale (for each frequency sample), as implemented in dinver. Compared to a usual misfit threshold value, this is a more demanding definition of an acceptable model as it requires all predicted data to be within one standard deviation of the observed data. However, as noted by Sambridge (2001), the algorithm looses part of the distance information contained in the misfit which can negatively affect convergence in high-dimensional parameter spaces. Generating models with identical misfit values has practical consequences for the way the NA behaves. Once all N_r Voronoi cells have an equal misfit value, all new models with the same misfit must also be included in the area of interest. Hence, dinver implements a dynamic N_r that increases each time a new good model (with identical misfit value) is found. This results in relatively uniform sampling (rather than oversampling) of the acceptable region of the parameter space. However, the population of models obtained does not necessarily reproduce the data uncertainty distribution. Interestingly, joint inversions of distinct observables such as dispersion data and HVSR result in an ensemble of models that are acceptable (i.e., within one standard deviation) with respect to both data types. This avoids the use of subjective weighting for each misfit component (i.e., data type) which is usually problem specific.

5 Bayesian inversion for probabilistic site characterization

Bayesian inference approaches to geophysical inversion are based on quantifying the PPD of earth models given observed data and prior information. Bayesian methods were first applied to active-source dispersion data by Schevenels et al. (2008) and Socco and Boiero (2008), and to ambient-noise (microtremor) dispersion by Foti et al. (2009). Furthermore, Cipta et al. (2018) applied a Bayesian inversion method to HVSR data. However, this section concentrates on a series of studies (Molnar et al. 2010; 2013; Dettmer et al. 2012; Gosselin et al. 2017; Gosselin et al. 2018) that developed a rigorous and quantitative overall approach, and which considered common microtremor dispersion data sets (with colocated invasive measurements) for direct comparison and evaluation.

To quantify model probability over the multi-dimensional parameter space, Bayesian inversions generally seek to compute statistical properties of the PPD. PPD properties of interest include parameter estimates, such as most-probable and mean values. More significantly, Bayesian inversion can quantify parameter uncertainties, which can be expressed as variances/covariances, credibility intervals, and (most notably) marginal probability densities. Since rigorous uncertainty analysis, rather than point estimation, is central to Bayesian inference, non-linear inversion and rigorous model selection assume greater significance. In the common case where appropriate earth and error models are not known a priori, they can be estimated from the data, as part of the inverse problem.

To avoid linearization errors in Bayesian inversion, non-linear parameter and uncertainty estimation is carried out numerically, typically employing Markov-chain Monte Carlo (MCMC) methods, as discussed in Section 5.1. MCMC draws random samples of the parameters from the PPD, such that statistical parameter/uncertainty estimates can be computed from the ensemble of samples. Efficient sampling strategies are key for practical inversion algorithms.

In terms of model selection, determining an appropriate earth-model parameterization is an important aspect of quantitative inversion. Over-parameterizing the model under-constrains parameters and can lead to spurious structure and to over-estimating parameter uncertainties. Conversely, under-parameterization can leave structure unresolved, biasing parameter estimates and under-estimating uncertainties. An objective approach to model selection is to choose the simplest parameterization that explains the data as quantified by the Bayesian information criterion (BIC), which is discussed in Section 5.2. Another approach is to marginalize over a set of possible parameterizations. Since this involves probabilistic sampling over models with different numbers of parameters (model dimensions), this approach is referred to as trans-dimensional (trans-D) inversion. Trans-D inversion, described in Section 5.3, has the advantage that the uncertainty in the parameterization is included in the parameter uncertainty estimates. Both of these model-selection approaches avoid subjective regularizations in the inversion, which can preclude meaningful uncertainty estimation.

Defining the data error model is another important component of rigorous uncertainty estimation. In many practical cases, the error distribution, including both measurement and theory errors, is not well known. The lack of specific information suggests that a simple distribution be assumed, with statistical parameters estimated from the data. Error models considered here are based on the assumption of Gaussian-distributed errors of unknown covariance, with the covariance matrix either estimated from data residuals or represented by an autoregressive (AR) process, as considered in Section 5.4.

In seismic site assessment, a number of studies have applied Bayesian methods to the inversion of Rayleigh-wave dispersion data derived from ambient seismic noise recorded at a small geophone array. The primary goal is to estimate the V_S profile (V_P and ρ profiles are also estimated in the inversion, but are of less significance and not considered here).

An important advantage of the Bayesian approach is that it is straightforward to propagate the uncertainty analysis for V_S directly into uncertainties in site assessment factors, which represent the ultimate goal of the work. This cannot be accomplished with other approaches reviewed in this paper. Assessment factors of interest include V_S30, which is used to categorize sites into classes according to the National Earthquake Hazards Reduction Program (NEHRP). V_S30 is also used to predict peak ground velocity (PGV) and peak ground acceleration (PGA) amplification factors (relative to hard ground). However, amplification spectra calculated using the full V_S profile generally provide better indicators of site amplification than V_S30-based measures (Boore and Atkinson 2008). In Section 5.5, the inversion methods described in Sections 5.1–5.4 are illustrated for dispersion data from two distinct geological settings (Molnar et al. 2010), with comparisons to invasive measurements and calculation of probabilistic site assessments (Molnar et al. 2013).

5.1 Non-linear inversion: MCMC sampling

To briefly describe non-linear Bayesian inversion, let ${\mathscr{M}}$ represent the choice of model, with m the corresponding set of M unknown model parameters, and let d be N observed data. Assuming data and parameters to be random variables, they are related by Bayes’ theorem

$$ P(\mathbf{m}|\mathbf{d},\mathcal{M})= \frac{P(\mathbf{m}|\mathcal{M}) P(\mathbf{d}|\mathbf{m},\mathcal{M})}{P(\mathbf{d}|\mathcal{M})}. $$

(10)

In Eq. 10, $P(\mathbf {m}|{\mathscr{M}})$ is the prior probability, representing available information for the model parameters (given a choice of model), independent of the data. $P(\mathbf {d}|\mathbf {m},{\mathscr{M}})$, the conditional probability of d given m and ${\mathscr{M}}$, defines the data information. Interpreted as a function of d, this represents the residual error distribution. However, when d is considered fixed (once data are observed), the term is interpreted as the likelihood of the parameters, ${ L}(\mathbf {m}|{\mathscr{M}})$. The normalization term, $P(\mathbf {d}|{\mathscr{M}})$, is referred to as Bayesian evidence. $P(\mathbf {m}|\mathbf {d},{\mathscr{M}})$ defines the PPD, representing the state of parameter information given the data and prior. Henceforth, the dependence on model ${\mathscr{M}}$ is suppressed for simplicity when not required.

Non-linear Bayesian inversion is typically based on using MCMC methods to draw (dependent) random samples from the PPD while satisfying the requirement for reversibility of the Markov chain (Gilks et al. 1996; Sambridge and Mosegaard 2002). In particular, Metropolis-Hastings sampling constructs a Markov chain by applying a proposal density function $Q(\mathbf {m}^{\prime }|\mathbf {m})$ to generate new parameters $\mathbf {m}^{\prime }$ based only on the current values m, and accepting the proposed parameters as the next step in the chain with probability

$$ A(\mathbf{m}^{\prime}|\mathbf{m}) = \min \left[1, \frac{Q(\mathbf{m}|\mathbf{m}^{\prime})} {Q(\mathbf{m}^{\prime}|\mathbf{m})} \frac{P(\mathbf{m}^{\prime})} {P(\mathbf{m})} \frac{{ L}(\mathbf{m}^{\prime})} {{ L}(\mathbf{m})} \right]. $$

(11)

The acceptance criterion is applied by drawing a uniform random number ξ on [0,1] and accepting the new parameters if $\xi \le A(\mathbf {m}^{\prime }|\mathbf {m})$. If the proposed step is rejected, another copy of the current model is included in the Markov chain. Once convergent sampling is achieved, parameter estimates and uncertainties can be computed statistically from the ensemble of samples (omitting an initial burn-in stage while stationary sampling is established).

The choice of proposal density $Q(\mathbf {m}^{\prime }|\mathbf {m})$ controls the efficiency of MCMC sampling. The goal is to achieve a well-mixed Markov chain that efficiently samples the parameter space, avoiding both small, ineffectual perturbations and high rejection rates. In Bayesian microtremor array inversion, Molnar et al. (2010) and Molnar et al. (2013) applied an efficient proposal density based on principal-component (PC) decomposition of the parameter covariance matrix (Dosso and Wilmut 2008). The PC decomposition provides both directions and length scales for effective parameter updates. Perturbations are applied in a rotated parameter space where the axes align with the dominant correlation directions (i.e., PC parameters are uncorrelated). The PC proposal is initiated from an analytic linearized estimate that is subsequently updated with a non-linear estimate from the on-going sampling (a diminishing adaptation). To achieve wide sampling over a potentially multi-modal parameter space, Gosselin et al. (2017) and Gosselin et al. (2018) applied the method of parallel tempering (Earl and Deem 2005; Dosso et al. 2012), which is based on a series of interacting Markov chains with successively relaxed likelihoods raised to powers 1/T ≤ 1, where T is referred to as the sampling temperature (similar to the temperature parameter in SA, discussed in Section 4.2). While only samples collected at T = 1 are unbiased and retained for analysis, probabilistic interchange between chains provides a robust and efficient sampling of the parameter space.

5.2 Model selection: Bayesian information criterion

As mentioned earlier, determining an appropriate model parameterization is an important aspect of Bayesian inference. In Eq. 10, the Bayesian evidence $P(\mathbf {d}|{\mathscr{M}})$, which formally represents the conditional probability of the data for a particular model, can be considered the likelihood of the model given the data. Hence, a natural approach to model selection is to seek the model that maximizes the evidence. Unfortunately, the integral defining evidence is particularly challenging to evaluate numerically to sufficient precision (Chib 1995). Commonly, an asymptotic point estimate of evidence, the BIC, is applied, defined by (Schwarz 1978)

$$ \begin{array}{ll} \text{BIC}(\mathcal{M}) & \approx -2\log_{e} P(\mathbf{d}|\mathcal{M})\\ & = - 2\log_{e} { L}(\hat{\mathbf{m}}|\mathcal{M}) + M\log_{e} N, \end{array} $$

(12)

where $\hat {\mathbf {m}}$ is the maximum-likelihood parameter estimate. Since the BIC approximates the negative of evidence, the model with the smallest BIC over a set of possible models is selected as the most appropriate choice. The first term on the right of Eq. 12 favors models with high likelihood (low data misfit); however, this is balanced by the second term which applies a penalty for additional parameters. Hence, minimizing the BIC provides the simplest model parameterization consistent with the resolving power of the data.

In Bayesian inversion for site assessment, Molnar et al. (2010) and Molnar et al. (2013) used the BIC to choose between models based on several fixed functional forms, including layered profiles and linear or power-law gradients (in all cases, layers thicknesses were unknowns included in the parameterization, as were the properties of an underlying semi-infinite basement). Gosselin et al. (2017) and Gosselin et al. (2018) considered depth-dependent model parameterizations in terms of Bernstein polynomials (BP) (Farouki and Rajan 1987; Quijano et al. 2016), which provide general (smooth) gradient forms. In this approach, a geophysical profile is represented as a B th-order BP in terms of B + 1 basis functions with the corresponding coefficients (weights) comprising the unknown parameters, as as well as gradient-layer thickness and basement properties. The BIC was applied to determine the polynomial order. While other sets of basis functions could be considered, BPs have the desirable property of optimal stability in regard to coefficient perturbations (as applied in MCMC sampling).

5.3 Model selection: Trans-D inversion

Trans-D inversion addresses model selection by sampling probabilistically over models with differing numbers of parameters (Green 1995; Malinverno 2002; Sambridge et al. 2006; Dettmer et al. 2010; Dosso et al. 2014). Let k index the choice from $\mathcal {K}$ possible models; Bayes’ theorem for hierarchical models can be written (Green 1995)

$$ P(k,\mathbf{m}_{k}|\mathbf{d}) = \frac{P(k) P(\mathbf{m}_{k}|k) P(\mathbf{d}|k,\mathbf{m}_{k})}{\displaystyle{\sum}_{k^{\prime} \in \mathcal{K}}\! \int\!\! P(k^{\prime}) P(\mathbf{m}^{\prime}_{k^{\prime}}|k^{\prime}) P(\mathbf{d}|k^{\prime},\mathbf{m}^{\prime}_{k^{\prime}}) \mathrm{d}\mathbf{m}^{\prime}_{k^{\prime}}}. $$

(13)

In Eq. 13, P(k)P(m_k|k) is the prior probability of the state (k,m_k), P(d|k,m_k) is interpreted as the likelihood L(k,m_k), and P(k,m_k|d) is the trans-D PPD. The PPD can be sampled numerically using the reversible-jump Markov chain Monte Carlo (rjMCMC) algorithm, which accepts a proposed transition between the current state (k,m_k) and a proposed state $(k^{\prime },\mathbf {m}^{\prime }_{k^{\prime }})$ with a probability given by the Metropolis-Hastings-Green criterion (Green 1995)

$$ \begin{array}{@{}rcl@{}} &&{} A(k^{\prime},\mathbf{m}^{\prime}_{k^{\prime}}|k,\mathbf{m}_{k}) = \\ && \min \left[ 1,\frac{P(k^{\prime}, \mathbf{m}^{\prime}_{k^{\prime}})}{P(k, \mathbf{m}_{k})} \frac{{L}(k^{\prime}, \mathbf{m}^{\prime}_{k^{\prime}})}{{L}(k, \mathbf{m}_{k})} \frac{Q(k,\mathbf{m}_{k}|k^{\prime},\mathbf{m}^{\prime}_{k^{\prime}})}{Q(k^{\prime},\mathbf{m}^{\prime}_{k^{\prime}}|k,\mathbf{m}_{k})} |\mathbf{J}| \right], \end{array} $$

(14)

where $Q(k^{\prime },\mathbf {m}^{\prime }_{k^{\prime }}|k,\mathbf {m}_{k})$ is the proposal probability density and |J| is the Jacobian determinant for the transformation between parameter spaces (|J| = 1 for the rjMCMC algorithm described here).

Dettmer et al. (2012) applied trans-D inversion to microtremor array data to consider earth models with unknown numbers of uniform layers. The model parameters consisted of k interface depths z_k above a maximum depth $z_{\max \limits }$, and geophysical parameters for each of the k + 1 layers (including the basement). rjMCMC sampling involved three types of steps, chosen randomly at each iteration: perturbation, birth, or depth. In a perturbation step, the parameterization is unchanged but changes to existing parameter values are proposed. A birth step proposes adding a layer by uniformly sampling a new interface depth on $[0, z_{\max \limits }]$ and choosing geophysical parameters from a Gaussian proposal density centred on the current values at the depth of the new interface. A death step proposes removing a random interface and setting the parameters of the resulting (thicker) layer to those either above or below the old interface. After collecting a large (convergent) trans-D ensemble of model samples, the number of interfaces is marginalized over in considering results.

5.4 Error model and likelihood function

Defining the error model requires specifying the statistical distribution of residual errors, which is often not well known. The lack of specific information suggests that a simple distribution be assumed, with statistical parameters estimated from the data. Assuming Gaussian-distributed errors with an unknown error covariance matrix C, the likelihood is

$$ L(\mathbf{m}, \mathbf{C}) = \frac{1}{(2\pi)^{N/2}|\mathbf{C}|^{1/2}} \exp\left[ -\frac{1}{2} \mathbf{r(m)}^{T}\mathbf{C}^{-1}\mathbf{r(m)}\right], $$

(15)

where r(m) = d − d(m) are data residuals. Errors are often assumed to be IID random variables. However, significant error correlations can occur, and neglecting these can bias parameter estimates and under-estimate uncertainties.

A variety of approaches have been applied to address error covariance in Bayesian microtremor array inversion. Molnar et al. (2010) and Molnar et al. (2013) applied a non-parametric approach to estimate C based on residuals from the optimal model of an initial inversion assuming IID errors. In this approach, the residuals are considered a realization of the error process from which statistical quantities can be estimated. Assuming the residuals represent an ergodic random process, a Toeplitz (diagonally-banded) covariance matrix can be estimated from the residual autocovariance (Dosso et al. 2006); this covariance-matrix estimate is then used in a subsequent Bayesian inversion. Dettmer et al. (2012), Gosselin et al. (2017) and Gosselin et al. (2018) applied a parametric approach to error covariance by considering data residuals to be a first-order AR process, which is equivalent to a Toeplitz covariance matrix with exponentially-diminishing off-diagonal (covariance) terms. The standard deviation and autoregressive parameter are sampled in the inversion to account for error variance and covariance. Both non-parametric and parametric methods can be extended to consider error statistics that vary over the data set by dividing the data into segments over which error statistics are assumed constant.

Finally, the error assumptions should be examined a posteriori for validity. For instance, under the above assumptions, standardized residuals (accounting for variance/covariance) from inversion should be consistent with an uncorrelated Gaussian random process. Inspection of residual histograms and autocorrelation functions can be used to assess the assumption of Gaussianity and the applicability of the covariance model, respectively; statistical tests can also be applied. Bayesian microtremor array inversions to date have found the assumed error models to be generally satisfied.

5.5 Examples

Bayesian inversion for V_S profile estimation and site assessment is illustrated here for two data sets collected by Molnar et al. (2010) at contrasting geological settings in southwestern British Columbia, Canada, which have since been considered by several authors. This region is located in the northern portion of the Cascadia subduction zone, one of the most seismically active areas in Canada. The highest seismic risks are associated with the two largest urban centres, Vancouver and Victoria. The Fraser River delta in southern greater Vancouver is composed of deep (up to 500 m) sands and silts overlying over-consolidated glacial deposits and bedrock. In contrast, the local geology of Victoria involves a shallow (0–30 m) layer of soft marine silts over stiff glacial deposits and/or bedrock.

Study sites in each setting were chosen at locations where invasive V_S measurements were available for comparison to inversion results. The Fraser River delta site was colocated with a 300-m borehole and within 60 m of three seismic cone penetration test (SCPT) sites with maximum penetration depths of 31–62 m. The Victoria site was colocated with an SCPT site where the cone penetrated 17 m of soft sediments before meeting refusal. The data collection and processing followed the guidelines of the European SESAME workgroup (Jongmans et al. 2005). Arrays of up to six broadband seismographs were deployed in cross-shaped and semicircular configurations at the Fraser River delta and Victoria sites, respectively. To obtain dispersion over appropriate frequency bands, the array was expanded several times: for the deep delta site the array aperture was varied from 15–180 m; for the shallow Victoria site the array radius varied from 5–35 m. Computation of phase-velocity dispersion curves from the recordings was carried out using geopsy software (Wathelet et al. 2020).

The two observed dispersion data sets are shown in Fig. 8. As mentioned, various Bayesian inversion approaches have been applied to these data. Molnar et al. (2010) determined the model parameterization from a number of choices using the BIC, and computed the error covariance matrix from the residual autocovariance of an initial inversion. This analysis indicated that a power-law gradient was the preferred parameterization for the delta data, while a linear gradient was preferred for the Victoria data. Gosselin et al. (2017) applied a BP parameterization with the polynomial order determined by the BIC, and used AR error modelling (only the delta data were considered by Gosselin et al. 2017, but, for completeness, this approach was also applied to the Victoria data for this paper). Third- and second-order BPs were indicated for the delta and Victoria data, respectively. Dettmer et al. (2012) applied trans-D inversion with AR error modelling. All inversion approaches provided excellent fits to both observed data sets, as illustrated in Fig. 8 for the BP inversions (fits for other approaches are similar).

Bayesian inversion results for the Fraser River delta site are shown in Fig. 9 in terms of marginal posterior probability profiles for V_S and for the basement interface depth z_b for each of the three approaches. The V_S marginal profiles are normalized independently at each depth for display purposes, with warm colours (e.g., red) indicating high probability and cool colours (blue) low probability (white is zero). Also included for comparison are V_S estimates from the invasive measurements in terms of averages over the borehole and SCPT measurements with one standard-deviation error bars.

Figure 9 shows similar overall V_S structure and generally good agreement with the invasive measurements for all three inversion approaches. The power-law model (Molnar et al. 2010) is the least-general parameterization but appears to be in best agreement with the invasive measurements which indicate that the V_S profile at this site approximates a power law over > 100 m depth. The BP model (Gosselin et al. 2017) can represent a wide range of smooth profiles, but the result approximates a power-law shape (with slightly lower near-surface curvature). The trans-D model (Dettmer et al. 2012) is based on uniform layers, but nonetheless approximates a power-law gradient and agrees well with the invasive measurements. Trans-D inversion represents the most general approach and includes parameterization uncertainties, leading to slightly wider probability densities in Fig. 9. As discussed in Molnar et al. (2010), the transition to an underlying halfspace with large uncertainties near 150-m depth in the inversion results may not represent actual basement material, but indicates that the dispersion data have little structural sensitivity below this depth.

Figure 10 shows Bayesian inversion results for the Victoria site. Since there was only a single SCPT here, no error bars can be associated with these measurements. The linear V_S profile indicated by the BIC (Molnar et al. 2010) is in good agreement with the invasive measurements, and the BP inversion results are similar (with slightly smaller uncertainties). The trans-D inversion results (Dettmer et al. 2012) represent the upper structure as a uniform layer above a region of increasing V_S, which also agrees well with the SCPT measurements. As discussed in Molnar et al. (2010), at this site the halfspace interface in the inversion results is indicative of an actual transition to consolidated material (i.e., the dispersion data are sensitive to this depth); however, the high-velocity basement is poorly constrained by the data. The interface marginal probability profiles in Fig. 10 indicate this interface occurs at $\sim $17 m, the depth the SCPT met refusal.

As mentioned previously, an advantage of Bayesian inversion is that it is straightforward to propagate uncertainty analysis from V_S profiles directly into site assessments. Probabilistic site assessments by Molnar et al. (2013) are summarized in Fig. 11 for the Fraser River delta and Victoria sites. Marginals for V_S30 show that the delta site is classified as NEHRP class E (soil with soft clay) with 95% probability, while the Victoria site is uncertain between class E and class D (stiff soil) with 42% and 58% probabilities, respectively. PGA marginals for the delta and Victoria sites show these amplification factors are about 1.5–1.8 and 1.8–2.6, respectively. In fact, V_S30-based assessments may not be appropriate for the Victoria site, given the strong impedance contrast within the upper 30 m. For such reasons, Molnar et al. (2013) recommended considering the travel-time averaged V_S as a function of depth, V_SZ(z) (not shown here), to determine appropriate amplification indicators for a specific site. Figure 11 also shows probabilistic amplification spectra for vertically propagating SH waves based on full-wave calculations including resonance effects (Boore 2005). These spectra demonstrate uncertainties of the fundamental frequency and its amplification of 0.1 and 0.3 Hz and up to factors of 2 and 5 for the delta and Victoria sites, respectively.

6 Tomography

The intensity and duration of ground shaking during an earthquake, at a specific site, are influenced not only by 1D heterogeneity (depth dependence) of geophysical properties (primarily the V_S profile), but also by 2D and 3D subsurface structure. This can be particularly important in sedimentary basins, which can trap and amplify seismic waves. In such cases, 1D models can be inadequate for predicting seismic site effects and hazards. This section discusses the method of seismic tomography to estimate 2D and 3D structure for seismic site characterization.

Seismic tomography has been the predominant tool for imaging heterogeneous structure in the earth over the last $\sim $50 years, applied over a wide range of spatial scales and considering a variety of seismic phases (wave types). The topic is vast and well developed, with several texts and reviews available (e.g., Nolet 2012; Rawlinson et al. 2014). This section does not attempt to provide a general review of the subject. Rather, in keeping with the theme of this paper (to review inversion for seismic site assessment), this section discusses the extension of surface-wave dispersion measurements to constrain laterally heterogeneous structure via surface-wave travel-time tomography. Seismic tomography has also been applied to site characterization with other data types, including body-wave travel times using active sources at the surface or installed down boreholes (e.g., Angioni et al. 2003; Azwin et al. 2013). The underlying principles are similar in these cases, and this section will concentrate on the recent and increasingly common application to surface waves.

Seismic tomography based on surface-wave dispersion from earthquake sources is not a recent imaging technique. However, earthquakes predominately generate surface waves at low frequencies that are less sensitive to shallow (less than $\sim $1 km) structure, and are consequently not suitable for seismic site characterization. More recently, it has been shown that cross-correlation (interferometric) techniques can be applied to array recordings of ambient seismic noise to derive part of the Green’s function (impulse response) between seismic stations, including surface-wave travel times (Campillo 2006). Such techniques have been shown to recover surface-wave dispersion at frequencies of $\sim $1 Hz and above (e.g., Chávez-García and Luzón 2005), making surface-wave tomography from ambient-noise interferometry an important emerging technique in engineering-scale studies, including seismic site characterization (e.g., Picozzi et al. 2009; Huang et al. 2010; Lin et al. 2013; Hannemann et al. 2014; Inzunza et al. 2019; Salomón et al. 2020). The majority of studies consider dispersion of Rayleigh waves (as opposed to Love waves), since they are easily isolated on vertical-component seismic recordings (which are also typically less noisy than horizontal-component recordings).

Because seismic surface waves propagate along the surface of the earth (rather than propagating in depth), the associated tomographic problem can be formulated in two steps, involving two different inverse problems. The first step is the actual “tomography,” whereby 2D maps of phase or group velocity (depending on the measurement methods) at various frequencies are constructed on a cellular grid, based on the spatial distribution of surface-wave travel paths. The second step is to form dispersion curves at specific locations throughout the study area (by combining phase/group-velocity maps at various frequencies) and perform a series of 1D inversions for structure directly beneath these locations. These 1D inversion results are then interpolated to form a pseudo-3D model. This second inversion step (estimating 1D structure) is typically solved using any of the methods described in previous sections of this paper. Hence, this section focuses on the tomographic aspect of the problem (i.e., the first inversion step).

Consider an array of seismometers that provides a set of surface-wave travel-time measurements between N station pairs. Using the high-frequency (seismic-ray) assumption, the surface-wave travel time t_i(f), at frequency f, between the i th station pair is given by the integral of phase or group slowness s(l,f) over the ray path l_i along which the wave propagates,

$$ t_{i}(f) = {\int}_{l_{i}} s(l^{\prime},f) dl^{\prime}, \quad i=1,\ldots,N. $$

(16)

Assuming a 2D model of slowness discretized into M cells of uniform slowness, Eq. 16 can be expressed as

$$ t_{i}(f) = {\sum}_{j=1}^{M} s_{j}(f) {\varDelta} l_{ij}, \quad i=1,\ldots,N, $$

(17)

where s_j(f) is the slowness of the j th cell and Δl_ij is the path length of the i th ray through this cell (Δl_ij = 0 for cells that are not along the i th ray path). This is the forward problem in classical tomography. For a homogeneous medium, a seismic ray follows the great-circle path (GCP) connecting the two stations. However, for a heterogeneous medium, the ray path is itself a function of the spatial distribution of slowness, leading to a non-linear problem.

Tomography is often linearized by assuming a GCP geometry or by calculating ray paths for a starting (reference) model and assuming stationarity. Within a linearized formulation, it is clear that the partial derivatives of the data (travel times) with respect to the model parameters (cell slownesses) that form the sensitivity matrix A (cf. Eqs. 1 and 17) are simply the ray-path length segments (i.e., A_ij = Δl_ij). Linear inverse methods (as discussed in Section 3) can then be applied to solve for the slowness values in the discretized 2D map. Regularization schemes defined to represent first- or second-order spatial derivative operators applied to the parameters to minimize model gradients or roughness are often employed in tomography to overcome the ill-posedness of the matrix inversion, and to produce simple, minimum-structure models (Constable et al. 1987; Aster et al. 2018).

As discussed in Section 3, after an initial linearized inversion the problem can then be linearized about the resulting model (updating ray path geometry accordingly), and the inversion procedure repeated iteratively until convergence. However, to date, many surface-wave tomographic inversions for site characterization perform only a single iteration assuming straight (or GCP) rays (e.g., Picozzi et al. 2009; Inzunza et al. 2019; Salomón et al. 2020). In reality, seismic waves exhibit off-path sensitivity where their true sensitivity is to geophysical properties in a volume about the ray path. For surface waves, such volumes can be approximated in 2D by a Fresnel zone (ellipse) around the ray path (Yoshizawa and Kennett 2002). Hannemann et al. (2014) used 2D Fresnel zones along straight paths in high-frequency surface-wave tomography for site characterization.

The extent/significance of ray path deflections (from straight or GCP paths) depends on propagation distance and the magnitude of lateral variability in velocity structure. In the shallow subsurface, where lateral variability in V_S at a given depth can be significant (e.g., due to variations in the depth to bedrock), actual ray paths can deviate significantly from straight-line or GCP assumptions. These assumptions can lead to theory errors that bias inversion results. Some tomographic site characterization studies (e.g., Picozzi et al. 2009; Hannemann et al.2014) suggest that measurement errors for high-frequency dispersion data are significantly greater than the theory errors introduced by straight-ray assumptions, and consequently dominate the problem. Other studies (e.g., Shirzad and Hossein Shomali 2014; Fang et al. 2015) use sophisticated wavefront tracking methods to accurately update ray path geometry over multiple linearized inversion steps.

Figure 12a–d shows examples of 2D surface-wave group-velocity maps from Hannemann et al. (2014). The data in their work were collected using an array of 27 seismometers deployed in two concentric circles (with respective diameters of $\sim $1800 m and $\sim $700 m) over a region of the Mygdonia basin in northern Greece that exhibits significant lateral variability in near-surface structure, including depth to bedrock. Surface wave dispersion data were extracted from cross-correlations of two weeks of ambient seismic noise recordings (see Hannemann et al. 2014, for details on data processing). Dispersion curves were formed at each location in the study area by combining the group-velocity maps. A series of 1D inversions for structure directly beneath these locations was then performed. Figure 12e shows an approximately north-south cross-section through the resulting pseudo-3D model, with depth contours to specific V_S values. Wavelength-based approximations for depth resolution suggest that the data can resolve V_S structure to greater depth in the northern part of the model (grey shading in Fig. 12e). Bedrock is shallower in the northern part of the model, as evidenced by high group velocity estimates at all frequencies and shallow depths for high V_S values, which is consistent with known geology and other geophysical studies. This work highlights the applicability of surface-wave tomography with high-frequency data for site characterization in geologically complex settings with significant lateral heterogeneity in structure.

There are associated advantages and disadvantages to performing surface-wave tomography in two independent steps. An advantage of the two-step inversion approach to estimating 3D structure is that it decouples the 1D depth sensitivity of the dispersion data from the 2D ray path sensitivity (i.e., the tomographic problem). Furthermore, tomography is typically performed individually for each frequency, reducing computation cost and complexity. However, 2D phase or group velocity maps at closely-spaced frequencies are generally expected to show similar structure. Hannemann et al. (2014) inverted all frequencies simultaneously and applied an additional regularization term for inter-frequency smoothness to impose consistent structure between 2D maps at adjacent frequencies. However, this requires the numerical inversion of a significantly larger matrix. Once the 2D maps are estimated, a series of computationally inexpensive inversions can be performed (as described in previous sections) to estimate 1D V_S structure beneath each point in the study area.

Direct inversion for 3D models from surface-wave dispersion for specific paths has also been considered, including studies that invert high-frequency data to recover shallow structure for seismic hazard assessment (e.g., Pilz et al. 2012; Fang et al. 2015; Li et al. 2016; Pilz et al. 2017; Inzunza et al. 2019). This has the advantage of skipping the intermediate step of estimating 2D phase or group velocity maps. Furthermore, data errors are propagated directly into the final inversion result, and are not distorted by the intermediate step. However, it is more challenging to define the sensitivity of the data to model parameters in the discretized 3D volume. Many approaches assume straight-ray propagation (e.g., Pilz et al. 2012). However, Fang et al. (2015) performed surface-wave ray tracing at each frequency to update the sensitivity matrix over multiple linearized inversion iterations. In that work, the sensitivity of the data to the model parameters was still only 2D (along the ray path). Furthermore, in considering high-frequency dispersion data in the Taipei Basin of Taiwan, Fang et al. (2015) showed that the ray paths determined for the final model deviate significantly from straight-line rays (Fig. 13). It is worth noting that studies of seismic site effects in sedimentary basins can vary significantly in scale. Fang et al. (2015) consider surface-wave propagation paths on the order of $\sim $10 km, where ray path deflections are significant. In contrast, the model by Hannemann et al. (2014) is on the order of $\sim $1 km, where ray path deflections are likely less significant.

Accounting for the effects of lateral heterogeneity on wave propagation is particularly useful for high-frequency data, which are sensitive to shallow (and complex) structure, and can lead to higher-resolution models. Such inversion approaches are similar to FWI (discussed in the following section), which solves for 2D or 3D models with accurate forward solvers for full-wave propagation. In these problems, the data are full seismograms (seismic waveforms), rather than travel times of specific arrivals. While this provides greater data information, the associated computational costs and complexity are increased significantly, and the problem typically requires an accurate starting model.

Finally, the model parameterization can also have a significant effect on tomographic inverse problems. Equation 17 formulates the tomographic problem with model parameters representing the slowness of discrete grid cells, but other approaches are possible. Fang et al. (2015) transform the model from a regular grid into a sparse wavelet-basis domain, where the parameters are wavelet coefficients. The advantage of this approach is that an L₁ damping regularization applied to the wavelet coefficients implicitly creates a minimum-structure model that (in theory) only allows detailed structure in the model where required by the data, as opposed to having two explicit regularization terms that apply smoothing and damping over the entire 2D model. This is particularly attractive in tomographic inversion, where uneven path coverage inherently leads to a multi-scale problem (i.e., some regions of the model are better resolved than others). Bayesian trans-D inversion has also been applied to tomography for large-scale problems (e.g., Bodin and Sambridge 2009; Bodin et al.2012; Gosselin et al. 2021), providing an adaptive multi-scale parameterization that is estimated as part of the inversion. However, to date, this method has not been applied to site assessment problems.

7 Full waveform inversion

FWI aims to recover high-resolution subsurface models using all of the information in seismic waveforms. Observed data are the complete recorded seismograms, including all types of waves and phases, while the modelled (predicted) data are synthetic seismograms computed for the presumed source and earth models to simulate the full wavefield. Inversions for an optimal earth model are based on minimizing the difference between observed and synthetic seismograms, with appropriate regularization to control structure and stability.

A fundamental aspect of FWI is the estimation of sensitivity kernels (matrices) expressing the changes in the wavefield with respect to perturbations in model parameters representing material properties (Chen et al. 2007). An alternative is the adjoint approach where the gradient of the misfit between the observed and modelled data is computed without explicitly constructing the sensitivity matrix (Tarantola 1984). The dependence of the seismic wavefield on model parameters is strongly non-linear, adding to the inherent numerical complexity of FWI for both approaches. Global-search techniques for these problems are mostly limited to low-dimensional cases due to the high computational cost of the forward problem. To address realistic, multi-dimensional problems, non-linear constrained local optimization techniques along with robust and reliable forward models have been developed in the frequency domain (Song et al. 1995; Hicks and Pratt 2001; Brossier et al. 2009) as well as in the time domain (Akcelik et al. 2003; Tromp et al. 2005; Askan et al. 2007; Bozdağ et al. 2011). Regardless of the particular inversion algorithm, the forward model must be realistic and efficient in representing the wavefield of interest. It is particularly important to account for the heterogeneity and scattering effects of soil media. For this purpose, numerical approaches for solving the partial differential equations governing wave propagation have been used extensively, such as the discrete wavenumber (Bouchon et al. 1989), finite difference (Virieux 1986), finite element (Akcelik et al. 2003) and spectral element (Komatitsch and Vilotte 1998) methods.

An extensive review of FWI, discussing alternative forward models and optimization approaches effective at different scales, can be found in Virieux and Operto (2009) and Fichtner (2011). In this section, the FWI concept is reviewed through description of a least-squares adjoint approach which is capable of estimating discontinuous distributions of V_S and intrinsic attenuation in large basins (Askan et al. 2007). Anelastic attenuation is critical for realistic FWI at all scales, including global and sedimentary-basin scales, as well as near-surface velocity models (Komatitsch et al. 2016). In contrast, dispersion and travel-time data discussed in previous sections depend only on the phase information contained in seismic recordings, and are therefor insensitive to attenuation. The anelastic FWI problem is briefly presented here within the context of 2D sedimentary valleys subjected to SH-wave excitation, followed by a simulated example for the Los Angeles basin. Well-known challenges of FWI and corresponding remedies are also discussed.

7.1 FWI for shear-wave velocity and anelastic properties in heterogeneous basins

The total effect of intrinsic attenuation is usually expressed in terms of the dimensionless quality factor Q, which is observed to be almost constant in the seismic frequency range. Viscoelastic stress-strain relationships can be modelled with relaxation mechanisms while the solution to the corresponding ordinary differential equation is expressed as a memory variable. To represent the anelasticity, Askan et al. (2007) used a single generalized standard linear solid (SLS) per grid point as the relaxation mechanism for simulating an almost constant Q. The mechanical properties of the SLS are related to Q through simple frequency-independent relationships. Q is also related to V_S through a series of forward wave-propagation calculations with shear-modulus reduction cycles.

Within the context of SH-wave propagation, FWI is formulated as a non-linear least-squares parameter estimation problem with the viscoelastic forward wave-propagation problem as the constraint. The objective is to obtain elastic and anelastic material models that minimize, over the time interval t ∈ [0,T] and spatial domain x ∈Ω, the L₂-norm difference between the observed state u^∗(x,t) and the predicted state u(x,t) of the antiplane displacement field, with the predicted field modelled by the coupled partial and ordinary differential equations for viscoelastic antiplane wave propagation at receiver locations x_j,j = 1,N_R. The objective function with regularization on the material field μ(x) (elastic shear modulus) and regularization parameter β is defined as

$$ \begin{aligned} \min\limits_{u,v,\mu }\ \frac{1}{2}\ \sum\limits_{j=1}^{N_{R}}\ & \int\limits_{0}^{T}\ \int\limits_{\Omega }\left[ u^{\ast }-u\right]^{2}\delta (x-x_{j}) d{\Omega} dt + \\ & \beta \int\limits_{\Omega }(\nabla \mu \cdot \nabla \mu +\varepsilon )^{1/2} d{\Omega} \ \end{aligned} $$

(18)

subject to

$$ \rho \frac{\partial^{2}u}{\partial t^{2}}-\nabla \cdot \lbrack \mu \nabla (u+\eta v)]=f(x,t)\ \ \text{in\ }{\Omega} \times [0,T], $$

(19)

$$ \frac{\partial v}{\partial t}+\alpha v=\frac{\partial u}{\partial t}\ \ \text{in\ }{\Omega} \times [0,T], $$

(20)

$$ \mu \nabla (u+\eta v)\cdot n=0\ \ \text{on\ }{\varGamma}_{FS}\times [0,T], $$

(21)

$$ \mu \nabla (u+\eta v)\cdot n=\sqrt{\rho \mu } \frac{\partial u}{\partial t}\ \ \text{on\ }{\varGamma}_{AB}\times [0,T], $$

(22)

$$ u=\frac{\partial u}{\partial t}=0,\ \ v=0\ \ \text{in\ }{\Omega} \text{ at}\ t=0 . $$

(23)

In these equations, v(x,t) is the memory variable corresponding to u(x,t), ρ(x) is the density, α(x) is the relaxation frequency, η(x) is the spring constant of the SLS, and f(x,t) is the body force vector representing the seismic source. Constraint Eqs. 19 and 20 are the governing equations of the viscoelastic model, while Eqs. 21–23 state the traction-free boundary condition on the free (earth) surface (Γ_FS), the absorbing boundary condition (on Γ_AB), and the initial conditions, respectively.

The regularization function included in the least-squares full-waveform formulation (second term in Eq. 18) treats the rank deficiency and ill-conditioning due to the insensitivity of the objective functional to high-frequency material-property perturbations. Among common regularization functionals used in FWI, total variation (TV) regularization is used in this formulation, which is the L₁ norm of the gradient of the material model. TV regularization recovers the material interfaces effectively through smoothing only along the interfaces. The parameter 𝜖 in the objective function ensures the TV functional is differentiable when ∇μ = 0. An alternative to TV regularization is Tikhonov regularization, which employs the L₂ norm of the gradient of the material model. Tikhonov regularization smooths the material discontinuities and thus is not appropriate for FWI for earth models where sharp interfaces and other discontinuities are expected (such as the interface between sedimentary basins and basement).

The solution for the FWI problem as stated typically involves determining the values of the state variables, u (predicted data) and v, and the inversion variables (model), μ, ρ, Q^− 1, that satisfy the first- and second-order optimality conditions (Akcelik et al. 2003; Askan et al. 2007). The formulation of the inversion with respect to μ is presented here for simplicity, but the extension to include Q^− 1 is straightforward and can be found in Askan et al. (2007). To obtain the expressions for the optimality conditions, first the Lagrangian functional is defined by incorporating the constraint equations into the the regularized least-squares objective function as

$$ \begin{array}{@{}rcl@{}} \mathcal{L}(u,v,\mu ,\lambda ,\phi )&=&\frac{1}{2}\sum\limits_{j=1}^{N_{R}}\int\limits_{0}^{T}\int\limits_{\Omega}[u^{\ast}-u]^{2}\delta (x-x_{j}) d{\Omega} dt \\ &&+ \beta \int\limits_{\Omega }(\nabla \mu \cdot \nabla \mu +\varepsilon )^{1/2}d{\Omega} \\ &&+{}\int\limits_{0}^{T}{}\int\limits_{\Omega }{}\lambda{} \left\{{}\rho \frac{\partial^{2}u}{\partial t^{2}}{\kern-1.5pt}-{\kern-1.5pt}\nabla {\kern-1.5pt}\cdot{\kern-1.5pt} \left[{\kern-1.5pt} \mu \nabla (u{\kern-1.5pt}+{\kern-1.5pt}\eta v){\kern-1.5pt}\right]{\kern-1.5pt}-{\kern-1.5pt}f{\kern-1.5pt}\right\}{\kern-1.5pt}d{\Omega} dt\\ &&+\int\limits_{0}^{T}\int\limits_{{\varGamma}_{FS}}\lambda \left\{\mu \nabla (u+\eta v)\cdot n\right\} d{\varGamma}_{FS} dt \\ &&+{} \int\limits_{0}^{T}{}\int\limits_{{\varGamma}_{AB}}{}\lambda \left\{{}\mu{\kern-1.5pt} \nabla (u{\kern-1.5pt} +{\kern-1.5pt} \eta v){\kern-1.5pt} \cdot{\kern-1.5pt} n{\kern-1.5pt} -{\kern-1.5pt} \sqrt{\rho \mu }\frac{\partial u}{\partial t}{\kern-1.5pt} \right\}{} d{\varGamma}_{AB} dt\\ &&+ \int\limits_{0}^{T}\int\limits_{\Omega }\phi \left\{\frac{\partial v}{\partial t}+\alpha v-\frac{\partial u}{\partial t}\right\}d{\Omega} dt, \end{array} $$

(24)

where λ and ϕ are the Lagrange multipliers (also known as adjoint or dual variables) for the partial and ordinary differential equation constraints, respectively.

According to the first-order optimality condition, the first variation of the Lagrangian with respect to the state, adjoint and material variables is zero at the optimum solution:

$$ \left\{ \begin{array}{c} \delta_{u}\mathcal{L} \\ \delta_{v}\mathcal{L} \\ \delta_{\mu }\mathcal{L} \\ \delta_{\lambda }\mathcal{L} \\ \delta_{\phi }\mathcal{L} \end{array}\right\} (u,v,\mu ,\lambda ,\phi )=0. $$

(25)

These equations are known as the Karush-Kuhn-Tucker (KKT) conditions. The two equations resulting from the variation of the Lagrangian with respect to λ and ϕ are the state problems for u and v, respectively. The state equations are the original constraints and boundary conditions. The variations of the Lagrangian with respect to the state variables u and v are the adjoint problems for λ and ϕ, respectively. The adjoint problem for λ is similar to the state equation for u. However, it is a terminal value problem where the source function is the misfit between the observed and modelled displacements. A similar formulation exists between the state equation for the displacement memory v and the adjoint equation for ϕ. The variation of the Lagrangian with respect to the shear modulus μ yields the material field equation. The resulting KKT system is a coupled, non-linear system of equations requiring an iterative solution approach.

7.2 Design of the solution approach for large-scale FWI: Numerical challenges and remedies

Two decisions required for the approach to solving large-scale FWI problems involve the algorithm for computing the search direction, and the algorithm for computing the step length. The trade-off between accuracy and computational cost guides these choices. It is possible to compute the search direction with gradient-based methods, such as steepest descent or Newton’s methods. Steepest descent uses a linear model of the objective function, but generally suffers from slow convergence. Newton’s method utilizes a quadratic model of the objective function, exhibits locally quadratic convergence, and can provide robust and efficient solutions for FWI when used with some form of globalization such as trust region or line search methods. The Newton step for the solution of the KKT system is

$$ \left[\begin{array}{lllll} \delta_{uu}^{2}\mathcal{L} & \delta_{uv}^{2}\mathcal{L}& \delta_{u\mu }^{2}\mathcal{L}& \delta_{u\lambda }^{2}\mathcal{L}& \delta_{u\phi }^{2}\mathcal{L}\\ \delta_{vu}^{2}\mathcal{L}& \delta_{vv}^{2}\mathcal{L}& \delta_{v\mu }^{2}\mathcal{L}& \delta_{v\lambda }^{2}\mathcal{L}& \delta_{v\phi }^{2}\mathcal{L}\\ \delta_{\mu u}^{2}\mathcal{L}& \delta_{\mu v}^{2}\mathcal{L}& \delta_{\mu \mu }^{2}\mathcal{L}& \delta_{\mu \lambda }^{2}\mathcal{L}& \delta_{\mu \phi }^{2}\mathcal{L}\\ \delta_{\lambda u}^{2}\mathcal{L}& \delta_{\lambda v}^{2}\mathcal{L}& \delta_{\lambda \mu }^{2}\mathcal{L}& 0 & 0 \\ \delta_{\phi u}^{2}\mathcal{L}& \delta_{\phi v}^{2}\mathcal{L}& \delta_{\phi \mu }^{2}\mathcal{L}& 0 & 0 \end{array}\right] \left\{\begin{array}{l} \bar{u} \\ \bar{v} \\ \bar{\mu} \\ \bar{\lambda} \\ \bar{\phi} \end{array}\right\} =- \left\{\begin{array}{l} \delta_{u}\mathcal{L}\\ \delta_{v}\mathcal{L}\\ \delta_{\mu }\mathcal{L}\\ \delta_{\lambda }\mathcal{L}\\ \delta_{\phi }\mathcal{L} \end{array}\right\}. $$

(26)

Herein, the $\delta ^{2}{\mathscr{L}} $ operator denotes the second variation of the Lagrangian with respect to the state, adjoint and material field variables, and the overbar on each variable indicates the Newton direction for that variable. The coefficient matrix is called the KKT matrix or the Hessian matrix. While the KKT system can be solved as a full space problem, such an approach is not feasible for large-scale FWI problems. A reduced space approach, however, eliminates the state and adjoint variables yielding a reduced system which contains unknowns related only to the material field:

$$ W_{\mu }\bar{\mu}=-\delta_{\mu}\mathcal{L}, $$

(27)

where W_μ is the Schur complement of $\delta _{\mu \mu }^{2}{\mathscr{L}}$ in the KKT matrix, which is known as the reduced Hessian.

Askan et al. (2007) used a Gauss-Newton approximation to yield a positive-definite reduced Hessian by ignoring the terms that depend on the adjoint variables in the KKT matrix. The conjugate gradient method is utilized along with a backtracking line search for the step length to solve the reduced system for $\bar {\mu }$ avoiding construction of the reduced Hessian. Instead, at each conjugate gradient iteration, only the matrix-vector product is computed by solving the state and adjoint problems. A limited memory type preconditioner is also used to treat the potential ill-conditioning of the reduced Hessian, which speeds up the algorithm significantly (Askan et al. 2007).

Another well-known challenge in FWI is that the objective function hyper-surface can possess many local minima, such that when the starting model is not in the basin of attraction of the global minimum, quasi-Newton methods converge to sub-optimal solutions. The size of the attraction basin of the global minimum decreases with increasing wavenumbers (i.e., shorter wavelengths). Hence, multi-level solutions are utilized to guide the optimizer to the global minimum by solving the inverse problem on increasingly finer meshes (Bunks et al. 1995).

Askan et al. (2007) applied the numerical algorithm described above to reconstruct a (synthetic) 2D V_S target distribution representing a vertical cross-section of the Los Angeles basin in the San Fernando Valley (Magistrale 2000). The KKT system is discretized with Galerkin-type finite elements and finite differences in space and time, respectively. The waveforms were simulated from the target profile at receivers on the free (earth) surface as pseudo-observed data. The causative fault was assumed to run perpendicular to the valley and the source was modelled as a uniform SH kinematic dislocation. Gaussian random noise (10%) was added to the simulated data to represent observation errors. The forward wave propagations were performed on the finest grid of 64 × 64 finite elements, while the multilevel inversion algorithm worked through increasingly finer optimization grids until the forward and inverse meshes were the same size. Figure 14 shows the sequence of V_S models from FWI on increasingly finer grids.

Algorithmic choices such as the preconditioning approach, type of regularization function, and value of the regularization parameter have a significant impact on FWI performance. The inverse problem is also affected by receiver density and the level of noise on the data. Askan et al. (2010) performed numerical experiments to observe the sensitivity of FWI to selected algorithmic parameters using the same 2D Los Angeles basin example. Among several parameters, it was observed that noise levels up to 10% percent did not have a significant effect on the quality of the inversion solution. In addition, even with sparse receiver arrays, FWI yielded acceptable profiles. However, use of a multi-level algorithm was found to be necessary to reach the global minimum. The selection of an appropriate regularization parameter was demonstrated to be the most significant factor for successful inversion. Every application of FWI requires a problem-specific regularization parameter. As discussed in Section 3, if the regularization parameter is too small, the solution contains artifacts and spurious structures. Conversely, if the parameter is too large, the model is overly simplified.

7.3 Outlook on FWI for site characterization

FWI of measured data remains a challenging problem, particularly for the shallow seismic wavefield which is relevant to seismic site assessment. The physical challenges for FWI for near-surface structural studies include strong attenuation, strong variability in near-surface lithology, poor a priori information, and complex surface topography (Nguyen and Tran 2018; Pan et al. 2018). The high computational cost of numerical wave propagation governed by the shortest resolvable wavelengths is another well-known issue. Nonetheless, within the context of inversion for near-surface velocity structure and site characterization, several FWI methods have been effective with measured data sets (Bretaudeau et al. 2013; Kallivokas et al. 2013; Fathi et al. 2016; Groos et al. 2017; Nguyen and Tran 2018). So despite the physical and numerical challenges discussed here, FWI has the potential to recover the complex velocity structures of heterogeneous basins as well as near-surface soil/sediment layers, and to be a powerful tool for seismic site assessment.

8 Summary and conclusion

Geophysical inverse theory is a vast and diverse field, spanning a wide range of physical problems and spatial scales from planetary to shallow engineering applications. In seismic site characterization studies, inversion is often used to estimate a model of the geophysical properties (predominantly V_S) of the shallow sub-surface from observations of seismic waves recorded at the surface. With this information, seismic-wave behavior can be predicted, and the site-specific hazards associated with earthquake ground shaking can be quantified and mitigated.

This paper provides a review of common inversion approaches for estimating geophysical models from seismic data for the purpose of seismic site characterization. In engineering-scale ($\sim $10–100 m) applications, 1D (depth-dependent) subsurface models are commonly considered, and are (in many cases) adequate for quantifying seismic-wave behavior. On these scales, surface-wave dispersion and HSVR data are commonly employed. However, the site-specific response to earthquake ground shaking is also influenced by larger scale structures (e.g., sedimentary basins). These cases necessitate more complex 2D or 3D models, discussed here in the context of surface-wave tomography and FWI. The complexity of the model generally affects the complexity of the inverse problem.

In seismic site characterization studies, inverse problems are non-linear, non-unique, and potentially unstable. Hence, suitable mathematical formulations are required. This paper considers a wide range of algorithms for estimating models of shallow subsurface structure based on seismic data. Most approaches discussed here are based on recovering an optimal (best-fit) model, representing a point estimate in a multi-dimensional model-parameter space. These include linearized approaches, which are efficient but prone to become trapped in sub-optimal solutions, as well as non-linear (numerical) optimization algorithms (e.g., DHS, SA, GA, and NA). An alternative approach is Bayesian inversion based on sampling the PPD over the parameter space to provide parameter estimates as well as quantitative uncertainty analysis.

Computational effort/time can be a limiting factor in geophysical inversion. This is generally inconsequential for 1D problems in seismic site characterization, but is a significant constraint in 2D and 3D problems. Future advancements of inversion for site characterization are tied to improved computational capabilities and, more importantly, to emerging technologies for collecting large volumes of seismic data. These include dense nodal geophone arrays (involving thousands of instruments) and distributed acoustic sensing with fibre optics (e.g., Olivier et al. 2018; Ajo-Franklin et al. 2019; Parker et al. 2018; Spica et al. 2020). New data processing and inversion advancements will be required to fully exploit the benefits and information available from these next-generation data sets.

References

Ajo-Franklin JB, Dou S, Lindsey NJ, Monga I, Tracy C, Robertson M, Tribaldos VR, Ulrich C, Freifeld B, Daley T, et al. (2019) Distributed acoustic sensing using dark fiber for near-surface characterization and broadband seismic event detection. Scientific Reports 9 (1):1–14
Article Google Scholar
Akcelik V, Tu T, Urbanic J, Bielak J, Biros G, Epanomeritakis I, Fernandez A, Ghattas O, Kim EJ, Lopez J, OH́allaron D (2003) High resolution forward and inverse earthquake modeling on terascale computers. In: Proceedings of the 2003 ACM/IEEE conference on supercomputing - SC ’03, ACM Press
Albarello D, Gargani G (2010) Providing NEHRP soil classification from the direct interpretation of effective Rayleigh-wave dispersion curves. B Seismol Soc Am 100(6):3284–3294
Article Google Scholar
Alfaro Castillo AJ (2006) Application of a heuristic method for the estimation of S-wave velocity structure. Earth Sci Res J 10(1):41–51
Google Scholar
Anderson J, Bodin P, Brune J, Prince J, Singh S, Quaas R, Onate M (1986) Strong ground motion from the Michoacan, Mexico, earthquake. Science 233(4768):1043–1049
Article Google Scholar
Anderson JG, Lee Y, Zeng Y, Day S (1996) Control of strong motion by the upper 30 meters. B Seismol Soc Am 86(6):1749–1759
Google Scholar
Angioni T, Rechtien RD, Cardimona SJ, Luna R (2003) Crosshole seismic tomography and borehole logging for engineering site characterization in Sikeston, MO, USA. Tectonophysics 368(1-4):119–137
Article Google Scholar
Askan A, Akcelik V, Bielak J, Ghattas O (2007) Full waveform inversion for seismic velocity and anelastic losses in heterogeneous structures. B Seismol Soc Am 97(6):1990–2008
Article Google Scholar
Askan A, Akcelik V, Bielak J, Ghattas O (2010) Parameter sensitivity analysis of a nonlinear least-squares optimization-based anelastic full waveform inversion method. CR Mécanique 338(7-8):364–376
Article Google Scholar
Aster RC, Borchers B, Thurber CH (2018) Parameter estimation and inverse problems. Elsevier
Azwin I, Saad R, Nordiana M (2013) Applying the seismic refraction tomography for site characterization. APCBEE Procedia 5:227–231
Article Google Scholar
Bard PY, Bouchon M (1985) The two-dimensional resonance of sediment-filled valleys. B Seismol Soc Am 75(2):519–541
Article Google Scholar
Baziw EJ (2002) Derivation of seismic cone interval velocities utilizing forward modeling and the downhill simplex method. Can Geotech J 39(5):1181–1192
Article Google Scholar
Beaty K, Schmitt D, Sacchi M (2002) Simulated annealing inversion of multimode Rayleigh wave dispersion curves for geological structure. Geophys J Int 151(2):622–631
Article Google Scholar
Bodin T, Sambridge M (2009) Seismic tomography with the reversible jump algorithm. Geophys J Int 178(3):1411–1436
Article Google Scholar
Bodin T, Sambridge M, Rawlinson N, Arroucau P (2012) Transdimensional tomography with unknown data noise. Geophys J Int 189(3):1536–1556
Article Google Scholar
Boore DM (2005) SMSIM—fortran programs for simulating ground motions from earthquakes: version 2.3—a revision of OFR 96–80-A. US Geological Survey open-file report, US Geological Survey open-file report 00–509, revised 15:55
Boore DM, Atkinson GM (2008) Ground-motion prediction equations for the average horizontal component of PGA, PGV, and 5%-damped PSA at spectral periods between 0.01 s and 10.0 s. Earthq Spectra 24:99–138
Article Google Scholar
Bouchon M, Campillo M, Gaffet S (1989) A boundary integral equation-discrete wavenumber representation method to study wave propagation in multilayered media having irregular interfaces. Geophysics 54(9):1134–1140
Article Google Scholar
Bozdağ E, Trampert J, Tromp J (2011) Misfit functions for full waveform inversion based on instantaneous phase and envelope measurements. Geophys J Int 185(2):845–870
Article Google Scholar
Bretaudeau F, Brossier R, Leparoux D, Abraham O, Virieux J (2013) 2D elastic full-waveform imaging of the near-surface: application to synthetic and physical modelling data sets. Near Surf Geophys 11(3):307–316
Brossier R, Operto S, Virieux J (2009) Seismic imaging of complex onshore structures by 2d elastic frequency-domain full-waveform inversion. Geophysics 74(6):WCC105–WCC118
Bunks C, Saleck FM, Zaleski S, Chavent G (1995) Multiscale seismic waveform inversion. Geophysics 60(5):1457–1473
Article Google Scholar
Campillo M (2006) Phase and correlation in random seismic fields and the reconstruction of the green function. Pure Appl Geophys 163(2):475–502
Article Google Scholar
Campillo M, Gariel J, Aki K, Sanchez-Sesma F (1989) Destructive strong ground motion in Mexico City: source, path, and site effects during great 1985 michoacȧn earthquake. B Seismol Soc Am 79(6):1718–1735
Article Google Scholar
Campillo M, Sánchez-Sesma F, Aki K (1990) Influence of small lateral variations of a soft surficial layer on seismic ground motion. Soil Dyn Earthq Eng 9(6):284–287
Article Google Scholar
Chávez-García FJ, Luzón F (2005) On the correlation of seismic microtremors. J Geophys Res-Sol Ea 110(B11)
Chen P, Jordan TH, Zhao L (2007) Full three-dimensional tomography: a comparison between the scattering-integral and adjoint-wavefield methods. Geophys J Int 170(1):175–181
Article Google Scholar
Chib S (1995) Marginal likelihood from the Gibbs output. J Am Stat Assoc 90:1313–1321
Article Google Scholar
Chmiel M, Roux P, Wathelet M, Bardainne T (2021) Phase-velocity inversion from data-based diffraction kernels: seismic Michelson interferometer. Geophys J Int 224(2):1287–1300
Article Google Scholar
Cipta A, Cummins P, Dettmer J, Saygin E, Irsyam M, Rudyanto A, Murjaya J (2018) Seismic velocity structure of the Jakarta Basin, Indonesia, using trans-dimensional Bayesian inversion of horizontal-to-vertical spectral ratios. Geophys J Int 215(1):431–449
Article Google Scholar
Constable SC, Parker RL, Constable CG (1987) Occam’s inversion: a practical algorithm for generating smooth models from electromagnetic sounding data. Geophysics 52(3):289–300
Article Google Scholar
Cornou C, Ohrnberger M, Boore DM, Kudo K, Bard PY, Chaljub E, Cotton F, Gueguen P (2006) Derivation of structural models from ambient vibration array recordings: results from an international blind test. ESG
Cox BR, Teague DP (2016) Layering ratios: a systematic approach to the inversion of surface wave data in the absence of a priori information. Geophys J Int 207(1):422–438
Article Google Scholar
Dettmer J, Dosso SE, Holland C (2010) Trans-dimensional geoacoustic inversion. J Acoust Soc Am 128(6):3393–3405
Article Google Scholar
Dettmer J, Molnar S, Steininger G, Dosso SE, Cassidy JF (2012) Trans-dimensional inversion of micro-tremor array dispersion data with hierarchical autoregressive error models. Geophys J Int 118:719–734
Article Google Scholar
Di Giulio G, Cornou C, Ohrnberger M, Wathelet M, Rovelli A (2006) Deriving wavefield characteristics and shear-velocity profiles from two-dimensional small-aperture arrays analysis of ambient vibrations in a small-size alluvial basin, Colfiorito, Italy. B Seismol Soc Am 96(5):1915–1933
Di Giulio G, Savvaidis A, Ohrnberger M, Wathelet M, Cornou C, Knapmeyer-Endrun B, Renalier F, Theodoulidis N, Bard PY (2012) Exploring the model space and ranking a best class of models in surface-wave dispersion inversion: application at European strong-motion sites. Geophysics 77(3):B147–B166
Dosso SE, Wilmut MJ (2008) Uncertainty estimation in simultaneous Bayesian tracking and environmental inversion. J Acoust Soc Am 124:82–97
Article Google Scholar
Dosso SE, Nielsen PL, Wilmut MJ (2006) Data error covariance in matched-field geoacoustic inversion. J Acoust Soc Am 119:208–219
Article Google Scholar
Dosso SE, Holland CW, Sambridge M (2012) Parallel tempering for strongly nonlinear geoacoustic inversion. J Acoust Soc Am 132(5):3030–3040
Article Google Scholar
Dosso SE, Dettmer J, Steininger G, Holland CW (2014) Efficient trans-dimensional Bayesian inversion for geoacoustic profile estimation. Inverse Probl 30:114018
Article Google Scholar
Dunkin JW (1965) Computation of modal solutions in layered, elastic media at high frequencies. B Seismol Soc Am 55(2):335–358
Article Google Scholar
Dutta U, Sen MK, Biswas N, Yang Z (2009) Investigation of shallow sedimentary structure of the Anchorage basin, Alaska, using simulated annealing inversion of site response. B Seismol Soc Am 99(1):326–339
Article Google Scholar
Earl DJ, Deem MW (2005) Parallel tempering: theory, applications, and new perspectives. Phys Chem Chem Phys 7:3910–3916
Article Google Scholar
Fäh D, Kind F, Giardini D (2001) A theoretical investigation of average h/V ratios. Geophys J Int 145(2):535–549
Article Google Scholar
Fäh D, Kind F, Giardini D (2003) Inversion of local S-wave velocity structures from average h/V ratios, and their use for the estimation of site-effects. J Seismol 7(4):449–467
Article Google Scholar
Fang H, Yao H, Zhang H, Huang YC, van der Hilst RD (2015) Direct inversion of surface wave dispersion for three-dimensional shallow crustal structure based on ray tracing: methodology and application. Geophys J Int 201(3):1251–1263
Article Google Scholar
Farouki RT, Rajan V (1987) On the numerical condition of polynomials in Bernstein form. Comput Aided Geom D 29:191–216
Article Google Scholar
Fathi A, Poursartip B, II KHS, Kallivokas LF (2016) Three-dimensional P- and S-wave velocity profiling of geotechnical sites using full-waveform inversion driven by field data. Soil Dyn Earthq Eng 87:63–81
Fichtner A (2011) Full seismic waveform modelling and inversion. Springer, Berlin
Book Google Scholar
Fogel LJ, Owens AJ, Walsh MJ (1966) Artificial intelligence through simulated evolution. Wiley
Forbriger T (2003) Inversion of shallow-seismic wavefields: II. Inferring subsurface properties from wavefield transforms. Geophys J Int 153(3):735–752
Article Google Scholar
Foti S, Comina C, Boiero D, Socco LV (2009) Non-uniqueness in surface-wave inversion and consequences on seismic site analyses. Soil Dyn Eq Eng 29:982–993
Article Google Scholar
Foti S, Parolai S, Albarello D, Picozzi M (2011) Application of surface-wave methods for seismic site characterization. Surv Geophys 32(6):777–825
Article Google Scholar
Foti S, Hollender F, Garofalo F, Albarello D, Asten M, Bard PY, Comina C, Cornou C, Cox B, Di Giulio G et al (2018) Guidelines for the good practice of surface wave analysis: a product of the interPACIFIC project. B Earthq Eng 16(6):2367–2420
Article Google Scholar
García-Jerez A, Piña-Flores J, Sánchez-Sesma FJ, Luzón F, Perton M (2016) A computer code for forward calculation and inversion of the h/V spectral ratio under the diffuse field assumption. Comput Geosci 97:67–78
Article Google Scholar
García-Jerez A, Seivane H, Navarro M, Martínez-Segura M, Piña-Flores J (2019) Joint analysis of Rayleigh-wave dispersion curves and diffuse-field HVSR for site characterization: the case of El Ejido town (SE Spain). Soil Dyn Earthq Eng 121:102–120
Article Google Scholar
Garofalo F, Foti S, Hollender F, Bard P, Cornou C, Cox BR, Ohrnberger M, Sicilia D, Asten M, Di Giulio G et al (2016) InterPACIFIC project: comparison of invasive and non-invasive methods for seismic site characterization. Part I: Intra-comparison of surface wave methods. Soil Dyn Earthq Eng 82:222–240
Article Google Scholar
Gilbert F, Backus GE (1966) Propagator matrices in elastic wave and vibration problems. Geophysics 31(2):326–332
Article Google Scholar
Gilks WR, S SR, Spiegelhalter GJ (1996) Markov chain Monte Carlo in practice. CRC press.
Gosselin JM, Dosso SE, Cassidy JF, Quijano JE, Molnar S (2017) A gradient-based model parameterization using Bernstein polynomials in Bayesian inversion of surface-wave dispersion. Geophys J Int 211:528–540
Article Google Scholar
Gosselin JM, Cassidy JF, Dosso SE, Brillon C (2018) Probabilistic seismic-hazard site assessment in Kitimat, British Columbia, from Bayesian inversion of surface-wave dispersion. Can Geotech J 55(7):928–940
Article Google Scholar
Gosselin JM, Audet P, Schaeffer AJ, Darbyshire FA, Estève C (2021) Azimuthal anisotropy in Bayesian surface wave tomography: application to northern Cascadia and Haida Gwaii, British Columbia. Geophys J Int 224(3):1724–1741
Article Google Scholar
Graves RW, Pitarka A, Somerville PG (1998) Ground-motion amplification in the Santa Monica area: effects of shallow basin-edge structure. B Seismol Soc Am 88(5):1224–1242
Article Google Scholar
Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model. Biometrika 82(4):711–732
Article Google Scholar
Groos L, Schfer M, Forbriger T, Bohlen T (2017) Application of a complete workflow for 2D elastic full-waveform inversion to recorded shallow-seismic Rayleigh waves. Geophysics 82 (2):R109–R117
Article Google Scholar
Hannemann K, Papazachos C, Ohrnberger M, Savvaidis A, Anthymidis M, Lontsi AM (2014) Three-dimensional shallow structure from high-frequency ambient noise tomography: new results for the Mygdonia Basin-Euroseistest Area, Northern Greece. J Geophys Res-Sol Ea 119(6):4979–4999
Hansen PC (1992) Analysis of discrete ill-posed problems by means of the L-curve. SIAM Review 34(4):561–580
Article Google Scholar
Haskell NA (1953) The dispersion of surface waves on multilayered media. B Seismol Soc Am 43(1):17–34
Article Google Scholar
Hawkins R (2018) A spectral element method for surface wave dispersion and adjoints. Geophys J Int 215(1):267–302
Article Google Scholar
Hedjazian N, Bodin T, Métivier L (2019) An optimal transport approach to linearized inversion of receiver functions. Geophys J Int 216(1):130–147
Google Scholar
Hicks GJ, Pratt RG (2001) Reflection waveform inversion using local descent methods: estimating attenuation and velocity over a gas-sand deposit. Geophysics 66(2):598–612
Article Google Scholar
Hobiger M, Cornou C, Wathelet M, Di Giulio G, Knapmeyer-Endrun B, Renalier F, Bard PY, Savvaidis A, Hailemikael S, Le Bihan N, Ohrnberger M, Theodoulidis N (2013) Ground structure imaging by inversions of Rayleigh wave ellipticity : sensitivity analysis and application to European strong-motion sites. Geophys J Int 192(1):207–229
Holland JH, et al. (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT Press
Hollender F, Cornou C, Dechamp A, Oghalaei K, Renalier F, Maufroy E, Burnouf C, Thomassin S, Wathelet M, Bard PY, Boutin V, Desbordes C, Douste-Bacque I, Foundotos L, Guyonnet-Benaize C, Perron V, Regnier J, Roulle A, Langlais M, Sicilia D (2018) Characterization of site conditions (soil class, V-s30, velocity profiles) for 33 stations from the French permanent accelerometric network (RAP) using surface-wave methods. B Earthq Eng 16(6, SI):2337–2365
Huang YC, Yao H, Huang BS, van der Hilst RD, Wen KL, Huang WG, Chen CH (2010) Phase velocity variation at periods of 0.5–3 seconds in the Taipei Basin of Taiwan from correlation of ambient seismic noise. B Seismol Soc Am 100(5A):2250–2263
Article Google Scholar
Inzunza DA, Montalva GA, Leyton F, Prieto G, Ruiz S (2019) Shallow ambient-noise 3d tomography in the Concepción Basin, Chile: implications for low-frequency ground motions. B Seismol Soc Am 109(1):75–86
Article Google Scholar
Jackson DD (1972) Interpretation of inaccurate, insufficient and inconsistent data. Geophys J Int 28(2):97–109
Article Google Scholar
Jongmans D, Ohrnberger M, Wathelet M (2005) Final report WP13: recommendations for quality array measurements and processing. European Commission–Research General Directorate, Site Effects Assessment Using Ambient Excitiations (SESAME), Deliverable 24
Kallivokas L, Fathi A, Kucukcoban S, Stokoe K, Bielak J, Ghattas O (2013) Site characterization using full waveform inversion. Soil Dyn Earthq Eng 47:62–82
Article Google Scholar
Knopoff L (1964) A matrix method for elastic wave problems. B Seismol Soc Am 54 (1):431–438
Article Google Scholar
Komatitsch D, Vilotte JP (1998) The spectral element method: an efficient tool to simulate the seismic response of 2D and 3D geological structures. B Seismol Soc Am 88(2):368–392
Komatitsch D, Xie Z, Bozdağ E, de Andrade ES, Peter D, Liu Q, Tromp J (2016) Anelastic sensitivity kernels with parsimonious storage for adjoint tomography and full waveform inversion. Geophys J Int 206(3):1467–1478
Article Google Scholar
Lei Y, Shen H, Xie S, Li Y (2018) Rayleigh wave dispersion curve inversion combining with GA and DSL. J Seism Explor 27(2):151–165
Google Scholar
Levenberg K (1944) A method for the solution of certain non-linear problems in least squares. Q Appl Math 2(2):164–168
Article Google Scholar
Li C, Yao H, Fang H, Huang X, Wan K, Zhang H, Wang K (2016) 3D near-surface shear-wave velocity structure from ambient-noise tomography and borehole data in the Hefei Urban Area, China. Seismol Res Lett 87(4):882–892
Lin FC, Li D, Clayton RW, Hollis D (2013) High-resolution 3D shallow crustal structure in Long Beach, California: application of ambient noise tomography on a dense seismic array. Geophysics 78(4):Q45–Q56
Lomax A, Snieder R (1994) Finding sets of acceptable solutions with a genetic algorithm with application to surface wave group dispersion in Europe. Geophys Res Lett 21(24):2617–2620
Lu Y, Peng S, Du W, Zhang X, Ma Z, Lin P (2016) Rayleigh wave inversion using heat-bath simulated annealing algorithm. J Appl Geophys 134:267–280
Article Google Scholar
Lunedei E, Malischewsky P (2015) A review and some new issues on the theory of the h/V technique for ambient vibrations. In: Perspectives on european earthquake engineering and seismology. Springer, Cham, pp 371–394
Magistrale H (2000) The SCEC southern California reference three-dimensional seismic velocity model version 2. B Seismol Soc Am 90(6B):S65–S76
Article Google Scholar
Maklad M, Yokoi T, Hayashida T, ElGabry MN, Hassan HM, Hussein H, Fattah TA, Rashed M (2020) Site characterization in Ismailia, Egypt using seismic ambient vibration array. Eng Geol 105874:279
Google Scholar
Malinverno A (2002) Parsimonious Bayesian Markov chain Monte Carlo inversion in nonlinear geophysical problems. Geophys J Int 151:675–688
Article Google Scholar
Marquardt DW (1963) An algorithm for least-squares estimation of nonlinear parameters. J Soc Ind Appl Math 11(2):431–441
Article Google Scholar
Martin AJ, Diehl JG (2004) Practical experience using a simplified procedure to measure average shear-wave velocity to a depth of 30 meters (VS30). In: 13th world conf. on earthquake engineering, International Association for Earthquake Engineering Tokyo
McGillivray PR, Oldenburg D (1990) Methods for calculating Fréchet derivatives and sensitivities for the non-linear inverse problem: a comparative study 1. Geophys Prospect 38(5):499–524
Menke W (2018) Geophysical data analysis: discrete inverse theory. Academic Press
Métivier L, Brossier R, Merigot Q, Oudet E, Virieux J (2016) An optimal transport approach for seismic tomography: application to 3d full waveform inversion. Inverse Probl 32 (11):115008
Article Google Scholar
Molnar S, Dosso SE, Cassidy JF (2010) Bayesian inversion of microtremor array dispersion data in Southwestern British Columbia. Geophys J Int 183:923–940
Article Google Scholar
Molnar S, Dosso SE, Cassidy JF (2013) Uncertainty of earthquake site amplification via Bayesian inversion of surface seismic data. Geophysics 78:WB37–WB48
Molnar S, Cassidy J, Castellaro S, Cornou C, Crow H, Hunter J, Matsushima S, Sánchez-Sesma F, Yong A (2018) Application of microtremor horizontal-to-vertical spectral ratio (MHVSR) analysis for site characterization: state of the art. Surv Geophys 39(4):613–631
Article Google Scholar
Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313
Article Google Scholar
Nguyen TD, Tran KT (2018) Site characterization with 3d elastic full-waveform tomography. Geophysics 83(5):R389–R400
Article Google Scholar
Nolet G (2012) Seismic tomography: with applications in global seismology and exploration geophysics, vol 5. Springer Science & Business Media, Berlin
Google Scholar
Ohori M, Nobata A, Wakamatsu K (2002) A comparison of ESAC and FK methods of estimating phase velocity using arbitrarily shaped microtremor arrays. B Seismol Soc Am 92(6):2323–2332
Article Google Scholar
Olivier G, Chmiel M, Brenguier F, Roux P, Mordret A, Dales P, Lecocq T, Djamaludin I (2018) Processing passive seismic data recorded on a dense array for CCS site characterization. In: SEG technical program expanded abstracts 2018. Society of Exploration Geophysicists, pp 3002–3006
O’Neill A, Matsuoka T (2005) Dominant higher surface-wave modes and possible inversion pitfalls. J Environ Eng Geoph 10(2):185–201
Article Google Scholar
Palmer SM, Atkinson GM (2020) The high-frequency decay slope of spectra (kappa) for m≥ 3.5 earthquakes on rock sites in eastern and western canada. B Seismol Soc Am 110(2):471–488
Article Google Scholar
Pan Y, Gao L, Bohlen T (2018) Time-domain full-waveform inversion of Rayleigh and Love waves in presence of free-surface topography. J Appl Geophys 152:77–85
Article Google Scholar
Parker L, Thurber C, Zeng X, Li P, Lord N, Fratta D, Wang H, Robertson M, Thomas A, Karplus M et al (2018) Active-source seismic tomography at the Brady Geothermal Field, Nevada, with dense nodal and fiber-optic seismic arrays. Seismol Res Lett 89(5):1629–1640
Article Google Scholar
Parker RL (1977) Understanding inverse theory. Ann Rev Earth Planet Sci 5(1):35–64
Article Google Scholar
Parker RL (1994) Geophysical inverse theory, vol 1. Princeton University Press
Parolai S, Picozzi M, Richwalski S, Milkereit C (2005) Joint inversion of phase velocity dispersion and h/V ratio curves from seismic noise recordings using a genetic algorithm, considering higher modes. Geophys Res Lett 32(1)
Parolai S, Richwalski SM, Milkereit C, Fäh D (2006) S-wave velocity profiles for earthquake engineering purposes for the Cologne area (Germany). B Earthq Eng 4(1):65–94
Article Google Scholar
Pei D, Louie JN, Pullammanappallil SK (2007) Application of simulated annealing inversion on high-frequency fundamental-mode Rayleigh wave dispersion curves. Geophysics 72(5):R77–R85
Article Google Scholar
Picozzi M, Albarello D (2007) Combining genetic and linearized algorithms for a two-step joint inversion of Rayleigh wave dispersion and h/V spectral ratio curves. Geophys J Int 169 (1):189–200
Article Google Scholar
Picozzi M, Parolai S, Bindi D, Strollo A (2009) Characterization of shallow geology by high-frequency seismic noise tomography. Geophys J Int 176(1):164–174
Article Google Scholar
Pilz M, Fäh D (2017) The contribution of scattering to near-surface attenuation. J Seismol 21(4):837–855
Article Google Scholar
Pilz M, Parolai S, Picozzi M, Bindi D (2012) Three-dimensional shear wave velocity imaging by ambient seismic noise tomography. Geophys J Int 189(1):501–512
Article Google Scholar
Pilz M, Parolai S, Woith H (2017) A 3-D algorithm based on the combined inversion of Rayleigh and Love waves for imaging and monitoring of shallow structures. Geophys J Int 209(1):152–166
Poovarodom N, Plalinyot N (2013) Site characterization in the Greater Bangkok area by microtremor observations. J Earthq Eng 17(2):209–226
Article Google Scholar
Quijano J, Dosso SE, Dettmer J, Holland CW (2016) Geoacoustic inversion for the seabed transition layer using a Bernstein polynomial model. J Acoust Soc Am 140:4073–4084
Article Google Scholar
Rawlinson N, Sambridge M (2004) Wave front evolution in strongly heterogeneous layered media using the fast marching method. Geophys J Int 156(3):631–647
Article Google Scholar
Rawlinson N, Fichtner A, Sambridge M, Young MK (2014) Seismic tomography and the assessment of uncertainty. Advances in geophysics 55:1–76
Article Google Scholar
Renalier F, Jongmans D, Savvaidis A, Wathelet M, Endrun B, Cornou C (2010) Influence of parameterization on inversion of surface wave dispersion curves and definition of an inversion strategy for sites with a strong V-S contrast. Geophysics 75(6):B197–B209
Rickwood P, Sambridge M (2006) Efficient parallel inversion using the Neighbourhood Algorithm. Geochem Geophy Geosy 7(11):Q11001
Article Google Scholar
Roux P, Wathelet M, Roueff A (2011) The San Andreas Fault revisited through seismic-noise and surface-wave tomography. Geophys Res Lett 38:L13319
Article Google Scholar
Ryden N, Park CB (2006) Fast simulated annealing inversion of surface waves on pavement using phase-velocity spectra. Geophysics 71(4):R49–R58
Article Google Scholar
Salomón J, Pastén C, Ruiz S, Leyton F, Sáez M, Rauld R (2020) Shear wave velocity model of the ABANICO formation underlying the Santiago city metropolitan area, Chile, using ambient seismic noise tomography. Geophys J Int. https://doi.org/10.1093/gji/ggaa600
Sambridge M (1999) Geophysical inversion with a neighbourhood algorithm-I. Searching a parameter space. Geophys J Int 138(2):479–494
Article Google Scholar
Sambridge M (2001) Finding acceptable models in nonlinear inverse problems using a neighbourhood algorithm. Inverse Probl 17(3):387–404
Article Google Scholar
Sambridge M, Mosegaard K (2002) Monte Carlo methods in geophysical inverse problems. Reviews of Geophysics 40:3–1–3–29
Sambridge M, Gallagher K, Jackson A, Rickwood P (2006) Trans-dimensional inverse problems, model comparison and the evidence. Geophys J Int 167:528–542
Article Google Scholar
Savvaidis A, Makra K, Klimis N, Zargli E, Kiratzi A, Theodoulidis N (2018) Comparison of VS30 using measured, assigned and proxy values in three cities of Northern Greece. Eng Geol 239:63–78
Article Google Scholar
Schevenels M, Lombaert G, Degrande G, Francois S (2008) A probabilistic assessment of resolution in the SASW test and its impact on the prediction of ground vibrations. Geophys J Int 172:262–275
Article Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Article Google Scholar
Shirzad T, Hossein Shomali Z (2014) Shallow crustal structures of the Tehran basin in Iran resolved by ambient noise tomography. Geophys J Int 196(2):1162–1176
Socco LV, Boiero D (2008) Improved Monte Carlo inversion of surface wave data. Geophys Prospect 56:357–371
Article Google Scholar
Song ZM, Williamson PR, Pratt RG (1995) Frequency-domain acoustic-wave modeling and inversion of crosshole data: Part II Inversion method, synthetic experiments and real-data results. Geophysics 60(3):796–809
Article Google Scholar
Spica ZJ, Perton M, Martin ER, Beroza GC, Biondi B (2020) Urban seismic site characterization by fiber-optic seismology. J Geophys Res-Sol Ea 125(3):e2019JB018656
Tarantola A (1984) Inversion of seismic reflection data in the acoustic approximation. Geophysics 49(8):1259–1266
Article Google Scholar
Tarantola A (2005) Inverse problem theory and methods for model parameter estimation. SIAM
Thomson WT (1950) Transmission of elastic waves through a stratified solid medium. J Appl Phys 21(2):89–93
Article Google Scholar
Tran KT, Hiltunen DR (2012) Two-dimensional inversion of full waveforms using simulated annealing. J Geotech Geoenviron 138(9):1075–1090
Article Google Scholar
Treitel S, Lines L (2001) Past, present, and future of geophysical inversion—a new millennium analysis. Geophysics 66(1):21–24
Article Google Scholar
Trifunac MD (2016) Site conditions and earthquake ground motion–a review. Soil Dyn Earthq Eng 90:88–100
Article Google Scholar
Tromp J, Tape C, Liu Q (2005) Seismic tomography, adjoint methods, time reversal and banana-doughnut kernels. Geophys J Int 160(1):195–216
Article Google Scholar
Van Laarhoven PJ, Aarts EH (1987) Simulated annealing. In: Simulated annealing: theory and applications. Springer, pp 7–15
Vantassel JP, Cox BR (2021) SWInvert: a workflow for performing rigorous 1-D surface wave inversions. Geophys J Int 224(2):1141–1156
Article Google Scholar
Virieux J (1986) P-SV wave propagation in heterogeneous media: velocity-stress finite-difference method. Geophysics 51(4):889–901
Article Google Scholar
Virieux J, Operto S (2009) An overview of full-waveform inversion in exploration geophysics. Geophysics 74(6):WCC1–WCC26
Wathelet M (2008) An improved neighborhood algorithm : parameter conditions and dynamic scaling. Geophys Res Lett 35:L09301
Article Google Scholar
Wathelet M, Jongmans D, Ohrnberger M (2004) Surface wave inversion using a direct search algorithm and its application to ambient vibration measurements. Near Surf Geophys 2:211–221
Wathelet M, Chatelain JL, Cornou C, Di Giulio G, Guillier B, Ohrnberger M, Savvaidis A (2020) Geopsy: a user-friendly open-source tool set for ambient vibration processing. Seismol Res Lett 91(3):1878–1889
Article Google Scholar
Williams R, Stephenson W, Frankel A, Cranswick E, Meremonte M, Odum J (2000) Correlation of 1-to 10-hz earthquake resonances with surface measurements of S-wave reflections and refractions in the upper 50 m. B Seismol Soc Am 90(5):1323–1331
Article Google Scholar
Wills C, Clahan K (2006) Developing a map of geologically defined site-condition categories for California. B Seismol Soc Am 96(4A):1483–1501
Article Google Scholar
Xia J, Miller RD, Park CB (1999) Estimation of near-surface shear-wave velocity by inversion of Rayleigh waves. Geophysics 64(3):691–700
Article Google Scholar
Xia J, Miller RD, Park CB, Hunter JA, Harris JB, Ivanov J (2002a) Comparing shear-wave velocity profiles inverted from multichannel surface wave with borehole measurements. Soil Dyn Earthq Eng 22(3):181–190
Article Google Scholar
Xia J, Miller RD, Park CB, Tian G (2002b) Determining Q of near-surface materials from Rayleigh waves. J Appl Geophys 51(2-4):121–129
Article Google Scholar
Xia J, Miller RD, Park CB, Tian G (2003) Inversion of high frequency surface waves with fundamental and higher modes. J Appl Geophys 52(1):45–57
Article Google Scholar
Yamanaka H (2005) Comparison of performance of heuristic search methods for phase velocity inversion in shallow surface wave method. J Environ Eng Geoph 10(2):163– 173
Article Google Scholar
Yamanaka H, Ishida H (1996) Application of genetic algorithms to an inversion of surface-wave dispersion data. B Seismol Soc Am 86(2):436–444
Google Scholar
Yong A (2016) Comparison of measured and proxy-based VS 30 values in california. Earthq Spectra 32(1):171–192
Article Google Scholar
Yong A, Hough SE, Iwahashi J, Braverman A (2012) A terrain-based site-conditions map of California with implications for the contiguous United States. B Seismol Soc Am 102 (1):114–128
Article Google Scholar
Yoshizawa K, Kennett B (2002) Determination of the influence zone for surface wave paths. Geophys J Int 149(2):440–453
Article Google Scholar
Zomorodian SA, Hunaidi O (2006) Inversion of SASW dispersion curves based on maximum flexibility coefficients in the wave number domain. Soil Dyn Earthq Eng 26(8):735–752
Article Google Scholar

Download references

Acknowledgments

This is Natural Resources Canada (NRCan) contribution number 20200797. The Consortium of Organizations for Strong Motion Observation Systems (COSMOS) consisting of the U.S. Geological Survey, the Geological Survey of Canada, and a group of North American power utility companies (Southern California Edison and Pacific Gas and Electric) identified the need for the development of this publication and contributed funding and encouragement to facilitate this project. This work is also partially supported by the Natural Science and Engineering Research Council of Canada through a Vanier Canada Graduate Scholarship to JMG.

Author information

Authors and Affiliations

Department of Earth and Environmental Sciences, University of Ottawa, Ottawa, ON, Canada
Jeremy M. Gosselin
School of Earth and Ocean Sciences, University of Victoria, British Columbia, Canada
Stan E. Dosso & John F. Cassidy
Department of Civil Engineering, Middle East Technical University, Ankara, Turkey
Aysegul Askan
Université Grenoble Alpes, Université Savoie Mont Blanc, CNRS, IRD, IFSTTAR, ISTerre, Grenoble, France
Marc Wathelet
Bureau of Economic Geology, The University of Texas at Austin, Austin, TX, USA
Alexandros Savvaidis
Pacific Geoscience Centre, Geological Survey of Canada, Natural Resources Canada, British Columbia, Canada
John F. Cassidy

Authors

Jeremy M. Gosselin
View author publications
You can also search for this author in PubMed Google Scholar
Stan E. Dosso
View author publications
You can also search for this author in PubMed Google Scholar
Aysegul Askan
View author publications
You can also search for this author in PubMed Google Scholar
Marc Wathelet
View author publications
You can also search for this author in PubMed Google Scholar
Alexandros Savvaidis
View author publications
You can also search for this author in PubMed Google Scholar
John F. Cassidy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jeremy M. Gosselin.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Gosselin, J.M., Dosso, S.E., Askan, A. et al. A review of inverse methods in seismic site characterization. J Seismol 26, 781–821 (2022). https://doi.org/10.1007/s10950-021-10047-8

Download citation

Received: 02 March 2021
Accepted: 07 September 2021
Published: 19 April 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s10950-021-10047-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A review of inverse methods in seismic site characterization

Abstract

Similar content being viewed by others

Flexible multimethod approach for seismic site characterization

A review of near-surface QS estimation methods using active and passive sources

Seismic site characterization with shear wave (SH) reflection and refraction methods

1 Introduction

2 Theoretical overview

2.1 Models and data

2.2 The forward problem

2.3 Errors and misfit

2.4 Parameterization

2.5 Practical considerations

3 Linearized inversion

4 Non-linear optimization

4.1 Downhill simplex

4.2 Global search: simulated annealing and genetic algorithms

4.3 Neighborhood algorithm

5 Bayesian inversion for probabilistic site characterization

5.1 Non-linear inversion: MCMC sampling

5.2 Model selection: Bayesian information criterion

5.3 Model selection: Trans-D inversion

5.4 Error model and likelihood function

5.5 Examples

6 Tomography

7 Full waveform inversion

7.1 FWI for shear-wave velocity and anelastic properties in heterogeneous basins

7.2 Design of the solution approach for large-scale FWI: Numerical challenges and remedies

7.3 Outlook on FWI for site characterization

8 Summary and conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation