The Interface Between Empirical and Simulation-Based Ground-Motion Models

Ground-motion models (GMMs) are a key driver for the results of probabilistic seismic hazard analyses and their uncertainty. GMMs that bridge seismological and empirical approaches are an effective tool to represent the distribution of ground motion and its uncertainty in seismic hazard assessment. A methodology is presented that uses ground-motion data recorded at seismograph sites in eastern North America and shows how they can be used to calibrate simple scalable seismological models of ground-motion generation and propagation. Such GMMs can directly account for the gross features of source scaling (magnitude and stress parameter), attenuation, site response, and kappa effects. It is shown that, by application of appropriate GMM strategies, sigma (aleatory uncertainty) could be greatly reduced, resulting in lower calculated hazard for nuclear plants founded on rock. This reduction in sigma requires that high-quality seismic monitoring (e.g., broadband seismograph stations) be installed and operated over a period of years (in addition to strong-motion stations), and that an ongoing investment be made in data analysis and targeted GMM development using the data.


Introduction
Ground-motion models (GMMs), also referred to as ground-motion prediction equations, are a key component in probabilistic seismic hazard analysis (PSHA), and often the most important uncertainty affecting PSHA results. GMMs provide median estimates of ground-motion amplitudes as a function of explanatory variables such as magnitude, distance, and site conditions, along with estimates of variability. Empirical GMMs are commonly used in datarich regions such as California and Japan; For instance, the second-generation Pacific Earthquake Engineering Research-Next Generation Attenuation-West (NGA-W2) project  includes empirical GMMs for crustal earthquakes in active tectonic regions Boore et al. 2014;Campbell and Bozorgnia 2014;Chiou and Youngs 2014;Idriss 2014), and is widely used in practice (e.g., Petersen et al. 2015).
An alternative method, commonly used in datapoor regions, is to derive a GMM using a simulationbased approach, in which a seismological model is calibrated with a set of empirical data. The advantage of such an approach is that robust magnitude and distance scaling behaviors can be imposed, whilst accommodating regional features that can be determined from limited available data. There are numerous examples of such simulation-based models in practice, including stochastic point-source models (Boore 1983;Atkinson and Boore 1995;Toro et al. 1997;Boore 2003) and finite-source stochastic and broadband simulations (Beresnev and Atkinson 1997;Motazedian and Atkinson 2005;Assatourians and Atkinson 2007;Frankel 2009Frankel , 2015. Note that, for stochastic models, simulations are not always required, as a random process statistical model can also be applied. Yenier and Atkinson (2015a, b) developed a regionally adjustable generic GMM based on the concept of equivalent point-source simulations. They derived a robust simulation-based GMM that can be adjusted to different regions by modifying the seismological input parameters (e.g., geometrical spreading, stress parameter, and calibration factor models) and examined the applicability of the model for earthquakes in California and central and eastern North America (Yenier and Atkinson 2015b). The parameters for the generic GMM were originally defined by calibrating a seismological model to match the empirical ground-motion amplitudes recorded in California (Yenier and Atkinson 2015a).
Specifically, Yenier and Atkinson (2015a) used the rich California ground-motion database to define elements of the functional form and calibrate the overall model scaling behavior in magnitude and distance. They also determined the geometric spreading (including a term to model the effects of near-distance saturation), anelastic attenuation, and stress parameter models that describe ground-motion amplitudes for California. The model was parametrized in a way that isolates the effects of magnitude scaling, stress parameter scaling, geometrical spreading, and anelastic attenuation on ground-motion amplitudes, so that the approach can be readily transported to other regions by modifying just a few regional source and attenuation parameters. Figure 1 illustrates the YA15 (Yenier and Atkinson 2015a) equivalent point-source model GMM for California and active crustal regions (for B/C site conditions of 760 m/s), in comparison with the underlying NGA-W2 data used in model development. The generic GMM matches the data as well as strictly empirical GMMs developed from the NGA-W2 database and has the added benefit of being parameterized by simple seismological parameters.

The Generic Ground-Motion Model: A Calibrated
Equivalent Point-Source Approach Atkinson et al. (2015) used the generic equivalent point-source approach of Yenier and Atkinson (YA15, Fig. 1 where ln(Y) is the (natural) logarithm of a groundmotion intensity measure, such as peak ground acceleration (PGA), velocity (PGV), and 5 %damped pseudospectral acceleration (PSA) at a selected oscillator frequency. F E , F Z , and F S are the model components for earthquake source, geometrical spreading, and site amplification, respectively. The anelastic attenuation (c) and empirical calibration (C) coefficients are frequency dependent. The C term is an empirical constant that scales the simulation amplitudes to match the amplitude of the observations. The source (F E ) and geometrical spreading (F Z , including near-distance saturation) terms are constrained in their scaling behavior by the equivalent point-source simulations that were validated using the rich empirical database from California (e.g., Fig. 1). Regional ground-motion data for ENA were inverted to determine the anelastic attenuation coefficients (c), site amplification model (F S ), and calibration constant (C). The source term (F E ) isolates the effects of magnitude and stress parameter on the ground-motion amplitudes: where F M represents the magnitude scaling term, ignoring near-distance-saturation effects, and F Dr represents the stress parameter scaling term. The F M term is a function of moment magnitude (M) and is defined using a hinged-quadratic functional form that follows an empirical form from data-rich regions (e.g., Boore et al. 2014): where the hinge magnitude, M h , and the model coefficients, e 0 , e 2 , and e 3 , are coefficients that are specified for each oscillator frequency (see ''Appendix'').
High-frequency ground-motion amplitudes relative to low-frequency amplitudes are controlled by the stress parameter (Boore 2003). The stress adjustment term is defined as where e Dr describes the rate of the ground-motion scaling with the stress parameter (Dr). The values of e Dr as determined from the simulations have a variability in magnitude and frequency that is rather complicated, and the shape of the function differs depending on whether one is upscaling or downscaling the stress parameter. The shape can be described by a polynomial: where s 0 to s 9 are frequency-dependent coefficients. Geometrical spreading effects are modeled using an equivalent point-source distance metric: where h is a pseudodepth term that accounts for distance saturation effects. The pseudodepth term is adopted from inversion results for active regions (Yenier and Atkinson 2015a), for which there are sufficient data to constrain such effects: It is important to note that, because we model geometric spreading using an equivalent point-source distance, the geometric spreading term implicitly includes the near-distance saturation effects attributable to finite-fault effects for large events. For small to moderate events, D rup is approximately equal to the hypocentral distance (D hypo ), and the geometric spreading is that of a classic point source.
The geometrical spreading function (F Z ) is where Z represents the geometrical attenuation of Fourier amplitudes, whilst the multiplicative component, (b 3 ? b 4 M)ln(R/R ref ), accounts for the Vol. 177, (2020) The Interface between Empirical and Simulation-Based Ground-Motion Models 2071 change in the apparent attenuation that occurs when ground motions are modeled in the response spectral domain rather than the Fourier domain. R ref is the reference effective distance, given as . Z is a hinged bilinear model that provides for a transition from direct-wave spreading to surfacewave spreading of reflected and refracted waves, beyond the critical distance for reflections from the Moho: where R t represents the transition distance (= 50 km), and b 1 (= -1.3) and b 2 (= -0.5) are the geometrical attenuation rates of Fourier amplitudes at R B R t and R [ R t , respectively. Note that the coefficients describing geometric spreading and anelastic attenuation can be determined in ENA from empirical data for small to moderate earthquakes. The site effects (F S ) are given relative to a reference site condition, in this case hard rock (traveltime weighted average shear-wave velocity over the top 30 m, V S30 * 2000 m/s); this is the site condition corresponding to most seismograph records in eastern Canada. The approach taken in Atkinson et al. (2015) was to use regression to determine site terms directly from the observations, along with the regional coefficients for attenuation.
The key attribute of the methodology behind the generic GMM is that most of the magnitude and distance scaling terms are fixed by previously calibrated simulation studies in data-rich regions, whilst a select few parameters-specifically the average stress parameter, anelastic attenuation, and calibration constant-are fine-tuned for the region of interest. In other words, we calibrate a well-behaved and validated generic model for a specific region of interest; the calibration can be accomplished using limited data on amplitude levels, site attributes, and attenuation.
The generic GMM developed for rock sites in eastern Canada is illustrated in Fig. 2 and compared with the corresponding functions for California. The comparison is made at high frequencies, for which rock amplitudes are significantly higher in ENA than in California due to a larger average value of stress parameter. At low frequencies (not shown), ENA and California amplitudes agree more closely. Note that the available data range for observations in ENA is limited, so for larger magnitudes the overall scaling behavior is effectively constrained by the underlying seismological model, which was calibrated for events in California of M 3.0-7.5.
Note that, because the generic GMM is based on an equivalent point-source concept, it will not adequately reproduce site-specific finite-fault attributes that would be important for sites that are very near to large potentially active faults. Such features could include directivity effects and strong coherent pulses, for example. Instead, the equivalent point source represents just the average of all such effects, via calibration to the California database. This representation is appropriate for most sites in ENA, for which the hazard is dominated by moderate events occurring on unknown faults within an areal source zone. More complex simulation models may be required for sites at which the hazard is influenced by specific nearby fault sources. Another model limitation is that finite-fault effects such as the neardistance saturation are assumed to be transferable from one region to another. This may not be completely true, since high-stress regions imply smaller faults than low-stress regions, and thus the near-distance saturation effects may be weaker in ENA than those observed in California. Such limitations of the equivalent point-source model with respect to the treatment of finite-fault effects are implicitly considered second-order effects, which we do not attempt to capture in the generic GMM approach. Hassani and Atkinson (2018) further generalized the generic GMM to enhance its usefulness for a wider range of regions and site conditions, by including a new term in the GMM to account for the effects of the near-surface attenuation parameter (j 0 ) on the response spectral domain ground-motion amplitudes (PSA) as well as on the ground-motion peak amplitudes (PGA and PGV): The kappa term (F j 0 ) models the effects of nearsurface high-frequency attenuation effects (j 0 ) (Anderson and Hough 1984; Van Houtte et al. 2011) in the response spectral domain. The interplay between the stress parameter and kappa is what controls the amplitudes of ground motions at high frequencies. This interplay is illustrated in Fig. 3 (see Boore 2003 for details).
The inclusion of a kappa term makes the GMM somewhat more complicated but facilitates adjustment of the j 0 value within the modified generic GMM to model a broader range of regions and reference site conditions. For further details of the use of this form, the reader is referred to Hassani and Atkinson (2018). Another advantage of having a j 0 term within the GMM is the ability to invert for the j 0 value using the response spectral amplitudes. Figure 4 shows the response spectral amplitudes at near-source distance (D rup = 1 km) for Dr = 100 bar and different j 0 values, for the adjustment model of Hassani and Atkinson (2018), over a wide range of magnitudes. This illustrates how maximum groundmotion amplitudes (before attenuation by path effects) are influenced by kappa in the response spectral domain. Effects can be pronounced at f [ 10 Hz; For example, based on calculations with the model (not shown), for an event of M = 6 having a stress parameter of 300 bar, median 20-Hz PSA at 10 km would be * 490 cm/s 2 for a very hard rock site with j 0 = 0.002. This scenario is the type of event that contributes significantly to hazard for nuclear sites in ENA situated on hard rock. However, it has been suggested that kappa values on some rock sites may be significantly higher than on others. For the same scenario event at a site for which j 0 = 0.01, the median PSA would be only * 330 cm/s 2 . Thus, kappa is an important high-frequency site parameter for rock sites.
The generic GMM model as presented in the foregoing is a useful way to encapsulate simple seismological models into a convenient functional form that facilitates the modeling of key source effects (M, stress parameter), path effects (geometric spreading and anelastic attenuation), and site effects (near-surface amplification and kappa), without the need to repeat simulations. It can be calibrated to limited regional ground-motion observations to provide a complete and robust GMM that follows scaling constraints in magnitude-distance space as established by empirical GMMs in data-rich regions. As such, it forms a practical and effective bridge between empirical and simulation-based modeling approaches.

Aleatory Uncertainty
Seismic hazard is driven not only by median ground motions but also by their uncertainty. Uncertainty is, partly by convention, partitioned into components expressing random variability about the median (aleatory uncertainty) and uncertainty regarding the true median values (epistemic uncertainty) (Bommer and Scherbaum 2008;Strasser et al. 2009). These uncertainties imply that there is a significant probability of receiving ground motions much larger than those expressed by the median GMM. The aleatory uncertainty can be appreciated by inspection of Figs. 1 and 2, which show that amplitudes a factor of two or more above the median are not unusual.
The aleatory uncertainty is expressed by sigma, the standard deviation of residuals [defined as the ln(observed) -ln(predicted) amplitudes]. The total Vol. 177, (2020) The Interface between Empirical and Simulation-Based Ground-Motion Models 2073 sigma can be partitioned into components that express between-event variability (s) and withinevent variability (u). s reflects the fact that some events are stronger than others due to their source attributes, such as a higher or lower stress parameter, whilst u reflects deviations from the median attenuation curve within a single event. u is sometimes further subdivided into components expressing the within-event variability component for a single station (u SS ) and the site-to-site variability (u S2S ); u S2S represents the systematic deviation of the ground motion at a specific site from the median event-corrected ground motion predicted by the GMM (in which only a general site-class model is included).
The reader is referred to Al Atik et al. (2010) for details. Note that the values of the aleatory uncertainty components are themselves subject to epistemic uncertainty; that issue is not addressed here. An interesting point is that u S2S and s could both be considered largely epistemic in nature, as they represent systematic departures that may be predictable with improved knowledge. Specifically, u S2S is attributable to site-specific amplification, which can be measured relative to the predictions of a GMM for a generic site condition. In the case of s, the hazard at a specific site may be dominated by a specific source having repeatable source attributes that could, at least in theory, be defined by site-and source-specific studies, leading to low between-event variability for the considered source. Atkinson (2006) showed that, by restricting a site-specific GMM to consider only earthquakes from a single source, the variability was significantly reduced (beyond that obtained by considering a single site). There is strong motivation to reduce sigma to its lowest possible values at nuclear sites, because the value of sigma significantly impacts the PSHA results. For plants that are founded on hard rock, such as those in eastern Canada and some parts of the eastern USA, this can be accomplished by investing in seismic monitoring in the plant region, including in the plant vicinity, and focusing the GMM development on rock sites. In this approach, recorded seismographic data from regional earthquakes are used to calibrate the GMM, with the regression analysis including the derivation of the site-specific amplification term (as a function of frequency) for each site, relative to the GMM. To ensure stable sitespecific amplifications, a minimum of five regional events (i.e., M [ 3 within a few hundred km) should be recorded at each station. This will suffice to obtain the linear amplification for each site; the nonlinear component is typically determined separately based on either an empirical or analytical model (e.g., Harmon et al. 2018). An illustration of the approach is provided in Atkinson et al. (2015). They show that the total variability of the resulting GMM, which includes the site-specific amplification model at each station, is 0.50-0.58 ln units for events of small to moderate magnitude (M 3 to 6), recorded at distances to 500 km; this is significantly lower than the corresponding values for GMMs that do not model amplification on a site-specific basis (e.g., Goulet et al. 2017). By contrast to an approach that treats site response attributes directly within the GMM, inappropriate modeling of site effects will result in an inflated value of sigma. This is illustrated in Fig. 5, which was compiled using data on sigma from Hassani and  with respect to the rock GMM model for ENA as presented in this paper; some data on sigma for rock sites in the Charlevoix region relative to optimized GMMs from Atkinson (2013) are also shown. When all ENA data, including both rock and soil sites, are used to compute sigma, and we model site response in the GMM using only V S30 as a predictive variable, we attain high sigma values, about 0.8 ln units at high frequencies-which represents a factor of 2.2 in variability about the median for one standard deviation. If we improve the site model by including peak frequency of response for the site (f peak )-which in ENA is a more important site variable than V S30 -then we reduce the sigma by about 0.1 ln units at high frequencies. Most of this reduction comes from the u S2S component for soil sites. If we consider only rock-like sites (those with V S30 [ 1000 m/s), and model the site response at each specific station based on the seismographic data recorded at the site, the total sigma drops to values in the range of 0.5-0.6 ln units, or a factor of 1.7 about the median, again due largely to reduction of u S2S . Finally, if the GMM uses a magnitude scale based on high-frequency amplitudes (e.g., Nuttli magnitude), instead of moment magnitude, we can reduce the s component at higher frequencies (Atkinson 1995). As an example, Atkinson (2013) showed that total sigma for rock sites in the Charlevoix region, when modeled using Nuttli magnitude (MN), is * 0.5 ln units. Moreover, these sigma values are attained for small to moderate events, which typically have higher sigma than larger events, due to greater event-to-event variability in source parameters (e.g., Goulet et al. 2017). The conclusion from the foregoing discussion is that sigma could be greatly reduced for nuclear sites on rock, resulting in lower calculated hazard. However, this reduction can only be achieved if a highquality seismic monitoring network (e.g., broadband seismograph stations) is installed and operated over a period of years, and investment is also made in data analysis and targeted GMM development using the data. This has not been the approach taken in most parts of ENA to date. It should also be noted that, whilst reduction of sigma will lead to reductions in computed hazard for rock sites, and other sites with relatively low site amplification, the computed hazard may increase at sites where it is currently being underestimated by non-site-specific approaches.

Epistemic Uncertainty
Epistemic uncertainty in median GMMs has often been modeled using alternative equations (typically those derived by various authors and approaches), with model weights in a PSHA logic tree being used to represent the relative confidence in each alternative. However, this is not necessarily the best way to model epistemic uncertainty in GMMs (Bommer and Scherbaum 2008;Atkinson 2011;Atkinson and Adams 2013;Atkinson et al. 2014). An alternative often used in site-specific studies is to define a representative or central-branch GMM, along with upper and lower variants that express uncertainty about the central model. This approach offers more flexibility in expressing uncertainty in knowledge of the correct median GMM. The representative equation approach also has significant practical utility, enabling a complex problem to be represented by a minimum number of branches for hazard calculations, which is efficient and transparent. A drawback is that a significant degree of judgment need be exercised regarding the selection of the central model and its upper and lower branches. However, such subjective judgments are equally important when using the alternative-GMM approach, as the selection and weighting of alternative models is also a process based on subjective judgment. To get around the drawbacks of both the representative suite and alternative GMM approaches, a more sophisticated and objective approach to representing model alternatives, based on Sammon's mapping of predicted amplitudes in higher-order dimensions, has been used in some projects, such as the NGA-East project (Goulet et al. 2017). This is a powerful approach but not easy to implement; it is also cumbersome to adjust the model on a site-specific basis as more information is obtained.
To some extent, the details of the method used to represent epistemic uncertainty may not be of critical importance. Sensitivity tests indicate that it is the range covered by the GMM models and their relative weights that are important to the PSHA results, not the mechanics of how they are treated (Atkinson and Adams 2013). An additional consideration is that the division of ground-motion uncertainty into its epistemic and aleatory components is ambiguous and nonunique, because some factors of the total uncertainty could be cast into either component (Strasser et al. 2009). In contemplating the epistemic versus aleatory subdivision, a factor to consider is that, Vol. 177, (2020) The Interface between Empirical and Simulation-Based Ground-Motion Models 2077 whilst GMMs cover a range of possible magnitudedistance scenarios, the actual ''design event'' that an individual structure may be required to withstand is really a single unknown future event that will have specific source, path, and site attributes, at some sigma level. This is because large potentially damaging earthquakes are rare events, and most structures will be expected to withstand only one such event in their design life. Moreover, for a nuclear plant, there would be inspection and repair immediately following any strong event and a reset of the facility's capacity. We do not know in advance the specifics of the event that the plant may experience, but we can model this uncertainty within the context of PSHA in a few ways. The simplest, assuming we also wish to use site-targeted aleatory uncertainty as described in the previous section, is to use the representative suite approach to define the epistemic uncertainty about the central branch GMM. In concept, the interevent components of uncertainty (most of the s component of aleatory variability) could be largely modeled as epistemic, reflecting uncertainty in the source characteristics that are expected to be realized in future events through the alternative GMM branches. In this case, the aleatory uncertainty would represent just the variability of observations about a median eventspecific prediction equation for a single station (i.e., u SS ). Considering the variability components of Fig. 5, careful source-driven modeling of epistemic uncertainty might reduce the aleatory component to the range of 0.4-0.5 ln units. Such an approach presupposes that we can model epistemic uncertainty in ground motions from future events through definition of the distribution of earthquake source, path, and site parameters of the GMM model. The generic GMM approach outlined here is a practical way to implement such distributions in Monte Carlo PSHA software, in which simulated earthquake catalogs and their generated ground motions at a site are used to calculate the ground-motion distribution at a site (Musson 1999;Assatourians and Atkinson 2013). For each simulated earthquake magnitude and location, we would draw (by Monte Carlo, from a defined distribution) a value of stress parameter and a value of total attenuation from source to site (frequency dependent, including geometric spreading and anelastic effects). The aleatory uncertainty for calculations at a specified site (with known response characteristics from network observations) would be that attributable to u SS . This approach would allow PSHA to be more site specific in its application of epistemic and aleatory uncertainties.

Concluding Remarks
This paper has dealt with the use of conventional GMMs to define ground motion and its epistemic and aleatory uncertainty, within the context of contemporary PSHA methodology. In the longer term, a truly site-specific PHSA would be based on simulations that fully consider the source, path, and site attributes that govern ground motions for all potential future earthquake scenarios, eliminating the need for GMMs altogether (e.g., Atkinson 2012). However, such an approach will require substantial further improvement of our knowledge of earthquake source, path, and site processes. Until such advances can be achieved, GMMs that bridge seismological and empirical approaches are an effective tool to represent the distribution of ground motion and its uncertainty in seismic hazard assessment.