1 Article Highlights

  • The article introduces a framework for development of GMMs for induced seismicity, with case study on a UK shale gas site.

  • A stochastic model calibration algorithm is used to obtain optimised input parameters consistent with available data.

  • The article provides a suite of simulation models to account for the epistemic uncertainty in the process.

  • A physically adjusted GMPE was constructed using mixed-effect regression technique, using both empirical and simulation data.

2 Introduction

The growth of unconventional oil and gas resources and, more recently, green subsurface injection technologies, such as geothermal and Carbon Capture and Storage, has led to a rapid increase in the occurrence of induced seismicity. In rare cases, this has been shown to result in significant economic and social impacts (e.g. McGarr et al. 2015; Langenbruch et al. 2020). The potential for induced seismicity to cause damage to infrastructure and impact public safety alone highlights the need for accurate risk assessments (Hong et al. 2022). However, the potential nuisance aspect of induced seismicity in urban environments presents an additional emerging requirement to understand weak-motion ground shaking (Schultz et al. 2021). Ground motion models (GMMs) for induced seismicity therefore require reliable estimates of ground shaking from magnitudes where shaking near the epicentre may be felt and considered a nuisance (roughly \(M_W \ge 2\)) through to potentially damaging events (with the largest injection-induced event being the 2016 \(M_W\) 5.8 Pawnee, Oklahoma earthquake). Selecting or developing a suitable GMM is therefore crucial for modeling the ground motion in the region of interest. Strategies and criteria for selecting and adjusting existing GMMs for specific target regions aim to obtain a set of equations that can accurately represent the anticipated range of potential ground motions in the target area. Additionally, the adjusted GMMs should eliminate the impact of the systematic geological differences between host and target regions (e.g. Cotton et al. 2006; Bommer and Stafford 2020; Edwards and Douglas 2013). However, blind parametric adjustment of existing GMMs does not provide a satisfactory solution, as this typically results in over-fitting of limited local datasets. Adjustment should therefore be undertaken with caution and alongside careful analysis of the available data.

GMMs for injection-induced earthquakes should provide robust estimations of ground motion intensity from small- to moderate magnitude and shallow focus events (\(M_{W}\le 6\); focal depth \(Z_{top} \le 5\) km; and distance \(\le 30\) km). This distance-magnitude-depth range is poorly covered by existing ground motion models. Another challenge associated with the development of GMMs for induced earthquakes is that, with a focus on smaller magnitude shallow events, regional disparities (such as wave propagation and attenuation) become more apparent (Chiou et al. 2010). Thus, direct interpolation from existing ground motion prediction equations (GMPE) generally leads to unsatisfactory predictions. For instance, as shown by Bommer et al. (2010), there is a systematic tendency to over-predict motions at the lower limits of GMPEs’ valid magnitude range (and extrapolation below) (e.g. Bommer et al. 2007; Atkinson and Morrison 2009). As result, indirect approaches (e.g. Bommer et al. 2022b), often involving numerical simulations (Edwards et al. 2019b) , are preferred. A further advantage is that such approaches allow flexibility in incorporating various levels of complexity, such as site-specific non-linear amplification. Fundamentally, this approach therefore offers the ability to incorporate the physical mechanisms that generate ground motion.

Fig. 1
figure 1

Framework for developing a new ground motion model in this study

This paper demonstrates the development of a physically-based GMM for induced seismicity to im-prove seismic hazard assessment associated with shale gas sites at Preston New Road (PNR), Blackpool, United Kingdom. A framework to develop the GMM is introduced in Fig. 1. The main steps in this framework can be divided into 3 stages: 1) stochastic simulation of ground motions using an a priori seismological model; 2) calibration of optimal model parameters; and 3) development of a hybrid-GMM. In the initial stage, we collect the data and define the prior seismological model. The prior considers seismic attenuation and site diminution (\(\kappa _{0}\)) obtained from spectral analysis of PNR-induced seismicity records from 2018-2019 (Suroyo and Edwards 2023). Their model was developed based on a total of 192 weak-motion earthquakes associated with hydraulic fracturing activity at PNR. Out of a total of 192 events, 57 with magnitudes in the range of \(-0.8 \le M_{L} \le 1.5\) were linked to hydraulic fracturing (HF) at the first well, PNR-1z, while 135 events (\(-1.7 \le M_{L} \le 2.9\)) were induced due to HF in the second well (PNR-2) in 2019. These earthquakes were shallow earthquakes that were located up to 4 km in depth and were well-recorded over epicentral distances up to \(\sim \) 25 km (Fig. 2). By considering these physical models as a prior, we simulate preliminary ground motions corresponding to the PNR dataset (Fig. 2) using stochastic simulation tools [SMSIM, Boore (2003)]. To account for local site effects, we perform site correction following Boore et al. (2014) which is also described in detail in Seyhan and Stewart (2014). Evaluation of the prediction from the stochastic model was carried out by determining bias between simulated and observed ground motions. We additionally compare the observed bias with that results from predictions using other GMPEs (e.g. Atkinson 2015; Edwards et al. 2021. In Stage 2, a refinement of the simulation model was undertaken, which also aimed to explore its epistemic uncertainty by producing a suite of acceptable model parameters. Finally, pseudo-finite-fault stochastic simulations for events with \(1 \le M \le 6\) recorded at distances up to 30 km (synthetic dataset) were computed using the optimally calibrated model. Combining the simulated ground motions in the synthetic dataset along with the empirical data, we subsequently determine a hybrid-GMM (stage 3) using a mixed-effects regression. This is presented in the form of a GMPE to facilitate application to rapid calculations for hazard analyses. The performance evaluation ground motion prediction at various stages of model development are presented. The results of the study will be useful in improving hazard assessment models for the gas site, which will enable more accurate risk management and mitigation strategies.

Fig. 2
figure 2

Magnitude- distance distribution of the Preston New Road empirical event catalogue classified by the depth ranges (upper panel) and station networks (lower panel). LV: University of Liverpool; UR: British Geological Survey; SD and PNR: Cuadrilla PNR-1z and PNR-2 TLS networks, respectively

3 Stochastic earthquake ground motion prediction

Conventional GMPEs are developed through non-linear regression of strong motion data recorded in an area of interest. This approach is problematic when we have limited data. Although empirical GMPEs are often relatively simple and easy to use, these models also have limitations in extrapolating to areas or conditions outside the range of observed data. Keeping this in mind, predicting ground motion for induced seismicity, with specific focus on highly variable motions from small and moderate events at short distances (Edwards and Douglas 2013), is very challenging. Moreover, induced earthquakes may have different physical characteristics and wave propagation effects (due to properties of the shallow crust), to those applicable to natural earthquakes. However, due to the simplification of empirical-based models, any correction based on physical principles is inherently difficult to perform. Thus, physically-based adjustments to GMPE functional forms are often needed, as opposed to only changing the coefficients of an existing GMPE. This was highlighted by Edwards et al. (2021), during development of a calibrated empirical GMPE for PNR, who noted that changes to the functional form would be required to further improve their model. However, they deemed this to be impossible to constrain using the limited distance and magnitude range of data available.

Stochastic ground motion simulation has the potential to overcome such limitations. Apart from offering flexibility where we can simulate ground motions for various scenarios, the stochastic approach allows us to overcome the problem of data gaps or limited data. A commonly used stochastic method is provided by Boore (2003) (Stochastic-Method SIMulation or SMSIM), which is a simple yet powerful tool to simulate ground motion by combining the parametric or functional form of the ground motion’s spectral amplitude with a random phase. This approach relies on a seismological model, which gives a prediction of the frequency content of earthquake ground motion in the form of Fourier amplitudes (Boore 2003; Lam 2023). This is where the physics of the earthquake process and wave propagation are contained. The Fourier spectra are combined with a set of random phase angles to generate a synthetic accelerogram using an inverse Fourier Transform. From the generated synthetic accelerogram, ground motion intensity parameters (e.g. peak velocity, acceleration) can be determined. Rather than computing full time histories, peak ground motion parameters can, in practice, be determined using random vibration theory (RVT) (e.g. Edwards and Fäh 2013; Drouet and Cotton 2015). The benefit of GMMs underpinned by FAS seismological models is that they can be simply linked to physical processes (Baltay et al. 2017) and are therefore easier to modify and more defensible than empirical GMPEs (Bora et al. 2014). Moreover, it allows us to provide distributions in the underlying physical processes themselves, or parametrisation thereof, rather than rely on the characterisation of GMPE coefficients.

3.1 Description of the model

SMSIM is used to simulate ground motion records at the PNR site in terms of peak ground velocity (PGV), acceleration (PGA), and \(5\%\) damped response pseudospectral acceleration (PSA). The flexibility of the stochastic model is based on the ability to break down the total spectrum of the motion into a seismological model containing source, path, and site terms. They are mathematically represented by the following functional form of the acceleration spectrum:

$$\begin{aligned} \begin{aligned} A(f) = C \cdot M_{0} \cdot E(f)\cdot P(R,f)\cdot S(f) \end{aligned} \end{aligned}$$
(1)

where \(M_{0}\), E(f), P(Rf), and S(f) are seismic moment, earthquake source function, medium effect (decay of acceleration spectra due to geometrical spreading, anelastic attenuation, and elastic attenuation), and site effect, respectively. The constant C in the Eq. 1 is given by:

$$\begin{aligned} \begin{aligned} C = \frac{R_{\theta \phi } \cdot FS \cdot PRTITN}{4 \pi \rho \beta ^3} \end{aligned} \end{aligned}$$
(2)

with \(R_{\theta \phi }\) is the radiation pattern, we use an average value of 0.55 (Boore and Boatwright 1984). FS is the amplification due to the free surface (\(FS=2\)), PRTITN the reduction factor that accounts for the partitioning of energy into two horizontal components (taken as \(1/\sqrt{2}\) here), and \(\rho \) and \(\beta \) are the density and shear wave velocity, respectively.

3.1.1 Source parameters

Source spectra of earthquakes are primarily controlled by seismic moment (\(M_{0}\)) and corner frequency (\(f_{c}\)), itself a product of the stress drop (Boore 1983). The source spectral shape is given by:

$$\begin{aligned} \begin{aligned} E(f) = \frac{1}{\left( 1+\left( \frac{f}{f_{c}}\right) ^{pf}\right) ^{pd}} \end{aligned} \end{aligned}$$
(3)

Atkinson and Boore (1998) where \(f_{c}\) is the source corner frequency, pf is the frequency parameter that influences how rapidly the response spectrum changes with frequency, and pd is the decay parameter which controls how quickly the spectrum decays as the frequency increases above \(f_{c}\). We follow the single-corner spectrum model by Brune (1970), where \(pf=2\) and \(pd=1\).

\(f_{c}\) and \(M_{0}\) can be related through a stress drop (in Pascals) equation using Brune (1970, 1971) and Eshelby (1957):

$$\begin{aligned} \begin{aligned} \Delta \sigma = f_{c}^{3} M_{0} / (0.4906 \beta )^{3} \end{aligned} \end{aligned}$$
(4)

where \(M_{0}\) expressed as (Hanks and Kanamori 1979):

$$\begin{aligned} \begin{aligned} M_{0}= 10^{1.5 M_W +9.05} \end{aligned} \end{aligned}$$
(5)

In practice, moment magnitude (\(M_{W}\)) is preferably used rather than a seismic moment as a more familiar measure of earthquake size. The earthquake sizes in the surface data catalogue used for PNR were specified in terms of local magnitude (\(M_{L}\)), and need to be converted to \(M_{W}\). We follow the \(M_{L}\)-\(M_{W}\) relationship described by Edwards et al. (2021) for smallest events (\(M_{W} \le 1.5\)), and Grünthal et al. (2009) for larger events.

$$\begin{aligned} M_{W} = {\left\{ \begin{array}{ll} \frac{2}{3} M_{L} +0.833, & M_{W} \le 1.5\\ 0.0376 M_{L}^{2} +0.646 M_{L}+ 0.53, & M_{W} > 1.5 \end{array}\right. } \end{aligned}$$
(6)

Source parameter models ( E(f) ) were estimated in terms of stress parameter following the spectral fitting approach (Edwards et al. 2008) for the PNR dataset (Suroyo and Edwards 2023). The values are determined as the mean value of stress parameter from all recordings, which are 0.15 megapascal (MPa) or 1.5 bar for \(M_{W} \le 1.5\) (notated as \(\Delta \sigma _{lower}\) in Eqs. 7 and 8) and 9.7 MPa or 97 bar for \(M_{W} > 1.5\) (\(\Delta \sigma _{upper}\) in Eqs. 7 and 8). These two values then form the lower and upper values in our stress parameter model. Criteria of the stress parameter model and definition of the values between the lower bound and upper bound are described in Eqs. 7 and 8 (units in bar). These lower and upper bounds are used as the initial input parameter model.

$$\begin{aligned} & \Delta \sigma =\nonumber \\ & {\left\{ \begin{array}{ll} \Delta \sigma _{upper}, & \!\!\!M_{W} \!>\! M_{upper},\\ \Delta \sigma _{lower}, & \!\!\!M_{W} \!<\! M_{lower},\\ \sigma _{scale} \Delta \sigma _{lower} 10^{\Delta \sigma _{(i)inter} ( M_{W}-M_{lower})} , & \!\!\!M_{lower}\!\le \! M_{W} \!\le \! M_{upper}. \end{array}\right. } \end{aligned}$$
(7)
$$\begin{aligned} \begin{aligned} \Delta \sigma _{(i)inter}= \frac{log(\Delta \sigma _{(i)inter}) - log(\Delta \sigma _{lower})}{(M_{upper} - M_{lower})} \end{aligned} \end{aligned}$$
(8)

3.1.2 Path effect parameters

The decay of A(f) due to geometrical spreading, scattering and absorption effects are represented by:

$$\begin{aligned} \begin{aligned} P(R,f)= e^{-\pi f ( \,\frac{R}{\beta Q(f)}) \,}{G(R)} \end{aligned} \end{aligned}$$
(9)

where G(R) is the geometrical spreading over the distance R, and Q(f) is the scattering and absorption effects of the medium. Geometrical spreading (G(R)) leads to a decrease in amplitude over distance (R), represented by \(\gamma \) (rate of geometrical decay)(see Eq. 10). The value of \(\gamma \) depends on the wave type and distance. For induced seismicity, it has been noted that near-field motions tend to decay more rapidly than 1/R (e.g. Atkinson 2015; Edwards et al. 2019b). To account for this, geometrical spreading (G(R)) is described by a piecewise series of linear decay in the log-log space:

$$\begin{aligned} G(R) = {\left\{ \begin{array}{ll} \left( \frac{R_{0}}{R}\right) ^{-\gamma _{0}}, & R \le R_{1},\\ G(R_{1})\left( \frac{R_{1}}{R}\right) ^{-\gamma _{1}}, & R_{1} < R \le R_{2},\\ \vdots & \\ G(R_{n})\left( \frac{R_{n}}{R}\right) ^{-\gamma _{n}}, & R_{n} \le R. \end{array}\right. } \end{aligned}$$
(10)

The anelastic effect of the medium is represented by the attenuation quality factor (Q(f)), which was modelled by Suroyo and Edwards (2023). Simplifying the path effect term (Eq. 9), the attenuation along the ray path can be written as:

$$\begin{aligned} \begin{aligned} P(f)= e^{-\pi f^{1-\alpha } t^{*}} \end{aligned} \end{aligned}$$
(11)

where \(t^{*}\) is the path-average attenuation at a reference frequency (1 Hz), defined by:

$$\begin{aligned} \begin{aligned} t^{*}\!=\!\! \int _{R} \frac{dR}{Q_{0}( \, R) \, \beta ( \, R) \, }+ \kappa _{0}\simeq \!\frac{R_{hyp}}{Q_{0}\beta } + \kappa _{0} =\! \frac{T}{Q_{0}}+ \kappa _{0} \end{aligned} \end{aligned}$$
(12)

T is the travel time and \(Q_{0}\) is the average quality factor along the ray path (e.g., Rietbrock et al. 2013). \(\kappa _{0}\) is the intercept at zero distance on \(t^{*}\)-distance plot which describes residual site-specific exponential decay (Anderson and Hough 1984).

The approach followed by Suroyo and Edwards (2023), which provides our initial seismic attenuation model, can be summarized in two stages. Starting with an initial inversion of the amplitude spectra. It aimed to determine the optimum frequency dependency of Q(f), defined by \(\alpha \). Alongside this, \(\kappa _{0}\) was determined by using the inversion for \(\alpha =0.0\) (i.e. using frequency independent Q, consistent with the definition of Anderson and Hough (1984)). In the second stage, inversion was performed under the premise \(\alpha = \alpha _{min}\) obtained from the previous step. A grid search of the \(f_{c}\) between 0- 50 Hz was applied to find a common source corner frequency (and therefore stress parameter) for each event, along with other parameters of the seismological model.

3.1.3 Site effect

Local site geology is often considered as a basic predictor of ground motion amplification at a specific site during an earthquake. For the site effect parameters, we consider the site amplification at a reference rock horizon, combined with near-surface site conditions defined by \(V_{S30}\), the travel-time average shear wave velocity over the upper 30 m. Site amplification is specified by a series of straight lines in the frequency versus log amplification space (refer to Boore (2003) for further information about SMSIM), while site diminution is represented by \(\kappa _{0}\) (Anderson and Hough 1984).

The site amplification function at a reference horizon was adapted from an empirical model derived for the Groningen gas field. This model accounts for crustal amplification between 3 km and a reference horizon at 0.8 km depth. The motions are then transformed to the ground surface through the convolution with local site amplification factors. To account the near-surface site amplification in GMPEs, \(V_{s30}\) is commonly used as a proxy. Higher \(V_{s30}\) indicates stiffer soils/rocks that tend to weakly attenuate, while lower \(V_{s30}\) corresponds to softer soils which tend to amplify long-period motions and strongly attenuate short period-motions. We adopt the non-linear amplification model described in Boore et al. (2014). The linear component of the site amplification model (\(F_{lin}\)) can be described as:

$$\begin{aligned} \begin{aligned} ln(F_{lin}) = c . ln\left( \frac{min(V_{s30},V_{c})}{V_{ref}}\right) \end{aligned} \end{aligned}$$
(13)

where \(V_{s30}\) represents the property of the site under investigation, coefficient c describes the \(V_{s30}\) scaling, \(V_{c}\) is the limiting velocity beyond which ground motions no longer scale with \(V_{s30}\), and \(V_{ref} \) is the site condition for which the amplification is unity (Boore et al. 2014). c and \(V_{c}\) are period-dependent coefficients defined through regression of empirical data by Boore et al. (2014). The non-linear component of site response is defined as:

$$\begin{aligned} \begin{aligned} ln(F_{nlin}) = f_1 +f_2 ln\left( \frac{PGA_{rock} +f_3}{f_3}\right) \end{aligned} \end{aligned}$$
(14)

where \(f_1\), \(f_2\),and \(f_3\) are the model coefficients and \(PGA_{rock}\) is the PGA for reference rock. The model coefficient of \(f_2\) represents the degree of non-linearity and is described as follows:

$$\begin{aligned} f_2= & f_4 [exp \left( f_5[ min(V_{s30},V_{c})-360]\right) \nonumber \\ & - exp\left( f_5 (760-360)\right) ] \end{aligned}$$
(15)

The model coefficients of \(f_1\), \(f_3\), \(f_4\), and \(f_5\), as presented by Boore et al. (2014) are provided in the Supplementary document.

3.1.4 Duration

The duration of ground motion in stochastic simulations has an influence on the amplitude of the simulated ground motion. It is assumed that the energy in the target spectrum is distributed randomly over a specified duration. In practice, this is assumed to be related to the ’significant duration’, which is the interval over which a defined portion of total energy in the record is accumulated. The metrics of lower and upper bound can vary depending on the application. For instance, the interval between \(5\%\) to \(75\%\) of the total arias intensity of the record is more likely to isolate the strongest portion of the energy corresponding to shear waves (Bommer et al. 2016). We follow the duration model proposed for Groningen by Bommer et al. (2016), which has been tuned to predict duration over the short distances that are relevant for this study. This duration model is an adapted model of Afshari and Stewart (2016), which was derived from recordings of tectonic earthquakes (Bommer et al. 2016). It is noted that for small magnitudes, the Groningen model captures a shorter duration close to the epicenter and a more rapid increase in duration with distance than modelled by Afshari and Stewart (2016). The adapted (Afshari and Stewart 2016) model can be written as Eq. 19 considering the source (\(F_{E}\)), path (\(F_{P}\)), and site (\(F_{S}\)) contributions (Eqs. 16, 17 and 18 respectively) (Bommer et al. 2016).

$$\begin{aligned} F_{E}(M,\Delta \sigma )= & max [0.014374 (\Delta \sigma / \Delta \sigma _{scaling})^{-1/3} \nonumber \\ & \times exp(0.85093 \times M_{W}), 0.66093] \end{aligned}$$
(16)

where \(\Delta \sigma \) reflects the local stress parameter (in units of bars), and \(\Delta \sigma _{scaling} = 150\) bar for represents the stress parameter for tectonic motions in Afshari and Stewart (2016) . In the Groningen application, the source duration is scaled to \(\Delta \sigma =50\) bar Bommer et al. (2016). The path and site components of duration are given by:

$$\begin{aligned} F_{P}(R_{epi})= 0.73756 \times ln(\sqrt{R_{epi}^{2} + 1.2631^{2}} - 0.17225) \end{aligned}$$
(17)
$$\begin{aligned} F_{S}(V_{s30})= -0.2246 \times ln[\frac{min(V_{s30}, V_{1})}{V_{ref}}] \end{aligned}$$
(18)

where \(V_{1}=600 m/s\) and \(V_{ref}=368.2m/s\) (Afshari and Stewart 2016; Bommer et al. 2016). The final duration model of the Groningen field can be written as:

$$\begin{aligned} D_{total}=exp[ln(F_{E}(M,\Delta \sigma )) + F_{P}(R_{epi}) + F_{S}(V_{s30})] \end{aligned}$$
(19)

The Afshari and Stewart (2016) duration model represents the total duration (including path and source duration), while in the SMSIM implementation, the source and path duration are defined separately. The source duration implemented in the SMSIM for a Brune source model is given by:

$$\begin{aligned} D_{source,SMSIM} = 1 / fc =\frac{1}{0.4906 \beta } \left( \frac{\Delta \sigma }{M_{0}}\right) ^{-1/3} \end{aligned}$$
(20)

Combining these, Equation 19 can be expressed as:

$$\begin{aligned} D_{total,SMSIM}= & D_{source,SMSIM} + exp(0.66093 \nonumber \\ & + F_{P}(R_{epi}) + F_{S}(V_{s30})) \nonumber \\ & - D_{source,SMSIM}(M_{lim}) \end{aligned}$$
(21)

Therefore, the path component of duration as input to SMSIM (\(D_{path,SMSIM}\)) can be calculated by subtracting \(D_{source,SMSIM}\) from \(D_{total,SMSIM}\). \(M_{lim}\)= 3 is selected such that at this magnitude, the total duration is defined by Eq. 19. The reason for referencing the source duration to magnitude 3 is that the Bommer et al. (2016) model was developed with reference to \(M_{W}\) =2 to 3.6. We include the SMSIM input parameter file used as a reference in the Supplementary document.

3.2 Overview of preliminary input model parameters

In the following, the initial (prior) seismological model is outlined. This model is subsequently subject to calibration to provide a minimum bias prediction model, as discussed later. A significant cause of uncertainty in calibrating stochastic simulations for the PNR shale gas site is due to the limited information regarding stress parameter, \(\Delta \sigma \). This determines the plateau of high-frequency acceleration, and is a controlling factor on short-period spectral accelerations, such as peak ground acceleration (PGA). Stress parameter is typically observed within the range of 1 to 100 MPa for normal (potentially damaging) earthquakes and \(\Delta \sigma \) \(\sim \) 0.01 to 0.1 MPa for ’slow’ (with limited high-frequency energy released) earthquakes (Edwards et al. 2019a). Our observation shows stress parameters scaled with magnitude and vary roughly between 0.1 to 100 MPa. Similarly, Atkinson (2004) discovered a dependence of stress parameter on moment magnitude: \(\Delta \sigma \) increases with magnitude until \(\sim \) \(M_W\) 4, above which it appears to have a relatively constant value in the range of 10-20 MPa (100-200 bars). Trial simulations of ground motion with a constant stress parameter for the PNR dataset resulted in a large, magnitude dependent bias at magnitudes above 1.5. This suggests that a magnitude-dependent \(\Delta \sigma \) model might be preferable for the PNR data. A stochastic simulation-based UK tectonic GMPE, introduced by Rietbrock et al. (2013), uses a magnitude dependent \(\Delta \sigma \) , with the median increasing linearly from 0.7 MPa to 10 MPa between magnitudes 3.0 and 4.5 \(M_{W}\) (with constant values below and above this range). However, since our data is dominated by magnitudes below the minimum used in their study (\(M_W\) 2), and is for shallow seismicity [which may have inherently lower stress parameters, Hough (2014)], the adaptation of this tectonic magnitude-dependent scaling might not be suitable. Our initial model for \(\Delta \sigma \) is based on two average values, calculated from the results presented by Suroyo and Edwards (2023) over different magnitude ranges. \(\Delta \sigma \) equal to 0.15 MPa for \(M_{W} \le 1\) and 9.7 MPa for \(M_{W} > 2\). These magnitude ranges could correspond to ‘induced’ (i.e. directly related to hydraulic fracturing), and ‘triggered’ events (i.e., slip on pre-existing faults), respectively. Values between these two magnitude hinges are determined following Eqs. 7and 8.

Anelastic attenuation of seismic energy at PNR is described by the local Q(f) model, which has been directly derived from the induced seismic sequences by Suroyo and Edwards (2023). Low Q(f), attributed to shallow layers in the crust, leads to a rapid rate of near-field decay with significantly more substantial attenuation observed than for UK regional events (Suroyo and Edwards 2023). The use of this local (PNR) Q(f) model is based on the assumption that the attenuation characteristics of earthquakes caused by induced seismic are different from the characteristics of deeper earthquakes, and the use of regional Q without any modification will give inaccurate ground motion predictions.

In addition to the anelastic attenuation model, a geometrical spreading model was introduced in the simulations. The parameters that control the geometrical spreading in SMSIM were represented by a piece wise model, wherein the \(n^{th}\) segment is defined by a reference distance in kilometres and slope (\(\gamma _n\)). The first reference distance (\(R_{ref}\)) is 1.0 km. This is followed by the three distance segments. The specific hinge distances separating each regime of decay (7 km and 12 km) are based on the elastic point-source simulations in the application for a shallow induced seismicity site in Groningen, the Netherlands (e.g., Edwards et al. 2019b; Bommer et al. 2022a). The changes at 7 and 12 km assumed to be related to more coherent reflected phases (Edwards et al. 2019b). Kraaijpoel and Dost (2013) argue that it is related with strong contrast between the reservoir rock and the salt. Due to the geology complex, it is not fully understood whether the reflection is related to carboniferous layer or the implications of salt-related propagation (Zechstein evaporites). However, both conditions represent a significant contrast between two layers. Starting from the condition where both Groningen and PNR have a Carboniferous layer, we adopt Groningen hinge points as an initial baseline and validating them against PNR-specific data to adjust attenuation rates based on local observations. Based on initial residual analyses, the decay is assumed to follow a halfspace model (\(R^{-1}\); \(\gamma _{0}= -1.0\)), significantly weakened (to the extent that amplification with increasing distance is observed) at a distance between 7-12 km (\(\gamma _{1}= 0.11\)), with more rapid decay at distances beyond 12 km (\(\gamma _{2}= -2.01\)) (see Fig. 3).

Fig. 3
figure 3

The initial geometrical spreading function model

The selection of the interval of significant duration is based on observations made in the Groningen study area (e.g. Bommer et al. 2016). According to Boore and Thompson (2014), a double-duration of 20-80\(\%\) (cumulative arias intensity) is the most suitable duration, aiming to increase the ability to identify windows that have ’strong shaking’ when used with smaller earthquakes. According to Bommer et al. (2016) the 5-75\(\%\) duration (\(T_{a5,75}\)) performed well in terms of the ability to identify the strongest portion of the accelerogram in Groningen. However, the performance of this prediction is highly dependent on each recording. Based on comparison of the PNR data, and the 5-75\(\%\) duration model for Groningen, the duration of 5-75\(\%\), is selected as the base model, for the PNR dataset. This is adapted prior to input in SMSIM through multiplication by a scaling factor (\(T_{path,SMSIM}=T_{a5,75}/0.55\)). This was found to ensure measured \(T_{a5,75}\) of simulated time series were of consistent duration with the based model \(T_{a5,75}\). Figure 4 shows the duration values of 5-75\(\%\) of the PNR recordings with magnitude \(\ge 1\) and the duration model used in stochastic simulations.

Fig. 4
figure 4

Preston New Road data and duration model of Bommer et al. (2016) with respect to epicentral distances for different magnitude values.(\(V_{s30}=200 m/s\))

Finally, a summary of the initial model parameters for the stochastic simulations is presented in Table 1.

4 Calibration of input model parameters

Simulated ground motions are initially generated based predominantly on the seismological model developed using seismic records from the study area (Suroyo and Edwards 2023). However, some biases may be introduced in the simulations through oversimplification (such as using a single source stress drop). To refine the accuracy of the prediction, and to explore the sensitivity and trade-off of the seismological model parameters, we performed an optimization-based calibration technique using the Area Metric (AM) (Sunny et al. 2022). The idea of this iterative process is to find the best set of parameters (the seismological model) that produce a minimum difference or residual between the predicted ground motion and the observed ground motion from past events. The algorithm of the calibration technique used in this study can be summarised following Fig. 5 (please refer to Sunny et al. (2022) for more details).

Table 1 Overview of input parameter used to perform stochastic simulation for Preston New Road - induced events

Calibrations were performed to refine the attenuation model (the frequency-dependent quality factor, defined by \(Q_{0}\), and \(\alpha \)), high-frequency attenuation parameter (\(\kappa _{0}\)), stress parameter (lower and upper \(\Delta \sigma \)), and geometrical spreading model (\(\gamma _{0}, \gamma _{1}, \gamma _{2}\)). We first calculate the AM of the simulated ground motions that have been performed based on the initial seismological model (later noted as \(AM_{inital}\)). The calibration results will be acceptable only if returning a smaller AM than the \(AM_{inital}\). The calibration process starts with generating n simulation parameter combinations by independently sampling each given parameter following the probability distribution presented in Table 2. Uniform distribution was utilised in the sampling mechanism of stress parameter (\(\Delta \sigma _{lower}\), \(\Delta \sigma _{upper}\)) and \(Q_{0}\). Selection of the uniform distribution considers the limited information or data about the uncertainty, reflecting a lack of specific knowledge about the distribution. Each value in the distribution is considered equally probable with minimum and maximum values decided based on observations of the data. We ensure that limit bounds of Q distribution (Table 2) is within the \(Q_s\) confidence interval range for induced seismic records based on previous studies by Suroyo and Edwards (2023). Meanwhile, the lower limit of \(\Delta \sigma _{lower}\) distribution is based on the distribution of \(\Delta \sigma \) for \(M \le 1\), where there are only a few observed values lower than 1.5 bar. Therefore, it is estimated that the distribution for \(\Delta \sigma _{lower}\) ranges from 1.5 to the maximum value observed (10 bar). Similar as \(\Delta \sigma _{lower}\), the lowest and highest value in the \(\Delta \sigma _{upper}\) distribution are based on the \(\Delta \sigma \) estimated in Suroyo and Edwards (2023). As for other parameters ( \(\alpha \), \(\kappa _{0}\), \(\gamma _{0}\), \(\gamma _{1}\), and \(\gamma _{2}\)), the probability distribution follows a normal distribution, where the initial values (Table 1) are located near the centre of the distribution and are more likely to appear. The normal distribution is characterised by the variability within the estimated standard deviations. We assume the maximum distribution is 20% of the average value. Both distribution types are considered to account for the epistemic uncertainty arising from limitations in our knowledge and understanding of seismic processes which encompasses uncertainties related to model simplifications, parameter choices, and data constraints. Iterative ground motion simulations were then carried out for each combination of input parameters. For computational efficiency, instead of directly simulating over all spectral ordinates, we simulate PGA alone as the first trial.

Fig. 5
figure 5

Flowchart describing the calibration process of SMSIM parameters

Table 2 Brief summary of initial and calibrated model parameters
Fig. 6
figure 6

Pair-wise comparison showing dependencies between calibration parameters (\(Q_{0}\), \(\gamma _{0}\), \(\gamma _{1}\), \(\gamma _{2}\), \(\alpha \), and \(\kappa _{0}\)). Pink circles indicate the correlated parameters that fall within the 95% confidence band of the empirical data CDF across all periods, and grey circles represent the 500 initial uncorrelated parameters

The validation and calibration of the simulations were assessed using AM values which describe the difference (and therefore misfit) between CDFs of empirical ground motion (ECDF) and the corresponding simulations. Further analysis was undertaken by computing the confidence band of the ECDF using the Dvoretzky-Kiefer-Wolfowitz-Massart (DKW) inequality (see Sunny et al. (2022) for further details). If the CDF of the newly simulated ground motions fall within the 95% confidence interval of the ECDF and the AM is lower than \(AM_{inital}\), the calibration process of the first trial is acceptable and can be continued to the next trial with the remaining spectral ordinates (we simulate PSA at 0.3 s for the \(2^{nd}\) trial). However, if these conditions are not met then we update the probability distribution range of each input parameter listed in Table 2 to widen the investigated model space.

In addition to the observation of CDFs and associated AM, the interaction between input parameters was analyzed using the Pearson correlation coefficient technique. This technique quantifies the strength and direction of a linear relationship between two continuous variables. According to Sunny et al. (2024), this approach aimed to understand the interaction between various parameter combinations, allowing us to explore the epistemic uncertainty of our model. Figure 6 demonstrates the dependency between each calibration parameter. This figure illustrates the parameters whose resulting simulated ground motion CDFs fall within the 95% confidence band of all ECDFs (i.e. considering the full range of ordinates: PGA, PGV, and PSA with \(T = \)0.03, 0.05, 0.1, 0.2, 0.3, 0.5, 1, 2, 0.75, 1.5, and 0.075s). These highlighted parameter subsets therefore satisfy our misfit criteria across all available empirical observations.

In application to the PNR data, measuring AM and imposing the quantitative assessment on how the resulting simulations vary within the confidence band of the empirical data, the \(n=500\) initial parameter subsets were narrowed down to \(m=\) 10 subsets in Trial 1. These correspond to results that fit within the 95% confidence band of the PGA ECDF. To increase the number of models with CDF that fit inside the ECDF confidence band, we made use of the covariance structure of the selected simulation parameters. Specifically, we utilised a resampling technique based on the Cholesky decomposition, which considers the statistical correlation of the parameters within the \(m=10\) subsets. The Cholesky decomposition of a symmetric matrix A (containing all simulation parameters) decomposes it into a lower triangular matrix (L) and its transpose (\(L^{T}\)), such that \(A=L L^{T}\). p random (uncorrelated) parameter-sets are drawn from an initial distribution, then multiplied by the lower triangular matrix (L), to generate correlated variables that have consistent covariance with the previous optimisation (here we use \(p=500\) models). These 500 correlated-parameter sets were then utilised as input for the second calibration trial. The second trial was performed by analysing the AM and results from simulated PSA at 0.3 s. We obtained \(q=50\) correlated-parameter sets fit within 95% confidence interval. As validation, we finally tested all remaining parameter combinations against PGV and PSA at different periods (0.03, 0.05, 0.1, 0.2, 0.5, 1.0, and 2.0 seconds) using \(q=50\) correlated-parameter sets. Finally, we obtained 6 common parameter combinations that fit within the ECDF’s confidence band over all periods (pink circles in Fig. 6).

Table 3 Summary of proposed model based on calibration approach

The final 6 parameter subsets, each of which produced ground motions within the 95% confidence band of the empirical data across all periods considered in this study (Table 3), can be considered to represent the epistemic uncertainty of our model. Given these 6 models, we calculate the average mean misfit for each iteration ID. This is done by taking the average value of all mean values from residual ground motions of different periods. The lowest average mean misfit value is 0.053 (subset ID 8). The minimum summation of AM (\(\sum \) AM) from all periods is found in ID 11 with \(\sum \) AM equal to 0.72. Based on the minimum \(\sum \) AM, the best-performing single model over the 6 proposed is model ID 11, later considered as the final selected ‘best-estimate’ model.

5 Development of hybrid-ground motion model

Aiming to provide an easy-to-implement and computationally efficient solution, we finally incorporate both observed and simulated ground motion data by developing a hybrid ground motion model. The observed and simulated data are combined by considering the weighting of the contributions from each dataset. The observed dataset contains metadata of ground motion records at PNR site (Edwards et al. 2021) with magnitude \(1\le M_{W}\le 2.7\) distributed up to  25 km. Synthetic data were randomly generated for magnitudes \(1\le M_{W} \le 6\) at epicentral distances up to 30 km (Fig. 7). The simulation of synthetic data was performed following the ’best-estimate’ calibrated parameter model (Table 3 with ID 11). Since the proposed GMM covers a distance of up to 30 km, the \(4^{th}\) segment of the geometrical spreading was added in the simulation of synthetic data. The additional segment covers distances above 25 km with a decay rate of \(\gamma _{3}=-1.0\) (returning to a halfspace model), which is based on observations from numerical simulations at moderate distances in Groningen (Edwards et al. 2019b). In this trial, we use equal-weight contributions from both datasets.

Following Edwards et al. (2021), the Atkinson (2015) (A15) induced seismicity model was selected as the base GMPE. Calibration of this model was performed using a mixed-effects regression. This approach includes fixed effects (the deterministic part of the GMPE), and random effects (account for the variability not explained by the fixed effects). The model was calibrated by determining residual misfit (\(log_{10}\)) between our data (empirical and synthetic) and the A15 model, after accounting for site amplification using \(V_{S30}\) and the linear amplification model of Boore et al. (2014). These residuals are then fitted with a parametric form that is a modified form of the A15 model:

$$\begin{aligned} log_{10}(Y)- & F_{lin}-log_{10}(Y_{A15}) = \Delta c_{0} +\Delta c_{1}\textbf{M}\nonumber \\+ & \Delta c_{2}\mathbf {M^{2}}+ G(R) + \tau + \phi \end{aligned}$$
(22)

with \(log_{10}(Y)-F_{lin}-log_{10}(Y_{A15})\) as the residual for a given spectral ordinate, generic \({\textbf {M}}\) is the moment magnitude, and G(R) as the geometrical spreading classified into 3 distance segments:

$$\begin{aligned} & G(R)=\nonumber \\ & {\left\{ \begin{array}{ll} \Delta c_{3a} log_{10}(R), & \!\!R \!\le \! r_{1},\\ \Delta c_{3a} log_{10}(r1) \!+ \Delta c_{3b} log_{10}(R/r_{1}), & \!\!r_{1} \!<\! R \!\le \! r_{2}, \\ \Delta c_{3a} log_{10}(r1) \!+ \Delta c_{3b} log_{10}(r_{2}/r_{1})\!+\! \Delta c_{3c} log_{10}(R/r_{2}), & \!\!otherwise. \end{array}\right. } \nonumber \\ \end{aligned}$$
(23)

where \(r_{1}\) is 7 km, \(r_{2}\) is 12 km, and R is the effective distance, defined by Atkinson (2015) as:

$$\begin{aligned} \begin{aligned} R= \sqrt{R_{hyp}^{2} + max(1, 10^{-0.28+0.19 M})^{2}} \end{aligned} \end{aligned}$$
(24)

The random effects in the model are represented by the total standard deviation (\(\sigma _{total}\)) which is defined as:

$$\begin{aligned} \begin{aligned} \sigma _{total}= \sqrt{\phi ^{2} + \tau ^{2}} \end{aligned} \end{aligned}$$
(25)

where \(\tau \) represents the standard deviation of between-event terms and \(\phi \) the standard deviation of the within-event terms (for a complete glossary about components of ground motion variability in the framework of probabilistic seismic hazard assessment, see Atik et al. (2010)). The within-event term (\(\phi \)) can be separated into site-to-site (\(\phi _{s2s}\)) and single-site (\(\phi _{ss}\)) terms (Rodriguez-Marek et al. 2013).

Finally, the adjustment coefficient of newly developed GMPE is presented in Table 4. In implementation, the use of the adjustment coefficients in Table 4 produce ground motions referenced to \(V_{S30}=\)760 m/s, as per the original A15 model. An example is shown in the Supplementary material. As this GMPE has been developed using median simulation parameters (i.e. not including their aleatoric variability), the inverted variabilities will significantly underestimate the true values. Given Atkinson (2015) as the reference model, we therefore propose ground motion variability can be taken directly from the A15 model, but any appropriate model (such as single-site variability) can be adopted (Rodriguez-Marek et al. 2013).

Fig. 7
figure 7

Join data between observed and synthetic datasets. Observed ground motions recorded at distance \(\le 25 km\) with magnitude \(1\le M_{W}\le 2.7\), while synthetic dataset created for magnitude \(1\le M_{W} \le 6\) with a distance up to 30 km

Table 4 Adjusted coefficients of the new GMPE
Fig. 8
figure 8

Comparison between modelled (using the uncalibrated prior) and observed ground motions for PGV (above) and PGA (below) versus magnitude (\(M_{W}\)) and epicentral distance. Modelled ground motions refer to the ground motion computed using the empirical model by Atkinson (2015) (yellow triangles), Edwards et al.(2021) (green squares), and simulated ground motions using the SMSIM approach (black circles). The red dotted line shows the median value, while the black lines show the upper and lower of \(68\%\) (dashed lines) and \(95\%\) percentiles (dotted lines) of the bias from the simulated ground motions

6 Discussion

A series of well-documented shale-gas extraction induced sequences of seismicity in the United Kingdom were connected to the PNR hydraulic fracturing wells in Blackpool, Lancashire. Seismic data were recorded from several densely spaced sensors placed within a radius of 25 kilometres of the epicenter of fracking activity. We focus on the data collected from 2018 and 2019, which is associated with the hydraulic fracturing operation at PNR-1z and PNR-2 wells. The seismicity recorded at the site was typically from small-magnitude earthquakes with shallow depth and are unlikely to be felt by the general population. Nonetheless, monitoring and regulations are in place to ensure the safety of both the operations and the surrounding communities. The effectiveness of existing GMPEs such as those proposed by Atkinson (2015) and Douglas et al. (2013) have previously been tested for their applicability in the ground motion prediction for PNR site (Edwards et al. 2019a). Developed based on near-source (\(R < 40\) km) but not necessarily shallow records from the NGA-West2 dataset, the GMPE by Atkinson (2015) was shown to perform satisfactorily at distances \(> 5\) km but led to significant underestimation at shorter distances. This is assumed because the model did not account for sufficient rapid attenuation of near-field motions (Suroyo and Edwards 2023). Edwards et al. (2021) subsequently presented a coefficient-calibrated form of the Atkinson (2015) model, that removed the overall underestimation of motions in the near-field (\(R < 5 km\)). However, they concluded that modification of the functional form itself would be required to completely remove all fluctuations in misfit bias. To effectively modify the functional forms of well-constrained GMPEs, we require a physical understanding of the mechanisms driving differences in ground motions. Therefore, in this study, we have incorporated a physical model ( e.g., attenuation, diminution) from the previous study by Suroyo and Edwards (2023) as the input parameters for simulating the ground motion for PNR induced seismicity.

Fig. 9
figure 9

Bias between simulated ground motion before (grey pentagons) and after calibration (red triangles) in terms of PGA, PGV and PSA at \(T=0.3\) s with respect to magnitude(\(M_W\)) and epicentral distance. The black line represents unbiased residuals, while the dashed line shows the median percentile of misfit from the initial simulation (black) and calibrated model (red). Mean and standard deviation of simulated ground motions before (\(\mu \), \(\sigma \)) and after (\(\mu \prime \),\(\sigma \prime \)) calibration are presented

A framework for developing a physical-based ground motion model has been proposed in this study (see Fig. 1). The framework begins with the data collection and defining the model parameters (Table 1). To reduce complexity, various parameter models such as path duration were adapted from case studies at the Groningen gas site. There has been extensive and well-documented research on induced seismicity in the Groningen gas field. The study regions of Groningen in the Netherlands and Preston New Road in the United Kingdom, while clearly different, and separated by over 500 km, share some general similarities. Both sites are on low velocity sediments, with \(V_{S30}\)   200 - 300 m/s; seismicity in both cases occurs at shallow depths of \(\sim \)3 km. We show that the duration of earthquake shaking is similar in the Groningen and PNR regions.

Fig. 10
figure 10

Residuals [in log10 scale: simulated synthetic (grey) and observed (blue) - predicted using New GMPE] for PGA versus magnitude and distance

Simulated ground motions utilising the prior model parameters (Table 1), based on spectral analysis performed by Suroyo and Edwards (2023), were initially produced. The bias between observed and predicted ground motions is shown in Fig. 8. Predictions using the model proposed by Atkinson (2015) (later denoted as A15) and Edwards et al. (2021) (E21), along with the simulated ground motion (before calibration) were initially compared to the empirical data. Figure 8 illustrates the resulting model bias with respect to epicentral distance and moment magnitude. As expected, the E21 model (specifically calibrated to PNR data) produces satisfactory predictions with a smaller bias compared to the A15 model. The mean value of residuals PGV (in the log scale) is 0.318 for the A15 model, -0.101 for the E21 model, and 0.194 for the simulation (before calibration). Meanwhile, the standard deviations are 0.333, 0.312, and 0.308 for residual of A15, E21, and initial simulation respectively. Note that we use the GMPEs at magnitudes below those for which they were calibrated in A15 (\(M \ge 3\)) and E21 (\(M \ge 1\)) for in this analysis.

To reduce bias in the prediction, we modified the input parameters using an iterative calibration technique (Fig. 4) such that the best combination model with a minimum bias was found. This calibration step (stage 2, Figure 1) was carried out following the method proposed by Sunny et al. (2024). Six final models (Table 3) are then proposed as the calibrated physical input model for simulating ground motion at the PNR site. Among those six models, which describe the epistemic uncertainty of our model, we select one model with the minimum \(\sum AM(T)\) and generate new simulated ground motions associated with the selected parameter combinations. The comparison of the residual between observed and simulated ground motions before and after calibration for PGA, PGV, and pseudo-spectral acceleration (PSA) at a 0.3 s period is illustrated in Fig. 9. A comparison of the residual plot between the initial and final simulation for different periods of PSA (0.2 s, 0.3 s, 0.5 s, and 1.0 s) is available in the Supplementary document. Overall, the calibrated model resulted in more centred residuals, with lower bias and standard deviation. The mean of simulated PGV residual reduced from 0.194 to 0.148, while the mean of PGA dropped from 0.161 to 0.041.

Table 5 Variability of simulated ground motions (after calibration)

In the final stage of the study, a hybrid ground motion model was developed by combining the observed and simulated ground motions (see Fig. 7). The A15 model was then calibrated to the combined dataset to form a physically adjusted GMPE (see Table 4). Figure 10 shows the discrepancy between ground motion data (observed and synthetic) and the physically adjusted ground motion prediction result. The residual misfit indicates the disparity between the log10 of ground motion data and the log10 of predicted ground motion using the new physically adjusted GMPE (Table 5). Figure 11 shows the residual plot of PGV before and after random effect correction. This figure also illustrates between-event, within-event, and between-station terms. We obtained a relatively small between-event variability which can be assumed as ’single source’ seismicity. We have a limited number of events and observations for long-period motions (\(T\ge 1.0\)), which are too few to be able to robustly estimate the associated random variability. Correction of random effects was applied to improve the predictions.

Fig. 11
figure 11

Residuals of PGV (larger horizontal component): a) residual with (black) and without (grey) random effect correction against magnitude, and b) distance. Variability of ground motion shown by c) between-event residuals against magnitude; d) within-event residuals against distance and its corresponding uncertainty shown by the light blue error bar; and e) station-to-station variability. The red error bar in subplots a and b shows the mean and standard deviation for the residual without random effect corrections, while the yellow error bar corresponds to the residual with random effect corrections

A comparison of the newly developed hybrid-GMPE with other existing models (A15, E21) is shown in Fig. 12, which illustrates PGV from different models for the magnitude of 1, 1.5, 2.7, 4.5, and 5.5 M. Overlain are the observed and simulated- synthetic PGV values for events corresponding to within \(\pm 0.1\) of the listed magnitudes. Another comparison between the hybrid GMPE and existing GMPE (A15 and E21) is presented in Fig. 13, showing the estimated pseudospectral acceleration (PSA) at 5, 15, and 30 km distances using the three different models. Higher near-field short-period motions were observed in the new GMPE and E21 models. In contrast, at longer distances (\(\ge \) 15 km), short-period motions in the new GMPE are lower than both A15 and E21. It is important to note that at a greater distances, the hybrid GMPE is controlled by simulations, which assume the same anelastic attenuation model as at near distances. Implementation of rapid decay owned by the near-field attenuation model could therefore incorrectly drive the prediction to be lower than it should be at greater distances. However, the proposed approach grants us the flexibility to modify the input model to suit the conditions.

Both Figs. 12 and 13 show a very close prediction between the new GMPE and E21 model at low magnitude ranges. At magnitude \(\ge \) 4, a constraint was applied to the E21 model which, by design, pushes the predictions to be almost identical to the base model (A15). Up to now, the prediction beyond observed magnitude and distance ranges therefore remain only loosely justified for PNR (i.e. based on selected empirical data from the NGA West-2 dataset in A15).

Fig. 12
figure 12

Predicted PGV using different GMPEs: Edwards et al. (2021) (dotted lines), Atkinson (2015)(dot-dashed lines), and our new hybrid-GMPE model (bold lines) for magnitude 1 (green), 1.5 (blue), 2.7 (cyan), 4.5 (orange), and 5.5 (purple). Superimposed with observed (circles) and synthetic (triangles) data with magnitude \(M \pm 0.1\). PGV plotted with respect to hypocentral distance in km

Fig. 13
figure 13

Predicted PSA for M 1 (bottom) to 6 (top) versus period for the Edwards et al. (2021) (dotted lines), Atkinson (2015) (dot-dashed lines, and our new hybrid-GMPE model (bold lines)

Fig. 14
figure 14

Comparison of the PGA models derived in the present study with previous existing models with respect to hypocentral distance (\(R_{hyp}\)) in km. The ground motions are plotted for event with magnitude 2.7 \(M_{W}\). Rietbrock et al. (2013) and Douglas et al. (2024) models are using the Joyner-Boore distance metrics (\(R_{JB}\)), with \(R_{hyp}\)-\(R_{JB}\) conversion assuming depth=3 km

To emphasize the comparison of prediction results between different models, the juxtaposition of predicted PGA for magnitude 2.7 and 1.5 with respect to each model is given in Fig. 14. Among the existing models (i.e. Douglas et al. 2013; Rietbrock et al. 2013; Rietbrock and Edwards 2019; Boore et al. 2014; Atkinson 2015; Cremen et al. 2020; Edwards et al. 2021; Douglas et al. 2024), the regional UK model of Rietbrock et al. (2013) (R13) produces a very low level of ground motion in the near-source region. This model was developed from the UK tectonic records with a focal depth model varying at 5-20 km depth. In the development of the R13 and the latest UK tectonic GMPE Douglas et al. (2024) (D24) models, the equations have been parameterized in terms of the Joyner-Boore distance (\(R_{JB}\)) rather than the effective distance. This combination of deep sources and distance metric, leads to significant near-field saturation that does not account for shallow sources. Despite that, D24 model provides a backbone model with three and five branches corresponding to different percentiles, where each branch of the model corresponds to associated weight, capturing epistemic uncertainties in the depth of top of rupture, geometric spreading, anelastic attenuation, site attenuation, and stress parameter (Douglas et al. 2024). In Fig. 14, we plot a 3-branch model with 2 different depths of top of rupture models. In our new model, we assumed a near-source distance saturation model proposed by Atkinson (2015). Out of two proposed models by Atkinson (2015), we use \(h_{eff}=max(1, 10^{-0.28+0.19M})\) over the \(h_{eff}=max(1, 10^{-1.72+0.43M})\) model following the suggestion of Atkinson (2020) and Edwards et al. (2021).

Fig. 15
figure 15

Comparison of AM plots between the observed (black) and predicted ground motions. The prediction refers to simulated data before (green dashed line)- after (green line) calibration, using Atkinson (2015) model (yellow dotted line), Edwards et al. (2021) model (red dotted line), Rietbrock et al. (2013) model in purple dotted line, Cremen et al. (2020) in grey dotted line, Douglas et al. (2013) in blue dotted line, latest UK tectonic GMPE by Douglas et al. (2024) in magenta dot-dashed line and physically adjusted GMPE in solid black dot-dashed line

The UK stochastic ground motion model (R13) was later updated by Rietbrock and Edwards (2019) (RE19). We modify this simulation model to use a 3 km source depth, as shown in Fig. 14. Another GMPE built based on tectonic earthquakes is the Boore et al. (2014) model (B14) which was developed based on crustal earthquakes in active tectonic regions (NGA-West 2). The motions predicted by the GMPEs derived from naturally occurring events (e.g., R13, RE19, B14, and the latest UK GMPE model by Douglas et al. (2024)) produce lower estimates at near-source distances rather than GMPEs built based on UK- induced seismicity data (e.g. Edwards et al. 2021; Cremen et al. 2020). We also compare the new GMPE with the existing model that was developed from a compilation of natural and induced seismicity data (Douglas et al. 2013) (D13). The D13 model was developed based on magnitude 1-4 M with distance up to 20 km and focal depth \(\sim \) 5 km. The site-referenced D13 model is applicable to rock sites specifically with \(V_{s30}\) = 1110 m/s (Edwards et al. 2019a). Despite that developed closer to the magnitude and distance range of interest, D13 is based on a mixture of various data sources which may cause a high model variability. Cremen et al. (2020) adapted this model to predict ground motions at the PNR site (henceforth referred to as the C20 model). The modification was accomplished by recomputing D13 coefficients in line with observed data. The limitation of the C20 model is that it was devised using data within a very short distance (\(\le \) 6 km). The distance attenuation in this model is very large, causing the predicted value to be much smaller than the observed ground motions at distances greater than 6 km. Moreover, similarly to D13 model, the C20 model assumes direct proportionality between earthquake magnitude and ground shaking and neglects the non-linear effect associated with magnitude (\(\textbf{M}^{2}\) term). The lack of \(\textbf{M}^{2}\) term in both D13 (Edwards et al. 2019a) and C20 models may cause problems when predictions are required for larger hypothetical scenarios.

Differing from the two previous models, Atkinson (2015) model (A15) was developed from tectonic events (the NGA-West2 dataset) and focused on short distance ranges such that it was applicable to typical distances observed for induced seismicity. The model utilised NGA-West2 data and comprises earthquake records with M 3-6 and is limited to distances up to 40 km to focus on near-source motions. Despite the lower magnitude limit of data which is above our range of interest, the A15 model is commonly used for predicting ground motions due to induced seismicity and can be a promising candidate to consider. The A15 model shows underprediction of PGA at lower distances. Meanwhile, the Edwards et al. (2021) (E21) model that was obtained from adjusting the A15 model coefficients, provides improved predictions. The new GMPE is broadly comparable to the E21 model and provides the most suitable prediction for the PNR site. In general, the predictions from the hybrid GMPE shows higher values at close distances, with more rapid decay than other models, as suggested by Suroyo and Edwards (2023). More generally, all GMPEs developed specifically for the PNR site, such as C20, E21, and hybrid GMPE developed here all lead to significantly larger intenties at near distance compared to other GMPEs.

The proposed hybrid GMPE is applicable to predict shallow ground motion with magnitude within range \(<6\) M and valid up to 30 km distance. The distance constraint is influenced by the assumption of a single anelastic attenuation model (Q(f)) and is therefore limited to waves propagating predominantly through the upper crust. In further work, utilising a depth-dependent Q model (Abercrombie 1998; Edwards et al. 2011) could widen the range coverage of predictions.

Finally, an evaluation of the performance of PGA in terms of the CDFs, based on each model is presented in Fig. 15. The performance was quantified using the AM by evaluating the area between the ECDF curve of each prediction with respect to the observed data as reference. Simulated motions, prior to the calibration process, result in AM = 0.166, which was reduced approximately by \(\sim \)46.39% to 0.089 after calibration of the simulation parameters. The largest AM value was found in the prediction of the R13 model with 1.936. Followed by the A15 model with AM 0.579, and GMPEs for D13 and C20 models with AM of 0.233 and 0.219, respectively. The E21 and D24 models produced slightly lower AM compared to the simulated ground motion before calibration (AM = 0.16 and 0.15 respectively). The best performance with the lowest AM value was produced by the prediction using the physically adjusted hybrid GMPE with AM = 0.03. Even though the existing GMPE (E21 model) predicts ground motions at the PNR site with minimal bias, we successfully improve the prediction in terms of AM by 81.25%. Our results show that the new GMM (physically adjusted hybrid GMPE) is reliable in predicting ground motions for induced seismicity application at the PNR site. Therefore, this model provides a candidate that could be included in the logic tree hazard calculation for the PNR case study.

7 Conclusions

As input for seismic design and loss assessment, ground-motion modelling has, up until very recently, been almost entirely focused on calculating ground motion from large-magnitude earthquakes (or tectonic earthquakes). Small-magnitude, shallow-focus generated earthquakes are difficult to extrapolate using such typical GMMs, calling forth the development of custom models. The attenuation and diminution model created particularly for induced seismicity at the PNR site served as the foundation for the development of a new physically adjusted GMPE presented in this paper. Taking advantage of the available information and developed model, along with adaptation to the existing duration model parameters from the Groningen case study, we carried out the initial ground motion simulation for the PNR site which is consistent with the E21 model. We successfully reduced the misfit of the simulated ground motions by performing a calibration technique utilizing AM introduced by Sunny et al. (2022, 2024) and introduced six final simulation models relevant to the application for the PNR shale-gas site.

As a final product, the combined data between the new simulated ground motions (after stochastic model calibration) and observed ground motions were calibrated to the reference GMPE (Atkinson 2015). By calibrating to a reference GMPE, we create a new functional form that allows us to predict a wider range of magnitudes and distances. The new functional form considers the physical model developed previously by taking into account simulated ground motion data in the hybrid GMPE development. This model predicts significantly larger motions at near distances and decays relatively faster than other existing GMPEs. Evaluation of the conformity between prediction and observations is shown by the Area Metric. The physically adjusted GMPE produced the lowest AM which implies better performance than other GMPEs for the PNR site.

8 Data and resources

The earthquake catalogue for the Preston New Road dataset can be accessed through the BGS earthquake data-base (http://www.quakes.bgs.ac.uk/earthquakes/dataSearch.html, last accessed December 2020). Earthquake waveform recordings are provided by the British Geological Survey and the University of Liverpool. A list of simulation model parameters (500 initial and 50 sets of parameters) is provided in the supplementary document and materials. Stochastic simulation codes are freely available for download: doi 10.1007/PL00012553 (https://www.daveboore.com/software_online.html, last accessed February 2023). Details about the stochastic calibration algorithm and code are available in Sunny et al. (2024). Regionally adjusted stochastic earthquake ground motion models, associated variabilities and epistemic uncertainties (doi:10.21203/rs.3.rs-3018282/v1). The Matlab code used for estimating ground motion variability was provided by Prof. Benjamin Edwards through personal communication. This code was part of his initial work related to the study published as Edwards et al. (2021) (doi.org/10.1785/0120200234).