Abstract
To complement the new European StrongMotion dataset and the ongoing efforts to update the seismic hazard and risk assessment of Europe and Mediterranean regions, we propose a new regionally adaptable groundmotion model (GMM). We present here the GMM capable of predicting the 5% damped RotD50 of PGA, PGV, and \( SA\left( {T = 0.01  8\,{\text{s}}} \right) \) from shallow crustal earthquakes of \( 3 \le M_{W} \le 7.4 \) occurring \( 0 < R_{JB} \le 545\,{\text{km}} \) away from sites with \( 90 \le V_{s30} \le 3000\,{\text{m}}\,{\text{s}}^{  1} \) or \( 0.001 \le slope \le 1\,{\text{m}}\,{\text{m}}^{  1} \). The extended applicability derived from thousands of new recordings, however, comes with an apparent increase in the aleatory variability (σ). Firstly, anticipating contaminations and peculiarities in the dataset, we employed robust mixedeffect regressions to down weigh only, and not eliminate entirely, the influence of outliers on the GMM median and σ. Secondly, we regionalised the attenuating path and localised the earthquake sources using the most recent models, to quantify regionspecific anelastic attenuation and localityspecific earthquake characteristics as randomeffects, respectively. Thirdly, using the mixedeffect variance–covariance structure, the GMM can be adapted to new regions, localities, and sites with specific datasets. Consequently, the σ is curtailed to a 7% increase at T < 0.3 s, and a substantial 15% decrease at T ≥ 0.3 s, compared to the RESORCE based partially nonergodic GMM. We provide the 46 attenuating region, 56 earthquake localities, and 1829 sitespecific adjustments, demonstrate their usage, and present their robustness through a 10fold crossvalidation exercise.
1 Introduction
GroundMotion Models (GMMs) characterise the random distribution of groundmotions at site for a combination of earthquake source, wave travelpath, and the affected site’s geological properties. Typically, GMMs are regressed over a compendium of strong groundmotion recordings collected from several earthquakes recorded at multiple sites scattered across a variety of geographical regions. The necessity of compiling such large datasets is to expand the range of magnitude and distance, and diversity of sitetypes, in order to derive a GMM capable of predicting realistic groundmotions for prospective seismic risk scenarios, e.g. large magnitudes at short distances from a reference rock site. NGAWest2 (Ancheta et al. 2014) is one such dataset compiled of groundmotion observations recorded around the globe—primarily from Western US, and in smaller fractions from Alaska, China, Italy, Japan, Taiwan, Turkey, etc. Several GMMs have been derived from this dataset for application in probabilistic seismic hazard (PSHA) and risk assessments. Given the clear tectonic and geological diversity of the data, possible regional differences in observed groundmotions needed to be quantified (Douglas 2004). Four of the NGAWest2 GMMs (Abrahamson et al. 2014; Boore et al. 2014; Campbell and Bozorgnia 2014; Chiou and Youngs 2014) accounted for regional differences in groundmotions through regionspecific regression coefficients. Through regionspecific adjustments, these GMMs were able to capture and predict, for example, the faster attenuation of shortperiod groundmotions with distance in Japan compared to Western US, and the relatively slower attenuation in China. Earlier GMMs were not capable of such predictions simply owing to the lack of sufficient data from individual regions to quantify the differences.
In case of Europe and the MiddleEast, RESORCE (Akkar et al. 2014b) is one such dataset compiled of data from Italy, Turkey, Greece, and other active seismic regions in panEurope. Using mixedeffects regression algorithms (Abrahamson and Youngs 1992), a few GMMs (Akkar et al. 2014a; Bindi et al. 2014) were derived and used in regional (Grünthal et al. 2018) and continental PSHA (Giardini et al. 2018; Woessner et al. 2015). However, these GMMs were not regionalised despite known geological differences between Italy and Turkey. In fact, even the regionalised GMMs (Bora et al. 2017; Kale et al. 2015; Kotha et al. 2016; Kuehn and Scherbaum 2016) limited the distinction to geopolitical boundaries, yet the geological diversity within these regions is far more complex. In essence, quantification of regional differences requires, first, a regionalisation scheme, and then, sufficient data from each region. With the arrival of the new European StrongMotion (ESM) dataset (Bindi et al. 2018b; Lanzano et al. 2019a) and regionalisation models, and ongoing efforts to update the panEuropean PSHA, a revision of the regionalised panEuropean GMMs is proposed.
In this study, we present an upgrade of the RESORCE dataset based region and sitespecific GMM (Kotha et al. 2016, 2017) for shallow crustal earthquakes using the new ESM dataset. We often see GMMs evolve with progressively larger datasets, and supersede their older versions in terms of applicability (Bommer et al. 2010). However, with increasing data and complex parametrization, a reduction in the apparent aleatory variability (σ) of GMMs could not be achieved (Douglas and Edwards 2016; Strasser et al. 2009). Of course, increasing amount of data comes with increasing the spatiotemporal diversity of groundmotion observations, and thus an increasing σ. One approach would be to introduce new predictor variables into the GMM, but then, uncertainty or unavailability of predictor values then becomes an issue during application (Bindi et al. 2017; Kuehn and Abrahamson 2017). With this in mind, this revision of the Kotha et al. (2016) GMM will attempt to regionalise and refine the aleatory variability, while maintaining its original parametrization. In addition, assuming a possible contamination of data and deviation from the assumption of lognormality, instead of the usual ordinary leastsquare estimates of GMM median and variances we compute their robust counterparts; while flagging outlier events, stations, and records in the dataset. Table 1 summarizes the additional features we introduced in this new GMM (abbreviated K20) with respect to the recent panEuropean GMMs: AK14 (Akkar et al. 2014a), B14 (Bindi et al. 2014), K16 (Kotha et al. 2016), L19 (Lanzano et al. 2019b).
2 Groundmotion data and selection criteria
Figure 1 compares the data distribution between the RESORCE and ESM datasets. The increase in amount of data between 2014 and 2018 for GMM development is dramatic. While the K16 GMM was regressed over 1251 records, the proposed revision (K20) derives from 18,222 records. One striking feature of the ESM dataset is the number of stations with ≥ 3 groundmotion recordings. RESORCE dataset had about 150 stations with ≥ 3 records, while ESM has 1077. This increase is highly sought in empirical sitespecific GMM, PSHA, and seismic risk applications (Faccioli et al. 2015; Kohrangi et al. 2020; Kotha et al. 2017; RodriguezMarek et al. 2013).
Another feature is the remarkable increase in the number of small earthquakes with \( M_{W} \le 4.5 \). This could imply an increase in σ of the revised GMM over K16, similar to the reported increase for NGAWest2 GMMs with respect to their superseded NGAWest counterparts. Nevertheless, groundmotions from frequent smallmoderate sized earthquakes drive the hazard in lowmoderate seismicity regions of Europe, such as France (Drouet et al. 2020) and Germany (Grünthal et al. 2018). It is necessary that the GMM is well behaved in these smallmoderate magnitude ranges. Moreover, if the sitespecific terms \( \delta S2S_{s} \) were to be estimated with low uncertainty, data from several smallmoderate sized earthquakes is indispensable.
The distance (JoynerBoore metric, R_{JB}) range and density of data is also superior to RESORCE. The regionspecific anelastic attenuation terms of K16, NGAWest2 and other GMMs (Sedaghati and Pezeshk 2017) were estimated from records at R_{JB} ≥ 80 km. This revised GMM aims to refine the regionalisation of K16, and therefore, such increase in farsource data is quite useful. In addition, advanced studies on spatial and temporal variability of attenuation are now possible (Bindi et al. 2018a; Dawood and RodriguezMarek 2013; Kotha et al. 2019; Landwehr et al. 2016; PiñaValdés et al. 2018; Sahakian et al. 2019).
The data visualized in Fig. 1 are from shallow crustal earthquakes in the ESM dataset. The full dataset contains groundmotions from other tectonic regimes as well, such as subduction interface, subduction inslab, Vrancea, etc. To filter out these and other records not suitable for a shallow crustal GMM development, we adopt the following selection criteria:

1.
To keep only the shallow crustal earthquakes, we select only those events classified as nonsubduction events by Weatherill et al. (2020d). The selection removes inslab, interface, outerrise, and uppermantle events from the regression dataset. The resulting 927 events with event depth \( 0 < D \le 39\,{\text{km}} \) are located in regions with \( 14 \le Moho \,depth \le 49\,{\text{km}} \), as per the Moho map of Grad et al. (2009)

2.
Only those events with ≥ 3 records in the dataset are used in regression. Also, wherever available, the harmonised M_{W} estimates from the European Mediterranean Earthquake Catalogue [EMEC] (Grünthal and Wahlström 2012) are preferred over the ESM default values

3.
We keep all sites in the dataset irrespective of whether their \( V_{S30} \), measured from geotechnical investigations, is provided or not in ESM. This is to estimate the sitespecific terms (\( \delta S2S_{s} \)) at as many sites as possible, and then explore various siteresponse proxies to characterize them (Kotha et al. 2018; Weatherill et al. 2020b)

4.
Choice of distance metric is R_{JB} where available, otherwise the epicentral distance R_{epi}—but only for events with \( M_{W} \le 6.2 \). The distance range is not truncated and extends up to \( R_{JB} = 545\,{\text{km}} \)

5.
Only those records with highpass filter frequency \( f_{hp} \le 0.8/T \), where T is the period, for both horizontal components are kept in the regression of spectral accelerations SA(T). This is to ensure that the filter does not significantly affect the response spectral values. As a result the dataset varies with the T (Abrahamson and Silva 1997). Further detail on waveform processing can be found in Lanzano et al. (2019a).
Following the above criteria, the number of records available for GMM regression is 18,222, from 927 events (\( 3.1 \le M_{w} \le 7.4 \)) recorded at 1829 stations (\( 0 \le R_{JB} < 545\,{\text{km}} \)). These numbers decrease to 9698, 491, and 1341 respectively due to the highpass filter criteria (#5) when T = 8 s.
3 Regionalisation datasets
In order to attempt GMM regionalisation, we first need to define regions, classify the data into regions, and quantify the regional variabilities through regression. The mixedeffects regression estimates first the group randomvariances, and then the randomeffects for individual levels in the group (Bates et al. 2014). For instance, in estimating the wellknown betweenevent term (\( \delta B_{e} \)), the preliminary step is to quantify observed eventtoevent groundmotion variability as the betweenevent randomvariance (τ^{2}). Following this, the betweenevent randomeffects for each event are estimated from the totalresiduals. The procedure is similar for the betweensite terms (\( \delta S2S_{s} \)), and other randomeffects in a mixedeffects regression. Essentially, within the event group, individual events are the levels. It is important to note that randomeffect values for the levels (in a group) can be used in three ways: (1) in levelspecific predictions accounting the epistemic uncertainty of levelspecific adjustment, (2) investigated for any physical phenomena and proxy parameters or (3) ignored, while treating the group randomvariance as an aleatory variability or an epistemic uncertainty in a GMM logictree (Douglas 2018a, b; Weatherill et al. 2020c).
We emphasize that an exploratory analysis on randomeffects (of levels in a group) is to evaluate if they concur with known geophysical properties. If yes, the randomeffect group remains in the regression; otherwise the grouping needs revision. In this study, we aim to capture variabilities among events, event sources (e.g. fault systems), attenuating regions, and recording sites. Anticipating significant randomvariances, we first describe the various regionalisation datasets used to formulate the randomeffect groups.
3.1 Attenuation regionalisation
Recent GMMs have demonstrated a strong betweenregion variability of anelastic attenuation, which is partially attributed to spatial variability in crustal characteristics, e.g. the 1 Hz Lgcoda Q values (Cong and Mitchell 1998), shearwave velocity (Lu et al. 2018), etc. The first generation of regionalised GMMs, however, relied on national administrative boundaries for regionalisation of anelastic attenuation (e.g. Italy, Turkey, Japan, California, etc.)—which could yield incongruous estimates of randomvariances. For example, spatial variability of attenuation imaged by 1 Hz coda Q maps of regions within Italy, Turkey (Cong and Mitchell 1998), France (Mayor et al. 2018), and Europe in general (Pilz et al. 2019) is rather high. Such spatial variability cannot be contained within political boundaries. Therefore, in this study, we adopt a more geologicalgeophysical regionalisation developed by Basili et al. (2019) under the purview of the TSUMAPSNEAM project.
Figure 2 shows the regionalisation and the number of records within each region polygon, as decided by the recording site location. Assuming that the anelastic attenuation is a phenomenon dominant at farsource distances and in nearsurface crustal layers, we let the recording site location decide to which region a particular record is to be assigned. Alternatively, regionalising records based on event location would cause some inconsistencies concerning event depth. Seismic waves from shallow and deep events sample different depths of the crust and upper mantle. TSUMAPSNEAM regions used in this study are surficial only, and cannot accommodate depth dependence of attenuation. Therefore, records are allotted to the regions based on the recording surface site location.
There are 46 regions in this map (Fig. 2), which is substantially larger than K16 with Italy, Turkey, and rest of Europe as the three attenuating regions. This regionalisation scheme splits Turkey into at least eight regions, East and West Anatolia being the largest. Italy is highly fragmented as well. Figure 2 indicates that there are regions with a few thousand records (e.g. Central and Northern Apennines), and a few with less than a hundred records (e.g. only 13 records from Rhine Graben). Thus, randomvariance of the attenuation regions group will be a quantification of the observed regiontoregion variability in anelastic attenuation across these regions (each region is a level). A finer or coarser regionalisation may yield a different estimate of betweenregion variance. Customarily, only if the randomvariance for this group is significant it would mean the regionalisation is appropriate, and that there is a significant regional variability in anelastic attenuation. Subsequently, we will inspect whether the regionspecific randomeffect values resemble any physical phenomenon.
3.2 Event localisation
Traditionally, earthquaketoearthquake variability is captured by the betweenevent randomvariance (τ^{2}), and the eventspecific randomeffect values (\( \delta B_{e} \)) are estimated for each event in this group. Based on earlier works on faultmaturity (Bohnhoff et al. 2016; Manighetti et al. 2007; Radiguet et al. 2009), we hypothesized that events associated to particular earthquake source (or a fault system) show systematic differences in their groundmotions, and that eventtoevent groundmotion variability is similar across various faultsystems.
In this study, we introduced an additional randomeffect to quantify earthquake localitytolocality variability, similar to the locationtolocation variability (\( \delta L2L_{l} \)) defined in Al Atik et al. (2010). We assign the ESM shallow crustal events to seismotectonic zones defined in the European Seismic Hazard Model 2020 (ESHM20) towards development of seismic source models. The seismotectonic source zonation (referred to as ‘TECTO’) is designed to be largescale and does not attempt to resolve smaller scale seismogenic features. As such it is intended to reflect the local tectonics influencing the seismic source generation but not to resolve highly localised features, which makes it a good candidate for our purpose—and more appropriate for source localisation than the TSUMAPSNEAM model used for attenuation regionalisation.
The randomeffects group will be referred to as betweenlocality (\( \delta L2L_{l} \)) from hereon, with each level (locality l) being a tectonic locality with allotted shallow crustal events. Figure 3 shows the distribution of recordings across the tectonic localities with at least one shallow crustal event associated to them. Once again, this group is introduced to quantify the earthquake localitytolocality variability of groundmotion in the dataset through the betweenlocality variance (\( \tau_{L2L}^{2} \)), which if close to zero indicates no regional variability of earthquake characteristics.
4 Functional form
A mixedeffects GMM is composed of fixedeffects and randomeffects. Fixedeffects part of the GMM predicts generic groundmotions as a continuous function of predictor variables, which in this case are event magnitude M_{W} and distance metric R_{JB} (in km). Randomeffects can serve as adjustments to the generic fixedeffects regression coefficients in order to produce groundmotion predictions specific to an event, tectonic locality, region, and site. Equation (1) illustrates the decomposition of observed groundmotion values into mixedeffects. We note that Eq. (1) is not an exact mixedeffects formula, but is used here only as an aid to explain the approach.
In the LHS of Eq. (1), \( ln\left( {GM_{e,l,s,r} } \right) \) is the naturallog of groundmotion (\( GM_{e,l,s,r} \), which are rotD50 measures of SA(T) here) produced by an event (e) originating in a tectonic locality (l in Fig. 3), and recorded by a site (s) located in region (r in Fig. 2). These observed groundmotions are decomposed into mixedeffects. Since events are exclusive to tectonic localities, and sites to regions, the formulation contains nested mixedeffect grouping. Alongside, since events in a locality can be recorded by sites in multiple surrounding regions, the formulation contains crossed mixedeffect grouping as well. Therefore, this GMM is developed using a crossed and nested mixedeffects approach (Stafford 2014).
In the RHS of Eq. (1), \( ln\left( \mu \right) \) is the generic fixedeffects formula without any specificities related to event, locality, site, and region. The \( \delta B_{e,l}^{0} , \delta L2L_{l} , \delta S2S_{s} \), and \( \delta c_{3,r} \) are randomeffect adjustments to the coefficients of the fixedeffect \( ln\left( \mu \right) \); and are aimed to be specific to the event (e), locality (l), site (s), and region (r), respectively. ϕ is the standarddeviation of the leftover residuals \( \varepsilon = {\mathcal{N}}\left( {0,\phi^{2} } \right) \), which contain phenomenon that could not be contained in the crossed and nested mixedeffects. Subsequent sections will explain these terms in more detail.
4.1 Fixedeffects
Equations (2) through (5) present the fixedeffects formula of the GMM. In Eq. (2), \( ln\left( \mu \right) \) is the generic median prediction as a function of M_{W} and R_{JB}; where, e_{1} is the generic offset, \( f_{R,g} \left( {M_{W} ,R_{JB} } \right) \) represents the geometric spreading, \( f_{R,a} \left( {R_{JB} } \right) \) represents the apparent anelastic attenuation, and \( f_{M} \left( {M_{W} } \right) \) represents the magnitude scaling of groundmotion prediction. The \( ln\left( \mu \right) \) is the predicted naturallog of RotD50 (Boore 2010) measures of PGA (in gal) and PGV (in cm s^{−1}), and the 5% damped elastic response spectral ordinates in acceleration (SA, in gal) at 34 periods ranging from 0.01 to 8 s. Therefore, the fixedeffects coefficients \( e_{1} , b_{1} ,b_{2} ,b_{3} ,c_{1} ,c_{2} , c_{3} \) in Eqs. (2–5) change with the period, PGA and PGV, and are generic to all events, tectonic localities, attenuating regions, and sites.
The fixedeffects component of this GMM remains similar to that of K16, but with a few minor changes based on nonparametric analyses. In Fig. 4, the top row of the plot shows nonparametric scaling of \( SA\left( {T = 0.1s} \right) \) with R_{JB}; wherein, for clarity, the data is split into magnitude bins and plotted in separate panels. The bottom row of the plot shows nonparametric scaling of \( SA\left( {T = 0.1\,{\text{s}}} \right) \) with M_{W}; where, the data is split into distance bins and plotted in separate panels. The smooth (coloured) curves are the loess fits (Jacoby 2000) illustrating nonparametric scaling of \( SA\left( {T = 0.1\,{\text{s}}} \right) \) with magnitude and distance. Given the variety of hypocentral depths in the dataset, and that we are using the depthinsensitive R_{JB}, we categorized the events into three depth bins (\( D < 10\,{\text{km}}, 10\,{\text{km}} \le D < 20\,{\text{km}}, 20\,{\text{km}} \le D \)), and plotted unique nonparametric attenuation curves for each bin. We note that, although the nonparametric plots presented here are for \( SA\left( {T = 0.1\,{\text{s}}} \right) \) and the shape of loess fits varies gradually with periods, the differences were not large enough to make changes to the GMM functional form. Based on such plots we introduced some changes into the GMM with respect to K16:

1.
In the \( SA\left( {T = 0.1\,{\text{s}}} \right) \) versus R_{JB} panels, it is evident that deeper events (\( 20\,{\text{km}} \le D \)) have a longer nearsource saturation plateau—a feature controlled by the socalled hparameter in GMMs (h_{D} in Eqs. 3 and 4). Shallower events have a shorter nearsource saturation plateau, and a steeper decay with distance. However, these differences are only prominent at \( R_{JB} \le 30\,{\text{km}} \).
The marginally larger \( SA\left( {T = 0.1\,{\text{s}}} \right) \) values of \( 20\,{\text{km}} \le D \) event at \( R_{JB} > 30\,{\text{km}} \) are also reasonable (Derras et al. 2012). Deeper events are closer to Moho; producing weaker surface groundmotions in the epicentral zone due to longer travel paths of direct arrivals, and relatively stronger groundmotions at farsource distances from the more efficient propagation in deeper crustal layers.
Based on these observations, we allowed the hparameter to vary with the depth bins, as indicated by the subscript D in \( h_{D}^{2} \) of Eqs. (3) and (4). In K16 GMM, this parameter was depthindependent, and had no subscript. Moreover, instead of making this parameter a regression coefficient, we keep the regression linear by assigning a priori values (based on preliminary nonparametric analyses and nonlinear regression trials), which are independent of magnitude and period: \( h_{D} = 12\,{\text{km}} \) for deep \( 20\,{\text{km}} \le D \) events, \( h_{D} = 8\,{\text{km}} \) for events of intermediate depth \( 10\,{\text{km}} \le D < 20\,{\text{km}} \), and for shallow events with \( D < 10\,{\text{km}} \) we assign \( h_{D} = 4\,{\text{km}} \)

2.
In addition to the above, since groundmotions appear reasonably depth independent at \( R_{JB} \ge 30\,{\text{km}} \), we set \( R_{ref} = 30\,{\text{km}} \) instead of the 1 km in K16. M_{ref} in Eq. (3) is the reference magnitude, and remains the same as in K16 i.e., \( M_{ref} = 4.5 \)

3.
In the \( SA\left( {T = 0.1s} \right) \) versus M_{W} panels, although the depth dependence of nearsource attenuation is not evident, we do observe saturation towards large magnitudes. In the panel showing nonparametric groundmotion scaling with M_{W}, the evidence suggests that saturation begins at M_{W} ≥ 6.2. However, this is a feature most noticeable at \( R_{JB} \le 30\,{\text{km}} \) and at short periods; here \( SA\left( {T = 0.1\,{\text{s}}} \right) \). Towards longer periods and at longer distances, the saturation is less pronounced. Therefore, unlike in K16 where the hingemagnitude was set as \( M_{h} = 6.75 \), in this GMM we set \( M_{h} = 6.2 \). M_{h} is period independent; saturation or otherwise beyond M_{h} is captured by b_{3} in Eq. (5).
4.2 Randomeffects
In the mixedeffects formulation of Eq. (1), \( ln\left( \mu \right) \) is the fixedeffects component. Equations (2) through (5) describe its constituents, wherein \( e_{1} , b_{1} ,b_{2} ,b_{3} ,c_{1} ,c_{2} , c_{3} \) are the fixedeffects regression coefficients. These generic fixedeffect coefficients can be adjusted using randomeffect estimates to predict event, tectonic locality, attenuation region, and site specific groundmotions. Regarding the randomeffects components:

1.
\( \tau_{c3} \) quantifies the betweenregion variability of anelastic attenuations across the attenuation region group described earlier (in Fig. 2). This means that, along with a generic c_{3} and randomvariance \( \tau_{c3}^{2} \), regionspecific adjustments \( \delta c_{3,r} \) are estimated as randomeffects. These randomeffects follow a Gaussian distribution \( \Delta c_{3} = {\mathcal{N}}\left( {0,\tau_{c3}^{2} } \right) \).
The generic fixedeffect coefficient c_{3} in Eq. (4) can be adjusted with a region’s \( \delta c_{3,r} \) to achieve regionspecific coefficient for that region r, as in \( c_{3,r} = c_{3} + \delta c_{3,r} \). Unlike the generic c_{3}, \( c_{3,r} \) is the apparent anelastic attenuation term that varies with region (subscript r identifies regions in Fig. 2). In K16, r identified the regions Italy, Turkey, and the rest of panEurope as Other. This grouping ensured that each of the regions had sufficient data to estimate statistically reliable \( \delta c_{3,r} \). Accordingly, \( \tau_{c3} \) quantified the regional variability of anelastic attenuation when RESORCE dataset is grouped into Italy, Turkey, and Other. ESM contains much more data from Italy and Turkey, and also several other nations. Since the number of regions is 15 times (46 regions) that of K16 (three regions), the quantified regional variability in anelastic attenuation, in terms of \( \tau_{c3} \), is also larger than that of K16. Regions with sufficient data also benefit from a wellconstrained regionspecific adjustment \( \delta c_{3,r} \)

2.
Betweenlocality variability of observed groundmotions are captured by the randomeffect \( \Delta L2L = {\mathcal{N}}\left( {0,\tau_{L2L}^{2} } \right) \), where the mixedeffects regression quantifies the variability as \( \tau_{L2L} \), at each T, and earthquake localityspecific terms as \( \delta L2L_{l} \) (subscript l identifies regions in Fig. 3). This randomeffect can be used to adjust the fixedeffect coefficient e_{1} in Eq. (2), as in \( e_{1,l} = e_{1} + \delta L2L_{l} \), to achieve groundmotion predictions specific to the tectonic locality of the events’ origin

3.
Eventtoevent variability in this GMM is the traditional betweenevent randomeffects \( \Delta B_{e} = {\mathcal{N}}\left( {0,\tau^{2} } \right) \) filtered for betweenlocality variability \( \Delta L2L = {\mathcal{N}}\left( {0,\tau_{L2L}^{2} } \right) \), and is now captured by the \( \Delta B_{e}^{0} = {\mathcal{N}}\left( {0,\tau_{0}^{2} } \right) \); where, for an event e located in tectonic locality l, the eventspecific term can be seen as \( \delta B_{e,l}^{0} \approx \delta B_{e}  \delta L2L_{l} \). \( \tau_{0} \) is the generic eventtoevent variability corrected for localitytolocality variability, and does not vary with tectonic locality l

4.
Sitetosite response variability is captured by the sitespecific randomeffects \( \Delta S2S = {\mathcal{N}}\left( {0,\phi_{S2S}^{2} } \right) \). The potential of \( \delta S2S_{s} \) in sitespecific GMMs is wellknown, and are useful in studying regional differences in siteresponse scaling with \( V_{S30} \) (timeaveraged shearwave velocity in 30 m topsoil) as in K16 or other siteresponse proxies (Kotha et al. 2018; Weatherill et al. 2020b)

5.
The leftover residuals \( \varepsilon = {\mathcal{N}}\left( {0,\phi^{2} } \right) \) contain the unexplained natural variability of groundmotion observations, and thus represent the apparent aleatory variability of the model. These residuals can be investigated for less dominant phenomenon, such as the anisotropic shearwave radiation pattern (Kotha et al. 2019)
In all, this GMM has four randomeffect groups, i.e. one degreeoffreedom more than K16, to explain more than 15 times the data. Those common with K16 are refined with a more physical regionalisation scheme, and greater geographical coverage of shallow crustal events and recording sites. With this configuration of mixedeffects GMM, we run a robust linear mixedeffects regression (Koller 2016) independently for the 36 RotD50 IMs of 5% damped SA for \( T = 0.01  8\,{\text{s}} \), PGA and PGV. Along with the regression coefficients, we estimate and provide also the fixedeffects variance–covariance matrices needed to estimate the GMM epistemic uncertainty (Atik and Youngs 2014; Bindi et al. 2017) and to update the GMM coefficients in a Bayesian framework (Kowsari et al. 2019, 2020; Kühn and Scherbaum 2015).
5 Regression method adapted to large datasets
Unlike K16, where the GMM is regressed using an ordinary leastsquares mixedeffects regression algorithm (Bates et al. 2014), in this study we employ a robust mixedeffects regression algorithm (Koller 2016). As any other reallife data, groundmotion data may contain outliers and other contaminations. Even minor contamination may drive the classical ordinary leastsquare mixedeffect estimates away from those without contamination. Robust linear mixedeffects (rlmm) regression is then quite useful in limiting the influence of outlier events, sites, and records on the GMM median and variances. It is important to note that, these outlier data points are not entirely removed from the GMM regression but are simply downweighted.
A feature of rlmm relevant to this study is that the randomeffects (attenuation regions, events, event localities, and sites) and residuals (records) with values beyond ± 1.345 standarddeviations of their respective normal randomdistributions are assigned progressively lower weights (< 1); whereas, in ordinary leastsquares all data is assigned unit weight. Any event, site, and record with nonunit weight are considered a possible outlier, and needs to be examined for its peculiarity. Although it is possible to tune the regression parameters, e.g. those of the Huber loss function (Huber 1992) in robust regressions, we chose to remain with the suggested default values (Koller 2016) optimized for efficiency and robustness in detecting outliers.
6 Results and discussion
The regression results comprise of fixedeffect coefficients and covariance matrices, randomeffect values including weights and standarderrors, residuals, and variances. It is customary to check the behaviour of GMM fixedeffects component, and its epistemic uncertainty in various magnitude and distance ranges. Randomeffects and residuals are checked for any noticeable biases or trends with predictor variables. We discuss them separately.
6.1 Fixedeffects
Figure 5 shows the GMM’s median \( SA\left( {T = 0.1\,{\text{s}}} \right) \) and \( SA\left( {T = 1\,{\text{s}}} \right) \) (in gal) predictions over magnitude and distance ranges. Along with the median prediction (lines), its epistemic uncertainty in terms of asymptotic standarddeviation (\( \pm \sigma_{\mu } \)) is shown as well (coloured ribbons).
6.1.1 Distance scaling
First, we discuss the predicted \( SA\left( {T = 0.1\,{\text{s}}} \right) \) and \( SA\left( {T = 1\,{\text{s}}} \right) \) scaling with R_{JB} in the left panel of Fig. 5. Looking at the curves for M4, we notice the impact of depthdependent h_{D} in rendering a shorter nearsource saturation (plateau ~ 0 to 3 km) for shallower events, compared to the intermediate depth events (plateau ~ 0 to 5 km) and deeper events (plateau ~ 0 to 10 km). The three curves merge at about 30 km, which is our R_{ref}. Beyond \( R_{ref} = 30\,{\text{km}} \), the depthdependence of distance scaling is nonexistent. The predictions show good agreement with the nonparametric trends in Fig. 4.
Beyond the R_{ref}, we see the regiondependent anelastic adjustments coming into play, which show the impact of adjusting c_{3} with \( \delta c_{3,r} = 0,  \tau_{c3} , + \tau_{c3} \). A region with faster than average attenuation will have a \( c_{3,r} \) more negative (with \( \delta c_{3,r} =  \tau_{c3} \)) than generic average, and vice versa for slower attenuation (with \( \delta c_{3,r} = + \tau_{c3} \)). Which of the 46 regions in panEurope apparently attenuate faster or slower than panEuropean average (with \( \delta c_{3,r} = 0 \)) will be discussed in the following sections, and elaborated in a followup study. Effect of \( \delta c_{3,r} \) at \( R_{ref} < 30\,{\text{km}} \) is negligible, as it should be. The 27 curves appear as three on either side of \( R_{ref} = 30\,{\text{km}} \), because nearsource (h_{D}) and farsource (\( \delta c_{3,r} \)) adjustments have their exclusive domains of influence.
In Fig. 5, we also show the epistemic uncertainty on median predictions. The orange ribbon is almost too thin to be noticeable for M4 and M5.5 predictions. Only for M7 and larger events, the ribbon is visibly wide because of the limited data from large magnitude events at nearsource distances in the ESM dataset, and the uncertainty on the coefficient b_{3}, which describes the magnitude scaling beyond M_{h} = 6.2.
6.1.2 Magnitude scaling
In the right panels of Fig. 5 we show the \( SA\left( {T = 0.1\,{\text{s}}} \right) \) and \( SA\left( {T = 1\,{\text{s}}} \right) \) scaling with M_{W}. Here as well we show 27 curves, but this time for \( R_{JB} = 10, 50, 150\,{\text{km}} \) instead of the three M_{W} values. The features in scaling with distance discussed in reference to the left panels also prevail here; \( \delta c_{3,r} \) is effective at \( R_{JB} = 150\,{\text{km}} \), while h_{D} is effective at \( R_{JB} = 5\,{\text{km}} \), and neither are at \( R_{JB} = 50\,{\text{km}} \). More important in this context, is the difference in scaling with magnitude at \( R_{JB} = 10\,{\text{km}} \) compared to \( R_{JB} = 50, 150\,{\text{km}} \). Evidently, the scaling of \( SA\left( {T = 0.1\,{\text{s}}} \right) \) is more gradual (less steep) at nearsource distance for \( M_{W} < M_{h} \) and oversaturates for \( M_{W} \ge M_{h} \). This is a known physical behaviour wherein, nearsource groundmotions, especially the short period SAs, are less sensitive to M_{W} (Campbell 1981; Schmedes and Archuleta 2008). A few previous GMMs observed the same with various datasets. Boore et al. (2014) and K16 allowed oversaturation at large magnitudes (\( M_{W} \ge 6.75) \), but whether this is realistic or not needs to be verified with specially compiled nearsource groundmotion datasets, e.g. the one recently made available by Pacor et al. (2018). The oversaturation effect, however, disappears at longer periods, as shown here for \( SA\left( {T = 1\,{\text{s}}} \right) \).
Figure 6 shows the predicted response spectra for the same scenarios shown in Fig. 5. The effect of event depth and regionalized anelastic attenuation are the same as in Fig. 5. An interesting feature is that the peak of the response spectra shifts towards longer periods for larger magnitude events, reflecting the decreasing cornerfrequency with increasing magnitude of a Brune (1970) source spectra.
An oddity in the response spectra is the kink appearing between \( T = 3.5\,{\text{s}} \) and \( T = 4.5\,{\text{s}} \), which is most likely from the sudden loss of approximately 3000 usable records (in regression) due to the highpass filtering frequency criterion #5 in GroundMotion Data and Selection Criteria. Similar kinks can be seen in a few other subsequent plots as well. Traditionally, GMM fixedeffect regression coefficients are smoothened in a postprocessing step, in order to smoothen the predicted response spectra and to constrain the GMM behaviour in sparsely sampled magnitudedistance ranges, e.g. Abrahamson et al. (2014). However, in this study we chose not to alter the fixedeffect coefficients, and to maintain full consistency with their variance–covariance matrices; which will be necessary in testing and updating of the GMM.
6.2 Randomeffects and residuals
Figure 7 shows the randomeffect and residual standarddeviations of the GMM. The total ergodic standarddeviation of the GMM is \( \sigma = \sqrt {\tau_{L2L}^{2} + \tau_{0}^{2} + \phi_{S2S}^{2} + \phi^{2} } \), when considering all source and site variabilities as aleatory. The solid lines in this plot correspond to the variance estimates of the GMM from ESM dataset (this study), while the dashed lines indicate those of the K16 GMM from RESORCE dataset. Note that the K16 GMM does not regionalise the events, and thus betweenlocality variability component \( \tau_{L2L} \) is absent. For the sake of comparison between the two models, we combine the two source related randomvariances of this study into the traditional betweenevent variability \( \tau = \sqrt {\tau_{L2L}^{2} + \tau_{0}^{2} } \)
We show that the total standarddeviation σ of the new GMM is considerably larger than that of K16 GMM. The largest increase in σ is at shortmoderate periods, and is from the increase in betweenevent τ and betweensite \( \phi_{S2S} \) variabilities. The increase in betweenevent variability can be attributed to the increased regional diversity of earthquakes, the increase of number of \( 4 \le M_{W} \le 5.5 \) events from 164 in RESORCE to 778 in ESM, and the additional 76 events with \( 3 \le M_{W} < 4 \) in ESM. Betweensite variability is clearly from the increase in number of recording sites from 385 in RESORCE to 1829 in ESM. The residual variability \( \phi \) of the new GMM, however, is equal or smaller than that of K16, despite the increase in regional diversity, the sample size, and the recording distance range from 300 to 545 km. In our view, this plot emphasizes the need to move from ergodic to partially nonergodic region and sitespecific groundmotion predictions to inhibit the effect of increased σ on PSHA (Anderson and Brune 1999; Bommer and Abrahamson 2006; Kotha et al. 2017; RodriguezMarek et al. 2013).
6.2.1 Anelastic attenuation variability
Figure 8 shows the regionspecific \( \delta c_{3,r} \) adjustments (coloured lines and ribbons) of the 45 regions for \( T = 0.01  8\,{\text{s}} \). Most of these curves lay within the \( \pm \tau_{c3} \) (red lines) bounds. As indicated by \( \tau_{c3} \), the regiontoregion variability of apparent anelastic attenuation is the highest at short periods, and decreases gradually towards longer periods. Highfrequency groundmotions are more susceptible to strong anelastic decay in the crust, which could be related to the crustal properties. Long period SAs are not effected as much by anelastic attenuation, therefore regional differences are relatively smaller, and \( \tau_{c3} \) is smaller by ~ 50% at \( T \ge 1\,{\text{s}} \).
In RESORCE dataset, K16 observed that short period SAs attenuate faster in Italy than in Turkey, which was observed earlier in NGAWest2 dataset by Boore et al. (2014), and confirmed later in Fourier domain by Bora et al. (2017). However, these observations were based on distinguishing regions by administrative boundaries and not geological or geophysical features. Since \( \tau_{c3} \) estimated from the new regionalisation is not zero, we can assert that regional variability of anelastic attenuation exists. Of course, regions with fewer data have larger epistemic uncertainty (standarderror) on their \( {{\updelta c}}_{{3,{\text{r}}}} \), but the largest epistemic uncertainty is always lower than the aleatory \( {{\uptau }}_{{{\text{c}}3}} \), and decreases with increasing data.
Figure 9 shows the spatial variability of apparent anelastic attenuation as captured by the regionalisation model of Fig. 2. In this figure, red polygons identify regions with slower than panEuropean average of anelastic attenuation c_{3} i.e. \( {{\updelta c}}_{{3,{\text{r}}}} > 0 \), and \( {{\updelta c}}_{{3,{\text{r}}}} < 0 \) for the faster attenuating blue regions. Regions with insufficient data, and thereby large epistemic uncertainty on their \( {{\updelta c}}_{{3,{\text{r}}}} \), are white in color, and given \( {{\updelta c}}_{{3,{\text{r}}}} = 0 \). There are few interesting features in these maps:

1.
Regions with similar attenuation characteristics are spatially clustered, although this is period dependent. In general, regions characterized with high seismic activity (e.g. Italy and Greece) show strong attenuation compared to those with lower seismic activity (e.g. central Europe).
A Bayesian clustering of regions with similar attenuation characteristics is proposed in the companion paper by Weatherill et al. (2020c); wherein, similar regions (r) are clustered (cl) and assigned clusterspecific \( {{\updelta c}}_{{3,{\text{cl}}}} \) with lower standarderror and smaller clusterspecific \( {{\uptau }}_{{{\text{c}}3,{\text{cl}}}} \). The epistemic uncertainty (standarderror) on clusterspecific \( {{\updelta c}}_{{3,{\text{cl}}}} \) is smaller than that of \( {{\updelta c}}_{{3,{\text{r}}}} \) due to the accumulation of data from multiple regions; while the clusterspecific \( {{\uptau }}_{{{\text{c}}3,{\text{cl}}}} \) is smaller than the overall \( {{\uptau }}_{{{\text{c}}3}} \) due to a smaller withincluster variability of regionspecific \( {{\updelta c}}_{{3,{\text{r}}}} \)

2.
The bestsampled regions are in central Italy with 5703 records from Northern and central Apennines W (West), 3438 records from Northern and central Apennines E (East). While attenuating faster than the panEuropean average, there appears to be a strong contrast between these adjacent regions

3.
The fastest attenuation in the Aegean Sea is observed in the Gulf of Corinth, where the sites record highly attenuated groundmotions traversing across the Aegean volcanic arc
A contrast in short period attenuation is apparent around the Alps regions. In addition, some differences are noticeable between west, north, and central Anatolia. Although not conclusive, rapidly changing crustal thickness (Grad et al. 2009) and associated crustal properties may partially explain the rapid change in attenuation properties in these regions. Regional variability of anelastic attenuation may in fact be a combination of variability of crustal shearwave velocity, crustal quality factor (e.g. coda Q), mantle temperature influencing the rigidity of the crust, and other parameters. Further elaboration is preferred in Fourier domain, rather than with response spectra here, in a followup study.
6.2.2 Source variability
Earthquake source variability is split in two randomeffect groups: betweenevent \( \Delta B_{e}^{0} = {\mathcal{N}}\left( {0,\tau_{0}^{2} } \right) \) and betweenlocality \( \Delta L2L = {\mathcal{N}}\left( {0,\tau_{L2L}^{2} } \right) \). Betweenevent terms can be estimated for the recorded earthquakes, but are difficult to predict for prospective earthquakes because of their spatiotemporal variability. Even when correlated with stressdrop or a source parameter that can explain the relative differences in groundmotions, prediction of stressdrop for the next event is not yet possible. Traditionally therefore, betweenevent variability is considered purely aleatory. Betweenlocality randomeffect is intended to quantify a portion of the betweenevent spatial variability into tectonic localities.
Figure 10 shows the \( \delta L2L_{l} \left( {T = 0.01  8\,{\text{s}}} \right) \) for 56 tectonic localities. As always, the standarderror (or epistemic uncertainty) of \( \delta L2L_{l} \) values is smaller than the \( \tau_{L2L} \). \( \tau_{L2L} \) values are nonnegligible, and the GMM fit improves (based on analysis of variance) with introduction of betweenlocality randomeffect group. Therefore, we consider it an effective randomeffect grouping. Although it is tempting to relate \( \delta L2L_{l} \) to some source parameter, it is preferably done in the Fourier domain in a followup study.
Figure 11 maps the various tectonic localities (indexed l) color coded to their \( {{\updelta }}L2L_{l} \left( {T = 0.1\,{\text{s}}} \right) \) and \( {{\updelta }}L2L_{l} \left( {T = 1\,{\text{s}}} \right) \) values. In the top panel, corresponding to \( {{\updelta }}L2L_{l} \left( {T = 0.1\,{\text{s}}} \right) \), a clear difference between central Italy and northwestern Anatolia can be seen. Apparently, earthquakes located in central Apennine region produce substantially lower short period groundmotions than those in northwestern Anatolia. Similarly, there is an apparent distinction between the central Apennines and Poplain earthquakes. The central Apennines tectonic locality contains the recent M6.5 Norcia earthquake (2016) and associated shocks, and several historic earthquakes in that region. The Poplain locality contains data from the substantially stronger M6.45 Friuli earthquake (1976) and a few recent earthquakes. At a first glance, it may appear as if the spatial patterns are due to the predominant focal mechanisms in the regions, but the diversity of focal mechanisms within each region (especially among smaller events) dissuades this hypothesis. Spectral decomposition of the ESM dataset by Bindi and Kotha (2020) revealed a much larger stressdrop of the M6.45 Friuli earthquake compared to M6.5 Norcia earthquake—the \( {{\updelta }}L2L_{l} \) of regions containing these earthquakes also contrast similarly (Bindi et al. 2019). Once again, correlating the randomeffects obtained in response spectra domain (as here) to the earthquake parameters (such as stressdrop) may not hold well at short periods. We reserve further elaboration of randomeffects with the Fourier version of the ESM dataset.
Betweenevent variability, now partially corrected for spatial variability through \( \Delta L2L = {\mathcal{N}}\left( {0,\tau_{L2L}^{2} } \right) \), is quantified in the distribution \( \Delta B_{e}^{0} = {\mathcal{N}}\left( {0,\tau_{0}^{2} } \right) \). Customary checks for \( \delta B_{e,l}^{0} \) include dependencies on magnitude, depth, and styleoffaulting. A few points to note here:

1.
EMEC estimates of M_{W} are used in the GMM regression. However, uncertainties in M_{W} are ignored despite their know impact on the τ_{0} estimates (Kuehn and Abrahamson 2017)

2.
Depth in our case is hypocentral depth of the event. Buried ruptures are likely to produce stronger groundmotions than the exposed ruptures. This phenomenon is introduced in some of the NGAWest2 GMMs e.g., Abrahamson et al. (2014) modelled the events with depthtotopofrupture \( z_{tor} \ge 20\,{\text{km}} \) to have 0.75–3 times larger SAs than exposed ruptures, depending on the period

3.
In this study, we use the typical styleoffaulting classification provided in the ESM dataset: Normal (NF), Thrust (TF), Strikeslip (SS), and Unknown (U). However, instead of introducing a styleoffaulting randomeffect in the GMM regression, we queried the \( \Delta B_{e}^{0} \) and \( \Delta L2L \) distributions and found no systematic differences. This result is consistent with the fact that styleoffaulting factors cannot be quantified using isotropic factors with no azimuthal variations (Kotha et al. 2019). Centroid Moment Tensor (CMT) solutions are however available for too few events to treat styleoffaulting with azimuth dependent factors as in Kotha et al. (2019). Even if available, the diversity of crustal structure across panEurope makes it difficult, without substantially more metadata (e.g. takeoff angles) in the ESM dataset
The left column of Fig. 12 shows \( \delta B_{e,l}^{0} \left( {T = 0s,0.1s,1s} \right) \) trends with M_{W}. No significant offsets or trends imply the magnitude scaling of the GMM sufficiently captures the magnitude dependence of SAs. The error bars show the median absolute deviance (MAD) of \( \delta B_{e,l}^{0} \) within bins of size \( M_{W} = 0.5 \). MAD is a robust estimate of variance when normality of distribution is not necessarily satisfied within each bin. The MAD estimates appear to be magnitude dependent, indicating heteroscedasticity of \( {{\uptau }}_{0} \). However, we do not provide a heteroscedastic variance model now without investigating first its significance; given the small number of large magnitude events compared to smallmoderate magnitude event sample (see Fig. 1 for data distribution).
The right column of Fig. 12 shows the \( \delta B_{e,l}^{0} \left( {T = 0.01s,0.1s,1s} \right) \) trends with depth. We see no significant trend with depth. Although deeper events are likely to produce stronger groundmotions, there exists several shallower events with similarly large \( \delta B_{e,l}^{0} \), and correcting \( \delta B_{e,l}^{0} \) for depth shows no remarkable reduction in \( {{\uptau }}_{0} \)—as indicated by the binwise \( {{\uptau }}_{0} \) values (error bars in right column of Fig. 12).
6.2.3 Siteresponse variability
As in K16 GMM, the KiKnet (Dawood et al. 2016) based Kotha et al. (2018) GMM, and Brooks et al. (2020) study on Northsea groundmotion data, we did not introduce a siteresponse scaling parameter in the fixedeffects for two reasons: (1) Only 419 of the 1829 sites have measured (and not inferred) \( V_{S30} \) available in the ESM, and using data from only the sites with measured \( V_{S30} \) leads to a large reduction in data for GMM regression; (2) investigation of sitetosite variability can be performed in a subsequent step. For instance, depending on the application, one can regress a relation between \( \delta S2S_{s} \sim V_{S30} \) measured or inferred from topographic slope and geology (Thompson et al. 2014; Vilanova et al. 2018; Wald and Allen 2007) or relate \( \delta S2S_{s} \) directly to topographic slope and geology at a regional scale (Crowley et al. 2019; Weatherill et al. 2020a, b).
Given the importance of siteresponse in hazard and risk assessments, the complexities in finding a compromise between a siteresponse proxy, its availability at different regional scales, and the propagation of uncertainties from GMMs to risk assessments, we intend to perform a separate investigation. Nevertheless, a database of \( \delta S2S_{s} \left( {T = 0.01  8s} \right) \) for the 1829 ESM sites along with their robust regression weights and standarderrors is provided.
Note that we always use the measured \( V_{S30} \) in the analysis shown in this study, and never the inferred values. Early analyses of \( \delta S2S_{s} \left( {T = 0.0, 0.1, 1.0\,{\text{s}}} \right) \) trends with measured \( V_{S30} \) and topographic slope are presented in Fig. 13. The left column of Fig. 13 plots the mean and MAD (robust standarddeviation estimate) of \( \delta S2S_{s} \left( {T = 0.0, 0.1, 1.0\,{\text{s}}} \right) \) within ranges of \( V_{S30} \) coinciding with Eurocode8 site classes A, B, C, and D, and slope divided into 9 bins of equal width between 0.001 and 1.000 m m^{−1}. Trend with \( V_{S30} \) (not measured but) inferred from a \( V_{S30} \sim slope \) correlation model of Wald and Allen (2007) is comparable to the trend with slope from which it is inferred in the right column of Fig. 13.
The correlation \( \delta S2S_{s} \sim V_{S30} \) (measured only) at shortperiods \( T = 0.0, 0.1\,{\text{s}} \) is rather poor, as indicated by similar mean and MAD (the robust standarddeviation) for \( V_{S30} < 800\,{\text{m/s}} \) in the left column of Fig. 13. It appears that shortperiod siteresponses of soft rock, stiff, and soft soils (EC8 class B, C, D) in this dataset are not adequately distinguishable. However, it is interesting to see that the mean of EC8 class A ‘rock’ sites with \( V_{S30} \ge 800\,{\text{m/s}} \) is much lower than the rest, along with a considerably larger variability. Shortperiod linear soilresponse of rock sites is known to be highly variable compared to softer soils, whose nonlinear soilresponse may suppress the otherwise high variability from linearonly amplification (Bazzurro and Cornell 2004). On the contrary, the longperiod siteresponse of rock sites is less variable than that of softer soils (e.g. \( 180\,{\text{m/s}} \le V_{S30} < 800\,{\text{m/s}}) \). Similar inferences can be drawn from the \( \delta S2S_{s} \sim slope \) plots in the right column of Fig. 13, where higher slopes are (usually) indicative of rock sites on steep hillsides, and lower slopes at softer sites located on flatter sediments.
For completeness, along with a database of \( \delta S2S_{s} \left( {T = 0.0  8.0\,{\text{s}}} \right) \) for the 1829 ESM sites, we derive a continuous empirical models for both \( \delta S2S_{s} \sim V_{S30} \)(measured) and \( \delta S2S_{s} \sim slope \) correlations seen in Fig. 13. We chose a quadratic functional form instead of the traditional piecewise linear function; as shown in Eqs. (6) and (7). Here, the measured \( V_{S30} \) is in m s^{−1} and slope in m m^{−1}, the regression coefficients \( g_{0} , g_{1} , g_{2} \) are different for \( SR^{{V_{s30} }} \) and \( SR^{slope} \), and change with period. The residuals from \( \delta S2S_{s} \sim V_{S30} \) (measured) and \( \delta S2S_{s} \sim slope \) regressions are \( \delta S2S_{s}^{{V_{S30} }} \) and \( \delta S2S_{s}^{slope} \), with robust standarddeviations \( \phi_{S2S}^{{V_{S30} }} \) and \( \phi_{S2S}^{slope} \), respectively.
Robust linear fits using an M estimator (Venables and Ripley 2002), at each of 34 periods between \( T = 0.01s  8\,{\text{s}},PGA \left( {T = 0\,{\text{s}}} \right) \) and PGV, are derived for \( {{\updelta S}}2{\text{S}}_{\text{s}} \sim V_{S30} \) correlation of 419 sites with measured \( V_{S30} \) available, and \( {{\updelta S}}2{\text{S}}_{\text{s}} \sim slope \) of the 1829 sites with slope derived from digital elevation models provided by Shuttle Radar Topography Mission.
Although heteroscedastic models of \( \phi_{S2S}^{{V_{S30} }} \) and \( \phi_{S2S}^{slope} \) appear reasonable in Fig. 13, we chose not to propose one without testing its significance; given the uneven distribution of sites in different bins. Figure 13 shows the reduction in betweensite variance from using \( V_{S30} \) and slope as siteresponse proxies (Eqs. 6 and 7) in the GMM. For comparison, the variances of K16 GMM are also shown in this plot. Note that the K16 GMM comes in two variants: one without a siteresponse component, and another with measured \( V_{S30} \) as a proxy for linear siteresponse. Note that, the K16 standarddeviations shown in Fig. 7 are those when not using \( V_{S30} \) as a parameter, while those in Fig. 14 are when using \( V_{S30} \) as siteresponse proxy—hence, lower in the latter.
A significant reduction in betweensite standarddeviation can be achieved using an efficient siteresponse proxy or a combination of proxies. Since only a few sites are provided with measured \( V_{S30} \) values, \( \phi_{S2S}^{{V_{S30} }} \) is substantially smaller than \( \phi_{S2S}^{slope} \). For a new site with \( V_{S30} \) or slope available, Eq. (6) or (7) can be appended to the \( ln\left( \mu \right) \) in Eq. (2), while replacing \( \phi_{S2S} \) (when estimating σ) with \( \phi_{S2S}^{{V_{S30} }} \) or \( \phi_{S2S}^{slope} \), respectively. For site with neither siteresponse proxy available, but with sufficient strong groundmotion recordings, the sitespecific \( \delta S2S_{s} \) term can be estimated and used for sitespecific groundmotion predictions.
6.2.4 Aleatory variability
The last component of the GMM is the apparent aleatory variability, quantifying the natural randomness of the groundmotion data—that which is not captured by the mixedeffects. The aleatory residuals, \( \varepsilon = {\mathcal{N}}\left( {0,\phi^{2} } \right), \) are tested for event depth and recording site distance dependencies in Fig. 15. For all periods (\( T = 0.0,0.1,1.0\,{\text{s}} \)), we observe no significant trends in binned means and MAD, which implies the \( f_{R,g} \left( {M_{W} ,R_{JB} } \right) \) and \( f_{R,a} \left( {R_{JB} } \right) \) of the GMM (Eqs. 2 and 3) explain the distance scaling of groundmotion sufficiently well. Heteroscedasticity of \( \phi \) is not evident either in these plots.
The moderatelong period residuals of this dataset show a clear evidence of anisotropic shearwave radiation pattern in the near and intermediate distance ranges of \( R_{JB} \le 80\,{\text{km}} \), similar to that reported by Kotha et al. (2019) with the KiKnet (Dawood et al. 2016; Kotha et al. 2018) and NGAW2 (Ancheta et al. 2014; Boore et al. 2014) datasets and their GMMs. In addition, there is a strong evidence of SmS phases from Moho reflection in the \( 60\,{\text{km}} < R_{JB} \le 200\,{\text{km}} \) distance ranges, e.g. Bindi et al. (2006), especially from events deeper than 10 km and close to Moho boundary. However, we chose not to discuss these features in the response spectral domain in which the GMM residuals are estimated, but instead with those in the Fourier domain as a followup.
7 Application
The GMM presented in this study has no new explanatory parameters in its functional form compared to previous panEuropean GMMs. The median predictions rely only on the two generic parameters M_{W} and R_{JB}, which constitute the fixedeffects. Of course, the standarddeviation estimates of the new model are significantly larger than those of K16, but this is to be expected given the 15fold increase in data: from a greater variety of sites, tectonic localities, \( M_{W} \le 5.5 \) events, etc. To explain the variability, without introducing new parameters, we instead resolved the apparent aleatory variability into various possible contributions. Therefore, the model can be used ignoring the regiontoregion, localitytolocality, and sitetosite variabilities, but at the cost of increased aleatory variability. We provide three application possibilities:
7.1 Ergodic application
The first approach is by ignoring all repeatable effects, i.e. the region, locality, and sitespecific adjustments. The betweenlocality, betweenevent, betweensite, residual standarddeviations can be combined into an ergodic, total standarddeviation \( \sigma = \sqrt {\tau_{L2L}^{2} + \tau_{0}^{2} + \phi_{S2S}^{2} + \phi^{2} } \), as shown in Fig. 7. In this case, the regional differences in anelastic attenuation, quantified by \( \tau_{c3} \), will be treated as an epistemic uncertainty on farsource distance scaling. The epistemic uncertainty on the regionalised anelastic attenuation coefficient c_{3} in Eq. (4) is \( \tau_{c3} \). This uncertainty can be handled with a GMM logic tree consisting of a slower \( \left( {c_{3,r} = c_{3} + \eta .\tau_{c3} } \right) \), average \( \left( {c_{3,r} = c_{3} } \right) \), and faster \( \left( {c_{3,r} = c_{3}  \eta .\tau_{c3} } \right) \) attenuating branches. Following the Miller III and Rice (1983) three point approximation of a Gaussian distribution, with \( \eta \approx 1.732 \) one can use logic tree branch weights of 0.167, 0.666, and 0.167, respectively for slower, average, and faster branches. Consequently, the groundmotion prediction is a weighted mixture of three Gaussian distributions \( {\mathcal{N}}\left( {ln\left( \mu \right), \sigma^{2} } \right) \), where \( ln\left( \mu \right) \) is estimated from Eq. (2) for three values of \( c_{3,r} \in \left( {c_{3} , c_{3} + \eta \cdot \tau_{c3} , c_{3}  \eta \cdot \tau_{c3} } \right) \).
Within the context of an ergodic application, if a site has either the measured \( V_{S30} \) or slope information available, but no sitespecific groundmotion recordings, then the \( \phi_{S2S} \) in the estimation of \( \sigma = \sqrt {\tau_{L2L}^{2} + \tau_{0}^{2} + \phi_{S2S}^{2} + \phi^{2} } \) can be replaced with \( \phi_{S2S}^{{V_{S30} }} \) and \( \phi_{S2S}^{slope} \), whilst appending the fixedeffects of Eq. (2) with Eqs. (6) and (7), respectively. Figure 14 shows the consequent reduction of σ to \( \sigma^{{V_{S30} }} \) and \( \sigma^{slope} \) values, when using siteresponse proxies \( V_{S30} \) and slope, respectively. Consequently, the groundmotion predictions follow the mixed Gaussian distribution \( {\mathcal{N}}\left( {ln\left( \mu \right) + SR^{{V_{s30} }} , {\sigma^{{V_{S30} }}}^{2} } \right) \) or \( {\mathcal{N}}\left( {ln\left( \mu \right) + SR^{slope} , {\sigma^{slope}}^{2} } \right) \), where \( ln\left( \mu \right) \) is estimated from Eq. (2) for three values of \( c_{3,r} \in \left[ {c_{3} , c_{3} + \eta \cdot \tau_{c3} , c_{3}  \eta \cdot \tau_{c3} } \right] \), and \( SR^{{V_{s30} }} \) or \( SR^{slope} \) are estimated from Eq. (6) or (7). The reduced aleatory variability from using a siteresponse proxy is beneficial until when enough groundmotion data can be collected at a site, and then \( \delta S2S_{s} \) for the new site can be estimated using equations provided in Kotha et al. (2017); RodriguezMarek et al. (2013); Sahakian et al. (2018); Villani and Abrahamson (2015), etc.
7.2 Regionspecific application
For regionspecific applications, the predictions can be upgraded with the regionspecific anelastic attenuation and tectonic locality specific adjustments. Anelastic attenuation is regionalised by adjusting the generic coefficient c_{3} in Eq. (4) with a regionspecific value \( \delta c_{3,r} \), as in \( c_{3,r} = c_{3} + \delta c_{3,r} , \) where region r is decided by the location of the site. Since \( \delta c_{3,r} \) are estimated from a smaller regionspecific sample of groundmotion recordings, it should be treated as epistemically uncertain. For this purpose, standarderror on \( \delta c_{3,r} \) are provided as well, and these are always smaller than \( \tau_{c3} \). Treating the \( SE\left( {\delta c_{3,r} } \right) \) as uncertainty on mean of a normally distributed sample, the 95% confidence interval of \( \delta c_{3,r} \) would be \( \delta c_{3,r} \pm 1.6SE\left( {\delta c_{3,r} } \right) \).
Similarly, depending on the tectonic locality of the earthquake, the GMM predictions can be further regionalised by adjusting e_{1} in Eq. (2) to \( e_{1,l} = e_{1} + \delta L2L_{l} \), where l identifies locality tectonic locality of the earthquake. Since \( \delta L2L_{l} \) is estimated from a smaller localityspecific groundmotion sample, the 95% confidence interval is bounded by \( \delta L2L_{l} \pm 1.6.SE\left( {\delta L2L_{l} } \right) \). A reduction of up to 10% in σ is achieved by dropping the \( \tau_{L2L} \) from aleatoric variance, resulting in a smaller \( \sigma_{r} = \sqrt {\tau_{0}^{2} + \phi_{S2S}^{2} + \phi^{2} } \), as shown in Fig. 16. Further reductions to \( \sigma_{r}^{Slope} \) are \( \sigma_{r}^{{V_{s30} }} \) possible if the siteresponse scaling with slope and \( V_{s30} \) (Eqs. 6 and 7) are used in predictions, respectively.
It is interesting to compare the total standarddeviation σ (dashed line in Fig. 16) of the regionalised, \( V_{s30} \) based K16 GMM with \( \sigma_{r}^{{V_{s30} }} \) of this study. At \( T \le 0.3s \), the \( \sigma_{r}^{{V_{s30} }} \) is about 6% larger than its K16 counterpart; while at T > 0.3 s, \( \sigma_{r}^{{V_{s30} }} \) is as much as 14% smaller. Despite the enormous increase in data, it appears that regionalisation has helped in substantially curtailing the increase in apparent aleatory variability.
7.3 Region and sitespecific application
Partially nonergodic region and sitespecific groundmotion predictions are possible for those sites with \( \delta S2S_{s} \) provided with this GMM or for new sites with sufficient groundmotion data. \( \delta S2S_{s} \) for the 1829 sites in the ESM dataset are provided, along with the \( \delta c_{3,r} \) of their region, and \( \delta L2L_{l} \) of nearby tectonic localities (earthquake sources). Since betweensite and betweenlocality variabilities do not apply for sitespecific predictions, the reduction in apparent aleatory variability is enormous, i.e. \( \sigma_{r,s} = \sqrt {\tau_{0}^{2} + \phi^{2} } \) is about 40% smaller than \( \sigma = \sqrt {\tau_{l2l}^{2} + \tau_{0}^{2} + \phi_{S2S}^{2} + \phi^{2} } \), as shown in Fig. 16. However, the reduction in aleatory variability will be accompanied by additional epistemic uncertainty. In addition to those in regionspecific predictions, uncertainty on (the mean) \( \delta S2S_{s} \) should be accounted with \( \pm 1.6.SE\left( {\delta S2S_{s} } \right). \) Region and sitespecific predictions therefore are a mixture of 27 Gaussian distributions \( {\mathcal{N}}\left( {ln\left( {\mu_{r,s} } \right), \sigma_{r,s}^{2} } \right) \), where \( ln\left( {\mu_{r,s} } \right) \) is estimated from Eq. (1) for three values of \( c_{3,r} \in \left[ {c_{3} + \delta c_{3,r} , c_{3} + \delta c_{3,r} + 1.6SE\left( {\delta c_{3,r} } \right), c_{3} + \delta c_{3,r}  1.6SE\left( {\delta c_{3,r} } \right)} \right] \), three values of \( e_{1,l,s} \in \left[ {e_{1,l} + \delta S2S_{s} , e_{1,l} + \delta S2S_{s} + 1.6SE\left( {\delta S2S_{s} } \right), e_{1,l} + \delta S2S_{s}  1.6SE\left( {\delta S2S_{s} } \right)} \right], \) wherein \( e_{1,l} \in \left[ {e_{1} + \delta L2L_{l} , e_{1} + \delta L2L_{l} + 1.6SE\left( {\delta L2L_{l} } \right), e_{1} + \delta L2L_{l}  1.6SE\left( {\delta L2L_{l} } \right)} \right] \).
In complement, a more practical application is presented in the companion study by Weatherill et al. (2020c), where a more thorough comparison of this GMM with contemporary models is provided. Weatherill et al. (2020c) also provides a more complete discussion on how the various epistemic uncertainties in this GMM can be handled in the logic tree framework of a PSHA.
7.4 Towards nonergodic groundmotion predictions
For tectonic localities, attenuating regions, and sites with sufficient amount of recordings the epistemic uncertainty on the randomeffect adjustments are negligible with respect to the randomeffect and standarddeviations. Collecting more groundmotion recordings is principal in moving towards nonergodic predictions. The benefits in resolving the ergodic assumption and progressing towards region and sitespecific in groundmotion prediction is demonstrated Fig. 17. In this plot, predictions for the M6.5 Norcia event of the central Italy sequence, which occurred on 30th October, 2016, are compared to the response spectra recorded at three sites covered by the Italian strong motion network (Gorini et al. 2010). These sites are identified by the network code IT in the ESM dataset: (1) permanent, freefield station LSS (Leonessa) with \( V_{S30} = 1091\,{\text{m/s}} \) located 25 km from the event epicentre, (2) permanent, freefield station MVB (Marsciano Monte Vibiano) with \( V_{S30} = 1046\,{\text{m/s}} \) located 65 km from the event epicentre and, (3) permanent, freefield station PSC (Pescasseroli) with \( V_{S30} = 1000\,{\text{m/s}} \) located 110 km from the event epicentre. The three columns in Fig. 17 correspond to the three stations.
These event and stations are selected to demonstrate progressively (in Fig. 17) the impact of moving from ergodic prediction relying on \( V_{S30} \) as siteresponse proxy (top row), through regionspecific predictions (middle row) considering regional (Northern and central Apennines West) anelastic attenuation (\( c_{3,r} = c_{3} + \delta c_{3,r} \)) and adjustment specific to the tectonic locality (\( e_{1,l} = e_{1} + \delta L2L_{l} \)) containing the event (locality ID: “PTTC007”), to region and sitespecific predictions (bottom row) from an additional sitespecific adjustment (\( e_{1,l,s} = e_{1} + \delta L2L_{l} + \delta S2S_{s} \)). Both the median prediction and standarddeviation change in process, which is reflected by the width of the coloured ribbon in Fig. 17. The \( \delta S2S_{s} \left( {T = 0.01  8s} \right) \) of these sites are estimated from 29, 15, and 20 records from predominantly smallmoderate earthquakes (details in the Fig. 17 panels). A few comments on this figure:

1.
The ergodic median predictions (central line) and one \( \sigma^{{V_{S30} }} \) interval (ribbon) are systematically above the observed response spectra at the three rock sites, located at near (25 km), intermediate (65 km), and far (110 km) source distances. This is likely because the M6.5 Norcia event produced relatively weaker groundmotions compared to other large magnitude events recorded in Greece and Turkey—as quantified by their respective \( \delta B_{e,l}^{0} \) values. Since the ergodic predictions consider all event, region, site, and record variabilities as aleatory, the \( \sigma^{{V_{S30} }} \) (Fig. 16) is large yet not large enough to contain the M6.5 event observations within \( \pm \sigma^{{V_{S30} }} \) boundaries.

2.
Regionspecific groundmotion predictions for these sites are achieved by adjusting the GMM with the \( \delta c_{3,r} \) of the Northern and central Apennines (West), in which the sites are located; and the \( \delta L2L_{l} \) of the tectonic locality PTTC007, in which the M6.5 event occurred (along with a few other prominent events and aftershocks). The epistemic uncertainties of these adjustments are relatively very small given the large number of recordings. In the middle row of Fig. 17, we notice the observed response spectra are closer to regionspecific predictions than to the ergodic predictions. \( \delta c_{3,r} \) and \( \delta L2L_{l} \) of these regions are both lower than the panEuropean average (which is zero), meaning the region attenuates shortperiod groundmotions faster and the events on average produce weaker groundmotions than elsewhere in panEuropean region.
It is interesting to note that, at short distances (site IT_LSS) the \( \delta c_{3,r} \) has no effect on regionspecific predictions, and the shift is mostly from \( \delta L2L_{l} \)—and so is at intermediate distance (site IT_MVB). At farsource distances (site IT_PSC), the combined effect of \( \delta c_{3,r} \) and \( \delta L2L_{l} \) worked well to capture the observed response spectra within the narrower \( \pm \sigma_{r}^{{V_{S30} }} \) (Fig. 16) range about the regionspecific median.

3.
Region and sitespecific predictions (sitespecific in short) for the three sites are shown in the bottom row of Fig. 17. Along with the curves, details on the number of recordings, magnitude and distance ranges (1st and 3rd quantile) of the recordings are provided. The additional adjustment to the preceding regionspecific predictions is through \( e_{1,l,s} = e_{1} + \delta L2L_{l} + \delta S2S_{s} \). While most of the data in estimating the \( \delta S2S_{s} \) of these sites is from smallmoderate sized earthquakes, the sitespecific predictions fit quite well with the observations for the large M6.5 event. Since \( \delta S2S_{s} \) are used to adjust the regionspecific groundmotions, the \( V_{S30} \) becomes irrelevant, and \( \phi_{S2S}^{{V_{S30} }} \) is dropped from \( \sigma_{r}^{{V_{S30} }} \) resulting in a smaller \( \sigma_{r,s} \).
In the above example, we demonstrated that applying regionspecific adjustments noticeably improved the match between observations and predictions. The best agreement was clearly from using region and sitespecific adjustments. To substantiate this claim, we performed a 10fold crossvalidation exercise to verify if the introduction of the various randomeffects into GMM functional form indeed improves its prediction capabilities. In doing so, we rerun the regression with three functional forms:

Ergodic model with no regionalisation of anelastic attenuation, no localisation of event terms, and no sitespecific adjustments, i.e. no randomeffects \( \delta c_{3,r} ,\delta L2L_{l} ,\delta S2S_{s} \)

Regional model with regionalisation of anelastic attenuation and tectonic localisation of events, leaving out the randomeffect to capture sitetosite variability, i.e. only the randomeffects \( \delta c_{3,r} \) and \( \delta L2L_{l} \), but no \( \delta S2S_{s} \)

Sitespecific model identical to the GMM presented here i.e., regionalisation of anelastic attenuation, localisation event terms, and the sitetosite randomeffect
To perform the crossvalidation, the dataset is split into 10 parts with nonoverlapping events. Meaning, earthquakes (and records) are exclusive to their subsets and do not feature in any other subset. We perform regression of the three models on any nine subsets combined, and test the predictions on the tenth subset. Rootmeansquarederrors are estimated for each trial and then averaged over the ten trials. This exercise is repeated for IMs:\( PGA, PGV, SA\left( { T = 0.1, 0.2, 0.5, 1, 2, 4\,{\text{s}}} \right) \).
Figure 18 shows a histogram of RMSE for each IM, from the three models. The reduction in RMSE from ergodic to regionspecific GMMs is clear and most prominent at short periods. Towards longer periods, the improvement is less pronounced but is still substantial. This is because the regionalisation randomeffects are focused on the capturing variabilities in shortperiod groundmotions. In Fig. 7, we notice the overall variability σ peaks at short periods. In Figs. 8 and 10, the largest regional variability of anelastic attenuation \( \tau_{c3} \) and tectonic localities \( \tau_{L2L} \) is also at short periods \( T \le 0.3\,{\text{s}} \). Without regionalisation, along with the increased the short period groundmotion prediction variability and reduced precision of an ergodic median, the predictive capability (measured as RMSE) of the GMM is reduced as well.
Across all the periods, i.e. the entire response spectra, the best predictive capabilities are those of region and sitespecific GMM. In Fig. 16, the largest reduction in σ is achieved not from using \( V_{S30} \) or slope as the siteresponse proxy, but from using \( \delta S2S_{s} \) as the sitespecific adjustment. However, it is unlikely that every site has sufficient groundmotion data to estimate it’s sitespecific \( \delta S2S_{s} \). In that case, alternative siteresponse proxies are sought to predict the \( \delta S2S_{s} \), as in Kotha et al. (2018); Weatherill et al. (2020b). However, even in these studies, while the long period siteresponse could be partially explained using some geotechnical parameters, shortperiod siteresponse is much more variable—even among the socalled reference rock sites (Bard et al. 2019; Pilz et al. 2020).
8 Summary
In this study, we present the update of the Kotha et al. (2016) region and sitespecific GMM, using the recently published European Strong Motion dataset (ESM) by Lanzano et al. (2019a) and Bindi et al. (2018b). The update derived from a dataset 15 times larger than the older RESORCE dataset (Akkar et al. 2014b), with only minor changes to fixedeffects formula, and one additional randomeffect grouping. Leveraging on the recently developed regionalisation datasets, TSUMAPSNEAM by Basili et al. (2019) and the tectonic zonation defined in the purview of European Seismic Hazard Model 2020 project, we have regionalised the groundmotion data to capture spatial variation of anelastic attenuation and earthquake characteristics, respectively.
Due to the increased spatial and temporal diversity of the data, the randomeffect and residual variances are larger than the older panEuropean GMMs. This increased variance motivates the development of regiondependent groundmotion models since a larger ergodic σ will severely affect the probabilistic seismic hazard and risk assessments. We propose making use of the attenuation region, tectonic locality, and sitespecific terms (randomeffects), along with their epistemic uncertainty, to develop partially nonergodic groundmotion predictions with a reduced σ. The improved predictive power of the region and sitespecific version of the GMM is substantiated with a 10fold crossvalidation. All the randomeffect values will be provided on request, along with the fixedeffects coefficients, and the variance–covariance matrices needed to update the model in a Bayesian framework.
Only the customary randomeffect and residual analyses are presented here, leaving out their physical evaluation for a followup study. A similar exercise is actually carried out on the Fourier amplitude version of the ESM dataset. Randomeffects and residuals of the Fourier GMM are easier to associate to geophysical and geological parameters that are not involved in the GMM regression. If indeed the spatial variability of source, path, and site randomeffects can be attributed to a globally available parameter, e.g. a crustal velocity map, we can attempt migrating the GMM developed from seismically active regions to the less active regions with sparse groundmotion recordings. The GMM is developed with the intention of perpetual updates, when and wherever new datasets become available. Further elaboration on randomeffect analyses, GMM update and application procedures are ongoing, and will complement this study. Meanwhile, this new GMM has the widest applicability yet for the panEuropean region; as it is derived from a dataset stretching from Pyrenees in the west to Iran in the east, RhineGraben in the north to Hellenic arc in the south, and constituted of manually processed groundmotion data shallow crustal earthquakes of \( 3 \le M_{W} \le 7.4 \) recorded at \( 0 \le R_{JB} \le 545km \).
9 Data and resources
The European Strong Motion flatfile is available at https://esm.mi.ingv.it//flatfile2018/ with persistent identifier PID: 11099/ESM_flatfile_2018. The analyses in this study have been performed in R software (Team 2013). In particular, we used the libraries rlmm (Koller 2016), dplyr (Wickham et al. 2019b), ggplot2 (Wickham et al. 2019a), ggmap (Kahle et al. 2019), viridis (Garnier 2019), etc. The electronic supplement provides all the data used, derived, and presented in this study. We further encourage readers to contact the authors for more details, GMM implementation in OpenQuake™ (Pagani et al. 2014), regression scripts, etc.
References
Abrahamson N, Silva WJ (1997) Empirical response spectral attenuation relations for shallow crustal earthquakes. Seismol Res Lett 68:94–127
Abrahamson N, Youngs R (1992) A stable algorithm for regression analyses using the random effects model. Bull Seismol Soc Am 82:505–510
Abrahamson NA, Silva WJ, Kamai R (2014) Summary of the ASK14 ground motion relation for active crustal regions. Earthq Spectra 30:1025–1055. https://doi.org/10.1193/070913eqs198m
Akkar S, Sandıkkaya M, Bommer J (2014a) Empirical groundmotion models for pointand extendedsource crustal earthquake scenarios in Europe and the Middle East. Bull Earthq Eng 12:359–387
Akkar S et al (2014b) Reference database for seismic groundmotion in Europe (RESORCE). Bull Earthq Eng 12:311–339. https://doi.org/10.1007/s1051801395068
Al Atik L, Abrahamson N, Bommer JJ, Scherbaum F, Cotton F, Kuehn N (2010) The variability of groundmotion prediction models and its components. Seismol Res Lett 81:794–801
Ancheta TD et al (2014) NGAWest2 database. Earthq Spectra 30:989–1005
Anderson JG, Brune JN (1999) Probabilistic seismic hazard analysis without the ergodic assumption. Seismol Res Lett 70:19–28
Atik LA, Youngs RR (2014) Epistemic uncertainty for NGAWest2 models. Earthq Spectra 30:1301–1318
Bard P, Bora SS, Hollender F et al (2019) Are the standard VSKappa hosttotarget adjustments the only way to get consistent hardrock ground motion prediction? Pure Appl Geophys. https://doi.org/10.1007/s00024019021739
Basili R et al (2019) NEAMTHM18 documentation: the making of the TSUMAPSNEAM Tsunami Hazard Model 2018
Bates D, Mächler M, Bolker B, Walker S (2014) Fitting linear mixedeffects models using lme4. arXiv preprint arXiv:14065823
Bazzurro P, Cornell CA (2004) Nonlinear soilsite effects in probabilistic seismichazard analysis. Bull Seismol Soc Am 94:2110–2123. https://doi.org/10.1785/0120030216
Bindi D, Kotha SR (2020) Spectral decomposition of the Engineering Strong Motion (ESM) flat file: regional attenuation, source scaling and Arias stress drop. Bull Earthq Eng 18:2581–2606. https://doi.org/10.1007/s10518020007961
Bindi D, Parolai S, Grosser H, Milkereit C, Karakisa S (2006) Crustal attenuation characteristics in northwestern Turkey in the range from 1 to 10 Hz. Bull Seismol Soc Am 96:200–214
Bindi D, Massa M, Luzi L, Ameri G, Pacor F, Puglia R, Augliera P (2014) PanEuropean groundmotion prediction equations for the average horizontal component of PGA, PGV, and 5%damped PSA at spectral periods up to 3.0 s using the RESORCE dataset. Bull Earthq Eng 12:391–430
Bindi D, Cotton F, Kotha SR, Bosse C, Stromeyer D, Grünthal G (2017) Applicationdriven ground motion prediction equation for seismic hazard assessments in noncratonic moderateseismicity areas. J Seismol 21:1201–1218. https://doi.org/10.1007/s1095001796615
Bindi D, Cotton F, Spallarossa D, Picozzi M, Rivalta E (2018a) Temporal variability of ground shaking and stress drop in Central Italy: a hint for fault healing? Bull Seismol Soc Am 108:1853–1863
Bindi D, Kotha S, Weatherill G et al (2018b) The panEuropean engineering strong motion (ESM) flatfile: consistency check via residual analysis. Bull Earthq Eng 17:583–602. https://doi.org/10.1007/s105180180466x
Bindi D, Picozzi M, Spallarossa D, Cotton F, Kotha SR (2019) Impact of magnitude selection on aleatory variability associated with groundmotion prediction equations: part II—analysis of the betweenevent distribution in Central Italy. Bull Seismol Soc Am 109:251–262
Bohnhoff M, MartínezGarzón P, Bulut F, Stierle E, BenZion Y (2016) Maximum earthquake magnitudes along different sections of the North Anatolian fault zone. Tectonophysics 674:147–165
Bommer JJ, Abrahamson NA (2006) Why do modern probabilistic seismichazard analyses often lead to increased hazard estimates? Bull Seismol Soc Am 96:1967–1977
Bommer JJ, Douglas J, Scherbaum F, Cotton F, Bungum H, Fäh D (2010) On the selection of groundmotion prediction equations for seismic hazard analysis. Seismol Res Lett 81:783–793
Boore DM (2010) Orientationindependent, nongeometricmean measures of seismic intensity from two horizontal components of motion. Bull Seismol Soc Am 100:1830–1835
Boore DM, Stewart JP, Seyhan E, Atkinson GM (2014) NGAWest2 equations for predicting PGA, PGV, and 5% damped PSA for shallow crustal earthquakes. Earthq Spectra 30:1057–1085
Bora SS, Cotton F, Scherbaum F, Edwards B, Traversa P (2017) Stochastic source, path and site attenuation parameters and associated variabilities for shallow crustal European earthquakes. Bull Earthq Eng 15:4531–4561. https://doi.org/10.1007/s105180170167x
Brooks C, Douglas J, Shipton Z (2020) Improving earthquake groundmotion predictions for the North Sea. J Seismol. https://doi.org/10.1007/s1095002009910x
Brune JN (1970) Tectonic stress and the spectra of seismic shear waves from earthquakes. J Geophys Res 75:4997–5009
Campbell KW (1981) Nearsource attenuation of peak horizontal acceleration. Bull Seismol Soc Am 71:2039–2070
Campbell KW, Bozorgnia Y (2014) NGAWest2 ground motion model for the average horizontal components of PGA, PGV, and 5% damped linear acceleration response spectra. Earthq Spectra 30:1087–1115. https://doi.org/10.1193/062913eqs175m
Chiou BSJ, Youngs RR (2014) Update of the Chiou and Youngs NGA model for the average horizontal component of peak ground motion and response spectra. Earthq Spectra 30:1117–1153
Cong L, Mitchell B (1998) Lg coda Q and its relation to the geology and tectonics of the Middle East. In: Mitchell BJ, Romanowicz B (eds) Q of the Earth: global, regional, and laboratory studies. Pageoph topical volumes, Birkhäuser, Basel. https://doi.org/10.1007/9783034887113_15
Crowley H et al (2019) Methods for estimating site effects in risk assessments, vol Deliverable 26.4, Final edn. Seismology and Earthquake Engineering Research Infrastructure Alliance for Europe (SERA)
Dawood HM, RodriguezMarek A (2013) A method for including path effects in groundmotion prediction equations: an example using the Mw 9.0 Tohoku earthquake aftershocks. Bull Seismol Soc Am 103:1360–1372
Dawood HM, RodriguezMarek A, Bayless J, Goulet C, Thompson E (2016) A flatfile for the KiKnet database processed using an automated protocol. Earthq Spectra 32:1281–1302
Derras B, Bard PY, Cotton F, Bekkouche A (2012) Adapting the neural network approach to PGA prediction: an example based on the KiKnet data. Bull Seismol Soc Am 102:1446–1461
Douglas J (2004) An investigation of analysis of variance as a tool for exploring regional differences in strong ground motions. J Seismol 8:485–496. https://doi.org/10.1007/s1095000430947
Douglas J (2018a) Calibrating the backbone approach for the development of earthquake ground motion models. Best practice in physicsbased fault rupture models for seismic hazard assessment of nuclear installations: issues and challenges towards full seismic risk analysis
Douglas J (2018b) Capturing geographicallyvarying uncertainty in earthquake ground motion models or what we think we know may change. In: Pitilakis K (ed) Recent advances in earthquake engineering in Europe: 16th European conference on earthquake engineering—Thessaloniki. Springer, Cham, pp 153–181. https://doi.org/10.1007/9783319757414_6
Douglas J, Edwards B (2016) Recent and future developments in earthquake ground motion estimation. Earth Sci Rev 160:203–219
Drouet S, Ameri G, Le Dortz K et al (2020) A probabilistic seismic hazard map for the metropolitan. France. Bull Earthq Eng 18:1865–1898. https://doi.org/10.1007/s10518020007907
Faccioli E, Paolucci R, Vanini M (2015) Evaluation of probabilistic sitespecific seismichazard methods and associated uncertainties, with applications in the Po Plain, northern Italy. Bull Seismol Soc Am 105:2787–2807
Garnier S (2019) Viridis: default color maps from “matplotlib”. 2018. https://github.com/sjmgarnier/viridis. R package version 03 4:27
Giardini D et al (2018) Seismic hazard map of the Middle East. Bull Earthq Eng 16:3567–3570
Gorini A et al (2010) The Italian strong motion network. Bull Earthq Eng 8:1075–1090
Grad M, Tiira T, Group EW (2009) The Moho depth map of the European Plate. Geophys J Int 176:279–292
Grünthal G, Wahlström R (2012) The EuropeanMediterranean earthquake catalogue (EMEC) for the last millennium. J Seismol 16:535–570
Grünthal G, Stromeyer D, Bosse C, Cotton F, Bindi D (2018) The probabilistic seismic hazard assessment of Germany—version 2016, considering the range of epistemic uncertainties and aleatory variability. Bull Earthq Eng 16:4339–4395
Huber PJ (1992) Robust estimation of a location parameter. In: Kotz S, Johnson NL (eds) Breakthroughs in statistics. Springer series in statistics (perspectives in statistics). Springer, New York, NY. https://doi.org/10.1007/9781461243809_35
Jacoby WG (2000) Loess: a nonparametric, graphical tool for depicting relationships between variables. Electoral Stud 19:577–613
Kahle D, Wickham H, Kahle MD (2019) Package ‘ggmap’
Kale Ö, Akkar S, Ansari A, Hamzehloo H (2015) A groundmotion predictive model for Iran and Turkey for horizontal PGA, PGV, and 5% damped response spectrum: investigation of possible regional effects. Bull Seismol Soc Am 105:963–980
Kohrangi M, Kotha SR, Bazzurro P (2020) Impact of partially nonergodic sitespecific probabilistic seismic hazard on risk assessment of single buildings. Earthq Spectra (inreview)
Koller M (2016) robustlmm: an R package for robust estimation of linear mixedeffects models. J Stat Softw 75:1–24
Kotha SR, Bindi D, Cotton F (2016) Partially nonergodic region specific GMPE for Europe and MiddleEast. Bull Earthq Eng 14:1245–1263
Kotha SR, Bindi D, Cotton F (2017) From ergodic to region and sitespecific probabilistic seismic hazard assessment: method development and application at European and Middle Eastern sites. Earthq Spectra 33:1433–1453. https://doi.org/10.1193/081016EQS130M
Kotha SR, Cotton F, Bindi D (2018) A new approach to site classification: mixedeffects ground motion prediction equation with spectral clustering of site amplification functions. Soil Dyn Earthq Eng. https://doi.org/10.1016/j.soildyn.2018.01.051
Kotha SR, Cotton F, Bindi D (2019) Empirical models of shearwave radiation pattern derived from large datasets of groundshaking observations. Sci Rep 9:1–11. https://doi.org/10.1038/s41598018375244
Kowsari M, Halldorsson B, Hrafnkelsson B, Snæbjörnsson JÞ, Jónsson S (2019) Calibration of ground motion models to Icelandic peak ground acceleration data using Bayesian Markov chain Monte Carlo simulation. Bull Earthq Eng 17:2841–2870
Kowsari M, Sonnemann T, Halldorsson B, Hrafnkelsson B, Snæbjörnsson JÞ, Jónsson S (2020) Bayesian inference of empirical ground motion models to pseudospectral accelerations of south Iceland seismic zone earthquakes based on informative priors. Soil Dyn Earthq Eng 132:106075
Kuehn NM, Abrahamson NA (2017) The effect of uncertainty in predictor variables on the estimation of groundmotion prediction equations. Bull Seismol Soc Am 108:358–370. https://doi.org/10.1785/0120170166
Kuehn NM, Scherbaum F (2016) A partially nonergodic groundmotion prediction equation for Europe and the Middle East. Bull Earthq Eng 14:2629–2642
Kühn NM, Scherbaum F (2015) Groundmotion prediction model building: a multilevel approach. Bull Earthq Eng 13:2481–2491
Landwehr N, Kuehn NM, Scheffer T, Abrahamson N (2016) A nonergodic groundmotion model for California with spatially varying coefficients. Bull Seismol Soc Am 106:2574–2583
Lanzano G, Sgobba S, Luzi L et al (2019a) The panEuropean engineering strong motion (ESM) flatfile: compilation criteria and data statistics. Bull Earthq Eng 17:561–582. https://doi.org/10.1007/s105180180480z
Lanzano G, Luzi L, Pacor F, Felicetta C, Puglia R, Sgobba S, D’Amico M (2019b) A revised groundmotion prediction model for shallow crustal earthquakes in Italy. Bull Seismol Soc Am 109:525–540
Lu Y, Stehly L, Paul A, Group AW (2018) Highresolution surface wave tomography of the European crust and uppermost mantle from ambient seismic noise. Geophys J Int 214:1136–1150
Manighetti I, Campillo M, Bouley S, Cotton F (2007) Earthquake scaling, fault segmentation, and structural maturity. Earth Planet Sci Lett 253:429–438
Mayor J, Traversa P, Calvet M, Margerin L (2018) Tomography of crustal seismic attenuation in Metropolitan France: implications for seismicity analysis. Bull Earthq Eng 16:2195–2210
Miller AC III, Rice TR (1983) Discrete approximations of probability distributions. Manag Sci 29:352–362
Pacor F et al (2018) NESS1: a worldwide collection of strongmotion data to investigate nearsource effects. Seismol Res Lett 89:2299–2313
Pagani M et al (2014) OpenQuake engine: an open hazard (and risk) software for the global earthquake model. Seismol Res Lett 85:692–702. https://doi.org/10.1785/0220130087
Pilz M, Cotton F, Zaccarelli R, Bindi D (2019) Capturing regional variations of hardrock attenuation in Europe. Bull Seismol Soc Am 109:1401–1418. https://doi.org/10.1785/0120190023
Pilz M, Cotton F, Kotha SR (2020) Datadriven and machine learning identification of seismic reference stations in Europe. Geophys J Int. https://doi.org/10.1093/gji/ggaa199
PiñaValdés J, Socquet A, Cotton F, Specht S (2018) Spatiotemporal variations of ground motion in Northern Chile before and after the 2014 M w 8.1 iquique megathrust event. Bull Seismol Soc Am 108:801–814
Radiguet M, Cotton F, Manighetti I, Campillo M, Douglas J (2009) Dependency of nearfield ground motions on the structural maturity of the ruptured faults. Bull Seismol Soc Am 99:2572–2581
RodriguezMarek A et al (2013) A model for singlestation standard deviation using data from various tectonic regions. Bull Seismol Soc Am 103:3149–3163
Sahakian V, Baltay A, Hanks T, Buehler J, Vernon F, Kilb D, Abrahamson N (2018) Decomposing leftovers: event, path, and site residuals for a smallmagnitude Anza region GMPE. Bull Seismol Soc Am 108:2478–2492
Sahakian V, Baltay AS, Hanks TC, Buehler J, Vernon FL, Kilb D, Abrahamson NA (2019) Groundmotion residuals, path effects, and crustal properties: a pilot study in Southern California. J Geophys Res 124:5738–5753
Schmedes J, Archuleta RJ (2008) Nearsource ground motion along strikeslip faults: insights into magnitude saturation of PGV and PGA. Bull Seismol Soc Am 98:2278–2290
Sedaghati F, Pezeshk S (2017) Partially nonergodic empirical groundmotion models for predicting horizontal and vertical PGV, PGA, and 5% damped linear acceleration response spectra using data from the Iranian Plateau. Bull Seismol Soc Am 107:934–948
Stafford PJ (2014) Crossed and nested mixedeffects approaches for enhanced model development and removal of the ergodic assumption in empirical groundmotion models. Bull Seismol Soc Am 104:702–719
Strasser FO, Abrahamson NA, Bommer JJ (2009) Sigma: issues, insights, and challenges. Seismol Res Lett 80:40–56
Team RC (2013) R foundation for statistical computing. Team RC, Vienna, p 3
Thompson E, Wald DJ, Worden C (2014) A VS30 map for California with geologic and topographic constraints. Bull Seismol Soc Am 104:2313–2321
Venables W, Ripley B (2002) Modern applied statistics (Fourth S., editor). Springer, New York
Vilanova SP et al (2018) Developing a geologically based VS30 sitecondition model for portugal: methodology and assessment of the performance of proxies developing a geologically based VS30 sitecondition model for Portugal. Bull Seismol Soc Am 108:322–337
Villani M, Abrahamson NA (2015) Repeatable site and path effects on the groundmotion sigma based on empirical data from southern California and simulated waveforms from the CyberShake platform. Bull Seismol Soc Am 105:2681–2695. https://doi.org/10.1785/0120140359
Wald DJ, Allen TI (2007) Topographic slope as a proxy for seismic site conditions and amplification. Bull Seismol Soc Am 97:1379–1395
Weatherill G, Crowley H, Lemoine A, Roullé A, Tourlière B, Kotha SR, Cotton F (2020a) Modelling seismic site response at regional scale for the 2020 European Seismic Risk Model (ESRM20). Bull Eng (inpreparation)
Weatherill G, Kotha SR, Cotton F (2020b) Rethinking site amplification in regional seismic risk assessment. Earthq Spectra. https://doi.org/10.1177/8755293019899956
Weatherill G, Kotha SR, Cotton F (2020c) A regionallyadaptable “ScaledBackbone” ground motion logic tree for shallow seismicity in europe: application in the 2020 European seismic hazard model. Bull Earthq Eng (inreview)
Weatherill G, Kotha SR, Cotton F, Bindi D, Danciu L (2020d) Updated GMPE logic tree and rock/soil parameterisation for ESHM18. vol Deliverable 25.4. Seismology and Earthquake Engineering Research Infrastructure Alliance for Europe (SERA)
Wickham H, Chang W, Henry L, Pedersen T, Takahashi K, Wilke C, Woo K (2019a) R package ‘ggplot2’v. 3.1. 1. Cran R
Wickham H, François R, Henry L, Müller K (2019b) dplyr: a grammar of data manipulation. R package version 0.8. 0.1. ed
Woessner J et al (2015) The 2013 European seismic hazard model: key components and results. Bull Earthq Eng 13:3553–3596. https://doi.org/10.1007/s1051801597951
Acknowledgements
Open Access funding provided by Projekt DEAL. We are grateful to the swift and insightful reviews by Prof. Sinan Akkar and an anonymous reviewer, and Prof. John Douglas in his capacity as the handling editor of this manuscript at Bulletin of Earthquake Engineering. The contributions of the Sreeram Reddy Kotha (corresponding author) in this research are funded by the SIGMA2 consortium (EDF, CEA, PG&E, SwissNuclear, Orano, CEZ, CRIEPI) under Grant—2017–2021. The model development has benefitted immensely from feedbacks provided by Dr. Paolo Traversa, SIGMA2 scientific committee, and the collaborators in Horizon 2020 (Grant No. 730900) “Seismology and Earthquake Engineering Research Infrastructure Alliance for Europe (SERA)” project.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kotha, S.R., Weatherill, G., Bindi, D. et al. A regionallyadaptable groundmotion model for shallow crustal earthquakes in Europe. Bull Earthquake Eng 18, 4091–4125 (2020). https://doi.org/10.1007/s10518020008691
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10518020008691
Keywords
 Groundmotion model
 Response spectra
 Robust mixedeffects regression
 Regionally adaptable
 Seismic hazard and risk
 Europe