1 Introduction

The area comprising the provinces of Sichuan and Yunnan, which are located near the south-eastern boundary of the Tibetan plateau, is one of the most tectonically and seismically active regions in China. Since the beginning of the historical record (~ 800 C.E.), two massive earthquakes (\(M_{S} \ge 8.0\)), the 1833 Songming earthquake in 1833 and the 2008 Wenchuan (WC) earthquake, and approximately 40 large earthquakes (\(7.0 \le M_{S} \le 7.9\)) have occurred in this area. More than 130 million people in the Sichuan–Yunnan area thus face high seismic hazard and, given the vulnerability of the structures in the area, high risk.

Ground-motion models (GMMs), including both empirical (i.e. those based on regression of observed ground motions) models and those derived using physics-based approaches, are a crucial component of seismic hazard analysis. Such models should capture “the centre, the body, and the range” (viz. the guidelines of the Senior Seismic Hazard Analysis Committee, SSHAC) of future earthquake ground motions either by using a suite of GMPEs or through a backbone approach (e.g., Douglas 2018). The most widely-used GMMs are ground motion prediction equations (GMPEs), which are generally derived through regression analysis of empirical data. In the following we use “GMM” to refer to a suite of GMPEs to capture the epistemic uncertainty in ground-motion prediction.

Due to a sparsity of digital ground-motion records from China, Hu and Zhang (1984) introduced the transform method in the 1980s to derive GMPEs for peak ground acceleration (PGA) and response (pseudo) spectral accelerations (PSAs). This approach was widely used in China subsequently (e.g., Huo 1989; Wang et al. 2000; Yu 2002). The GMPEs most widely used for seismic hazard analysis of major engineering projects are still based on the transform method, like the models by Xiao (2011) and China Earthquake Administration (2019) (hereafter, CEA 2019). In this method, a reference area with abundant instrumental ground-motion records and macroseismic intensities (e.g., the western United States of America, USA) is chosen. Then separate GMPEs to predict PGA or PSA and macroseismic intensities are derived for the reference area and a similar GMPE for macroseismic intensity for the target area (in our case, China). The assumption then made is that, for the same magnitude and distance, the same level of macroseismic intensity in both areas would correspond to the same level of PGA or PSA. This then allows a GMPE for PGA or PSA to be estimated for the target area by converting between macroseismic intensity and PGA or PSA. While the transform method has obvious imperfections, e.g., large intervals between intensity values, this method is commonly used. The method has highlighted that ground-motion characteristics vary amongst Chinese regions (Yu and Wang 2004, 2006; Zhao et al. 2009; Fan et al. 2011).

More and more ground-motion records have been collected in China with improvements in strong-motion networks since the early 2000s, especially in the Sichuan–Yunnan area. This increasing number of records facilitates the development of local GMPEs. Various GMPEs have been derived directly from local ground-motion records from the Sichuan–Yunnan area: Cui et al. (2012), Wang et al. (2013), Li et al. (2020) for PGA and PSAs and Fan et al. (2020) and Zhang et al. (2021) for peak ground velocity (PGV) and peak ground displacement. The development of these GMPEs does not just indicate great progress for regional seismic hazard analysis but also provides more choices of GMPEs for such analyses. Given the complex regional characteristics of ground motions, it is not evident that GMPEs derived using data from other regions (e.g., western USA) are well calibrated to the characteristics of a target region (here, Sichuan–Yunnan) in terms of its source and path properties (e.g., Bommer and Stafford 2020). Region-specific GMPEs could show more consistency with the records from the target region than other models as long as the region-specific GMPEs satisfy the requirements of seismic hazard analysis (e.g., Mousavi et al. 2012; Ghasemi and Allen 2022). Therefore, region-specific GMMs are generally recommended for a specific site/region as GMMs vary from region to region, because of differences in, e.g., source characteristics, path effects related to geometric spreading and anelastic attenuation (e.g., Cotton et al. 2006; Goulet et al. 2018; Weatherill et al. 2020).

To develop appropriate GMMs, GMPEs need to be evaluated against their potential performance in predicting ground motions for the target area and requirements for probabilistic seismic hazard analyses (e.g., Bommer et al. 2010). For the purposes of site-specific, national or continental hazard analyses, candidate GMPEs have been evaluated in many studies, e.g., for the USA (e.g. Abrahamson et al. 2008; Gregor et al. 2022; McNamara et al. 2019), for Australia (e.g. Ghasemi and Allen 2022), for Italy (e.g. Lanzano et al. 2020), for Europe and the Middle East (e.g. Stafford et al. 2008; Campbell and Bozorgnia 2006; Campbell 2016) and globally (e.g. Stewart et al. 2015). Very few studies have undertaken such an evaluation for China. Huang and Galasso (2019) compared the NGA-West2 GMPEs and one local GMPE to local ground-motion records in the Sichuan province. Despite this study’s interesting finding that ground motions in this province are characterized by slower attenuation than modelled by the NGA-West2 GMPEs, this study was based on only a single local candidate GMPE and relatively few ground-motion records. The local GMPE (Huo 1989) used in their comparison was developed more than 30 years ago using the transform method, which does not correctly model recent ground-motion observations (Wang 2010). Besides, the ground-motion records used by Huang and Galasso (2019) came from only three recent strong earthquakes, which are not necessarily typical of ground motions in the Sichuan province, especially since relatively small and moderate earthquakes have caused significant damage in this area.

In this study, we evaluate GMPEs to recommend appropriate models for the Sichuan–Yunnan area. First, we compile all regional GMPEs derived from local instrumental ground-motion records. Second, these candidate GMPEs are evaluated based on their datasets, functional forms and independent variables, the methods used to derive the models, and the range of response periods considered. Next, we compare predictions from the candidate GMPEs to local ground-motion records by computing inter- and intra-event residuals (also known as between- and within-event residuals). Finally, we make some recommendations of appropriate GMPEs for use within seismic hazard analyses of the Sichuan–Yunnan area.

2 Determination of candidate GMPEs

We searched for all published GMMs developed for the Sichuan–Yunnan area and identified 14 candidate GMPEs. This excludes models derived for specific sites or from single earthquakes only. In addition, we have only considered GMPEs that have used data from a wide range of magnitudes because these have wider applicability. Therefore, those GMPEs that have used data from small and moderate earthquakes only are not included in our comparisons. This includes the GMPE of Jiang (2017).

Huo (1989), Yu and Wang (2006) and Lei et al. (2007) derived their GMPEs using the transform method. These models assume elliptic attenuation because isoseismal maps usually use elliptic contours to describe the effects of an earthquake. It is difficult, however, to estimate the major axes of future earthquakes as the strikes of faults vary. This creates a difficulty and uncertainty when predicting ground motions for future events. Given the problems discussed above with the transform method and these difficulties with assuming elliptic attenuation, these GMPEs have been excluded from the following evaluation.

The GMPEs of Xiao (2011) and CEA (2019) are also derived using the transform method, although the ground-motion records of the WC earthquake were included during the regression analysis. In addition, they both assume elliptic attenuation, which makes them more difficult to use. As mentioned above, however, they are the most widely used GMPEs within seismic hazard analyses for major engineering projects in China. Because of this and despite the potential difficulties in applying them, the GMPEs of Xiao (2011) and CEA (2019) are considered in the later residual analysis.

Eight other Sichuan–Yunnan GMPEs derived using the standard method of regression on ground-motion data (and not the transform method) are considered more carefully in the following. These are the models of: Cui et al. (2006), Kang and Jin (2009), Cui et al. (2012), Wang et al. (2013), Zhang et al. (2013), Wen et al. (2018), Li et al. (2020), and Zhang et al. (2022). These eight GMPEs were developed for Sichuan and/or Yunnan using local ground-motion data. Most of them were published in peer-reviewed journals except Cui et al. (2012), which is a conference paper, and all of them have not yet been superseded by a new article by the same authors. Thus, none of models can be excluded by basic quality assurance criteria without looking deeper.

Following exclusion criteria suggested by Bommer et al. (2010) for GMPEs of shallow crustal earthquakes, the main elements used for the assessment of the GMPEs applicability are: the dataset used, the range of response periods, the functional form and the independent variables, the methods used to derive the coefficients, and the magnitude and distance ranges of applicability. These are discussed in turn in the following sections.

2.1 Dataset selection

All datasets used to derive these GMPEs consist of instrumental ground motions of earthquakes that occurred in the Sichuan–Yunnan area. Zhang et al. (2013) and Li et al. (2020) also used a small number of ground-motion records from global shallow crustal earthquakes. Each group of authors developed their own selection criteria for records, including considerations of the magnitude and distance ranges, the minimum and maximum number of records for each event, the numbers of components considered and how they were combined, recording quality criteria and whether records from aftershocks were included.

Details are listed in Table 1. It is obvious that Cui et al. (2006) provides limited details on the dataset used. Kang and Jin (2009) did not provide sufficient information on their dataset either. In addition, they only used records from aftershocks of the WC earthquake (from 12th May to 10th June 2008). It is debatable if one specific earthquake sequence is sufficient to derive a widely applicable GMPE. The rest of the considered GMPEs were derived from records of moderate and strong earthquakes (\(M \ge 4.0\)), and the number of records from each earthquake should be sufficient for regression analysis.

Table 1 Characteristics of records and selection criteria used by the GMPE developers

In summary, all eight studies collated their own datasets of instrumental records and used them to derive GMPEs according to their own selection criteria, but the quality of the datasets varies considerably. Based on the characteristics of datasets discussed above, the GMPEs of Cui et al. (2006) and Kang and Jin (2009) would be excluded at the first step as it is likely that these models will be unreliable due to inadequacies of their datasets. In the following only the six remaining models: Cui et al. (2012), Wang et al. (2013), Zhang et al. (2013), Wen et al. (2018), Li et al. (2020), and Zhang et al. (2022), are considered.

2.2 Function form and variables

As shown in Table 2, all the GMPEs use logarithmic functions, with magnitude, distance, site and style-of-faulting terms. The logarithmic form of GMPEs is justified as the distribution of ground motion parameters has been shown to follow a lognormal distribution, except potentially for very high amplitude motions. The ground motion parameters (PGA, PGV, and 5%-damped PSA) are non-linear functions of magnitude and source-to-site distance. Neither epicentral distance, \(R{}_{epi}\), nor local magnitude, \(M_{L}\), are generally appropriate variables to predict ground motions of moderate and large earthquakes, particularly within the near-source region. Site-effect terms that are not in terms of the time-average shear-wave velocity to 30m, \(V_{S30}\), or another physical measurement would likely not be optimum either. This indicates that the GMPEs of Cui et al. (2012) and Wang et al. (2013) would be less suitable than others, which use appropriate magnitude and distance scales and even consider style-of-faulting effects in their functions.

Table 2 Functional forms, the variables used and their ranges of applicability of the considered GMPEs

2.3 Derivation method

Since all functional forms of GMPEs are logarithmic, it is important to choose a regression method that accounts for inter- and intra-event components of variability when deriving the coefficients of a GMPE (e.g. Douglas 2003). The intra-event effects include systematic bedrock or site-specific effects and path-specific effects, and the inter-event component contains systematic source-specific effects (Al Atik et al. 2010). Table 3 lists the methods used by their developers to derive the candidate GMPEs, for their residual analyses (if performed) and to test the resulting GMPEs. Some of the developers calculated inter- and intra-event residuals. Most of the developers tested their GMPEs by comparing them with previous models and/or with observed ground-motion data, which adds credibility to the GMPEs.

Table 3 Methods used by the GMPE developers

For each of the candidate GMPEs, we evaluated the predicted PGA curves against distance. These curves exhibit similar and reasonable shapes, except the curves from the GMPE of Wen et al. (2018). As shown in Fig. 1a, as well as the very small values of PGA (in terms of an unknown unit), the curves are inappropriate because predictions for \(M_{w} = 5.0\) are greater than those for \(M_{w} = 8\). Examining the coefficients carefully, it was noticed that the values of \(a_{2}\) and \(a_{4}\) (from Table 1 in Wen et al. 2018) were unusual with positive and negative values, respectively, when they would be expected to be negative and positive. To check for a simple clerical error, we switched the values for \(a_{2}\) and \(a_{4}\), and then redrew the PGA curves for various magnitudes. Comparing these new curves with the original ones, the values of PGA in Fig. 1b are now much too high and the shapes of the curves are unusual. When examining the unit covariance matrix, Wen et al. (2018) themselves highlight that some of the coefficients of their GMPEs (including those related to the magnitude scaling) show strong trade-offs, which they relate to their data’s relatively narrow magnitude range and few near-source records. Therefore, the GMPE of Wen et al. (2018) is rejected from further comparisons.

Fig. 1
figure 1

a Attenuation of PGA for magnitude 5.0, 6.0, 7.0 and 8.0 using the GMPE of Wen et al. (2018). b Using the published equation but switching the values of \(a_{2}\) and \(a_{4}\)

Therefore, there are five models left for the following comparison: Cui et al. (2012), Wang et al. (2013), Zhang et al. (2013), Li et al. (2020) and Zhang et al. (2022).

2.4 Range of response periods

GMMs should provide predictions for a wide range of response spectral periods (from 0s, i.e., PGA, up to at least 2 s) (Bommer et al. 2010). According to the requirements of Chinese code for seismic design of buildings (MHUDPRC 2010), response spectral periods up to 6s are required from seismic hazard analyses. As shown in Fig. 2, all five remaining GMPEs provide predictions from 0.0 to at least 6.0s, although the number of sampling points varies from model to model. For the GMPE of Cui et al. (2012), the sampling points are too spread out to enable smooth response spectra to be drawn, while the other four GMPEs generate smoother spectra. Most spectra show large variations at longer periods (beyond 6s), probably due to noise in the ground-motion data and a limited records being used at such periods. The GMPEs of Zhang et al. (2013) and Li et al. (2020) show unusual behaviour for smaller earthquakes. These unusual predictions were confirmed by comparing them against spectra shown in the original articles.

Fig. 2
figure 2

Comparison of predicted response spectra from candidate GMPEs for various magnitudes and distances assuming rock site conditions, and assuming strike-slip faulting and a footwall site (if applicable)

Despite the above-mentioned weaknesses of these GMPEs, all GMPEs predict similar levels of ground motion for common response periods (Fig. 2). These equations appear to provide reasonable and similar predictions of PSA up to around 2 s. Overall, the GMPEs of Wang et al. (2013) and Zhang et al. (2022) provide better predictions for the whole period range, with the PSAs from Zhang et al. (2022) showing less variation with period than those from Wang et al. (2013).

2.5 Conclusions of this GMPE evaluation

Through looking at each aspect of the candidate GMPEs, we were able to understand the inherent quality of each GMPE in terms of their datasets, functional forms, independent variables and the magnitude and distance ranges of their applicability. A summary of these findings are reported below.

  • There is no detailed accessible list of records or the detailed characteristics of the database used by Cui et al. (2006) or Kang and Jin (2009), which makes it harder to assess the reliability and applicability of these GMPEs. Therefore, both these GMPEs were excluded before any detailed comparison was carried out.

  • Cui et al. (2012) mainly used observational data from the mountainous area of the Sichuan–Yunnan area, and hence the GMPE from Cui et al. (2012) may be applicable for seismic hazard analysis in this area. The specific magnitude and distance ranges of applicability of this GMPE are also unknown. In addition, the GMPE’s use of epicentral distance means that it is likely unreliable in the near-source region of moderate and large earthquakes. Similarly, \(V_{S30}\) was not considered when modelling local site effects. Overall, this GMPE is not considered appropriate for most seismic hazard assessments.

  • Wang et al. (2013) did not consider \(V_{S30}\) for the site terms of their GMPE either. This GMPE also has a narrow magnitude range of applicability due to the data used (4.5 ≤ \(M_{S}\) ≤ 6.5). In consequence, this GMPE is also not considered appropriate for most seismic hazard assessments.

  • As stated in their article, the GMPE from Zhang et al. (2013) could be used to predict ground motions for earthquakes of magnitude equal to and greater than 5.0 within a rupture distance of 200 km. The applicability of this model for moderate earthquakes is, however, doubtful as we do not obtain reasonable response spectra for earthquakes of magnitude 5 using this GMPE.

  • According to the range of magnitude and distance indicated by Wen et al. (2018), their GMPE is applicable for \(M_{w} > 4.0\) and within a Joyner–Boore distance of around 200 km. The coefficients provided in this article, however, are likely incorrect because of the unusual ground-motion predictions they generate.

  • As discussed above, the GMPE of Li et al. (2020) could be appropriate for the prediction of horizontal PGA and PSAs of strong (M > 6) earthquakes within a rupture distance of 200 km.

  • Zhang et al. (2022) stated that their GMPE showed good performance for estimating PSAs from earthquakes with \(M_{w}\) 4.2–7.9 and rupture distances from 10 to 200 km, at sites with \(V_{S30}\) = 140–1130 m/s. The predicted response spectra shown here confirmed this good performance.

In conclusion, we assessed these eight candidate GMPEs by following the Bommer et al. (2010) exclusion criteria. As listed in Table 4, the GMPE from Zhang et al. (2022) is the only one that passes all Bommer et al.’s basic exclusion criteria. Wen et al. (2018) may pass if the correct coefficients were available. In addition, Li et al. (2020) could be an appropriate choice if the focus was only on calculating ground motions from earthquakes with M > 6.

Table 4 Overall assessment of the five remaining GMPEs according to Bommer et al. (2010)’s exclusion criteria

3 Comparisons of GMPEs to local ground-motion records

To further evaluate the performance of these models, we compared predictions from the five remaining GMPEs to local ground-motion data. Various studies have used goodness-of-fit measures such as the mean and standard deviations of the residuals, log-likelihood and Euclidean distances to assess the match between observations and predictions for different regions (e.g. Scherbaum et al. 2004, 2009; Kale and Akkar 2013; Mousavi et al. 2014). These measures capture the overall fit between the observations and predictions within a handful of values and hence they are useful data reduction methods, especially when ranking many GMPEs. We have decided not to rank the GMPEs based on these measures as we believe that presenting plots of observations and predictions, and graphs showing inter- and intra-event residuals (as well as computing the bias and standard deviations, as well as best-fit trends, from these residuals) provides a more complete description of where the GMPEs fit the data. Our approach is similar to that taken by, for example, Scasserra et al. (2009) for Italy.

3.1 Database of ground-motion records

All instrumental data used in this study are provided by the Strong Motion Observation Centre within the Institute of Engineering Mechanics of the China Earthquake Administration (hereafter SMOC). The official data that are available from SMOC start in 2007 and currently end in 2017. The database contains source parameters (epicentre locations, hypocentral depths and magnitudes), station parameters (station locations and basic site conditions) and ground-motion values (PGA and PSAs for various spectral periods). The records are from 946 earthquakes that occurred in the Sichuan–Yunnan area between January 2007 and December 2017.

The magnitudes provided were either on the surface-wave or local scales, so to be consistent with the magnitude scale used in most of the candidate GMPEs, moment magnitudes \(M_{w}\) were collected from the Global Centroid-Moment-Tensor (GCMT) catalogue when available or from the \(M_{w}\)-based earthquake catalogue for China compiled by Cheng et al. (2017). Based on three Chinese earthquake catalogues and three global \(M_{w}\) catalogues (GCMT, ISC-GEM and NEIC), Cheng et al. (2017) compared events with \(M_{S}\) in Chinese earthquake catalogues with \(M_{w}\) in global catalogues and then derived \(M_{S}\)\(M_{w}\) regression relationships using a constrained general orthogonal regression method. These relationships were used to estimate \(M_{w}\) for the earthquakes without an entry in the GCMT catalogue.

Considering the magnitude range of applicability of these GMPEs listed in Table 1, we selected ground-motion records from earthquakes of \(M_{w} \ge\) 4.8. This leads to 3,279 records from 71 mainshocks of \(M_{w}\) 4.8–7.9. The locations of these earthquakes are shown in Fig. 3. The WC earthquake of 2008 generated a large sequence of aftershocks, including some strong earthquakes. Therefore, we have included those records in the following comparison (1,756 records from 58 aftershocks of the WC earthquake).

Fig. 3
figure 3

Map showing the epicentres of earthquakes used in the comparison and the locations of recording stations. The aftershocks of the WC earthquake are not plotted due to their large quantity. Light grey and light orange polygons indicate the Sichuan and Yunnan provinces, respectively

According to the focal mechanism solutions provided by published references, 26 earthquakes are strike-slip events, two earthquakes are normal-faulting events and two earthquakes are reverse-faulting events. Details can be found in the Electronic Supplement.

For the other earthquakes whose magnitudes are less than 5.0, focal mechanism solutions cannot be obtained either from the Global CMT catalogue or from the literature. Nevertheless, the tectonics of the Sichuan–Yunnan area can help assess the focal mechanisms of these small earthquakes. The Indian plate is colliding with the Eurasian plate and is moving towards the northeast, which forces the Tibetan plateau to continue to move in an east–south–southeast direction (Yin and Harrison 2000), whereas the relatively stable Huanan block resists the escaping movement of the plateau as a rigid barrier (Zhang et al. 2010). As strain response, modern tectonics in the Sichuan–Yunnan area is dominated by movements of translation, rotation and uplift of secondary active blocks. These blocks are bounded by a series of strike–slip faults combining dip–slip components (Xu et al. 2003). Hence, the focal mechanisms of earthquakes occurred in the Sichuan–Yunnan area are dominated by strike–slip faulting, with only a few normal-faulting and reverse-faulting events, which are distributed in the west and northeast of this area, respectively (Hu et al. 2017).

In consequence, we assume strike–slip faulting when evaluating the GMPEs for \(M_{w}\) less than 7.0. All PGAs and PSAs available at very large magnitudes are from the reverse-faulting WC earthquake of \(M_{w}\) 7.9 so the GMPEs are evaluated for reverse faulting for this event.

3.2 Distance parameters

Rupture distance (\(R_{rup}\)) needed to be estimated since it is the distance measure used in our candidate GMPEs. Although it is difficult to obtain the rupture geometries for all events in our database, it still is obtainable for the larger earthquakes along well-mapped faults.

We identified 30 earthquakes with sufficient information from which we could estimate rupture geometries (Electronic Supplement). For these earthquakes, estimates of \(R_{rup}\) for all stations were obtained using the estimated rupture geometries and the Fault-to-Station Distance program (Boore 2019). For the other 41 earthquakes with \(M_{w}\) less than 5 for which rupture geometries could not be identified, hypocentral distance (\(R_{hypo}\)) was used as a proxy distance measure. Most observations from earthquakes of \(M_{w} \le 5\) are recorded at 30–200 km from the rupture plane (Fig. 4a). Due to the likely small size of these ruptures (less than 5 × 5 km) the effect of this assumption will be negligible.

Fig. 4
figure 4

a Magnitude-distance distribution of records, b distribution with magnitude, and c distribution with \(R_{rup}\)

Figure 4c shows the number of records against \(R_{rup}\). For all events, most records are in the distance range 30–300 km. For large earthquakes, and particularly the \(M_{w}\) 7.9 WC earthquake, records are also available from 300 to 1000 km. There are a large number of records from aftershocks of the WC earthquake, most of which are located at 30–200 km away [the magnitudes of these aftershocks are smaller than 6.2 and mainly range from 4.8 to 5.6].

3.3 Comparisons of GMPEs to local records

Zhang et al. (2013), Li et al. (2020) and Zhang et al. (2022) included terms to model the effect of different styles of faulting. For the strike-slip faulting events, the style-of-faulting term \(f_{flt} = 0\). Cui et al. (2012) and Wang et al. (2013) did not include style-of-faulting terms. These three GMPEs (Zhang et al. 2013, 2022; Li et al. 2020) also included hanging-wall terms. The hanging wall effect can sometimes be observed in the near-source area (\(R \le 40\;{\text{km}}\)) at short spectral periods (e.g., Abrahamson and Somerville 1996). In the Sichuan–Yunnan area, most faults have strike-slip mechanisms with high dip angles. In addition, most records are located more than 30km from the rupture plane (Fig. 4c), so hanging-wall effects would likely be minimal in the vast majority of cases. Hence, we did not use the hanging-wall terms. All GMPEs used \(V_{S30}\) in their site-effect term except that by Wang et al. (2013). Average \(V_{S30}\) of 553 m/s and 353 m/s (Yu et al. 2016) are assumed for rock and soil sites, respectively.

We plotted the predicted values from the candidate GMPEs (Cui et al. 2012; Wang et al. 2013; Zhang et al. 2013, 2022; Li et al. 2020) and observed values for PGA and PSA (1 s) against rupture distance (\(R_{rup}\)) for various magnitudes. We used a magnitude bin of \(\pm 0.2\) around the central value to group the observations for each magnitude interval. For PGAs from all considered magnitudes, there is a good match between predictions and observations from rock stations at all distances, especially from 30 to 200 km (Fig. 5). Overall, the predictions of PGAs fit the observations from soil stations as well, especially for \(M_{w}\) = 5.0, 5.5 and 6.0. Furthermore, the plots for the WC aftershocks show the same pattern. For \(M_{w}\) = 6.5, the predictions for PGA are slightly lower than the observations from soil stations but this is based on limited records.

Fig. 5
figure 5

Plots of observations and predictions for PGAs and PSAs (1s) against rupture distance for various magnitudes. For consistency amongst models, we show predictions for rock site conditions. According to the range of applicability of each GMPE listed in Table 2, the predictions are shown up to 200km with solid lines and beyond 200km with dash-dotted lines. Both Cui2012_weighted and Cui2012_unweighted are from Cui et al. (2012) but have different coefficients (this is the same for Zhang2013_FP and Zhang2013_FN)

For period 1s, the predictions from GMPEs show wider fluctuation, and the smaller magnitudes (\(M_{w}\) = 5.0, 5.5 and 6.0) show wider dispersion amongst GMPEs. The predictions generally match the observations from both rock and soil stations well. For \(M_{w}\) = 6.5 and 8.0, some predictions are slightly lower than the observations from soil stations but most under-predictions happen for records from more than 200km. For the WC aftershocks, more observations are under-predicted and the observations are marginally lower than those from mainshocks for the same magnitude. This finding is consistent with previous studies (e.g., Douglas and Halldórsson 2010).

The inter- and intra-event residuals are computed using the algorithm of Abrahamson and Youngs (1992). The residual analysis is restricted to the data from within \(R_{rup}\) = 300 km, which is the farthest distance considered in most seismic hazard analyses. This means 2,469 records (70 events) are included.

The magnitude scaling is tested by plotting inter-event residuals against magnitude. The results in Fig. 6 indicate mixed findings with respect to the misfit between the magnitude-scaling of the candidate GMPEs and local observations. The inter-event (tau) standard deviations range from 0.5 to 0.7, which are larger than commonly-observed values of about 0.3–0.5 (e.g., Douglas and Edwards 2016). The results suggest that ground motions from earthquakes in the Sichuan–Yunnan area are more variable than those in other regions.

Fig. 6
figure 6

Variation of inter-event residuals against magnitude for PGA, PSA (0.1 s) and PSA (1 s). Black solid lines are linear fits, and the black dash lines indicate the 95% confidence intervals. Only the weighted model of Cui et al. (2012) and the FN model of Zhang et al. (2013) are presented here as their variants show similar results

To clearly understand the variation of inter-event residuals with magnitude, we derived linear best-fit lines from the residuals. The slope of the line, namely the trend, indicates a mismatch in the magnitude-scaling of the model and the data. The residuals from Cui et al. (2012) show strong trends against magnitude for PSA at 1s, while for PGA and PSA at 0.1 s the trends are less clear. The relatively low means of the inter-event residuals (biases) and the weak trends in the inter-event residuals with respect to magnitude show that this model can predict PGA and PSA (0.1 s) better than PSA (1 s). Wang et al. (2013) has high bias for PGA and PSA (0.1 s), suggesting a large misfit between predictions and observations, while the model could generate good predictions for PSA (1 s). The trends for Wang et al. (2013) are weak for all presented periods, especially at 0.1 s. Zhang et al. (2013) also shows better predictions for PSA (1 s), as indicated by lower bias and weaker trends for this response period. For PGA and PSA (0.1 s), however, the misfit between predictions and observations are obvious as the biases are high. For Li et al. (2020), there is a weak trend for PGA but the trends for PSA (0.1 s) and PSA (1 s) are extremely strong. Zhang et al. (2022) shows better behaviour for PGA and PSA (1 s) than for PSA (0.1 s), but compared to other models, it provides relatively good predictions for PGA and PSA at both 0.1 s and 1 s. There are trends for PGA, PSA (0.1 s) and PSA (1 s) but all of them are slight and show the same overall pattern.

As shown in Fig. 6, most GMPEs produced better predictions for earthquakes with magnitude between 5.5 and 6.5 but the predictions from Cui et al. (2012) and Zhang et al. (2022) show a clear misfit at \(M_{w}\) = 6.6 (the Lushan earthquake in 2013), with the inter-event residual for this event being much greater than zero for these two GMPEs. However, it is hard to conclude whether any of the GMPEs can provide good predictions for all strong earthquakes due to the limited records from this magnitude range.

The distance scaling is tested by examining the intra-event residuals versus \(R_{rup}\). As shown in Fig. 7, all models generally indicate small biases and trends in terms of their distance scaling for PGA. Most models indicate weak trends for PSA at 0.1 s and 1 s as well. Wang et al. (2013) and Zhang et al. (2013) show trends for PSA (0.1 s) and Cui et al. (2012) shows a relatively stronger trend for PSA (1 s). Overall, Zhang et al. (2022) shows a good performance against distance that for all considered spectral periods. The weak trends mainly occur at distances beyond 200 km, which is beyond this model’s reported range of applicability. The results also indicate that all models show weaker trends for PGA and short periods than longer periods.

Fig. 7
figure 7

Variation of intra-event residuals against rupture distance for PGA, PSA (0.1 s) and PSA (1 s). Black solid lines are linear fits, and black dash lines indicate the 95% confidence intervals. Green and red circles indicate records from soil and rock stations, respectively

3.4 Comparisons with another widely used and recommended GMPEs

We have also compared worldwide GMPEs to local ground-motion records. For active shallow crustal regions, three empirical models, AB2010 (Akkar and Bommer 2010), CY2008 (Chiou and Youngs 2008), and Zhao2006 (Zhao et al. 2006), were recommended for the Global Earthquake Model by Stewart et al. (2015). Chiou and Youngs (2008) has been updated to Chiou and Youngs (2014) during the NGA-West2 project. The new version is also recognized as an adaptable GMPE for active crustal regions (Bommer and Stafford 2020). Therefore, we adopted CY2014 (Chiou and Youngs 2014) rather than 2008 version. In addition, as the most widely used GMPEs in China, Xiao (2011) and CEA (2019) were also included in the comparison.

The results in Fig. 8 indicate low bias and no strong trends in the plots for inter-event residuals, which suggests that the three worldwide GMPEs and the two widely-used Chinese GMPEs provide good predictions for the local data, particularly for PGA and PSA (0.1 s). The inter-event (tau) standard deviation remains large. The bias for PSA (1.0 s) and trends are slightly higher but still better than most of the local GMPEs.

Fig. 8
figure 8

Variation of inter-event residuals against magnitude for PGA, PSA (0.1 s) and PSA (1 s) for three global and one Chinese GMPE. Black solid lines are linear fits, and the black dash lines indicate its 95% confidence intervals. CEA (2019) is updated from Xiao (2011) so it shares the same functional form and the coefficients for response periods from 0 to 1.2 s with Xiao (2011). Due to all periods considered in our analysis being less than 1.2 s, we only present the results for Xiao (2011) for the major axis, although predictions for the minor axis are similar

The plot of intra-event residuals against distance shown in Fig. 9 indicates that the three worldwide GMPEs display slight trends with respect to their distance attenuation for PGA and PSA at 0.1 s and 1.0 s. The two Chinese GMPEs show strong trends, especially for PSA (1.0 s), which are worse than those observed for the local GMPEs.

Fig. 9
figure 9

Variation of intra-event residuals against rupture distance for PGA, PSA (0.1 s) and PSA (1 s) for three global and one Chinese GMPE. Black solid lines are linear fits, and black dash lines indicate its 95% confidence intervals. Green and red circles indicate records from soil and rock stations, respectively

Perhaps surprisingly, the worldwide GMPEs present better compatibility to the local data than the local GMPEs. The NGA-West2 model (CY2014) shows the best prediction in terms of magnitude and distance scaling and overall ground-motion levels.

The results also suggest that the source characteristics of earthquakes in the Sichuan–Yunnan area are more variable than in other regions because of the larger inter-event standard deviations. This could be explained by the complicated tectonic characteristics, which may also have led to the seismic hazard of this area being underestimated by the Chinese zonation map. The results indicate that using only local GMPEs may not be appropriate when estimating the seismic hazard of this area. The combined use of local models and robust global models within a GMM, potentially composed of equally-weighted predictions from the Zhang et al. (2022), Akkar and Bommer (2010), Chiou and Youngs (2014), Zhao et al. (2006) and CEA (2019) GMPEs, would be more appropriate.

4 Conclusions and recommendations

In this article, we evaluated local GMPEs for the Sichuan–Yunnan area based on their characteristics as well as by comparing them with local ground-motion data. From the local GMPEs, that by Zhang et al. (2022) is considered the best for the area. Several reasons support this conclusion: a good and accessible dataset, a standard functional form than captures the main magnitude- and distance-scaling effects, and a good match to the local ground-motion data. Our results also indicate that the distance attenuation of Zhang et al. (2022) matches the observations better than the widely-used GMPEs in China by Xiao (2011) and CEA (2019). Therefore, we recommend the Zhang et al. (2022) GMPE for use within seismic hazard analyses for engineering projects.

Although the rapidly increasing number of seismic stations in China enables the development of local GMPEs, some new local GMPEs suffer from problems, such as poor dataset selection, insufficient consideration of site effects and unusual functional forms. In addition, due to the relatively short recording history, the datasets used to derive the GMPEs does not cover the entire magnitude range of interest for seismic hazard analyses, especially at large magnitudes (M > 6.5). Hence, the local GMPEs for the Sichuan–Yunnan area, and potentially also for the whole of China, could be improved in several ways. For example, rather than deriving completely new models, local GMPEs could be developed by adjusting robust GMPEs from elsewhere. Based on our analysis using local ground-motion data, global GMPEs generally show better performance than local models. The results also suggest that improved estimates of \(V_{S30}\) for seismic stations could help to decrease the uncertainty associated with ground-motion prediction. Finally, the results suggest that earthquake ground motions in the Sichuan–Yunnan area are more variable than those in other active crustal regions.