1 Introduction

Probabilistic seismic hazard models provide assessments of the annual frequency of exceedance (AFE), or probabilities of exceedance over decades, of different levels of ground-motion intensity. Hundreds, if not thousands, of such models have been derived since probabilistic seismic hazard assessment (PSHA) was pioneered in the late 1960s. National-scale versions of the models are the basis of seismic design codes and risk evaluations for insurance and crisis management purposes, whilst site-specific variants are instead used to assess the seismic safety of existing and planned critical infrastructure. Well-studied parts of the world (e.g., California, Europe, and New Zealand) have been the subject of dozens of PSHAs, all of which show differences in the assessed hazard for AFEs of engineering importance (generally 10–2 to 10–4, but down to 10–8 for some highly sensitive infrastructure).

There are various reasons why new PSHAs are regularly undertaken at a country-wide (national seismic hazard models, NSHMs), regional or local scale. Such reasons include: to take into account the advancement in knowledge, methods and data (for instance, the discovery of new active faults [e.g. see examples in Bommer (2022)]); to overcome deficiencies in previous models, such as inappropriate probabilistic approaches or insufficient consideration of epistemic uncertainty; to update seismic building codes; and, particularly for site-specific studies, the design or periodic safety assessment of infrastructure (e.g. USNRC 2018). NSHMs are often routinely updated at time intervals of five to ten years (e.g. Petersen et al. 2024; Gerstenberger et al. 2022) to cope with the development of new procedures, new datasets and improved knowledge of earthquake sources and ground motions. It goes without saying that frequent updating makes it less likely to find large differences between old and new models, thereby simplifying the subsequent use of the updated hazard results. In some cases, however, updating is performed less frequently due to administrative and political constraints, as well as budgetary restrictions. In these cases, new datasets, knowledge and procedures may lead to differences between new and old NSHM that are important in terms of, for example, building codes applications (Marzocchi and Meletti 2024). In contrast, it could also happen that a new PSHA leads to similar hazard estimates to those from a previous PSHA; however, even if the hazard estimates do not change, conducting a new assessment could be valuable because the previous PSHA may have been flawed or poorly documented. In addition, it is important from the viewpoints of regulators and other key stakeholders (e.g. the (re)insurance industry) that new knowledge, data and approaches are used when assessing hazard and risk (e.g. Douglas and Edwards 2016).

Given the large uncertainties inherent in PSHAs, it is unsurprising that differences in the assessed hazard between different generations of the models are often vigorously debated, with some arguing that such variations are to be expected, whilst others will deem it a source of concern. What is perhaps surprising is the fact that, notwithstanding the relevance of this topic, there are only a handful of proposals in the literature discussing how these differences could be considered. The two aims of this article are thus the following: to review and test proposed approaches to quantify what is a significant or an important difference between hazard results; and to discuss what could be considered when examining these differences for various uses of the hazard models. We acknowledge, nonetheless, that regulatory agencies and other bodies have their own specific criteria for judging hazard models and that these criteria do not necessarily include a discussion of differences to previous hazard estimates.

Although the terms significant and important are often considered as synonyms, they have different meanings, at least within the context of this study. The term significant implies the application of statistical tests. Depending on the probabilistic framework adopted, the test seeks to detect if the differences are due to chance, if they describe changes that cannot be explained by epistemic uncertainties, or whether, given a set of data, one model performs remarkably better than the other. The term important has a more pragmatic meaning, suggesting that the differences (statistically significant or not) lead to consequences that may have a large economic or societal impact, or may even be unmanageable. We note that this terminology may create confusion to end users of hazard results. This is because a difference amongst results from distinct hazard models may not be significant from a statistical viewpoint, but may be important for practical applications, such as building codes, which would impose a need to design/assess structures under different seismic loads depending on the model adopted. Finally, given that the term important depends on variations in risk/engineering results, ambiguity may arise regarding whether we are evaluating the importance of the differences in hazard or risk results. While both outcomes are certainly heavily correlated, we argue that since the only source of variation in this study is the different hazard models, it is the importance of the different hazard results that is being evaluated.

It is also noted that in the calculations carried out in this work the exact values from the published hazard curves are used rather than contoured values. Contouring can mask or emphasise differences in computed hazard because sites can be grouped in the same hazard category for different models or in different categories, solely depending on how the categories were chosen. For example, a site with a mean peak ground acceleration (PGA) at the 475-year return period in Hazard Model A of 0.19 g, which subsequently changed to 0.21 g in Hazard Model B, would either be in the same category if the limits were 0.15 g, 0.25 g and 0.35 g, or different categories if the limits were 0.1 g, 0.2 g and 0.3 g. Contouring can be useful in indicating that hazard cannot be determined precisely, but as this simple example shows, it can also unduly emphasise small differences. To our knowledge, in Europe only Italy uses a mapFootnote 1 providing exact (rather than contoured) ground-motion levels in its seismic design code. In countries such as France,Footnote 2 the UKFootnote 3 and othersFootnote 4 only contoured values are used for design. The translation of seismic hazard models including their uncertainties to seismic zonations for design codes is not straightforward as it no longer is solely a scientific or engineering process but also involves discussions with standards organisations and policy makers. In addition, as shown by Table 3 of Douglas et al. (2013), contouring can have an impact on risk estimates (and other derived values) calculated using contoured results, for example from a seismic building code.

Finally, it is underlined that in this study we discuss differences in the hazard results as expressed by hazard curves for specific structural periods and ranges of AFE, and refrain from discussing differences in uniform hazard spectra or inputs to the hazard model. These types of differences are discussed in country-specific studies, e.g. Belvaux et al. (2014) for France, Kohrangi et al. (2018) for Iran, and Kalakonas et al. (2020) for Guatemala. Similarly, Beauval and Bard (2022) present informative plots (their Figs. 4 and 5) showing how the assessed hazard for various locations in France varies depending on the study.

In the next section methods proposed in the literature to evaluate whether a difference between hazard results is significant or important are summarised. In the following section, these methods are applied to some test cases, and we highlight the different conclusions that would be reached by applying them. Next, the risk and engineering implications of differences between hazard results are discussed using various examples. The penultimate section discusses the results of these examples and describes areas for future research about how differences between hazard results could be managed. The article ends with some brief recommendations. We would like to emphasise that, although the following comparisons may appear simple, we are unaware of a similarly thorough investigation of this topic, which is of considerable interest to various end users of seismic hazard models. This investigation was motivated by ongoing discussions that we are having with end users and others in the earthquake engineering community about differences amongst seismic hazard assessments.

2 Methods proposed in the literature

It is relatively common to quantify the differences, often in percentage terms, in ground-motion levels for a given return period (or AFE) between hazard models (e.g., Belvaux et al. 2014; Tromans et al. 2019). However, the practical importance of these percentage differences in terms of applications of the hazard model, or their statistical significance in terms of how accurately hazard can be assessed given the large epistemic uncertainties, are rarely discussed. To our knowledge, only four methods to examine differences in pairs of seismic hazard estimates have been previously proposed, as discussed in this section. Figure 1 illustrates the proposals for an example pair of seismic hazard curves.

Fig. 1
figure 1

Illustration of the criteria used to compare hazard estimates, using the results of ESHM13 and ESHM20 for Bucharest, Romania as an example; (a) hazard curves (AFE vs. PGA) showing the 5th to 95th percentile ranges and mean values for each hazard model; (b) differences between the models using Cohen's d-factor (Malhotra 2014, 2015) using the mean AFE values; (c) PGA distributions, with the 5th, 16th, 50th, 84th, and 95th fractiles, as well as the 25th and 75th fractiles used by Abrahamson (2017), indicated as vertical lines

2.1 Proposal of McGuire (2012)

The first proposal is by McGuire (2012), who examined the differences between the hazard assessed by different experts or teams within site-specific seismic hazard assessments away from active plate boundaries (central and eastern North America and Switzerland). He uses this information to propose a lower bound for the precision to which seismic hazard can be assessed for well-studied sites within state-of-the-art endeavours. He concludes that if an alternative assessment changes the calculated mean AFE by less than 25% for ground motions corresponding to 10–4 annual frequency of exceedance and changes the mean AFE by less than 35% for ground motions corresponding to 10–6 annual frequency of exceedance, then, in the language used by the author, the change can be deemed not significant (even though the method is not based on a formal statistical test). This proposal could be considered as the simplest of the four proposals, as it uses the mean hazard curve directly, rather than hazard fractiles or derived values. Therefore, it could be applied more easily and widely than other approaches. On the other hand, it is only applicable for low AFEs and site-specific studies, and it is based on how precise hazard assessments can be from the point of view of available knowledge, rather than the importance of differences from the point of view of end users of the hazard results.

2.2 Proposal of Malhotra (2014, 2015)

The second proposal is by Malhotra (2014, 2015), and is based on Cohen (1977)’s effect size. This approach is based on the probability density functions (PDFs) derived from the mean hazard curves of the two studies being compared. The test statistic is the Cohen effect size, d, defined as:

$$d=\frac{{\mu }_{1}-{\mu }_{2}}{\sqrt{0.5({\sigma }_{1}^{2}+{\sigma }_{2}^{2})}}$$
(1)

where µ1 and µ2 are the means and σ1 and σ2 are the standard deviations of the two PDFs of the ground-motion levels. The criterion for a large difference, adopted by Malhotra (2014, 2015) from Cohen’s original work, is d > 0.8. Again, because this approach is based on mean hazard curves it can be applied when fractiles have not been computed or published. It does, however, need numerical values for full hazard curves, which are not always published (though digitised scans could be used). Because of the shape of the PDFs derived from hazard curves, the means characterise ground-motion levels of low engineering importance. Hence, a large effect size may not mean the hazard curves are greatly different at AFEs commonly used in applications.

2.3 Two proposals of Abrahamson (2017)

The workshop presentation by Abrahamson (2017) proposes two independent approaches to assess the robustness (although the meaning of that term is not explained by the author) of a change in the hazard between the original and updated hazard model. The change is considered robust by Abrahamson (2017), according to his first criterion, if the mean hazard for the target AFE is outside the 25th and 75th ground-motion fractiles of the updated hazard model. Although not being a formal statistical test (this point will be discussed later), we interpret the term robust to refer to a difference worth being mentioned. This approach is simple to apply and does not require any calculations, although the 25th and 75th fractiles are not commonly computed (unlike the 5th, 16th, 50th, 84th and 95th fractiles). We are unsure why the 25th and 75th fractiles are used by Abrahamson (2017) for this criterion but it could be because they define the central portion (based on the interquartile range) of the updated epistemic uncertainty. We note also that the epistemic uncertainty of the original model is not considered, which effectively means that we are checking if the original mean hazard falls within some uncertainty bounds of its updated counterpart, but not vice-versa.

The second criterion can be described by the following inequality (rearranged from the original formulation):

$${\text{ln}}\left(\frac{{\text{IM}}_{\text{new}}}{{\text{IM}}_{\text{old}}}\right)-0.5{\sigma }_{haz}>0$$
(2)

where IMnew and IMold are the mean ground-motion levels at the target AFE and σhaz is the standard deviation derived from the logarithms of the fractiles at the target AFE. Equation (2) is based on comparing the ratio of the mean hazard estimates to the uncertainty in the hazard estimates as implied by the fractiles from only the new hazard model, and highlighting when the ratio is larger than would be expected given this uncertainty. This means that this second criterion highlights increases, but not decreases in seismic hazard estimates, presumably because the former has potentially a larger impact on engineering practice than the former.

Both the approaches of Abrahamson (2017) have the advantage of explicitly considering the uncertainty in the assessed hazard through the fractiles. The second approach, however, requires the numerical values of the fractiles to have been published, as well as the undertaking of further calculations. The benefit of this second approach over the method based solely on the relative locations of the original mean and updated 25th and 75th fractiles is not clear, especially since it may give misleading results for low hazard areas. Finally, it is also unclear what conclusion should be drawn if the two approaches give contradictory results (i.e., one method indicate that the change is robust whilst the other does not); this is a scenario that will occur every time the first criterion is met at a site where the hazard estimates have decreased.

2.4 Discussion of the four proposals

The methods of Malhotra (2014, 2015) and Abrahamson (2017) are based on the differences in ground-motion levels (e.g., PGAs) for a given return period or AFE. This is like the recent ratio maps of PGA and spectral accelerations (at 0.5 s, 1.5 s and 3.0 s) for specific AFEs between the 2010 and 2022 National Seismic Hazard Models for New Zealand presented by Gerstenberger et al. (2022). Although such an approach does correspond to how hazard results are often used in practice within building design codes or national hazard maps (e.g., what is the PGA for a return period of 475 years?), hazard is, however, assessed the other way around, i.e., a hazard engine gives the AFE for a given ground-motion level. This is recognised by the proposal of McGuire (2012), who uses the AFEs for his criteria, and this is also how hazard models are used within risk evaluations, through the convolution of the hazard and fragility/vulnerability curves. Therefore, it may be more appropriate among the four methods, although more difficult to visualise, to use the differences in AFE at a given ground-motion level (e.g., PGA of 0.1 g), as has been done in Italy where the coefficient of variation of the hazard has been mapped (Meletti et al. 2021). Because of the steep slopes of hazard curves, differences in AFEs for a given ground-motion level can also be many times higher than differences in ground-motion level for a given AFE. Therefore, the same difference may appear greater when expressed in terms of AFEs than when expressed in terms of ground-motion levels.

Although the term significance is used in some of the above-mentioned papers and some of the methods suggest a statistical test, we argue that a formal test on the statistical significance of the difference between two hazard outcomes requires the establishment of a probabilistic framework (Marzocchi and Jordan 2017). For example, if we adopted the subjective interpretation of probability (Apostolakis 1990; Vick 2002), only the mean hazard curves would be considered and the fractiles would not have any meaning. The statistical test to evaluate differences would then require a set of data to calculate the Bayes factor (Kass and Raftery 1995); note that in this framework, the term significance does not refer to the significance level of a test, but to a different category of the Bayes factor [see Sect. 3.2 in Kass and Raftery (1995)]. Conversely, fractiles have a formal meaning in the unified framework proposed by Marzocchi and Jordan (2014), where a meaningful statistical comparison between two hazard outcomes requires the inclusion of the distribution of fractiles from both models. All above-mentioned methods do not comply with these basic requirements to assess a general statistical significance. Hence, the value and interpretation of these techniques is primarily heuristic, and the best method to use depends on the context.

3 Results of applying the methods to some test cases

The methods described in the previous section are herein applied to some European test cases: one site in Switzerland with results from five recent PSHAs; some example locations from two recent national hazard models for Italy; and a comparison of two European hazard models. The objectives of this section are, firstly, to illustrate the typical extent of the differences between published seismic hazard estimates and, secondly, to understand how previously proposed methods would characterise such differences. We note that due to the consideration of only PGA, as well as only a handful of locations for each example, the implications for actual sites or hazard models are limited.

3.1 Beznau (Switzerland)

The first site used to test the methods described in the previous section is Beznau, in northern Switzerland. This site is chosen as it was considered within two state-of-the-art site-specific PSHAs in 2004 (PEGASOS; NAGRA, 2004) and 2013 (PEGASOS Refinement Project, PRP; swissnuclear 2013), as well as being covered by two European studies [European Seismic Hazard Model 2013, ESHM13 (Woessner et al. 2015) and European Seismic Hazard Model 2020, ESHM20 (Danciu et al. 2021, 2022)] and a national [Swiss Hazard Model, SUIhaz15 (Wiemer et al. 2015)] study. We note that the inclusion of these studies does not imply approval or confirmation of the models' validity. Regulators and other authorities have their own specific criteria for judging a model and approving or rejecting it for use. The various hazard studies, which are available in the public domain, are used here simply to illustrate the application of the described approaches. Only PGA hazard curves for the 5th, 16th, 50th (median), 84th and 95th fractiles, as well as the mean, were considered. The hazard curves for PGA from the final reports of PEGASOS and PRP were digitised, linearly interpolated in the logarithmic domain and then converted using the Poissonian assumption to probabilities of exceedance in 50 years. The hazard curves from the other three studies were obtained directly from the website of the European Facilities for Earthquake Hazard and Risk (EFEHR, Haslinger et al. 2022). Footnote 5 These data were already in terms of probabilities of exceedance for 50 years, but they were linearly interpolated in the logarithmic domain when required. The hazard results for ‘rock’ site conditions were considered; despite the differences in the reference rock conditions for Beznau, i.e. Vs30 ~ 1800 m/s for PRP/PEGASOS, 1100 m/s for SwissHaz15, and 800 m/s for both ESHM13 and ESHM20, no attempt was made to adjust these values to a uniform definition of ‘rock’. Any differences from this aspect are likely to be minimal [i.e. about 5% difference using Danciu and Fäh (2017)’s conversion factors for PGA] and unlikely to change the conclusions. Finally, because of their common use for engineering purposes, two return periods: 475 years (10% probability of exceedance in 50 years) and 2475 years (2% probability of exceedance in 50 years) are considered. We note that the two site-specific PSHAs of PEGASOS and PRP were focussed on return periods longer than 475 years, which could explain some of the differences observed for these studies and those of the national and European studies (ESHM13, ESHM20 and SUIhaz15).

All five studies find that Beznau is a low-to-moderate hazard site, with mean PGAs for a return period of 475 years between about 0.04 and 0.10 g and between about 0.09 g and 0.23 g for a return period of 2475 years. The hazard results for this site from the five models are summarised in Fig. 2. One striking observation in this comparison is that the PGAs from ESHM13 are roughly twice those from the other four studies, including from ESHM20, for both AFEs. Such a large difference in ground-motion levels from one study to the next would lead to intense discussions and could trigger detailed scrutiny of the new model to understand the reasons for the large differences. For ESHM13, the reason for the large difference is likely because the hazard at Beznau is being influenced by the ground-motion model for stable continental regions (Delavaud et al. 2012), which predicts higher PGAs than the ground-motion models in the other hazard models. Figure 2 also allows the proposals of Abrahamson (2017) to be assessed; applying these to, for example, the pair of PEGASOS and PRP estimates suggests that this change is robust as both the inequality is satisfied, and the old mean PGA is outside the 25th and 75th fractiles. On the other hazard, the change from SUIhaz15 to ESHM20 is not robust according to the same criteria. Based on a visual inspection of the spread of the hazard results, the above evaluation does appear sensible.

Fig. 2
figure 2

Hazard results for Beznau (Switzerland) from five studies for PGA and return periods of 475 (left) and 2475 years (right). Crosses correspond to 5th and 95th fractiles, upward triangles to 16th and 84th fractiles, vertical lines to the 25th and 75th fractiles, squares to the 50th (median) fractile and stars to the mean PGA. “Unc” is the measure of uncertainty of the hazard results computed by Douglas et al. (2014), σhaz is the standard deviation of the lognormal distribution fitted to the fractiles and σμ is the standard deviation of the ground-motion model estimated using the approach of Douglas (2018)

Although it could be argued that a lower threshold may be more applicable for the higher AFEs considered here, we have assumed McGuire (2012)’s threshold of ± 25% difference between AFEs at the ground-motion level corresponding to 10–4 (return period of 10,000 years) also apply for these AFEs. Considering the same pairs of hazard models cited above, and using the McGuire terminology, this criterion leads to the conclusion that PEGASOS and PRP are again significantly different, as the change in AFE is about 250% at the 475-year PGA of about 0.06 g, whereas the SUIhaz15 and ESHM20 are not significantly different, as the change in AFE is only about 10%.

Estimating the means and standard deviations of the PDFs from the hazard curves that are required by the approach of Malhotra (2014, 2015) leads to the following values of d (Cohen’s effect size): 0.17 for the change between PEGASOS and PRP, and 0.049 for the change between SUIhaz15 and ESHM20. Neither of these changes are, therefore, considered large according to this criterion (it is recalled that 0.8 is the threshold). The reason for the different conclusions for the pair PEGASOS and PRP compared with the other three proposals is because the standard deviations of the PDFs are large relative to the means. The only pairs where the change in hazard is considered large are those involving ESHM13, because the differences in the means of the PDFs are large relative to the size of the standard deviations.

3.2 Italy

Quite often, the proposal of a new NSHM triggers an intense debate on its validity and the differences with results from an existing model (e.g. Sabelli 2022, 2023). For this reason, in this section we compare PGAs for a ‘rock’ site from a new NSHM for Italy (MPS19; Meletti et al. 2021) with those from the model that is currently used for the building code (MPS04; Stucchi et al. 2011). MPS04 considers the larger horizontal acceleration component instead of the geometric mean of the two horizontal components used in MPS19. Hence, for a more coherent comparison between the two sets of hazard results, we multiply the PGA values of MPS19 by a constant factor of 1.15, which represents the average ratio between larger horizontal component and the geometric mean of the two horizontal components (Meletti and Marzocchi 2019). Here, we compare the seismic hazard for four representative cities that span most of the length of the Italian peninsula: Bologna (44.51500 N; 11.31746 E), Florence (43.76395 N; 11.27489 E), L’Aquila (42.33493 N; 13.42252 E) and Syracuse (37.06970 N; 15.30073 E).

In Table 1, we show the results obtained with the four tests described in the previous sections. Note that the McGuire (2012) test aims at comparing hazard results at very low AFEs, which are instead not calculated for the Italian NSHM. Therefore, here we apply this method considering return periods of 475 and 2475 years (10% and 2% probabilities of exceedance in 50 years); in particular, we compare the AFE of MPS19 corresponding to the PGA of MPS04 calculated at an AFE equal to 0.002 (return period of 475 years) and 0.0004 (return period of 2475 years). For what concerns the first method proposed by Abrahamson (2017), we consider the 16th and 84th fractiles, given that they are the most commonly available.

Table 1 Results from the application of the four evaluation criteria to seismic hazard estimates given by MPS04 (Stucchi et al. 2011) and MPS19 (Meletti et al. 2021) across four Italian cities

The results illustrate how each of the four criteria highlight different aspects of the variations in the hazard estimates. The McGuire (2012) method shows differences above the proposed threshold of 25% for almost all cases considered, whilst, on the contrary, the Malhotra (2014, 2015) distance threshold is never reached. However, as mentioned above, neither method accounts for epistemic uncertainty, given that they make use of the mean hazard curves alone. The two methods proposed by Abrahamson (2017) do account for epistemic uncertainty, albeit only of MPS19, neglecting that of MPS04; this means that, as discussed in Sect. 2.3, we are checking if the mean hazard of MPS04 falls within the uncertainty bounds of MPS19, but not vice-versa. Considering the first of these two methods, L’Aquila (475-return period only) and Syracuse are those cities where the mean hazard of MPS04 does fall outside the 1-sigma range of MPS19. The application of the second Abrahamson (2017) criterion, on the other hand, serves merely the purpose of highlighting, as again noted already in Sect. 2.3, the sites where there is an increase of MPS19 with respect to MPS04, which in this case occurs for Bologna.

3.3 Europe

We extend our analysis to include five locations in the Euro-Mediterranean region: Bucharest (44.400 N; 26.0821 E, Romania), Izmit (38.400 N;27.1821 E, Türkiye), Zagreb (45.800;15.9821, Croatia), Lisbon (38.700 N;-9.2178 E, Portugal), and Syracuse (37.100 N;15.2821 E, Italy), and compare ESHM13 (Woessner et al. 2015) and ESHM20 (Danciu et al. 2021). The selection of these five cities was made with the intention of considering instances where ESHM13 values exceed those of ESHM20, as well as, conversely, where ESHM20 values surpass ESHM13, together with scenarios where both models yield comparable results. Figure 3 shows a comparison of hazard curves for ESHM20 and ESHM13, including the mean and five fractiles of the former, alongside the latter’s mean hazard curves. It results evident that the ESHM13 mean values for Bucharest and Izmit are lower than the ESHM20 values, while in Lisbon and Syracuse, the ESHM13 values are higher. As for Zagreb, the mean hazard curves of both models appear similar.

Fig. 3
figure 3

Hazard curves for ESHM20 and ESHM13 in Bucharest, Izmit, Zagreb, Lisbon and Syracuse. Mean and five fractiles of ESHM20 and mean values for ESHM13, respectively

In Table 2, where the results obtained from the application of the four evaluation methods described before are given, the hazard estimates variation that can already be perceived in Fig. 3 emerge again; the differences between the two hazard models in Bucharest, Izmit, Lisbon, and Syracuse are clearly highlighted by all four criteria, whereas for Zagreb, the variations are negligible. The AFE percentage variation criterion indicates that for all cities except Zagreb the 25% threshold value proposed by McGuire (2012) is exceeded. The Cohen’s effect size [Malhotra (2014, 2015)], d, also confirms the non-negligible changes for all sites except Zagreb, with the difference, however, that the proposed 0.8 threshold value is reached only for Izmit. The out-of-range fractile criterion (Abrahamson 2017) of the ESHM13 PGA values with respect to ESHM20 is observed for Lisbon and Syracuse, for both return periods. Conversely, the second criterion proposed by Abrahamson (2017) is met for the cases of Bucharest and Izmit, given that these are the two sites for which an increase in hazard values has occurred. It is noted, for the interested reader, that a detailed discussion on the reasons behind these variations can be found in Danciu et al. (2021, 2022).

Table 2 Results from the application of the four evaluation criteria to seismic hazard estimates given by ESHM13 (Woessner et al. 2015) and ESHM20 (Danciu et al. 2021) across five European cities

4 Importance of hazard differences for risk reduction and management

Risk reduction or management solutions can include the implementation of seismic design measures in building codes, development of retrofitting campaigns, definition of post-disaster emergency plans, or the creation of financial instruments to transfer the risk from the public sector to the (re)insurance market. Nevertheless, methods to assess the impact of changes in seismic hazard on such applications have received limited attention in the open literature. Further, and unlike what has been described previously for the seismic hazard component, there are no methods to evaluate how and when a variation in a seismic risk model (or seismic risk parameter) is to be considered significantly different. This is partially due to the paucity of risk models in the public sector (in comparison with their hazard counterparts), as well as the fact that changes in the risk results can be caused not just by the seismic hazard, but also by variations in the exposure and vulnerability components. Herein we thus explore the impact of changes in hazard on various risk metrics, and from various viewpoints. We begin by illustrating ways in which the change in hazard might be important for the public and private sectors that are designing risk management applications. Then we look at the potential impact of changes in hazard on the application of building codes.

As previously stated by Bommer (2022), we emphasise that hazard results by themselves have little practical use as the most important earthquake impacts are generally caused by damage to buildings or infrastructure (risk) and not by the ground motion (hazard) itself. Therefore, when considering differences in hazard models it is necessary to understand the implications of such differences for end users of the hazard models. This is a similar situation to the choice of the minimum magnitude limit to use within PSHA being an engineering decision related to seismic risk rather than a seismological decision related to seismic hazard (Bommer and Crowley 2017).

Finally, we note that, as the focus of this work is on highlighting the differences between hazard results and the challenges these cause, only generic and relatively simple risk and engineering calculations are considered. If the differences between two specific seismic hazard models were of interest for a particular location, then case-specific calculations could be performed. We also clarify that we are not implying that the assumed building types and metrics considered in the following are appropriate for the location (Beznau) or the specific hazard studies considered, and also that these examples are for illustrative purposes only.

4.1 Impact of hazard changes on the design of risk management applications

The incorporation of risk results, such as average annualised losses, losses for specific return periods (Silva 2018) or the impact of specific events, on risk reduction measures within the public sector has been demonstrated in several studies. For example, Dolce (2012) described how funds of 1 billion euros were distributed throughout the Italian provinces in direct proportion to the average annual losses expected in each province. The earthquake risk was calculated considering the seismic hazard model supported by the National Institute of Geophysics and Volcanology (INGV), and naturally a different distribution of risk across the country would have been obtained if another seismic hazard model had been used. Such modelling options would directly affect the available funds for each province to support risk reduction measures. Risk metrics are also used for urban planning or to assess the needs for temporary shelters in case of destructive events (e.g., Erdik and Durukal 2008; Anhorn and Khazai 2014).

To explore these applications, we calculated the expected loss ratios for three return periods (250, 475 and 2475 years) for an unreinforced masonry building. We used the seismic hazard curves for Beznau (see Fig. 2) and the vulnerability model proposed by Martins and Silva (2020). We note that we did not consider uncertainty in the vulnerability model, and thus the hazard return period and loss ratio return period are the same. We purposely used a vulnerable building class as this type of construction is common in Europe (Crowley et al. 2020) and is of particular interest within the scope of risk reduction in the public sector. We note also that we considered the 250-year return period (which is not typically provided as an output of hazard studies), since loss estimates for more frequent return periods are often needed for risk management. The median loss ratios and 16/84th fractiles for the three return periods are presented in Fig. 4, along with a probability density function approximated by a Gaussian (normal) distribution in log space. As expected, the loss ratios follow the same trend observed for the hazard results presented in Fig. 2. However, we note that minor differences in the seismic hazard may lead to important differences in the risk. For example, the hazard proposed within the SUIhaz15 and ESHM20 projects for the 475-year return period differs by less than 5% (and indeed, according to the previously described variation evaluation methods, they can be deemed as not significantly different), but the loss ratios produced with these models for the same return period differ by 18%. If we consider the hazard models that produce the minimum and maximum seismic hazard (PRP and ESHM13, respectively), the maximum PGA for the 475 years is 2.8 times greater than the minimum PGA, while the corresponding maximum loss ratio is 19.6 times larger than the minimum loss ratio. To facilitate the comparison of the risk metrics across the five hazard models, we have included the median loss ratios for the three return periods in Table 3.

Fig. 4
figure 4

Loss ratio results for Beznau (Switzerland) from five studies for PGA and return periods of 250 (left), 475 (centre) and 2475 years (right), and an unreinforced masonry building with 3 storeys. Crosses correspond to the 16th fractile, circles represent the median, and upward triangle correspond to the 84th fractile. The dashed lines represent an approximation of the probability density functions for the loss ratios

Table 3 Median loss ratios for the two building classes (unreinforced masonry and moderate code reinforced concrete buildings) considering 5 hazard models and 3 return periods

In the (re)insurance industry, there is a high level of scrutiny between different versions of seismic risk models. Some changes might imply modifications in the design of risk transfer solutions, or the adjustment of existing products. For example, for the development of the Turkish Catastrophe Insurance Pool (TCIP), a probabilistic seismic risk model was developed for the country to understand the average annualised loss for each region, as well as the expected loss for specific return periods (Bommer et al. 2002). More recently, parametric insurance products (e.g., Goda, 2021) for ChileFootnote 6, Peru, Colombia and Mexico have been developed, which also rely on probabilistic seismic risk results to define the triggering thresholds. The criteria used by the insurance industry and regulators to decide whether a change in a seismic hazard model causes important changes in the resulting risk metrics are not publicly available, and usually depend on internal policies and the interests of the clients. From the experience of the authors working with partners from this sector, a change in the risk results of more than 10% requires a justification, while variations of more than 25% have resulted in the rejection of new risk models by national regulators.

To explore these applications, we have repeated the calculations presented previously for the unreinforced masonry building, but now considering a moderate code reinforced concrete building with 4 storeys (Martins and Silva 2020). The consideration of a different building class is prompted by two reasons: 1) vulnerable building classes are unlikely to be insured due to their high risk, and 2) it is important to understand whether the variations in the risk results are dependent on the building portfolio. The loss ratios for the three return periods are illustrated in Fig. 5 while the median loss ratios are presented in Table 3.

Fig. 5
figure 5

Loss ratio results for Beznau (Switzerland) from five studies for PGA and return periods of 250 (left), 475 (centre) and 2475 years (right), and a reinforced concrete structure with 4 storeys. Crosses correspond to the 16th fractile, circles represent the median, and upward triangle correspond to the 84th fractile. The dashed lines represent an approximation of the probability density functions of the distribution of loss ratios

Although these results seem to follow a similar trend to those shown previously in Fig. 4 for the vulnerable building class, a lower impact was observed for the reinforced concrete structure. For example, the risk results produced with the PRP and ESHM13 seismic hazard previously led to differences of a factor of 19.6, while in this case such factor reduces to 12.4 (for the 475-year return period). Nonetheless, for the two cases of almost identical seismic hazard (i.e., SUIhaz15 and ESHM20), the differences in the risk metrics are above 10% for all return periods. This means that while the previously discussed seismic hazard variation evaluation criteria would deem both models not significantly different, most likely the differences in the risk results would trigger further investigation by the catastrophe risk modelling and insurance industries.

4.2 Impact of hazard changes on the application of building codes

Previous studies have looked at the impact of changes in seismic hazard on the design of buildings. For example, Gkimprixis et al. (2021) designed a 4-storey 3-bay reinforced concrete building to different levels of design peak ground acceleration (PGA) and showed that an increase of the design PGA values by as much as 60% in areas of moderate-to-high seismic hazard (in Italy) only increased the initial cost of construction by around 2%. The change in life cycle costs (i.e., the combination of the cost of construction and losses due to damage over the life of the structure) between hazard models was seen to be slightly higher, up to 7%. The largest impact was instead seen on the risk (in terms of the average annual frequency of collapse and the average annual loss), with differences of more than two orders of magnitude, which supports the observations of the previous section that risk estimates are particularly sensitive to changes in the hazard model.

Given the limited impact that the change of hazard has been seen to have on the design and life cycle costs, in this section we focus on the impact that the change of hazard can have on existing structures, rather than new (yet to be built) structures. If changes in seismic hazard models are directly implemented as changes to seismic actions in seismic design/strengthening codes, this could lead to many existing structures suddenly becoming no longer ‘code compliant’ from a life-safety viewpoint and would require seismic retrofitting. Some of these buildings might also have only recently been retrofitted (which is likely to be the case in countries such as Italy thanks to SismaBonus,Footnote 7 the recently promoted national seismic strengthening strategy).

We propose a four-step method to evaluate whether a change in hazard could lead to a change in life-safety code-compliance, and we argue that if this occurs, such a change is important regardless of the actual amount by which the hazard has changed.

The steps of the method are as follows:

  1. 1.

    Design a given building class to modern design principles (e.g., Eurocode 8) for a number of different levels of PGA or spectral acceleration (depending on which seismic input parameter is prescribed by the design code being used), and produce collapse fragility functions for each design level.

  2. 2.

    For a given location, obtain the design seismic actions from a given hazard model (typically the mean PGA with a 10% probability of exceedance in 50 years, though it could be the median, if that is prescribed in the code, or it could refer to other return periods, and it could also even account for importance factors), and select the fragility function for this design level.

  3. 3.

    Calculate the average annual probability of collapse (AAPC) by convolving the mean hazard curvesFootnote 8 (from the design hazard model) with the collapse fragility function. Ensure that the target AAPC for code compliance (discussed further below) is met. It is noted that whilst typical seismic design procedures do not explicitly use AAPC as a design parameter, compliance with the latter is implicit in the calibration underlying the development of modern seismic design codes.

  4. 4.

    Repeat step 3 with the same fragility function, but with hazard curves from an alternative (revised) hazard model. Check if the target AAPC for code compliance is exceeded.

In the procedure outlined above, rather than use the mean hazard curve to compute the AAPC, the curve for a specific fractile might be used so that the AAPC with a desired degree of confidence is computed (see e.g. Abrahamson and Bommer, 2008; Kwong and Jaiswal 2023). This would effectively mean that for two locations with the same mean hazard, but with different levels of epistemic uncertainty, the AAPC with a given degree of confidence would be higher in the location associated with the higher epistemic uncertainty.

In recent years there has been an increase in studies that assess the fragility of buildings designed to modern seismic design principles for varying levels of seismic hazard, which provide useful input to the first step of the method. Examples include Gkimprixis et al. (2020) and Martins et al. (2018), who have produced fragility functions for mid-rise reinforced concrete structures designed to Eurocode 8 (CEN 2004), and Suzuki and Iervolino (2021), who present fragility functions for residential reinforced concrete and masonry buildings and industrial steel and precast concrete frames designed to the Italian Norme Tecniche delle Costruzioni (NTC) (MIT, 2018). Figure 6 shows the relationship between the parameters of the lognormal fragility functions (namely the median and dispersion, denoted as theta and beta herein) and the design peak ground acceleration obtained with the numerical models from Martins et al. (2018). It is noted that there is significant uncertainty in the relationship between design acceleration and collapse fragility, and this epistemic uncertainty can also be propagated into the computation of confidence intervals of the AAPC. Another aspect worth noting is that for low design accelerations the fragility is constant; this is because designing for gravity loads always confers to buildings a minimum of capacity against horizontal loading, which can exceed the levels implied by the design accelerations (e.g. Baltzopoulos et al. 2023).

Fig. 6
figure 6

Relationship between design PGA and the theta (PGA) and beta parameters of the lognormal fragility functions from Martins et al. (2018) for several 3 to 5 storey RC frames designed to EC8 site class A

The target AAPC for code compliance can be taken as 2 × 10–4 (which can also be expressed as a 1% probability of collapse in 50 years), following the recommendations introduced in an Informative Annex of the updated Eurocode 8 (CEN, 2024), as well as in the ASCE Standard 7–10 (ASCE 2010). For Beznau, the design hazard has been obtained using the mean PGA (475-year return period) from the five hazard models presented in Fig. 2. The AAPC was then computed with the mean hazard curves from each of these studies. The results are presented in Table 4, where for a given design hazard model (say PRP, shown in the first column), the AAPC assessed according to each hazard model is shown. These results show that the target AAPC is respected in all cases (i.e., is always less than 2 × 10–4), regardless of the combination of design hazard and revised hazard. Previously it was shown that the change in hazard from the PEGASOS to PRP was robust, but these result show that the increase in mean hazard levels from the PRP model to the PEGASOS model does not affect the code compliance of the building class considered herein. Hence, from an engineering perspective, this change would not be important for the existing building stock.

Table 4 Values of AAPC for mid-rise RC buildings in Beznau designed according to five different hazard models and assessed using the same or subsequently developed hazard models

Nevertheless, it is worth noting that even if life safety code-compliance is maintained, these results can give insight into the increase in the assessed risk due to a change in the hazard model. In the case of a PRP-designed structure which is then assessed with the ESHM20 model, the AAPC increases by a factor of 2. For the case of buildings designed to the PRP hazard and assessed with the ESHM13 code, the AAPC is seen to increase by almost an order of magnitude. This large impact of changing to the ESHM13 hazard is consistent with the findings on the significance of this hazard update, presented previously. On the other hand, for the buildings designed to the PEGASOS or SUIhaz15 hazard models, the AAPC would be almost unchanged if the hazard model were to be updated to the SUIhaz15 or ESHM20 models, respectively; this lack of importance in the change of hazard is aligned with the findings of Sect. 3.1.

The same exercise has been carried out using the European and Italian hazard curves presented in the previous section. In this case only two hazard models have been compared: ESHM13 and ESHM20 for Europe and MPS04 and MPS19, for Italy. The same example cities as before were considered in both cases, with the results presented in Table 5 and 6.

Table 5 Values of AAPC for mid-rise RC buildings designed according to two different European hazard models and assessed using the same or subsequently developed hazard models, for five example European cities
Table 6 Values of AAPC for mid-rise RC buildings designed according to two different Italian hazard models and assessed using the same or subsequently developed hazard models, for four example Italian cities

For the European hazard, there is only one city where the hazard change from ESHM13 to ESHM20 can be deemed important, shown in bold in the table, as the AAPC exceeds the acceptable threshold due to the change in hazard. This would imply that buildings designed in Izmit due to the ESHM13 hazard would no longer respect the code’s underlying acceptable risk according to the latest insights given by ESHM20. This finding is perhaps not surprising, given that the design hazard PGA increased by 84% from ESHM13 to ESHM20. In all other cases the AAPC remains below the threshold, even though the AAPC doubles in Bucharest. Another interesting insight from this table is the comparison of the AAPC for buildings that are both designed and assessed with each hazard model. In Zagreb, the slight decrease in design hazard (together with the changes across the hazard curve) from ESHM13 to ESHM20 leads to an increase in the implicit risk of buildings designed to code. In Syracuse, despite a 30% decrease in the design hazard PGA level from ESHM13 to ESHM20, the implicit risk remains fairly constant (4.41 × 10–5 vs. 4.18 × 10–5).

For Italy, the change in hazard from MPS04 to MPS19 would not be deemed important in any of the locations considered, as in all cases the change in hazard does not lead to an exceedance of the threshold AAPC. In Florence the AAPC increases three-fold, but still remains below the threshold. In Syracuse, despite a 35% reduction in the design PGA from MPS04 to MPS19, the implicit risk of buildings designed to each hazard model is seen to be very similar, which implies an increase in the hazard levels at other return periods that are influential on the AAPC. In Bologna the design PGA increases by around 20%, but the implicit risk is actually seen to be lower for buildings designed and assessed with MPS19. This is also the case in L’Aquila, where similar levels of design PGA are found in both hazard models, but the implicit risk of buildings designed and assessed with the later hazard model is lower.

5 Discussion

Changes in seismic hazard estimates for a location over time can have varying implications in different contexts. These implications are discussed in this section.

The development of risk management measures often relies on risk metrics such as return period losses or average annual losses. The evaluation of the impact of variations in the seismic hazard indicated that small changes on the seismic hazard can lead to important differences in risk metrics. These findings indicate that updating seismic hazard models may render insurance products inadequate and can affect the distribution of public funds for risk reduction.

Seismic building codes imply an ‘acceptable’ level of risk for individual buildings; e.g., in the updated Eurocode 8 (CEN, 2024) the annual probability of exceeding the near collapse limit state is 2 × 10–4. For the design and construction of new buildings, as shown above, changes in the hazard can affect the assessed annual probability of collapse (or other damage states). Therefore, if society does not want the population living in these buildings to be under a higher risk than previously thought (even if the risk still falls within ‘acceptable’ thresholds), the seismic actions in the building code would need to be adapted to reflect this change (higher seismic resistance in the case of higher hazard results and lower in the case of lower hazard). This change would then affect the life cycle costs (cost of construction and losses due to damage over the life of the structure) of the building, although if the changed hazard is considered at the design stage the effect on the life cycle costs would likely be minimal.

The change in the assessed risk to the users of the building or the life cycle costs that can be considered important is a question for society (government, regulators, builder’s owners and the general public). This is a difficult question as there is uncertainty in estimating this risk and, without employing risk-targeted seismic actions in the code (e.g., Luco et al. 2007), the risk will vary from one location to another. The risk may thus already exceed acceptable thresholds in some locations, especially those with higher levels of hazard (e.g., Silva et al. 2015; Iervolino and Pacifico 2021), which should therefore be taken into account even before considering the impact that a change to the seismic hazard could entail. The answer to this question would then help determine the importance of changes in the assessed hazard. As a means of maintaining stability in building codes and maintaining confidence of practicing engineers in the design loads, it may be advisable that reductions in levels of hazard, from one generation of models to another, are not implemented in the code for the design of new buildings but are ‘held in reserve’ for the future.

For an existing building, if the reassessed hazard leads to an indication that the annual probability of collapse (or other risk measure) has decreased, then there is nothing that needs to be done. In contrast, in the case where the assessed seismic hazard has increased from a previous generation then this may indicate that the building would need to be retrofitted to increase its capacity. If a structure has recently been constructed or retrofitted based on the previous hazard estimates, then the debate over whether additional capacity should be added could be intense. In the case of older unmodified buildings, in contrast, they may need improvements in any case due to the changes in design and construction standards from older building codes. Even without changes in the assessed ground shaking hazard, seismic building codes have evolved greatly in the past fifty years, and so what to do with buildings constructed to older codes is a common problem. Countries such as Italy have recently promoted efforts to strengthen older structures (e.g., Sismabonus). If a decision has already been made to retrofit a structure through these programmes, then any increase in the hazard would be relatively easy, and cost relatively little more, to address at the same time.

It is not our role to provide thresholds to decide whether a change to the assessed hazard is important or not but there are actions that the construction industry could take to become more resilient to changes in seismic hazard modelling. Some ideas are discussed in the following paragraphs.

Firstly, policies to mitigate the impact of changes to the assessment of hazard and structural capacity (which are also continually being updated in building codes) are needed. Examples of such measures might be an explicit code-prescription and enforcement of time windows that define how long a building can remain non-code-compliant (with the time as a function of the level of increase of risk, often proxied by a seismic capacity/demand ratio) or enforcing a more realistic (less conservative) assessment of the capacity of recently designed/retrofitted buildings. For example, linear methods to assess the capacity of structures are conservative, and a building may be perceived as less likely to be code compliant when its seismic capacity has been evaluated using a linear structural analysis method as opposed to a nonlinear method. In addition, there are also many non-structural elements that contribute to the strength of a building (e.g., infill panels) that, again for the sake of simplifying structural analyses, are often overlooked in seismic assessment, whilst they may lend additional seismic resistance to the building.

Secondly, the design ground motions imposed by the seismic building code could be made more robust to changes in the hazard model. For example, the epistemic uncertainty in the assessed hazard could be better accounted for by using a higher fractile of the ground motion than using the expected (mean) value. This was suggested by McGuire et al. (2005), who note that designing for a higher level than implied by the mean hazard could be more cost-effective as it would avoid the cost of future retrofits in case of a revised hazard assessment. Using a higher fractile would mean that as epistemic uncertainties theoretically could decrease with new generations of hazard models the design ground motions would stay stable (or even decrease). As an example, in the UK, it is common to use the 84th fractile of the response spectral accelerations for 10–4 AFE rather than the mean (or median) as a conservative estimate of the hazard for nuclear structures (ONR, 2022). A switch to using a higher fractile would, however, likely lead to large increases in the assessed ground motions within seismic building codes, which, although there would be the promise that they would decrease with time, would likely lead to difficulties in the short and medium term because they may imply a need for retrofitting, as some buildings would no longer be code-compliant.

Thirdly, if the acceptable levels of risk or life cycle costs were explicitly stated as part of the building code, then conversations over whether a structure needs to be retrofitted (and consequently the importance of changes in the assessed hazard) would be easier. Methodologies that allow structural engineers to explicitly consider risk/losses in the seismic design of new buildings (e.g. Gentile and Calvi 2023) and retrofitting of existing ones (e.g. Aljawhari et al. 2022; Gentile et al. 2021) continue to progress and improve, also in terms of their usability by practitioners. In addition, proposals on risk-based calibration of seismic codes have also continued to evolve (e.g. Bommer et al. 2005; Crowley et al. 2012, 2018), and the recent advances in the application of machine learning algorithms for speedier deployment of customised risk models (e.g. Gentile and Galasso 2022) will further facilitate such a potential shift. Nonetheless, and notwithstanding all these scientific developments, we acknowledge that considerable work and societal decisions are still required before acceptable levels of risk may be explicitly defined and prescribed in building codes.

Fourthly, inspiration could be sought from the Senior Seismic Hazard Analysis Committee (SSHAC) guidelines that are often used within nuclear-related seismic hazard assessment projects, as these guidelines explicitly refer to the need for longevity and stability in hazard results between different generations of hazard models (Budnitz et al. 1997). Recent SSHAC implementation documents have provided recommendations for when hazard studies may need updating (USNRC 2018).

Lastly, we recommend a clear separation between the development of hazard models and their implementation for engineering purposes (Jordan et al. 2014), whilst at the same time ensuring that there is constant assessment of the impact that evolving hazard models may have to different end users. The inputs to a hazard model should capture our scientific understanding and our uncertainty without being constrained by what implications these inputs could have on changes in the ground motions for a given AFE and fractile. The implementation of the results of the hazard model within the seismic building code could account for the epistemic uncertainty in the results (for example, by targeting a level of risk with a given level of confidence). Methods to assess the significance of the raw hazard results from different models will necessarily be different to those that assess the importance of differences in the final hazard results presented to end users (construction industry, infrastructure owners, insurance companies and general public). The large epistemic uncertainties captured in the raw hazard results mean that different models may be compatible, and the significance of differences are small, but when the hazard models are implemented for use by different communities there may be important implications of these differences.

6 Conclusions

This article discussed the vital topic of judging whether differences between results from different probabilistic seismic hazard assessments warrant further consideration and whether they are significant and/or important in different contexts. When it comes to comparing just the hazard values, the methods previously proposed are not adequate to evaluate formally the statistical significance of the difference between two models. Because seismic hazard can only be assessed to a relatively low level of precision due to the large epistemic uncertainties that are present in such assessments, even for well-studied regions, we think it is vital to consider these uncertainties when comparing hazard results. It is noteworthy that the same uncertainties may be of great interest when evaluating the uncertainties in risk assessment and for decision makers. We would like to emphasise that two models that are not statistically significantly different in terms of the hazard values alone may actually lead to great differences when used by engineers for design, analysts when assessing risk or insurance companies when computing premiums. Different terminologies amongst groups and the ways that differences between hazard estimates are assessed can lead to lively discussions and confusion. We, therefore, advocate for the development of structured approaches to the evaluation of evolving hazard models, clearly driven by the intended applications of the models; these approaches would include meticulous analysis of methods, reference to supporting materials, comprehension of the underlying assumptions and data used in constructing the models, awareness of the uncertainties and significance of input parameters, expert judgments, and the structure of computational models. Such a structured approach would require the documentation of seismic hazard models to be highly detailed, such as models developed using the SSHAC guidelines (Budnitz et al. 1997). Regulatory authorities have their own specific criteria to judge hazard models and these should continue to be followed. We also argue that regional, national and site-specific PSHAs continue to be updated as more data become available and science evolves, regardless of how these updates affect the hazard.

Regarding the importance of the differences in seismic hazard results, we conclude that there is no universal approach to decide whether these differences are important or not. The approach used and the criteria adopted are application-dependent. While the three risk examples provided in this study are certainly not exhaustive, we can provide the following recommendations for the cases that were previously described.

  1. 1.

    Development of risk management measures. We demonstrated that minor (and often not significant) differences in seismic hazard results can lead to large variations in the associated risk results, in particular when fragile building portfolios are considered, due to their sensitivity to changes in ground shaking for frequent return periods. In our opinion, whether the different hazard results are deemed important or not, should not be the remit of hazard modellers or even earthquake engineers. Instead, for this use case, we recommend that disaster risk managers, policy makers, and social scientists evaluate the impact on the assessed seismic losses, and decide whether these variations are likely to cause any change in their policies and measures. For example, if a new hazard model indicates higher expected seismic losses, but such impact is still considerably below the coping capacity of the country or region, then arguably such differences might be deemed not important. But again, such a conclusion cannot possibly be made by a hazard or risk analyst.

  2. 2.

    Creation of insurance products. We demonstrated that due to the typically lower vulnerability of assets that are likely to be insured, differences in the seismic hazard results might have a lower impact on the resulting losses. Nonetheless, such differences could still cause an impact in the premiums of insurance policies, or in the triggering criteria for CAT bonds. Due to the need to regulate the insurance market, there are clear regulations that dictate how changes in a catastrophe loss model are evaluated and justified, even if such guidelines might not be publicly available. For this use case, we recommend that differences in seismic hazard results are deemed important if they will cause a change in the insurance products or other risk transfer instruments, as defined by the rules of insurance and national regulators. In this case, insurance underwriters and data analysts must be involved in this evaluation process, and ultimately decide whether the differences in the new hazard results are important.

  3. 3.

    Design and retrofitting of structures. As previously discussed, several (modern) design regulations have indicated the level of safety to be respected for both the design of new structures and the retrofitting of existing ones. For this use case, the evaluation of the differences between hazard results is much less ambiguous and can be analytically determined by earthquake or structural engineers. We recommend that differences in seismic hazard results be deemed important if they lead to a decrease in the locally regulated level of seismic safety. For example, in Europe, if a new seismic hazard model is shown, through a non-conservative structural performance assessment method, to lead to an increase of the average annual probability of collapse above the acceptable risk prescribed in the Eurocode 8, then such differences would be classified as important, given that in lieu of this new information, existing buildings might be deemed to have insufficient seismic resistance.

In conclusion, there is no universal criterion for assessing differences between seismic hazard estimates; each has advantages and disadvantages. Although McGuire's threshold-based approach and Malhotra's effect size provide straightforward assessments, they may not fully capture the practical implications of differences in hazard estimates. Abrahamson's robustness criteria incorporate uncertainty via fractiles, which adds complexity yet may provide more insightful comparisons. The use of percentage differences in ground shaking values or AFE may provide a simpler computational approach. The method chosen must be appropriate for the specific context of the hazard assessment and the needs of end users. This overview does not advocate for a single best practice, but rather presents a variety of methodologies in the hope of inspiring and facilitating more nuanced and informed analyses in the future. We acknowledge that additional analyses are required in order to understand the implications of differences in hazard results for different applications. Our intention is to start a discussion around this topic and to encourage further work.