1 Introduction

The spatial resolution of exposure models for earthquake risk assessment is often dictated by a number of factors. Traditionally, such models feature a rather coarse resolution with assets most often aggregated at the centroids of polygons delineating administrative units. This choice typically follows the availability of exposure data, often sourced in aggregated form from national census surveys. While such high-level data can still be spatially disaggregated using suitable proxies (e.g. population distribution or night traffic lights), another concern that necessitates maintaining a relatively coarse resolution is the associated computational cost of risk calculations. Such compromises have been proven necessary when dealing with large and complex regional risk models (Crowley et al. 2021; Silva et al. 2020) despite improvements in processing power over the years.

The potential bias induced from exposure relocation and aggregation has attracted attention and has been sporadically investigated by different authors. Bazzurro and Park (2007) explored the effects of portfolio relocation and aggregation and found that it introduces artificial correlation, which leads to underestimation of losses at short return periods and overestimation at longer return periods. Average annual loss (AAL) estimates were found to be less sensitive. Bal et al. (2010) studied the impact of geographical resolution of exposure data on the loss ratio estimate for individual earthquake scenarios affecting the Sea of Marmara region in Turkey. Their analysis indicated that mean loss ratio estimates are not significantly biased even when working with a resolution corresponding to the district level, however they observed notably increased uncertainty around that estimate when using coarser resolutions. Over the years, this topic continued to attract attention as researchers (DeBock and Liel 2015; Gomez-Zapata et al. 2021; Kalakonas et al. 2020; Mistry and Lombardi 2023; Pittore et al. 2020; Scheingraber and Käser 2018, 2020; Fayjaloun et al. 2021) sought to understand the effects of location uncertainty and/or exposure aggregation, and devise strategies to minimize them. Perhaps the most comprehensive study on the effects of exposure aggregation to date comes from Dabbeek et al. (2021). The latter performed a large-scale sensitivity analysis in the context of the European Seismic Risk Model 2020, (ESRM20; Crowley et al. 2021) exploring the impact of exposure resolution and aggregation strategy, while looking at results at different scales.

Herein, we leverage upon a presumably complete geo-localized database of building objects in Switzerland, compiled by the Swiss Federal Office for the Environment (FOEN). We use this database as the benchmark building-by-building exposure model, along with a high-resolution site amplification model developed as part of the Earthquake Risk Model of Switzerland (ERM-CH23; Wiemer et al. 2023; Bergamo et al. 2023). We then proceed to study different aggregation choices and propose a novel procedure for model size reduction via exposure aggregation that aims to minimize any associated errors. In the following sections:

  1. 1.

    we discuss the drivers of error in such exposure aggregation operations and how different model objectives might be affected,

  2. 2.

    we investigate the effects of exposure aggregation over different spatial resolutions.

  3. 3.

    we propose a framework to minimize errors associated with incorrect assignment of site conditions due to aggregation,

  4. 4.

    we propose a framework to minimize errors associated with artificially induced correlation due to joint loss ratio realizations as a result of aggregation.

2 Sources of error and implications for model design

Exposure aggregation might introduce error in different ways (Bal et al. 2010; Bazzurro and Park 2007). For instance, the relocation of assets might move them further away or closer to seismogenic sources capable of inducing significant or frequent losses. This type of error can become important when very coarse resolutions are used for exposure aggregation, and seismicity rates change rapidly in space (e.g. when fault sources are included in the seismogenic source model). Larger errors can occur when assets are placed in locations with different site amplification conditions. The latter can vary drastically, even within short distances, and affect significantly the levels of ground motion intensity to be experienced by the relocated assets.

For large exposure datasets or buildings portfolios, errors of both those two types are often averaged out. Since assets are arbitrarily relocated further or closer to seismogenic sources and mapped to better or worse site conditions, the associated loss estimates are sometimes overestimated and other times underestimated. When the asset losses are aggregated to total exposure losses, positive and negative errors will generally cancel each other out. Of course, for this scenario to play out, the number of new locations (after aggregation) must be sufficiently large, and the total exposure value must not be overly concentrated at a handful of locations that would dominate the total loss estimates. It should be noted here that even when the overall number of assets being assessed is large enough for aggregation errors to average out, it is often the case that the analyst is also interested in risk estimates for subsets of the exposure or portfolio. For instance, such subsets could refer to administrative units (e.g. boroughs, postal codes, municipalities, regions, etc.), building types, or lines of business. Obtaining results at all these levels is relatively common practice and attention should be paid to ensure that the number of assets at each subset is sufficient for convergence.

Besides any bias induced by the relocation of assets, a secondary artefact of typical aggregation practices is the introduction of an artificial correlation on the input ground motion and sampled loss of the assets being aggregated together. In other words, as a result of the placement of all assets within the chosen aggregation unit at a single location, they all are assumed to experience the same ground motion level. Moreover, assets of the same structural type at the same location are also often merged into a so-called macro-asset or macro-structure (Bazzurro and Park 2007), by treating them as a single asset with replacement cost equal to the sum of the replacement costs of the individual assets. In that case, the loss ratio given ground motion is implicitly taken as equal for all individual assets comprising the macro-asset. These two approximations, generally, do not affect average loss estimates (e.g. AAL), but distort the loss distribution, reducing loss estimates associated with high probabilities (or low return periods) and increasing those associated with low probabilities (or long return periods; Bazzurro and Park 2007; Dabbeek et al. 2021).

3 Case study description

As a case study for the sensitivity checks that will be presented later on, we use a subset of the logic tree that makes up the Earthquake Risk Model of Switzerland (ERM-CH23; Papadopoulos et al. 2023; Wiemer et al. 2023) and focus our analysis on two Swiss cantons, namely Zurich and Valais. ERM-CH23 is described in detail in Wiemer at al. (2023) and Papadopoulos et al. (2023; preprint), hence the reader is referred to these publications for further information. Although not important for the conclusions of this study, for full transparency, we note that here: (1) we use only the logic tree branches of ERM-CH23 that use spectral acceleration as intensity measure; (2) we consider only the exposure logic tree branch that is based on a random forest-based assignment of structural typologies to each asset; and (3) the ground motion logic tree is also limited to four regionally calibrated (Edwards et al. 2016) ground motion models (Cauzzi and Faccioli 2008; Chiou and Youngs 2008; Edwards and Fäh 2013; Wiemer et al., 2016), two for the alpine tectonic regime (OpenQuake aliases: EdwardsFah2013Foreland50Bars and CauzziFaccioli2008SWISS08) and two for the foreland tectonic regime (OpenQuake aliases: EdwardsFah2013Alpine75Bars and ChiouYoungs2008SWISS01). These reductions are meant to facilitate the investigations in this paper, and while they impact the risk estimates, they are not expected to affect the conclusions vis-à-vis the relative effect of exposure aggregation.

Figure 1 shows the spatial distribution of building objects in the two cantons under consideration, as obtained from the building database used as the basis of ERM-CH23. This database, compiled by the Swiss Federal Office for the Environment (FOEN), is presumably complete and contains over 2.25 million geo-localized building entries. Canton Zurich is more densely built with 150 buildings per km2, as opposed to 28 buildings per km2 for canton Valais. If the building stock were to be aggregated on a regular grid, we can compute the mean and coefficient of variation of replacement cost per grid cell (Fig. 1). Exposure aggregation on grids with a spatial resolution of 5 km × 5 km, 2 km × 2 km, 1 km × 1 km and 0.5 km × 0.5 km results in 189, 743, 2014, 5131 non-empty grid cells respectively for canton Valais. Similarly, for canton Zurich, the number of grid cells containing buildings is 94. 484, 1743, 5515 for the four alternative grids. We note (Fig. 1) that the coefficient of variation of replacement cost among grid cells containing buildings is generally higher in Valais, reflecting the fact that the exposure is more spatially heterogeneous (with respect to value). To investigate the effects of exposure aggregation, a first analysis is run with a non-aggregated building-by-building exposure that will serve as a benchmark. The results of this analysis are contrasted with results obtained using different exposure aggregation approaches, as detailed in the sections to follow. The probabilistic risk assessment is carried out in the OpenQuake engine (Pagani et al. 2014) and errors (with respect to the benchmark case) are computed for quantities such as the average annual loss (AAL) and losses at specific return periods.

Fig. 1
figure 1

Spatial distribution of exposure within the considered cantons. The table on the bottom right reports the number of buildings and the building density in each of the two cantons. It also reports the mean (in B CHF) and coefficient of variation (CoV) of replacement cost, respectively, among grid cells (of different sizes: 5 km × 5 km, 2 km × 2 km, 1 km × 1 km, 0.5 km × 0.5 km) that contain buildings

4 Simple exposure aggregation strategies

To avoid confusion, we will henceforth use the term exposure aggregation to refer to asset relocation and/or aggregation into single macro-assets, as described earlier. The exposure aggregation takes place as a pre-processing step before the risk analysis. This is not to be confused with loss aggregation, with which we will refer to the summation of losses pertaining to assets that belong to specific subsets. This has the objective of computing loss estimates for said subsets, e.g. loss aggregation by municipality or by line of business.

Firstly, we conduct the exposure aggregation as follows:

  1. 1.

    a regular grid with a certain spatial resolution is defined;

  2. 2.

    all assets located within a grid cell are relocated to its centroid;

  3. 3.

    all relocated assets at the centroid belonging to each specific structural typology (and if required loss aggregation levelFootnote 1) are merged into single macro-assets. This is done by summing up their replacement costs and treating them as a single asset;

  4. 4.

    the site conditions at the cell centroid are defined according to one of three options described below.

Note that in this study we use a regular grid for exposure aggregation, nevertheless any polygon layer (such as administrative units) or irregular mesh can be used for this purpose. We explore four alternative spatial grid resolutions, i.e. 5 km × 5 km, 2 km × 2 km, 1 km × 1 km, 0.5 km × 0.5 km. The site conditions for ERM-CH23 involve three values for each of two different intensity measures (spectral accelerations at 0.3 s and 0.6 s), i.e. the site amplification factor AF, the site-to-site variability (φs2s), and the single-station variability (φss). These values have been computed at high resolution and are available in the form of raster maps (Bergamo et al. 2023). We explore three alternative definitions of site conditions for the cell centroid as per point (4) above:

  1. 1.

    the site conditions are extracted at the grid cell centroid location from the raster maps.

  2. 2.

    the site conditions are extracted at the original locations of all buildings in the grid cell from the raster maps. Then, the average values of AF, φs2s, and φss are assigned as site conditions for the centroid.

  3. 3.

    the site conditions are extracted at the original locations of all buildings in the grid cell from the raster maps. Then, the modal values (i.e. the most frequently occurring values) of AF, φs2s, and φss are assigned as site conditions for the centroid.

For each case, we compute errors for each municipality in the two cantons, as (AALAALref) / AALref, where AALref denotes the reference AAL value obtained from analysis with the building-by-building exposure. Figure 2 shows the empirical cumulative distribution functions (ECDFs) derived from the samples of individual municipality AAL errors. It is no surprise that a higher resolution exposure achieves lower errors. When a 0.5 km × 0.5 km grid resolution is chosen, municipality AAL errors are for the most part lower than 30% (and usually lower than 15%), regardless of the choice of approximation for the site conditions. On the other hand, if a 5 km × 5 km resolution is used, the errors can be quite large and go even beyond 100%. In the bottom right panel of Fig. 2, the resolution is fixed to 2 km × 2 km, and the three different site condition choices are compared. Interestingly, when the site conditions are averaged (mean) the errors appear to be the smallest in absolute terms, but strongly biased towards negative values (i.e. underestimation of AAL). As seen in the bottom right panel, the AAL errors are negative in ∼ 90% of municipalities when mean site parameters are used (2 km × 2 km resolution). The choice of modal site condition values seems to provide a more balanced result, i.e. lower errors compared to the centroid approach, but also less bias towards negative values compared to the mean approach.

Fig. 2
figure 2

ECDFs of Zurich and Valais municipality AAL errors. The bottom right panel refers to an exposure aggregation with a 2 km × 2 km resolution and compares the different site condition definitions. The other three panels refer to the site conditions definition indicated in the plot and compare different aggregation resolutions

5 Exposure aggregation with K-means clustering of site condition parameters

As shown in the previous section, exposure aggregation at dense spatial resolutions, like 0.5 km × 0.5 km, seems to reduce errors to acceptable, yet still not negligible levels. At the same time, the denser the resolution, the smaller is the gain in computational speed. To improve upon this, we first propose a strategy to reduce the bias associated with the misassignment of site condition parameters. To this end, we carry out a K-means (MacQueen 1967) clustering analysis on the site condition variables at the building locations within each grid cell. An example is shown in Fig. 3, and it works as follows:

  1. 1.

    The six parameters to be clustered are the amplification factors, site-to-site and single-station standard deviations for SA(0.3s) and SA(0.6s). A number of clusters Nc to be identified is first selected by the user. Then, the K-means clustering algorithm is applied on the sample of site parameters at the locations of all buildings within each cell, to identify the Nc clusters.

  2. 2.

    The buildings at locations belonging to an individual cluster are grouped together. The Nc individual building clusters are then relocated to distinct adjacent locations near the cell centroid (a few meters apart, just to allow assigning different site parameters as input). If the aggregation grid is too coarse, or if the exposure is being aggregated within larger admin unit polygons, the center of mass of the exposure could be used instead of the cell centroid (Dabbeek et al. 2021). For the spatial resolutions considered here, such an approach was not deemed advantageous given that the gradient of rock hazard is not particularly steep.

  3. 3.

    The associated K-means cluster centroid site parameter values are assigned to the locations of each building cluster.

  4. 4.

    For each cluster, all buildings of a given structural typology are merged into one macro-asset, by summing up their replacement costs (as in the previous section).

Fig. 3
figure 3

K-means clustering approach for minimizing errors pertaining to mismatch of site conditions. Left: example of a 2 × 2 km grid cell; the black dots represent buildings whereas the color scale indicates the variation of each of the six site parameters from higher (blue) to lower (red) values. Right: Visualization of K-means clusters for the example grid cell. The colored dots represent five cluster centroids

This procedure minimizes (depending on the number of selected clusters) the distance between the site conditions at the original building locations and the site conditions assigned after aggregation. This strategy works regardless of the size of the grid cell (although larger grid cells might feature a larger range of site conditions, hence requiring more clusters). This enables the use of significantly coarser resolutions, since unlike the site conditions that change drastically within short distances, the gradient of rock hazard is typically smoother. Therefore, a relocation of assets by a couple of kilometers typically has minor impacts on the hazard values and risk estimates. In the following, we will refer to this type of exposure aggregation as KMi, where i describes the number of clusters used.

6 Adjustment of loss ratio uncertainty

Besides the error in site conditions, as discussed previously, exposure aggregation also induces an artificial correlation of the loss ratio across buildings as a result of the common ground motion input in each grid cell and the common loss ratio (given ground motion) simulation for each macro-asset. Here, we also take action to mitigate the second of these two effects, by reducing the uncertainty in the vulnerability functions to account for the fact that they refer to macro-assets rather than single buildings. Vulnerability functions provide the loss ratio (LR) as a function of the ground motion intensity measure (IM). Usually, for a given IM level k, the loss ratio is provided as distribution f(LR|IM = k), e.g. as illustrated in Fig. 4. This distribution refers to the loss ratio experienced by a single building. In the case of a macro-asset, its loss ratio consists of the summation of the individual asset losses (each of which is a random variable), divided by the summation of the individual replacement costs, as given by Eq. (1):

$$ L{R}_{m, IM=k}= \frac{{\sum }_{i}^{n}{L}_{i}|IM=k}{{\sum }_{i}^{n}{RC}_{i}}$$
(1)

where, LRm refers to the loss ratio of the macro-asset, Li to the loss experienced by asset i, and RCi to the replacement cost of asset i. If losses (given intensity k) are perfectly correlated, then the original single building loss ratio distribution can be maintained. Otherwise, the loss ratio variance of a macro-asset comprised of n individual assets (all belonging to the same building class, therefore featuring identically distributed loss ratios given intensity), is given as:

$$ Var\left(L{R}_{m}\right|IM=k)= Var\left(\frac{{\sum }_{i}^{n}{L}_{i}|IM=k}{{\sum }_{i}^{n}{RC}_{i}}\right)=\frac{1}{{\left({\sum }_{i}^{n}{RC}_{i}\right)}^{2}}\cdot Var\left({\sum }_{i}^{n}{L}_{i}|IM=k\right)$$
(2)

The variance of the summation of the n individual assets given intensity can be calculated as:

$$ Var\left({\sum }_{i}^{n}{L}_{i}|IM=k\right)={\sum }_{i}^{n}Var\left({L}_{i}|IM=k\right)+2\cdot {\sum }_{i<j}Cov\left({L}_{i}|IM= k, {L}_{j}|IM=k\right)$$
(3)

If we assume that the losses given intensity of the n assets are uncorrelated, then the second term of Eq. (3) becomes zero. Further assuming that the n assets, with identical loss ratio distributions, also have equal replacement costs RCi = RC*, translates to identical loss distributions f(Li|IM = k) = f(L*|IM = k). We can then rewrite Eq. (2) as follows:

$$ Var\left(L{R}_{m}|IM=k\right)=\frac{n\cdot Var\left({L}^{*}|IM=k\right)}{{\left(n\cdot {RC}^{*}\right)}^{2}}=\frac{1}{n}\cdot \frac{Var\left({L}^{*}|IM=k\right)}{{RC}^{*2}}=\frac{1}{n}\cdot Var\left(L{R}^{*}|IM=k\right)$$
(4)

In reality, some correlation between assets of the same type is expected to be present. This can be easily accommodated within this framework, e.g. through Monte Carlo simulation. Herein, we assume no correlation, given that we want to compare to the benchmark building-by-building case where we have not introduced any such correlation. Moreover, instead of assuming equal replacement costs, one can easily define a distribution of the relative replacement costs within a grid cell. However, since this is likely to highly vary spatially, the equal replacement cost approximation is chosen here for simplicity.

The macro-asset vulnerability curves can then be defined with the updated variance around their mean loss ratio definition if a parametric uncertainty model is used (e.g. beta or lognormal distribution). If a non-parametric model is desired, the probability mass of loss ratios can be obtained through Monte Carlo simulations. In that case, m macro-asset loss simulations are carried out, in each of which losses are simulated for all n assets comprising the macro-asset and then summed up to get the total macro-asset loss. By dividing with the total replacement cost each time, one ends up with m samples of the macro-asset loss ratio, which can be used to build a non-parametric probability mass function.

For the case study explored here, we estimated vulnerability curves for macro-assets comprised of 1, 4, 20 and 85 buildings. The latter were assigned to macro-assets of 1, 2–9, 10–39, and ≥ 40 buildings in the exposure. Figure 4 illustrates the proposed reduction of the variability around the macro-asset loss ratio, using an example building typology (M5_L of the ERM-CH23 taxonomy, i.e. low-rise unreinforced brick masonry). We will henceforth refer to this adjustment strategy as aggv.

Fig. 4
figure 4

Left: schematic illustration of the proposed adjustment; Middle: vulnerability curves for macro-assets comprised of 1, 4, 20, and 85 assets of the M5_L typology. The mean vulnerability curve remains unaltered, while the uncertainty decreases with increasing number of assets being combined; Right: coefficient of variation of the loss ratio given intensity level for each of the four macro-asset cases

7 Results and discussion

Figure 5 below shows municipality AAL errors obtained with the best performing (in terms of precision) case among the simple exposure aggregation strategies described earlier, i.e. the 0.5 km × 0.5 km aggregation using modal site parameters. These errors are contrasted with three other choices, all on a much coarser 2 km × 2 km grid: (a) KM5 + aggv, (b) KM5, and (c) KM3 + aggv. One can see that the proposed strategies drastically reduce the AAL errors. The aggv adjustment does not appear to have much of an effect on AAL errors (but it does on loss exceedance curves as will be shown later on), while choosing five K-means clusters over three just leads to some minor improvement. In Fig. 6, the AAL errors are also presented spatially. The 2 km × 2 km - KM5aggv case results in only four municipalities with AAL errors > 10% (and all of them < 20%), as opposed to the simpler aggregation approach, where locally errors can exceed even 100%.

Fig. 5
figure 5

Left: histogram of Zurich and Valais municipality AAL errors under two different aggregation strategies; Right: ECDFs of Zurich and Valais municipality AAL errors, similar to Fig. 2, for four different aggregation strategies

Fig. 6
figure 6

Choropleth maps of municipality AAL errors [%] for two aggregation strategies; top: 0.5 km × 0.5 km– modal, bottom: 2 km × 2 km - KM5aggv

The above analysis highlights the superiority of the proposed exposure aggregation strategy when looking at AAL results at the municipality level. Of course, the errors are expected to be smaller at larger scales for all cases. Figure 7 compares AAL and loss exceedance curve errors at the cantonal level (for our two testbeds of canton Zurich and canton Valais). A first observation is that the AAL errors (inset panels) are relatively small in all cases. This is in line with expectations and previous studies (e.g. Bazzurro and Park 2007; Dabbeek et al. 2021). That said, a notable observation is the bias seen when mean site parameters are used. This is in agreement with our previous finding that the municipality AAL errors were biased towards negative errors when mean site amplification was assigned (the negative municipality errors compound to introduce bias in the cantonal AAL, as opposed to the other cases where the non-biased errors tend to average out). For what concerns the loss exceedance curves, the use of mean site parameters seems to lead to significant bias, whereas the other three cases tested seem to be within acceptable precision. Note that the large errors seen at the shortest return periods (10–50 years for Switzerland) are not concerning, as they refer to tiny loss estimates (where small absolute differences manifest as large deviations in percentage). They are still presented here for transparency and in order to demonstrate the trend.

Fig. 7
figure 7

AAL and loss exceedance curve errors for canton Zurich (left) and canton Valais (right) for different exposure aggregation strategies

At smaller scales, such as at the municipality level in our case study, it has been known how aggregation distorts the shape of loss exceedance curves through the unintended introduction of correlation on input ground motion and loss ratio realization between assets. In Fig. 8, we report the obtained loss errors by return period. The black dots in the boxplots denote the errors of individual municipalities. In the top left panel, which refers to the 0.5 km × 0.5 km modal site parameters aggregation case, we observe the expected trend of underestimation at short return periods that transiently moves towards overestimation at longer return periods. The latter is not seen in Fig. 7 or Fig. 8. This is partly because of the site condition bias (as discussed earlier, mostly in the mean site parameters case; some remaining bias also in the other cases) that shifts the curves towards the left. And it is also partly because of the limited range of return periods computed and shown here. Had we computed losses for return periods above 2000 years, the trend towards larger (in absolute terms) errors would be expected to continue. Similar trends were found by Dabbeek et al. (2021). Moving to the 2 km × 2 km– KM5 aggregation case on the top right plot, we see an improvement on the central tendency of error, due to the correction of the site mismatch bias. However, the errors remain large and the spread around the central tendency remains significant, allowing for errors even beyond 100% for a few municipalities. When we also apply the aggv adjustment (bottom panel), we see a further significant improvement, both for the central tendency and the spread around it. A rather disheartening observation is that despite the notable improvements achieved with the proposed exposure aggregation strategies, and the rather dense– compared to most studies − 2 km × 2 km aggregation resolution (with 5 cluster points within each cell) used, significant errors remain in the municipality loss exceedance curves.

Fig. 8
figure 8

Loss errors [%] by return period. The black dots in the boxplots denote the errors of individual municipalities. Comparison between 0.5 km × 0.5 km modal site parameters aggregation (top left), 2 km × 2 km KM5 aggregation (top right), and 2 km × 2 km KM5 + aggv aggregation

However, at this point it is important to note a few important caveats. Firstly, we have not considered spatial correlation of the intra-event ground motion residuals in our analysis (since this was computationally implausible for the building-by-building analysis). Accounting for that correlation would have a similar effect on the loss distribution (i.e. reduction at short return periods, increase at long return periods) to what we see as a result of the aggregation. Although the magnitude of the effect that ground motion spatial correlation would have on the results in comparison to the effect of exposure aggregation is unclear, this suggests that the actual error is expected to be smaller. In other words, we expect that not modelling the ground motion spatial correlation when employing an aggregated exposure should counteract, to some extent, the effects of the correlation artificially added through exposure aggregation. This follows the conclusions of Bazzurro and Park (2007).

Secondly, our last analysis looked at rather small administrative units (municipalities). Some of the latter (often the ones where the largest errors are seen) number only a few hundred buildings (about ¼ of the municipalities analyzed have fewer than 500 buildings, the smallest one numbering only 12 buildings). At that scale, a regional risk model is anyways likely to have many more biases than the ones studied here, and, thus, should be used with extreme caution. For instance, the building typology of each individual asset in the exposure is usually unknown and is assigned following regional statistics. Likewise, the vulnerability curves associated with each building typology are meant to represent the mean behavior of the building class, with individual buildings likely to significantly diverge from it. If the typology statistics and building class vulnerability curves are unbiased representations of the portfolio, these issues are mitigated when the portfolios analyzed are sufficiently large, and the biases associated with each individual asset are aggregated together. However, for limited portfolios or exposure datasets (such as small municipalities), one is likely to reach the limits of capability and credibility of most seismic risk models.

8 Computational considerations

Being that the sole objective of exposure aggregation is to ensure tractability of the analysis, we lastly report on the computational gains (in terms of runtime and output size) achieved through the various strategies explored herein. We emphasize that these gains should be viewed as indicative only, since they would depend on a multitude of factors, such as the specifications of the computational infrastructure, the software and underlying methodology used to run the analysis, various analysis configuration settings (e.g. number of simulations, datasets to be saved in the output, etc.), the size of the exposure, and even the model itself. The analyses presented in this article were carried out with the OpenQuake engine v14 on a 256-cores (AMD EPYC 7742)– 2 TB DDR4-RAM (3200 Mhz) server. In each case, we measured the time required to complete the analysis, as well as the size of the hdf5 datastore file created by OpenQuake. The event-based risk analysis on the aforementioned server using the building-by-building exposure took 8.7 h for canton Zurich and 4.4 h for canton Valais. The generated hdf5 files for the two cantons occupied 29.5 and 28.9 GB respectively. These demands highlight the need to seek performance optimizations, especially considering that the analyses presented in this study were carried out using only a small subset of the overall ERM-CH23 logic tree. Table 1 below shows the overall gains for each aggregation case, using the benchmark building-by-building analysis as the reference. The 2 km × 2 km KM5aggv case that was eventually selected and used for ERM-CH23 allows a 78- and 32-fold runtime reduction in the analyses performed for the cantons of Zurich and Valais, respectively. Similarly, the output size was reduced by 45 and 31 times, respectively. The larger improvements for canton Zurich are a result of its higher building density, which makes aggregation more effective in reducing the size of the exposure. Although, as stressed earlier, the gains reported in Table 1 are not necessarily generalizable for every calculation, the huge improvement in terms of computational load is evident.

Table 1 Computational gains achieved through different aggregation strategies

9 Conclusions

This paper discusses the effects of exposure aggregation in regional earthquake risk analyses and explores strategies for their mitigation. Taking advantage of an extensive geo-localized database of building objects, assembled by the Swiss Federal Office for the Environment (FOEN), we compared risk estimates obtained using different aggregation approaches to the building-by-building benchmark. In line with previous studies, we observed significant errors in terms of AAL at small scales (municipalities), which are largely averaged out when looking at larger exposure datasets (cantons). We further looked at different ways of assigning site condition parameters to the aggregated buildings and observed that mean parameter values led to lower absolute errors per municipality, but that these errors were skewed towards negative values, leading to a larger underestimation of AAL at the cantonal level. Modal site parameters led to less skewing of AAL errors and provided arguably more balanced results.

We then proposed two adjustments for further optimizing exposure aggregation. The first one involves the identification of site parameter clusters within the spatial aggregation geometry (square grid cells in this article) and the assignment of the cluster centroid parameters to macro-assets assigned to each cluster. The second one reduces the loss ratio given intensity level variance Var(LR|IM = k) of the vulnerability curves, depending on the number of buildings being aggregated at each location. We showed that the former adjustment corrects the bias associated with the misassignment of site conditions. On the other hand, the second adjustment reduces the distortion of loss exceedance curves that happens as a result of aggregation.

Although not explored here, further optimization of the exposure aggregation can be foreseen, e.g. by aggregating over meshes different to the regular grid we opted for in this study. For instance, an irregular triangular mesh that is denser in areas where building density is higher and coarser elsewhere could be advantageous. Such a strategy would limit the number of locations to be analyzed in sparsely built areas that have a smaller contribution to the overall loss. Of course, it would also decrease precision at local scales in these areas, so the modeler would need to make a decision on a case basis depending on the intended application. Similarly, the aggregation mesh could take into account the spatial pattern of site parameters (e.g. amplification factors or Vs30 values), so that zones of similar site amplification can be aggregated together. The latter might be somewhat redundant if the K-means clustering approach proposed here is used, although a combination of the two could perhaps achieve even better results (e.g. if site conditions are relatively uniform within a cell, fewer clusters would be needed to achieve the same results).

While computational capabilities keep increasing, so does model complexity and the interest in exploring more uncertainties. Therefore, model simplifications, such as exposure aggregation, that can decrease computational demands by orders of magnitude, while minimizing the trade-off with precision will continue to be crucial. The methodology and insights presented here should provide guidance to risk modelers seeking to improve the tractability of their models.