1 Introduction

The seismicity in Iceland is related to the Mid-Atlantic plate boundary which roughly bisects the country from South to North. Within Iceland, the boundary shifts eastwards in the south and back westwards in the north through two complex fracture zones. The one in the south is called the South Iceland Seismic Zone and is in the middle of the largest agricultural region in the country. The other in the north is called the Tjörnes Fracture Zone (Einarsson 1991). The largest earthquakes in the country have occurred within these zones, and are mostly associated with a strike-slip motion at shallow depth (5–10 km). They are commonly in the surface magnitude (Ms) range of 6.0–7.0. Larger earthquakes hardly exist due to the fault mechanism, crust strength and crust thickness. This conclusion is supported by damage surveys of historical earthquakes and geological evidence. Descriptions of the effect and consequences of historical destructive earthquakes date back to the eleventh century AD, and can be found in old manuscripts and documents (Halldórsson et al. 2013; Vogfjörð et al. 2013). It should be notified that all the destructive historical earthquakes in Iceland (pre1900) that have been assigned a magnitude value are referring to the surface magnitude scale although the uncertainty in those values is large (Halldórsson et al. 2013). It is only the newer earthquakes that have been assigned a moment magnitude value which is a more common scale to use today.

In the South Iceland Seismic Zone, earthquakes tend to occur in sequences. One such sequence occurred in 1784 when two earthquakes of magnitude MS7.1 and 6.7 struck in 2 days with fault distances approximately 25 km apart. In 1896, five earthquakes of magnitude MS6.9, 6.7, 6.0, 6.5 and 6.0 struck in 2 weeks with 50 km distance between the most western and most eastern fault (average fault distance 12.5 km) (Sigbjörnsson and Rupakhety 2014). More recently, two (Mw6.5 and 6.5) earthquakes with 15 km fault-to-fault distance struck within 4 days of each other during June 2000.

Despite Iceland’s high seismic hazard, which is comparable to the hazard experienced in Southern Europe, its population has been historically small and the inhabited areas sparse. As a result, the consequences of major earthquakes on the island have been low on the international scale, and have received little attention. In recent times, villages have been established in locations that used to be rural, and a greater number of structures and infrastructure have been built, creating a larger exposure to losses from natural hazards. Since 2000, three earthquakes of Mw greater than 6.0 have occurred in the country. All three events caused significant damage and raised the need to develop suitable mitigation strategies.

The development of such strategies requires a better understanding of the seismic vulnerability of buildings in Iceland. In this study, the Global Earthquake Model (GEM) project (Rossetto et al. 2014) definition of vulnerability is adopted according to which seismic vulnerability represents the expected losses (i.e., economic loss, casualties, injuries or downtime) suffered by the building inventory for a given event. Seismic vulnerability can be assessed either directly by the use of post-earthquake loss data (Maqsood et al. 2016; Thráinsson and Sigbjörnsson 1994; Wesson et al. 2004) or indirectly by coupling the fragility of the building inventory with damage-to-loss functions (Eleftheriadou and Karabinis 2011; Jara et al. 1992; Sun Yang et al. 1989). In this study, the seismic vulnerability of the building inventory in South Iceland is directly assessed from economic loss data collected in the aftermath of the sequence of the June 2000 earthquakes. This empirical method is preferred in the literature as it is based on the actual performance of buildings instead of simulations (Rossetto and Elnashai 2003; Silva et al. 2014) or expert judgement (ATC-13 1985; Halldórsson et al. 2013; Jaiswal et al. 2013; Vogfjörð et al. 2013). Nonetheless, the reliability of this method depends on the quality and quantity of the data. Existing studies using post-earthquake data are commonly found (Pitilakis et al. 2014) to use large databases which focus on the damaged buildings, under-representing the undamaged buildings introducing a bias.

Deviating from this trend, house insurance against natural catastrophes is compulsory in Iceland. This means that, due to insurance obligations, the repair cost for each damaged building is recorded after each destructive earthquake. Therefore, this study uses a unique database which is both large and complete. The database includes information regarding the repair and replacement cost, structural characteristics and location of every single building in the area affected by two successive 2000 earthquakes in South Iceland (see Fig. 1). The data are obtained by interlacing a complete official property database and a loss database based on insurance claims. In addition to the loss data, valuable ground motion intensity data were also recorded from 20 permanent strong ground motion accelerometers distributed in the affected region, with source-to-station distance less than 50 km (Thorarinsson et al. 2002). The stations are located in main villages and at selected farms, bridge sites and power plants (see Fig. 1). These ground motion intensity data have been used to evaluate area-specific Ground Motion Prediction Equations (GMPE) (Ólafsson and Sigbjörnsson 2004; Rupakhety and Sigbjörnsson 2009).

Fig. 1
figure 1

Map of South Iceland showing the two June 2000 South Iceland earthquakes are shown and all residential buildings within the dotted red grid cell which are used in the study. It also shows the locations of the strong motion accelerometer stations. The names on the map show the locations of villages with Selfoss being the biggest one. The map is based on data from the National Land Survey of Iceland

The vulnerability surveys were carried out in the aftermath of the second event, and hence the areas affected by both earthquakes reflect losses accrued over the two events. Earthquake sequences, where two or more major earthquakes of similar size strike in short period range are well-known in many earthquake prone areas, and, similar to here, damage or loss data for the events is only recorded after the sequence. This bias notwithstanding, such loss data has been used in the past to derive empirical fragility and vulnerability functions. In these cases, two alternative approaches are observed in the literature. In the first case, the area which has been affected by both events is identified using recorded losses and field observations, and the data points in that area are used for the vulnerability analysis (Bessason and Bjarnason 2016). An alternative approach links the damage or loss at any location with the maximum intensity measure value from the sequence of events. The latter method can result in combination of loss data from regions affected by significant ground motions from the sequence of events with loss data from regions affected predominantly from a single event.

The present study develops a novel methodology based on beta regression modelling, which relies on distinguishing the regions affected by two events from the regions affected by a single event and develops vulnerability curves for each area. The proposed methodology is applied on the high quality Icelandic database in order to assess the vulnerability of low-rise residential buildings in the South Iceland Seismic Zone. Low rise buildings dominate the building stock in the area. In what follows, the ground motion intensity at the affected regions is presented followed by a description of the database and the application of the methodology.

2 The two June 2000 South Iceland earthquakes

In Fig. 1, the epicentre and fault locations of the two earthquakes that affected South Iceland are depicted. The first earthquake (Mw6.5), struck on June 17, 2000, 15:41, (GMT). The earthquake was a right-lateral strike-slip earthquake, with fault striking in the north–south direction and an approximate focal depth of 6.3 km. Subsurface fault mapping based on the micro-earthquakes showed an approximately 12.5 km-long and 10 km deep vertical fault rupture (N7°A) (Vogfjörð et al. 2013). The highest recorded Peak Ground Acceleration (PGA) was 0.64 g, 5.7 km from the fault. In the following, this earthquake is termed Eq. 1.

The second earthquake (Mw6.5) struck on June 21, 2000, at 00:52, (GMT). This earthquake was also a right-lateral strike-slip earthquake, with the fault striking in the north–south direction (N1°A) and with an approximate focal depth of 5.3 km. Subsurface fault mapping based on the micro-earthquakes showed an approximately 16.5 km-long and 7–9 km deep almost vertical fault rupture (Vogfjörð et al. 2013). The highest recorded PGA was 0.84 g at a bridge site 3.1 km distance from the fault. Time histories and response spectra from these events can be found in the ISESD database (Ambraseys et al. 2002). This earthquake is termed, here, Eq. 2.

In addition to these two earthquakes, after-shocks were felt in the area. The largest one was ML5.0 while all the others were of magnitude less than ML4.5. These are, however, ignored in the present study.

In vulnerability assessment, it is necessary to use an intensity measure type that has good correlation with the observed damage. In the analytical vulnerability literature, most recent studies consider intensity measures based on spectral acceleration or spectral displacement at representative structural periods to be the most effective intensity measures for vulnerability assessment (D’ Ayala and Meslem 2013). Indeed such measures have also been used in the empirical vulnerability literature. However, unless a spectral intensity measure is evaluated at a specific structural period value (e.g. T = 0.2 s), the use of spectral values in empirical vulnerability studies introduces an as-yet un-modelled epistemic uncertainty, i.e. in the estimation of the structural period of the affected buildings. In the current study, this is further complicated by the fact that the natural period of the structures could be changing between the two earthquakes, i.e. the fundamental period of vibration of buildings when the second earthquake hits may have changed due to its sustaining damage in the first earthquake.

Hence, in the current study, as the affected buildings are low-rise, stiff and with low natural periods the ground motion intensity is expressed in terms of the PGA which is representative for the short period part of a response spectrum. Since instrumental recordings are only available for few sites, it is necessary to use a ground motion prediction equation (GMPE) to estimate the PGA levels at each location. Very few GMPEs are based on Icelandic strong motion data. From these, the GMPE of Rupakhety and Sigbjörnsson (2009) is chosen, which used the ground motions records from both events of interest to construct the equation, in the form:

$$\log_{10} (PGA) = \, - 1. 0 3 8+ 0. 3 8 7\cdot {\text{M}}_{\text{w}} - 1. 1 5 9\cdot { \log }_{ 1 0} \left( {\sqrt {{\text{H}}^{ 2} + 2.6^{2} } } \right)\,\, + 0.123 \cdot S\,\, + \varepsilon \cdot 0.287\,\,\left( {{\text{m/s}}^{ 2} } \right)$$

where H is the distance to surface trace of the fault in km, S is a site factor which takes the value 0 for rock sites and 1 for stiff soil sites. The last term is an error/scatter term where ε follows a standard normal distribution with mean equal to zero and standard deviation equal to 1, i.e. ε ~ N(0,1). Following common practice, the PGA level at a given location is estimated as the median PGA from Eq. (1) ignoring the error term. The adopted GMPE is based on using both the horizontal peak components from each station. Most of the strong motion recordings used in constructing Eq. (1) are from Icelandic earthquakes but the database has been augmented by records from continental Europe and the Middle East (Rupakhety and Sigbjörnsson 2009). The main characteristic of the GMPE expressed by Eq. (1) is that it predicts a relatively high PGA in the near fault area whilst the attenuation with distance is more than generally found in well-known GMPE of similar form. This higher attenuation with distance in Iceland compared to other seismic regions has been explained by the existence of young, fissured and low quality rock in the seismic source area that damp the propagating seismic waves faster than in more solid rock (Ólafsson 2013; Sigbjörnsson et al. 2009).

The soil conditions at the building locations are not provided in either the property database nor the loss database. However, in construction sites in Iceland, the presence of a thin organic soil layer at the top, typically1–3 m, is expected. It is a standard practice today to excavate and remove this soil and to either build directly on rock or to place the foundation on a 1–2 m thick compacted gravel fill. In older buildings, the organic soil was not always removed and therefore tilting and sagging can be a problem. Nevertheless, in some areas of South Iceland, (mainly those closest to the coast), there are thick alluvial sediments of stiff bonded sand and gravel sites. In these sites, some level of soil interaction/amplification is expected. The geological map of South Iceland, which depicts locations and areas of sediments, is used, here, to determine the soil conditions at each building site. Soil investigations have shown that the stiffness of the latter sites commonly corresponds to soil class B in Eurocode 8 (Bessason and Erlingsson 2011). Other buildings not located in these sediment areas are assumed to be founded on bedrock (i.e., soil class A in Eurocode 8) although in some cases local soil investigation would possibly result in other soil conditions.

3 Description of the loss database

3.1 Property database and building typologies

All properties in Iceland are registered in an official database: Registers Iceland(Icelandic Property Registers), which contains detailed information regarding the real estate. Such information includes their municipality, land code, property code, street address, postcode, geographical coordinates, use of property, construction year, number of storeys, floor area, main building material, valuation for taxation, and valuation for fire insurance (reconstruction value). In the database, these information are presented in terms of dwellings, and it is noted that a residential building can have one or more dwellings(Icelandic Property Registers).

Table 1 shows that approximately 85% of all dwellings in Iceland are found in reinforced concrete (RC) buildings, and that 54% of all dwellings are located in low rise (1–3 storeys) RC buildings. Although the official database does not include information on structural load-bearing systems, it can be stated that the great majority of residential buildings in Iceland have structural walls for resisting lateral seismic forces (Bessason et al. 2012). This is true for RC and timber buildings, as well as masonry buildings. Typically, RC buildings are cast in place and timber and masonry buildings are built on site. The masonry buildings are built of unreinforced manufactured hollow pumice blocks in walls and tied together with rigid RC floors. Figure 2 shows examples of typical Icelandic low-rise RC buildings. Moment resisting frames with or without infill walls are rare.

Table 1 Classification of dwellings in Iceland in both urban and rural areas based on data from Icelandic Property Registers (Bessason et al. 2014)
Fig. 2
figure 2

Examples of how buildings are defined in the study: a townhouse with five street adresses is classified as five buildings. b Apartment building (block) with three street addreses and three staircases each with six dwellings is classified as three buildings in the study

Nonetheless, the distribution in classes of dwellings in the affected area in South Iceland is different from the picture depicted in Table 1. The area consists mainly of agricultural land with many farms and few small villages and service centres. The vast majority of residential buildings are low-rise single-family buildings, but there are also two-family buildings, town-houses and apartment buildings (blocks). Buildings higher than 3 storeys are rare. Timber buildings are also more frequent than the percentage indicated in Table 1. The present study is restricted to these low-rise residential buildings, which represent the overwhelming majority of the buildings in the studied area.

The post-earthquake database includes losses in individual dwellings instead of buildings. In apartment buildings, the loss due to global structural damage is assigned to each dwelling, by contrast, the loss due to non-structural damage is estimated separately for each dwelling. In this study, the total loss of both non-structural and structural damage suffered by each building is of interest. For this reason, the structural and non-structural loss of dwellings with identical street addresses are aggregated to express the total loss suffered by a single building. Figure 2a shows an example of a townhouse with five street addresses which is defined as five buildings in this study. Figure 2b shows an apartment building with three street addresses and three staircases, where each staircase has six dwellings. In the current study, such a structure is classified as three buildings.

In earlier vulnerability studies (Bessason and Bjarnason 2016; Bessason et al. 2012, 2014) for South Iceland, buildings in the area affected by the 2008 Ölfus Earthquake (Mw6.3) have been classified into five building typologies based on their construction material and their seismic-design code. The RC buildings were classified according to whether they were designed and built either before 1980 (Pre1980) when the seismic design code was introduced in Iceland or after implementation of this code (Post1980). Similarly, two sub-classes: Pre1980 and Post1980, were considered for the timber buildings. Finally, only one class was defined for masonry buildings, which were built mainly before 1980.Overall, Pre1980 low-rise buildings were found to perform reasonably well in strong seismic events despite the lack of seismic design. Nonetheless, the new seismic code proposed a minimum reinforcement of structural walls, which increases the overall strength of the buildings built after 1980, and coincided with two other improving changes in building techniques that are unrelated to seismic design but which have an effect on buildings’ seismic response. Firstly, the concrete strength was increased in RC structures to improve their weathering resistance. Secondly, the finish of foundations was improved for concrete and timber buildings.

Figure 3 depicts the distribution of the inventory in the affected area according to their construction material, their construction age and their number of stories; features described in the literature as influencing the seismic performance of buildings in South Iceland. It can be noted that the majority of the buildings have been built before 1980. Most buildings are made of reinforced concrete (RC), and one-storey buildings dominate the database. The largest subclass of buildings is one-storey RC Pre1980 buildings. It should be mentioned that the oldest building present in the dataset was built in 1875 and in total there are only 23 buildings built before 1900.

Fig. 3
figure 3

Summary of the buildings in the affected area according to their number of stories, age and construction material

3.2 Loss data and sub-classification

Natural catastrophe insurance of buildings is mandatory in Iceland and is administrated by the Iceland Catastrophe Insurance (ICI). Therefore, after catastrophic events like large earthquakes, the repair and replacement cost for every damaged building is determined by trained assessors in order to settle the individual insurance claims. For the studied earthquake sequence, the loss in each dwelling was estimated in the aftermath of the second event. The main steps were as follows:

  1. 1.

    A property owner reported damage to their local insurance company, which informed ICI;

  2. 2.

    Assessors, working in pairs, prepared for the assessment work by familiarizing themselves with technical drawings and other related information about the damaged property;

  3. 3.

    Assessors performed a first inspection of the property, documenting all building damage, marking them on technical drawings and taking photos;

  4. 4.

    Assessors prepared a damage assessment report. The reports included a description of the damage and a cost estimate for the repairs.

The loss data were classified in five sub-categories of structural and non-structural damage as depicted in Table 2. It is important to keep in mind that the non-structural damage includes damage to all fixtures, as well as technical systems (plumbing, electrical installations etc.) but does not include damage of loose household equipment like furniture, TVs, computes, etc. Although, previous studies (Bessason and Bjarnason 2016; Bessason et al. 2014) have shown that non-structural loss dominates the overall loss, this study focuses on the total repair cost (i.e., the aggregated loss in all five sub-categories) normalised with the replacement value taken from the official property database. The latter determines the loss ratio of a building, also widely known as damage factor (DF):

$$Loss = \frac{{{\text{Estimated}}\,{\text{total}}\,{\text{repair}}\,{\text{cost}}}}{{{\text{Replacement}}\,{\text{value}}}}$$
Table 2 Sub-categories of damage used in the survey after the two June 2000 South-Iceland earthquakes

The loss cannot be greater than 1 (Loss = 100%) and in practice the expression “total damage” was assigned to residential buildings that suffered an estimated repair cost of more than 70% of their replacement value. In these cases, full replacement cost is paid to the owner. In this study, 100% loss was used whenever the owners received the full replacement cost despite the actual repair cost was estimated as lower. The derived vulnerability curves therefore incorporate local policy for insurance pay-out.

According to ICI, the replacement value reported in the database is the same as the fire insurance value of a building, and is the depreciated replacement value plus the cost of removing the destroyed building. The depreciation is based on age, building material and general condition. On the other hand, the repair cost (loss) is in general not depreciated.

Table 3 presents an overview of the data by providing the number of: undamaged, damaged and totally damaged buildings after the June 2000 earthquake sequence aggregated in bins of the maximum median PGA level from the two events (maxPGA) for the five building classes defined in Sect. 3.2. Overall, there are 4754 buildings in the affected area (maxPGA ≥ 0.05 g), 793 of which have been damaged. Although no residential buildings collapsed, 30 buildings were replaced (Bessason et al. 2012) due to suffering extreme damage (more than 70% loss ratio). A detailed description of the main failure modes of each building class in the studied sequence of events is beyond the scope of this study and the reader is invited to read more in other published studies (Halldórsson et al. 2013; Vogfjörð et al. 2013). It can be noted that the buildings within each acceleration bin are unevenly distributed, with the majority of data being in the lower bins, (which is a common observation in post-earthquake databases (Rossetto et al. 2013)). It can also be noted that the proportion of damaged buildings increases with the increase in the maxPGA levels.

Table 3 Classification of the loss data after the two June 2000 South-Iceland earthquakes

3.3 Variability in loss data

Having established an overall increase in the number of damaged residential buildings with increasing ground motion intensity, the scatter of loss in the affected residential buildings is explored next. In Fig. 4, the scatter of the loss suffered by individual buildings is depicted. In agreement with the observations in Table 3, most residential buildings were affected by low maxPGA levels and suffered no or low loss. In Fig. 4, clusters of data can be seen, corresponding to the villages, such as Selfoss, Hella and Hvolsvöllur, as shown in Fig. 1.

Fig. 4
figure 4

Scatter plot of undamaged and damaged buildings with losses varying in the interval (0,100%). UD means undamaged (no loss) buildings, D means damaged buildings and TD means total damaged buildings (total loss)

The scatter appears to be significant for both low as well as larger maxPGA levels. Even within the same village, the losses appear to range from no loss to extreme damage. Similarly, for large maxPGA levels, the loss values also range from no loss to total loss. It should be noted that variability in the loss of individual buildings is too large to lead to reliable vulnerability curves for predicting future losses, and it is not useful for helping to identify trends in the data. Although this has been observed in loss data from other earthquakes, the sources of this variability is not well understood and should be the focus of future research. For the needs of this study, the large variability in the data is taken into account by developing a novel methodology based on data aggregation as presented in what follows.

4 Proposed vulnerability assessment methodology for sequence of earthquakes

The aim of the empirical vulnerability assessment (Rossetto et al. 2013) is to construct a statistical model capable to predict seismic losses given the most important explanatory variables which typically include the intensity measure and various structural characteristics of the buildings stock. The complexity of the examined sequence of events also required the investigation of whether the loss in the areas affected by the sequence of events is significantly higher than in the areas affected by a single event. The main challenge faced in this study was shown to be the high variability of the available loss building-by-building data (see Fig. 4). This challenge is addressed by developing the novel four-step framework depicted in Fig. 5, which allows for the aggregation of the data in such a way that can identify trends in the loss data for the various explanatory variables. The proposed framework leads to the construction of seismic vulnerability curves for specific geographical areas with a certain distribution of residential buildings according to their structural characteristics, deviating from existing studies (Maqsood et al. 2016) which assessed the vulnerability of specific building classes. Central in the proposed framework is an iterative approach, whereby adaptive meshing is used to separate the area affected by the earthquakes into grid cells. The aggregation approach is based on two main assumptions which are explained in greater detail in what follows. For each grid cell, the data are aggregated and then the ground motion intensity level which characterizes each cell is determined along with the distribution of each building class. Moreover, it is assessed whether the cell have been overall affected by only one or both earthquake events.

Fig. 5
figure 5

Framework of the proposed seismic vulnerability assessment methodology for sequence of earthquakes

Having identified the optimum data aggregation assumptions, an exploratory analysis follows which aims to identify important explanatory variables which are included in subsequent statistical models. The models are fitted to the data and goodness of fit tests are employed to identify the model which best assesses the seismic vulnerability of an area affected by a sequence of events. The sensitivity of the results to the aggregation assumptions can be explored by repeating these steps for different data aggregation assumptions. In what follows, the proposed methodology is applied to the Icelandic database and in “Appendix” the results of alternative assumption can be found.

4.1 Loss data aggregation

The data aggregation is based on their location and relies on an adaptive spatial grid constructed as follows. The maxPGA from the two events is estimated on an equi-spaced, dense grid of coordinates (1 km × 1 km) within the affected area (i.e., 80 km in East West × 100 km in South North direction). It should be noted that the median PGA for each event and grid coordinates is estimated from the GMPE expressed by Eq. (1) accounting for the soil conditions obtained from the relevant geological map of Iceland (Jóhannesson et al. 1982) ignoring the error in the GMPE. The spatial grid is then constructed by defining a partitioning of the available area into non-overlapping grid cells, which include a unique sample of the aforementioned grid coordinates. For each grid cell of this spatial grid, the standard deviation of the maxPGA estimates for the unique sample of coordinates does not exceed a pre-specified threshold. The construction of the partitioning is done recursively by partitioning the available area into 4 equi-area grid cells, and examining whether the observations in each grid cell satisfy the standard deviation threshold (\(threshold_{stdev}\)). If the standard deviation of the maxPGA is below the preselected threshold, the grid cells that are not further partitioned into 4 equi-area sub-grid cells. The process is repeated until all grid cells have maxPGA estimates with standard deviation below the threshold.

The threshold is selected arbitrarily. In this study, three thresholds (i.e., 0.05, 0.10 and 0.15 g) were selected. Table 4 shows the total number of grid cells obtained for the three thresholds. Overall, the smaller the \(threshold_{stdev}\) is, the smaller the sample of buildings in each grid cell. For example, for the smallest 0.05 g threshold, the largest number of grid cells (No = 177) is obtained, associated with the largest percentage (i.e., 89%) of grid cells with small building samples (i.e., ≤ 20 buildings) and 31% of the grid cells have a single building. The proportion of grid cells with small samples reduces with the increase in the threshold to 0.1 and 0.15 g but remains above 50%. A dramatic decrease of the grid cells with a single residential building to below 10% is instead achieved by increasing the threshold. For this study, the \(threshold_{stdev}\) is selected equal to 0.10 g, as it provides a large sample size of grid cells with an adequate building aggregation at each grid cell. Figure 6 depicts the partitioned area for the selected \(threshold_{stdev}\). The grid cells appear to be smaller in size for the areas close to the two faults and larger in the areas further away from the faults. In “Appendix”, a more thorough discussion on the differences obtained when using alternative thresholds is presented.

Table 4 Number of grid cells obtained from the partition of the affected area for the three standard deviation thresholds
Fig. 6
figure 6

Map of the affected area with the locations of the residential buildings, the minPGA contours of the 17th and 21st June 2000 seismic events and the adaptive grid cells for the three assumptions regarding the level of the maxPGA standard deviation 0.10 g, assuming that the minPGA threshold is 0.10 g

4.2 Exploratory analysis

Having partitioned the affected area, values regarding the loss as well as a number of potentially statistically significant explanatory variables (see Table 5) were determined from the aggregated data in each grid cells. In this study, the loss is estimated as the ratio of the sum of the repair cost of the residential buildings located in each grid cell over the sum of their replacement cost:

$$loss_{j} = \frac{{\sum\nolimits_{i = 1}^{{N_{j} }} {repair\,cost_{i} } }}{{\sum\nolimits_{i = 1}^{{N_{j} }} {replacement\,cost_{i} } }}$$

where Nj is the total number of buildings in a given grid cell j.

Table 5 Variables used in the empirical vulnerability assessment

In Table 5, both continuous and categorical explanatory variables can be noted. The continuous variables include the maxPGA for each grid cell as well as the area of each grid cell (in km2). The maxPGA for each grid cell has been estimated as the maximum of the median PGA levels of the two events for the buildings in the examined grid cell. The median PGA levels have been estimated by the GMPE expressed by Eq. (1) which accounts for the soil conditions ignoring the error term. The categorical variables include the variable Event, which expresses whether a grid square has been affected by a single event or both events of the June 2000 sequence, and the building characteristic (termed ‘Class’). This later categorical variable was defined in three different ways i.e. according to the main construction material (Mat), construction age (Age), or a combination of construction material and age (Mat-Age).

The potential importance of the explanatory variables in the prediction of the loss in a grid square is examined next. One of the key questions in this study is whether an area of cumulative damage from both events can be identified and separated from the area affected only by a single event. The area is expected to include at least part of the area between the two faults. Its size and exact boundaries are determined in this study as follows. The miminum PGA (minPGA) level from the two June earthquakes is estimated for each building. The value is compared to a pre-specified minPGA threshold (\(threshold_{\hbox{min} PGA}\)). If the minPGA level for a given building is above the threshold, the building is considered to have been affected by both events. Otherwise, the building is considered to have been affected by a single event, without distinguishing whether it is the 17th June or the 21st June earthquake.

The proportion of buildings in a given grid cell affected by a single or both events determines whether that grid cell should be classified as affected by a single event or both events. The selection of the proportion is based on judgement. For the needs of this study it is considered that if over 50% of the buildings in a grid cell have been affected by both events then the grid square is classified as affected by both events. The \(threshold_{\hbox{min} PGA}\) is also an arbitrary value. For the needs of this study, three values: 0.05, 0.10 and 0.15 g are considered. Table 6 depicts the number of grid cells affected by single and both events for the three \(threshold_{\hbox{min} PGA}\) as well as the three \(threshold_{stdev}\). Essentially, for the smallest \(threshold_{\hbox{min} PGA}\) (i.e., 0.05 g), most of the area (i.e., 88–96% of the grid cells) has been affected by both events. For the second threshold, 40–60% of the grid cells have been affected by both events. For the third threshold, most grid cells (≥ 79%) have been affected by a single event and the small number of grid cells affected by both events.

Table 6 Number of grid cells obtained from the partition of the affected area for the three standard deviation thresholds and minPGA threshold

This study identifies the case 0.10 g_0.10 g (i.e., the minPGA threshold selected equal to 0.10 g and the standard deviation threshold also equal to 0.10 g) as optimum as it provides a reasonable sample size of grid cells with less than 10% of grid cells having a single building. Furthermore, the minPGA threshold equal to 0.10 g is considered to result in a large enough intensity to expect cumulative damage from two events. In Fig. 6, it can be noted that the area affected by both events based on \(threshold_{\hbox{min} PGA}\) equal to 0.10 g is large and includes the faults from both events. It should be noted that the vulnerability curves based on the four alternative threshold combinations, highlighted (in bold) in Table 6, are also explored in the “Appendix”.

In the vulnerability and fragility literature, the statistical models fit to the data typically linearly relate the natural logarithm of the ground motion intensity with the response variable (i.e., either loss or probability of damage) through a link function. In this study, the link function is expressed in terms of the logit function (see Sect. 4.3.1 for an analytical description of the model and its components). For the needs of the explanatory analysis, the logit of the total loss suffered by the grid cells is plotted again the maxPGA separately for the grid cells affected by a single or both events in Fig. 7. The beta distribution is not defined for loss equal to zero or 1. For this reason, the no loss data values have been given the very small value of 10−5 and complete loss data were given value equal to 0.99. It can be noted that the loss increases with the increase in maxPGA levels highlighting the importance of the selected intensity measure level in predicting the seismic loss. Despite the considerable uncertainty, the loss in the grid cells assumed to be affected by both events appears on average to be higher than the loss suffered by the grid cells affected by a single event. This confirms the presence of an area of cumulative loss from the two seismic events on the 17th June and 21st June 2000.

Fig. 7
figure 7

The logit of the losses for each grid cell are plotted against the ln(maxPGA) for the areas initially assumed to have been affected by a single event and both events for case 0.10 g_0.10 g

The importance of the area (in km2) of each grid cell in the prediction of the loss is also examined in Fig. 8. Specifically, the logit of the total loss suffered by each grid cell is plotted against the area of the cell. Despite the substantial uncertainty in the loss, an overall reduction in the expected loss with increasing area can be observed. This is not surprising given that the larger sized grid cells (see Fig. 6) are associated with lower maxPGA values.

Fig. 8
figure 8

The logit of the losses for each grid cell are plotted against the surface of the Area in km2 considered to have been affected by a single event and both events for case 0.10 g_0.10 g

Overall, the aforementioned discussion shows that at least for the examined case, all four explanatory variables ‘maxPGA’, ‘Event’ and ‘Area’ are potentially statistically significant. For this reason they should be added to the statistical model, which is constructed in what follows. Furthermore, the importance of considering the building characteristics in empirical fragility and vulnerability assessment of natural disasters is well-documented (Rossetto et al. 2013). The available database includes information regarding the construction material and age of the building inventory in the affected area. It is not, however, clear whether both characteristics are important in predicting the loss from the available aggregated data. For this reason, three different classification schemes of the building inventory are used here. According to the first scheme, the buildings in the affected area are sub-divided into 5 classes based on their construction material and age (‘Mat-Age’) as depicted in Table 5. Secondly, the buildings in each grid cell are classified into three classes according to their material (‘Mat’) as shown in Table 5. Finally, the buildings in each grid cell are classified into two classes (‘Age’), based on their age, as a proxy for seismic design level (see Table 5).

4.3 Selection of statistical model

Having identified the potentially important explanatory variables, multiple statistical models are constructed in this section. Their goodness of fit at the post-earthquake data is also assessed in order to identify the best-fitted model for case 0.10 g_0.10 g.

4.3.1 Statistical model

The seismic vulnerability of a grid cell is determined by a statistical model that expresses the loss in terms of ‘maxPGA’, ‘Event’ and ‘Class’. Given that loss is a continuous variable bounded in the unit interval of (0,1), it is assumed that loss follows a beta distribution, which has probability density function, expected value and variance, respectively:

$$\begin{array}{*{20}l} {f\left( {l;\mu ,\varphi } \right) = \frac{\varGamma \left( \varphi \right)}{{\varGamma \left( {\mu \varphi } \right)\varGamma \left( {1 - \mu } \right)\varphi }}l^{\mu \varphi - 1} \left( {1 - l} \right)^{{\left( {1 - \mu } \right)\varphi - 1}} } \hfill & {0 < l < 1} \hfill \\ {E\left[ {L;\mu ,\varphi } \right] = \mu } \hfill & {0 < \mu < 1} \hfill \\ {\text{var} \left[ {L;\mu ,\varphi } \right] = \frac{{\mu \left( {1 - \mu } \right)}}{1 + \varphi }} \hfill & {\varphi > 0} \hfill \\ \end{array}$$

In Eq. (4), μ is the mean value and φ is the precision. A beta regression model links μ and possibly φ with a systematic component that is a function of a vector of explanatory variables.

The mean value, μ, is related to the explanatory variables through a link function, g1(.):

$$\mu = g_{1}^{ - 1} \left( {\eta_{1} } \right)$$

where η1 is a function of the explanatory variables. In this study, the logit link function is adopted:

$$g_{1}^{{}} \left( \mu \right) = \text{logit} \left( \mu \right) = \ln \left( {\frac{\mu }{1 - \mu }} \right)$$

Similarly, φ can also be considered as a constant intercept or a function of the explanatory variables, η2, through a link function, g2:

$$\varphi = g_{2}^{ - 1} \left( {\eta_{2} } \right)$$

η2 is determined by the plot of residuals against η1. If the residuals appear to be randomly distributed the use of a constant precision is adequate. The presence of heteroskedasticity, the increase or decrease of the scatter of the residuals with the increase in η1, indicates the need for a more complex expression of the precision which account for explanatory variables. In this study, the link function of the precision is expressed in the form:

$$g_{2} \left( \varphi \right) = \ln \left( \varphi \right)$$

Having determined the main properties of the statistical model, η1 and η2 need to be expressed as functions of the explanatory variables. Based on the above explanatory analysis, the working form of η1 is set to include both the ground motion intensity (‘maxPGA’) as well as the variable which captures whether the area has been affected by a single or both events (‘Event’). It should be reminded that this study differs in that it incorporates the building characteristics directly in the statistical model. This study focuses on estimating the seismic loss in a given grid cell, which has a particular composition of buildings, e.g., 70% RC, 20% timber and 10% masonry. Following the three classification schemes depicted in Table 4, the working form of η1 is written as:

$$\eta_{1} = \left\{ {\begin{array}{*{20}l} {\theta_{0} + \theta_{1} *ln(maxPGA) \, + \theta_{2} *Event + \theta_{3} *Masonry + \theta_{4} *Timber\_Pre1980 +\, \theta_{5} *Timber\_Post1980 + \theta_{6} *RC\_Pre1980 + \theta_{7} *RC\_Post1980} \hfill \\ \quad\qquad{\theta_{0} + \theta_{1} *ln(maxPGA) \, + \theta_{2} *Event + \theta_{3} *Masonry + \theta_{4} *Timber + \theta_{5} *RC} \hfill \\ {\theta_{0} + \theta_{1} *ln(maxPGA) \, + \theta_{2} *Event + \theta_{3} *Pre1980 + \theta_{4} *Post1980} \hfill \\ \end{array} } \right.$$

It should be noted that, in Eq. (9), the explanatory variables associated with the distribution of the building characteristics are not independent as their sum is constant and is equal to 1 for each grid cell. This violates one of the main assumptions on which a meaningful regression is based upon, i.e. the need for the explanatory variables to be independent. To address this issue, the constant-sum explanatory variables, xi, are transformed through an isometric log-ratio transformation (Bruno et al. 2015). The transformed explanatory variables can be written as:

$$z_{i} = \sqrt {\frac{D - i}{D - i + 1}} \ln \left( {\frac{{x_{i} }}{{\sqrt[{D - i}]{{\prod\limits_{j = i + 1}^{D} {x_{j} } }}}}} \right),\quad i = 1, \ldots ,D - 1$$

where D is the total number of constant-sum explanatory variables. D = 5 for the classification of buildings according to ‘Mat-Age’ (see Table 4). D = 3 for the classification of building according to ‘Mat’. Finally, D = 2 for the classification of buildings according to ‘Age’. A description of the variables (xi) which are used in Eq. (10) is given in Table 4. Ideally, all building classes of a given scheme should be present in a grid cell in order for the transformation in Eq. (10) to be meaningful. This, however, is an unrealistic expectation for the available database, where it is not uncommon to have grid cells where at least one building class is not represented. The aggregation of the buildings according to a single structural characteristic (i.e., only material or only age) instead of two reduces the number of missing classes in the grid cells but does not eliminate them. Unrepresented classes in a given grid cell grid are considered equal to 0.001(Neocleous et al. 2011) in order to make the transformation possible.

The working form of η1 after the transformation of the constant-sum explanatory variables is depicted in Eq. (11). The three first equations correspond to the three classification schemes (i.e., ‘Mat-Age’, ‘Mat’ or ‘Age’) presented in Table 4.

$$\eta_{1} = \left\{ {\begin{array}{*{20}l} {\theta_{0} + \theta_{1} *ln(maxPGA) \, + \theta_{2} *Event + \theta_{3} *z_{Mat - Age,1} + \theta_{4} *z_{Mat - Age,2} + \theta_{5} *z_{Mat - Age,3} + \theta_{6} *z_{Mat - Age,4} } \hfill \\ \quad\qquad {\theta_{0} + \theta_{1} *ln(maxPGA) \, + \theta_{2} *Event + \theta_{3} *z_{Mat,1} + \theta_{4} *z_{Mat,2} } \hfill \\ {\theta_{0} + \theta_{1} *ln(maxPGA) \, + \theta_{2} *Event + \theta_{3} *z_{Age,1} } \hfill \\ {\theta_{0} + \theta_{1} *ln(maxPGA) \, + \theta_{2} *Event} \hfill \\ {\theta_{0} + \theta_{1} *ln(maxPGA) \, } \hfill \\ {\theta_{0} + \theta_{1} *ln(maxPGA) \, + \theta_{2} *Event + \theta_{3} *z_{Mat,1} + \theta_{4} *z_{Mat,2} + \theta_{5} *Area} \hfill \\ \end{array} } \right.$$

Apart from exploring which building classification scheme fits the data best, the influence in the fit of including the variables ‘Event’ and ‘Area’ is also explored. Thus, six statistical models are constructed (termed ‘M.EMatAge’–‘M.EMatA’) using the various forms of η1 expressed by Eq. (11) and a constant η2 (see Table 7) aiming to identify the model which fits the data best. Two more models are also tested where η2 is related to ‘Event’. The models are fitted to the data via the ‘betareg’ package in ‘R’. The Akaike Information Criterion (AIC) (2011) is estimated for each fit and the model, which yields the lowest AIC value, is identified as the best fit to the data.

Table 7 The main models fitted to the data and their corresponding AIC values

4.3.2 Application: case 0.10_0.10 g

Similar to the previous Sect. 4.2, a detailed analysis focuses only on the case where both thresholds have values equal to 0.10 g. In Table 7, ‘M.EMat’, whose explanatory variables include ‘maxPGA’, ‘Event’, and the distribution of buildings according to their construction material, appears to provide the best fit for the data.

In Table 7, the differences in the AIC values appear to be small for the examined models. For this reason, a likelihood ratio test is used to compare the different models in order to explore the significance of ‘Event’ and the different building classification schemes. In Table 8, the test on ‘M.EMat’ and ‘M.0’ yields a p value well below the 0.05 threshold.

Table 8 Identifying the best fitted model by examining the p value of a likelihood ratio test

This indicates that ‘M.EMat’ which accounts for ‘Event’ and the buildings classes according to their construction material provides a better fit than a model which includes only the maxPGA. By contrast, the p value of the test on ‘M.EMatAge’ and ‘M.0’ is equal to 0.05 which means that there is not enough evidence to justify the statistically better fit of the more complex model ‘M.EMatAge’ when compared to simpler model M.0. Similarly, there is not enough evidence to prefer the more complex ‘M.EAge’ over the simpler ‘M.EMatAge’ (p value = 0.30). This reinforces the observation in Table 7 that ‘M.EMat’ provides a better fit than ‘M.EMatAge’, ‘M.EAge’ and ‘M.0’. With regard to ‘Area’, the likelihood ratio test on ‘M.EMat’ and ‘M.EMatA’ yields p value above the 0.05 threshold indicating that the addition of this variable in the model is not necessary.

The importance of the ‘Event’ variable in the formulation of η1 as well as η2 is explored next. To do that, the assumption that η2 is constant (see Table 7) is relaxed in models ‘M.EMat’ and ‘M.0’and the precision, φ, is written as a function of ‘Event’ (see ‘M.E.E’ and ‘M.EMat.E’ in Table 7):

$$\eta_{2} = \theta_{0}^{\prime } + \theta_{1}^{\prime } *Event$$

Figure 9 depicts the plots of residuals against η1 for the four aforementioned models. The plot depicts a marked increase in the variability of the residuals with the increase in η1 for ‘M.0’. This apparent heteroskedasticity is seen to reduce by fitting ‘M.E.E’ to the data. Similarly, the residuals of ‘M.EMat’ show signs of heteroscedasticity that is somewhat moderated by relaxing the constant precision assumption in ‘M.EMat.E’. These visual observations reinforce the updated η1 formulation in Eq. (12). By comparing the AIC values in Table 7, ‘M.EMat.E’ appears to be the model which fits the data best. It should be noted that the likelihood ratio test on ‘M.EMat.E’ and ‘M.E.E’ stresses that the improvement in the fit by the addition of the material classes of the buildings in the model. The likelihood ratio test between ‘M.EMat.E’ and ‘M.Mat’ in Table 8 also confirms the statistical significance of ‘Event’ in the model.

Fig. 9
figure 9

Plots of residuals for ‘M.EMat’, ‘M.E.E’, ‘M.0’ and ‘M.EMat.E’ against η1

5 Results and discussion

The model ‘M.EMat.E’ is identified as one which provides the best fit to the data. Its regression coefficients as well as their standard error are presented in Table 9. In what follows, a sensitivity analysis aims to discuss the usefulness of this model in predicting the loss for grid cells comprising different distributions of building types.

Table 9 Regression coefficients for the best fitted model ‘M.EMat.E’, for 0.10 g_0.10 g

In Table 10, five different distributions of buildings in a given grid cell are listed. C1 is based on the construction material distribution of low-rise buildings in Iceland (see in Table 1). C2 assumes that the construction material distribution in the grid cell is identical to the distribution of low-rise buildings according to their material in the South Iceland Seismic Zone (see Table 2). Finally, the three latter cases reflect a grid cell with buildings constructed by a single material.

Table 10 Distribution of residential buildings according to their material in a grid cell

Figure 10 depicts the vulnerability curves and their corresponding 90% pointwise bootstrap prediction intervals for a grid cell assuming it was either affected by both events or by a single event. For all 5 cases, the mean vulnerability curves for the grid cell affected by both events appears to be higher than the curves obtained for the areas affected by a single event. The prediction intervals constructed for the grid cell affected by both events appear to be wider than their counterparts for the areas affected by a single event. These wide intervals for buildings affected by both events illustrate the uncertainty in the mean curves which must be kept in mind when comparing the two cases: ‘Single’ and ‘Both’.

Fig. 10
figure 10

Vulnerability curves and their 90% prediction intervals constructed by fitting ‘M.EMat.E’ with the 5 cases of building distribution in classes according to their material for a representative grid cell

It is interesting to note that the lower boundaries of the prediction intervals are essentially at zero loss, which reflects the presence of many grid cells with zero loss across the full range of maxPGA. In general, the 90% prediction intervals are very wide as compared to other studies (Rossetto et al. 2014). It is also noted that the vulnerability curves appear to be rather flat with increasing maxPGA, which may be explained by the fact that the main losses were related to non-structural damage (Jóhannesson et al. 1982; Rupakhety and Sigbjörnsson 2009) and not structural damage (analysed here). The data indicate that despite high PGA values the structural system of the buildings performed well during the 2000 sequence of earthquakes. The overall losses were small and no residential buildings collapsed. This can be attributed to the design of timber buildings in Iceland to withstand high wind loads (i.e., the fundamental value of the base wind velocity according to Eurocode EN 1991 1-4 4.2 (1991) and the Icelandic national annexes is vb,0 = 36 m/s for the South Iceland Seismic Zone) and the low-rise RC structural wall buildings which also have an inherently strong lateral load resisting system. The comparison of cases C1–C2 shows that the differences in the distribution of the building inventory nationally and in South Iceland do not lead to notable differences in the vulnerability curves.

By comparing cases C3–C5, it can be noted that the grid cell with exclusively masonry buildings appears to have suffered the largest loss irrespective of whether it has been affected by both events or by a single event. By contrast, the grid cell which includes exclusively timber building appears to have suffered the smallest loss followed closely by the curves for a grid cell with exclusively RC buildings. The wider prediction intervals associated with the grid cells having exclusively masonry building data can be partially attributed to the overall small number of masonry buildings, especially for higher maxPGAs levels as well as their inherently uncertain seismic behavior, which has also be noted elsewhere in the literature (Maqsood et al. 2016).

In Fig. 11, the vulnerability curves for the best fitting models are compared with existing Icelandic vulnerability curves from two studies. Thráinsson (1992) constructed a multi-linear vulnerability curve for pre-1980 RC buildings based on data from the 1936 Dalvík and 1976 Kópasker earthquakes in North-East Iceland. The ground motion intensity in this curve is expressed in terms of MMI, which is transformed here (Vogfjörð et al. 2010) to PGA in order for the curve to be directly comparable to the ones constructed in this study. In previous studies (Bessason and Bjarnason 2016; Bessason et al. 2012, 2014) produced fragility curves for Masonry, Pre1980 RC, Post1980 RC, Pre1980 Timber and Post1980 Timber buildings using post-disaster data from both the 2000 sequence of events and the 2008 South Iceland earthquake (Mw6.3). It should be mentioned that buildings in the region between the two earthquake faults of the 2000 sequence were removed from the fragility analysis in the latter study, which therefore predominantly combined data for buildings affected only by single events. In a following paper (Bessason and Rupakhety 2017), these fragility functions were used to construct vulnerability curves, which can be compared to this study. Overall, the existing vulnerability curves compare well to the results of this study as they appear to be included in the prediction intervals constructed here.

Fig. 11
figure 11

Comparison of the vulnerability curves constructed in this study with existing curves for masonry, RC and timber buildings affected by a single event

The vulnerability model constructed in this study can be used in vulnerability or risk assessment of other regions of Iceland when the earthquake magnitude is similar or below Mw6.5. At larger magnitudes higher losses can be expected. The PGA may not necessarily become much larger in bigger events but the ground motion intensity will increase with more significant load cycles and longer duration. On the other hand, the model is believed to give a conservative loss estimates for lower magnitude earthquakes (Mw < 6.5).

6 Conclusions

This paper describes a new earthquake vulnerability methodology which can be used to predict the seismic losses in areas affected by sequential earthquake events. The proposed methodology adopts an adaptive grid approach which that allows a distinction to be made between buildings affected by two events and buildings affected by a single event, even when event loss data is collected after the second earthquake.

The methodology has been applied here to a detailed building-by-building and complete loss database which was recorded in the aftermath of two Mw6.5 earthquakes, which hit South Iceland in 2000. The database covers almost 5000 residential buildings and includes both structural and non-structural losses, excluding the loss to household contents. The ground motion intensity for each building is expressed in terms of PGA and future work is expected to highlight the sensitivity of the methodology to the way used to predict the intensity levels as well as other ground motion intensity measures, which will account for the effect on damage of ground motion frequency content and duration.

The methodology constructed vulnerability curves for areas with a known compositions of building types (defined by proportions of buildings with different construction material) according to whether they have been affected by a single or both seismic events. It has been shown that the mean loss for the average building stock composition (55% RC, 36% Timber, 9% Masonry) is less than 10% of replacement value for all maxPGA levels when exposed to single event. When the same building stock is exposed to strong ground motion from two events, the losses are on average higher but overall less than 25% of the replacement value. It was also shown that that masonry buildings are more vulnerable than both RC and Timber buildings. Finally, significant uncertainty was associated with each prediction of loss for a given maxPGA level highlighting the need to further explore the source of this uncertainty and how it can be potentially reduced.