1 Introduction

The main purposes of seismic risk assessment is to know the probability of occurrence of seismic actions—hazard—and that of the expected damages—risk. Concerning design and retrofitting, the aim is to act on the project and construction of structures to decrease the expected damage, resulting in a reduction of the catastrophic effects of earthquakes on society. Proper assessment and design methodologies require knowledge of how seismic actions, in terms of intensity measures (IMs), affect structural response. Variables representing this response are commonly known as engineering demand parameters (EDPs).

Several IMs have been proposed to better understand the seismic performance of buildings. In this respect, a relevant milestone has been the response spectra theory; see for instance Housner (1941, 1970), Housner and Jennings (1982) and Newmark and Hall (1982). Based on this, IMs considering the damped response of single-degree-of-freedom systems (SDoF) were introduced. This knowledge, combined with the increasing capacity of computers and collection of strong motion databases, have allowed to perform probabilistic calculations which had previously been unthinkable. In this way, statistical analyses of several spectrum-based IMs and EDPs (obtained through advanced nonlinear structural models) have been recently performed (Padgett et al. 2008; Jalayer et al. 2012; Ebrahimian et al. 2015, 2021; Bojórquez et al. 2017a,b; Pinzón et al. 2020; Du and Padgett 2021; O’Reilly 2021).

Efficiency, sufficiency and steadfastness are analysed and discussed in this article. These statistical properties are desirable for IMs when they are employed to predict the seismic response of civil structures. In short, efficiency is related to the predicting power; sufficiency means that the IM renders the conditional response distribution independent of other earthquake characteristics; steadfastness, introduced herein, is related to stability of efficiency for predicting the behavior of building populations with increasing diversity.

From the statistical point of view, the most desirable feature that an IM should exhibit is efficiency (Shome and Cornell 1999; Giovenale et al. 2004; Luco and Cornell 2007). An IM is considered efficient if it reduces the dispersion of the parameter that represents the structural response (EDP). Efficiency can be quantified through the dispersion of IM-EDP points; the lower this scattering the higher the efficiency. Several IM variables, that have been commonly used to quantify efficiency, are explored herein (Padgett et al. 2008; Ebrahimian et al. 2015; Ebrahimian and Jalayer 2021). In addition, new IMs based on average spectral values are also introduced.

As commented above, a sufficient IM makes the statistical properties of a cloud of IM-EDP pairs independent of other earthquake parameters (mainly those related to the earth crust rupture at the source of the event). This property can be quantified through the bivariate analysis of the residuals of a cloud of IM-EDP points and a specific seismological parameter (Luco and Cornell 2007). If these variables are not correlated, it means that the IM is sufficient with respect to the seismological parameter. Note that, lack of sufficiency can be seen as an opportunity to increase the efficiency of the IM by applying multi-regression analysis, as will be shown later (Elefante 2010); in this case, the seismological parameters become information variables, IVs. In general, for the purpose of this research, an IV can be any property that varies from simulation to simulation.

The last desirable IM statistical property analysed in this article, which has not heretofore been systematically investigated, at least up to the authors’ knowledge, is steadfastness with respect to variations of structural properties. This means that the efficiency of steadfast IMs, after grouping results of building models with different physical characteristics, is stable. Note that a steadfast IM could be used to reduce the number of structural calculations when estimating seismic risk at urban level.

When assessing seismic risk, the most desirable feature for EDPs is that they quantify, as best as possible, the performance level of structures affected by earthquake-induced ground motions. One of the most used EDP for estimating this performance level is the maximum inter-story drift ratio, MIDR (Mayes 1995). Many other variables like the maximum displacement at the roof, base shear, maximum floor acceleration (MFA), or damage indices, can be considered as EDPs.

It is worth noting that both IMs and EDPs are random variables showing high dispersion. In the case of IMs, this is due to the complexity of the strong motion influencing a point at the ground level where the acceleration is recorded. Regarding EDPs, randomness comes not only from the seismic input itself, but also from the mechanical properties of the materials, the geometric features of the structure, directionality effects, amongst other aspects (Vargas et al. 2018). Therefore, efficiency, sufficiency and steadfastness should be measured by considering several sources of uncertainty. For instance, if a specific structure is analysed, uncertainties related to the mechanical properties of the materials and seismic action are the ones carrying more variability into the response (Vamvatsikos and Fragiadakis 2010; Jalayer et al. 2015; Vargas-Alzate et al. 2019a). In the case of urban environments, information related to the geometrical distribution of buildings belonging to the emplacement are also required (Silva et al. 2015; Vargas-Alzate et al. 2020). It is worth mentioning that the quantification of statistical properties in simulations that consider random sources of high variability can lead to identifying new properties of the IMs, as in the case of steadfastness.

There are many statistical approaches to extract information from IM-EDP pairs on the expected damage to buildings (Vamvatsikos and Cornell 2002; Jalayer 2003; Jalayer and Cornell 2009; Vargas et al. 2013; Jalayer et al. 2015; Vargas-Alzate et al. 2019a). The main differences between them lie in the sampling strategy used to obtain these pairs, the selection procedure of the strong motion records, and the IMs and EDPs selected. In general, results suggest that the selection of the most suitable IM to analyse the probabilistic distribution of an EDP highly depends on the structural type and seismogenic environment under consideration. In this article, the cloud analysis approach has been considered to analyze the relationship between IMs and EDPs (Jalayer et al. 2015).

Within the framework of the KaIROS project (https://cordis.europa.eu/project/id/799553), several numerical tools to deal with the aforementioned uncertainties have been developed. These tools are used herein to analyse the statistical relationships between a group of IMs and the response of buildings in terms of MIDR and MAF. Specifically, a set of a wide variety of probabilistic numerical models representing the behavior of Reinforced Concrete (RC) frame buildings are used as a testbed (Vargas-Alzate et al. 2019b). In order to characterize seismic hazard, a group of ground motion records, selected and scaled to get response values within and beyond the elastic limit, has been selected and scaled. Once building models and seismic actions have been characterized, EDPs are calculated using Nonlinear Dynamic Analysis, NLDA, as numerical tool. This latter has been considered the most reliable for numerically simulating the seismic response of buildings.

Then, from the ground motion set, a group of basic IMs is obtained and correlated with both EDPs (MIDR and MAF). As pointed out above, several statistical properties have been analysed: (1) the competence to predict EDPs (efficiency); (2) the sufficiency of IMs with respect to seismological parameters; and (3) the ability to group buildings with different features without increasing the variability of the bivariate distribution (steadfastness). It is shown that there is a set of IMs exhibiting efficiency, sufficiency and steadfastness. For insufficient IMs, it has been developed vector-valued IMs with increased efficiency (Elefante et al. 2010).

Note that the quantification of the expected damage of buildings can be improved if IM-EDP pairs exhibiting steadfastness and high-level of correlation are considered. This is relevant for assessment and design. For instance, fragility functions obtained from these highly correlated pairs would provide more accurate estimations on the expected damage state of structures. Moreover, safety factors used in design could be better quantified. Thus, this article focuses primarily on identifying ideal IMs, which are the ones that exhibit the highest efficiency and, for practicality, sufficiency and steadfastness.

2 Probabilistic structural models (exposure)

The success of developing suitable building models depends on the quantity and quality of information regarding the mechanical and geometrical properties of the structural elements. When the analysis is oriented to urban environments, it is also important to know about the geographical distribution of as many physical properties as possible. Of great importance is to know, for instance, the number of structures belonging to each structural type, distribution of the number of stories and spans, azimuthal position of buildings (Vargas-Alzate et al. 2021), amongst many others. In addition, the variability level of the mechanical and geometrical properties of the elements, as well as that of acting loads, should be available. A more detailed discussion of several of these random variables is provided below.

2.1 Gravity loads and mechanical properties

As commented above, to properly analyse the probabilistic seismic behavior of a group of buildings, it is necessary to consider the aleatory uncertainties regarding input variables. Concerning gravity loads and mechanical features of the materials, the following properties have been considered as random variables: live loads, LL, permanent loads, DL, compressive strength of concrete, fc, tensile strength of steel, fy, elastic modulus of concrete, Ec, and elastic modulus of steel, Es. Continuous Gaussian distributions have been assumed for these variables; mean values, µ, standard deviations, σ, and coefficient of variation, c.o.v. of each variable are shown in Table 1. It is worth mentioning that properties assigned to LL correspond to a fraction (33%) of the design value (Eurocode 8).

Table 1 Mean values, µ, standard deviations, σ, and coefficient of variation, c.o.v., of the random variables related to the structural models

2.2 Geometrical properties

Concerning geometrical properties, the following random variables have been considered: number of stories, \(N_{st}\), number of spans, \(N_{sp}\), story height, \(H_{st}\), and span length, \(S_{l}\). \(N_{st}\) and \(N_{sp}\) follow a uniform discrete distribution in the interval (3, 13) and (3, 6), respectively; \(H_{st}\) and \(S_{l}\) are distributed uniformly in the interval (2.8, 3.2) m and (4, 6) m, respectively.

In order to assign the cross-section dimensions of the first story columns, the following equation has been used:

$$W_{c} \;{\text{or}}\; D_{c} = c_{1} *\ln \left( {N_{st} } \right) + c_{2} *S_{l} + c_{3} *{\Phi }_{1,0} + c_{4}$$
(1)

where ci are coefficients that may be adjusted depending on the data distribution of the analysed area. For this study, \(c_{1} =\) 0.4, \(c_{2} =\) 0.05, \(c_{3} =\) 0.01 and \(c_{4} =\)-0.35; Φ1,0 represents the standard normal distribution. Note that columns are not necessarily square, that is, one random sample is generated for the width, \(W_{c}\), and one for the depth, \(D_{c}\), of the element (Eq. 1). For upper stories, each side of the columns decreases systematically by 5 cm every three stories. Values generated are rounded to the nearest multiple of 5 cm to be consistent with real dimensions of RC elements.

The width of beams, \(W_{b}\), depends on the number of stories and span length of the building model, and it has been calculated by using the following equation:

$$W_{b} = b_{1} *N_{st} + b_{2} *S_{l} + b_{3}$$
(2)

where bi are coefficients that depend on the characteristics of the study zone. For the hypothetical case study, \(b_{1} =\) 0.01, \(b_{2} =\) 0.02 and \(b_{3} =\) 0.19. The depth of the beams, \(D_{b}\), is obtained using the following equation:

$$D_{b} = g_{1} \;{*}\;N_{st} + g_{2} \;{*}\;S_{l} + g_{3}$$
(3)

where \(g_{1} =\) 0.01, \(g_{2} =\) 0.05 and \(g_{3} =\) 0.17. Notice that no random term is considered for beams.

As pointed out above, within the framework of the KaIROS project, a software package has been developed for probabilistic seismic analyses by using as structural solver the Ruaumoko program (Carr 2000). This package allows to generate probabilistic MDoF systems representing the behavior of RC frame buildings (see also Vargas-Alzate et al. 2019b); this module has been adapted to simulate the seismic response of other structural types located in other seismic environments (Pinzón et al. 2020). Using this algorithm, 1160 structural models have been generated (this number is related to the availability of records within the ground motion database, as will be shown later).

The modified Takeda hysteresis law, Otani (1974), has been used to represent the in-cycle behavior of the structural elements (beams and columns). This law allows considering strength degradation due to large deformations and excessive number of inelastic cycles. To do so, a strength degradation function, which depends on the ductility reached by the structural elements, as well as a degrading coefficient associated to the number of inelastic cycles has been considered in the hysteretic model (see Carr 2000). For stiffness degradation, the degrading factor has been fixed to 0.4. The yielding surfaces are defined by the bending moment-axial load interaction diagram for columns and bending moment–curvature for beams. P-Delta effects has been considered in the numerical model, as the structures analysed may experience deformations beyond the elastic range, sometimes exceeding their capacity to withstand gravity loads.

Coefficients considered in (1 to (3, as well as the mechanical property values (Table 1), have been selected to approximate the expected behavior of RC structures located in earthquake-prone regions. In order to verify this, three main aspects from the generated models have been analysed: (1) the fundamental period (\(T_{1} )\); (2) the horizontal stiffness; and (3) the nonlinear behavior given the design spectrum.

Regarding \(T_{1}\), the ones obtained from the generated models have been compared with those computed by means of the classical rough approximation \(T_{1} = N_{st} /10\). Figure 1(Left) shows the evolution of \(T_{1}\) as a function of the number of stories for the 1160 building models; it can be seen that \(T_{1}\) values agree with those expected for the building typology analysed. Figure 1(Right) shows a sketch of one hundred building models generated according to the conditions explained above.

Fig. 1
figure 1

Left: Variation of the fundamental period as a function of the number of storeys. Right: one-hundred building models generated with KaIROS software

In seismic design, when defining the horizontal stiffness of a building, it is common practice to limit the expected MIDR given a design spectrum (generally for a 475-years return period) provided by the seismic regulation of the area; this MIDR is generally limited to a maximum value of 0.01 (Eurocode 8). The most used methodology in practice to estimate this expected MIDR is the response spectrum method (Taghavi and Miranda 2010; Gupta and Hall 2017). Accordingly, the MIDR distribution of the generated models has been estimated by considering as seismic hazard the design spectrum type 1 (Eurocode 8), for a soil type A, and a basic acceleration, \(a_{b}\), equal to 0.4 g. Figure 2 shows the distribution of the MIDRs after applying the response spectrum method; it can be seen that none of the generated models exceeds the threshold of 0.01 given the design level considered.

Fig. 2
figure 2

MIDR distribution using the response spectrum method

Regarding the nonlinear behavior, the capacity curves of the generated models have been calculated by using the adaptive pushover analysis (Satyarno 2000). This structural analysis type has been employed herein since it allows an automatic identification of the last point in the capacity curve, \(\delta_{u}\). The longitudinal steel percentages have been assigned to the structural elements so that the life safety damage state, assumed herein to be \(\delta_{u}\), is not exceeded by the expected demand in terms of the performance point, \(\delta_{m}\). It is worth mentioning that in the assignation of the reinforcement, minimum steel percentages equal to 1% for columns and 0.5% for beams have been adopted.

Then, the calculated capacity curves are transformed into capacity spectra (see Fig. 3a) and the respective \(\delta_{m}\) s have been obtained by considering the design spectrum described above; the procedure A of the ATC-40 (ATC-40 1996, see chapter 8) has been employed to obtain \(\delta_{m}\). Thus, it has been quantified if the expected demand exceeds the last capacity displacement of the structure (in terms of \(\delta_{u}\)). To do so, the ratio between \(\delta_{m}\) and \(\delta_{u}\) has been calculated for each of the capacity spectra; it has been found that only twice this ratio is slightly higher than one (which indicates that \(\delta_{m}\) exceeds to \(\delta_{u}\), see Fig. 3b). In other words, the vast majority of the structures generated do not exceed the limit state related to life safety for the design spectrum with a 475 years return period, which is an important requirement established in current design guidelines.

Fig. 3
figure 3

Left: capacity spectra vs design spectrum type 1, soils A (Reference). Right: performance ratios

Based on the above verifications, it can be said that the structural models generated have been developed to meet the main design requirements of RC buildings located in earthquake-prone regions.

3 Seismic demand (hazard)

Seismic hazard is an important random variable that should be considered to properly estimate expected damage and risk. Actually, the main source of uncertainty when assessing the seismic response of civil structures is the variability of ground motions. The randomness in both seismic actions and, consequently, in the structural response, increases alongside with the severity of the earthquake.

Nowadays, there are excellent accelerometric databases at global, regional and country scale that have allowed to quantify uncertainties in seismic hazard. For instance, the Istituto Nazionale di Geofisica e Vulcanologia, INGV, provides the Engineering Strong Motion Database (ESM) and includes a link to the most important accelerometric databases in the world (Luzi et al. 2016a, b).

However, the availability of strong ground motion records covering high-intensity intervals at specific response periods is a common shortcoming found in current databases. Note that an improper sampling of ground motions can trigger excessive scaling of lower intensity records to fit them into high-intensity intervals. Besides, scaling the same record to different intensity levels would introduce false correlation between IMs and EDPs. Therefore, a proper selection of strong motions is paramount to the success of probabilistic structural analyses. The procedure employed to face these issues is detailed below.

There are several methodologies to properly select ground motion records from a database that, for instance, are consistent with a specific site-dependent spectral shape (Haselton et al. 2012). For the purpose of this study, the most important requirement is to have enough ground motion records which are scaled (if necessary) to analyse the generated structural models at different performance levels. Nonetheless, it is also important to avoid excessive scaling of the records so that bias introduced will be negligible. According to this, the main steps followed to pick and set up a proper suite of accelerograms are described in the following:

  • Step 1 Choose the strong ground motion database. The ESM database (Luzi et al. 2016a, b) compiled by the INGV has been used to select the ground motion records; this database contains records occurred in Europe

  • Step 2 Identify the IM for scaling the ground motion record set

  • Step 3 Define suitable intensity bands for the selected IM

  • Step 4 Compute the selected IM for each ground motion record

  • Step 5 Sort the records in descending order as a function of the calculated IM values

  • Step 6 If needed, the ground motion record with the highest IM value is scaled so that it falls into the highest intensity band; if this record naturally fulfills the interval condition, it is not scaled; this step is repeated with the subsequent records until having enough ground motion inputs belonging to this interval

  • Step 7 Step 6 is repeated for each interval considered; the scale factors estimated in STEP 6 (if any) are calculated so that IM values tend to be uniformly distributed within the intensity band

It seems reasonable to select (Step 2) an IM highly correlated with the structural response so that, by increasing or decreasing this variable, so does the structural response. Therefore, it is more appropriate to consider IMs calculated from the spectral response of SDoF systems, since they have been shown to be highly correlated with the MIDR (Miranda 2000). In this respect, several researchers have proven that it is more efficient to use an IM based on the geometric mean of spectral acceleration values around the fundamental period, \({Sa}_{avg}\), than only using the spectral acceleration value associated to this period, \(Sa({T}_{1})\), (Bianchini et al. 2009; Bojórquez and Iervolino 2011; Kazantzi and Vamvatsikos 2015; Eads et al. 2015; Adam et al. 2017). Analogous to this concept, the arithmetic mean of spectral acceleration values around the fundamental period, AvSa, has been considered herein as IM to select and scale ground motion records. It has been preferred to use the arithmetic mean than the geometric one since this tends to provide a more efficient an steadfast IM, as will be shown later.

Because of the probabilistic approach adopted in this study, there is not a single structural model but a group of them which, in addition, have highly variable fundamental periods (see Fig. 1). Therefore, the period range for averaging the spectral ordinates of the IM is established from the dynamic properties of the entire population of buildings (Vargas-Alzate et al., 2019b). In this way, ground motion records, whose mean spectral acceleration in the interval (0.2–1.6) s, lies in a band limited by two intensity levels are selected and scaled. This interval corresponds to 0.7 times the minimum and 1.12 times the maximum period observed in Fig. 1a; after several runs, they are the ones minimizing the dispersion in most of the IM-EDP clouds analysed. Note that these limits tend to be broader when analysing structures sharing similar number of stories.

The intensity levels (Step 3) defining the upper and lower limits of each band range from 0.035 to 0.7 g at intervals of 0.035 g (i.e., 20 bands are defined). The highest limit has been selected so that approximately 10% of the models exceed their capacity to withstand gravity loads.

By means of the selection and scaling process described above, the horizontal component related to the east–west direction of 1160 earthquake records (58 records per band) of the ESM database have been obtained (the entire database contains 1163 ground motion records). Figure 4 (Left) shows the response spectra of the 1160 selected records. The distribution of magnitudes, M, and epicentral distances, ED, of these records is shown in Fig. 4 (Right).

Fig. 4
figure 4

Left: Response spectra of the 1160 selected and scaled accelerograms. Right: Distribution of magnitudes and epicentral distances

4 Intensity measures

The increased computing power and mathematical background along with proper numerical models allow to extract valuable information on variables related to strong seismic actions (IMs) as well as on the response of civil structures (EDPs). Ideally, an IM should contain enough information about the ground motion so that the structural response can be predicted with confidence (Pejovic and Jankovic 2015). Note that an IM may depend on the properties of the ground motion, or both the ground motion and the structural characteristics.

4.1 Spectral-based intensity measures

The dynamic response of structures subjected to ground motions induced by earthquakes has been correlated to the peak response of equivalent SDoF systems. The study of this simplified model gave rise not only to the response spectra but also to spectral IMs. It has been recognized that efficient IMs would be defined by response spectral ordinates. This is why response spectral ordinates are extensively used to quantify seismic hazard at a site. Such ordinates are calculated from the maximum time-history response after solving the dynamic equilibrium equation for SDoF systems given a set of specific period values:

$$m\ddot{u}\left( t \right) + c\dot{u}\left( t \right) + ku\left( t \right) = - m\ddot{u}_{g} \left( t \right)$$
(4)

where \(\ddot u\left( t \right)\), \(\dot{u}\left( t \right)\) and \(u\left( t \right)\) are the acceleration, velocity and displacement time-history responses of the SDoF, respectively; \({\ddot u_g}\left( t \right)\) is the acceleration ground motion; m, c, and k represent mass, damping and stiffness of the system, respectively. IMs from 1 to 6 described in Table 2 belong to this category.

Table 2 Intensity measures description

4.2 Energy-based intensity measures

For both design and assessing the performance of civil structures, the peak response of physical magnitudes like displacement, velocity or acceleration has been widely used. However, several researchers have found that the expected damage of structures is strongly tied to the amount of energy introduced to the system (see for instance Benavent-Climent et al. 2004; Yazgan 2012; Cheng et al. 2015; Güllü et al. 2019). In this respect, the equivalent velocity spectrum is a representation of the amount of energy introduced to a set of SDoF systems. This energy can be calculated by rewriting Eq. 4 in terms of energy. That is, each term of this equation is multiplied by the differential increment of displacement (\(\dot{u}dt\)) and then integrating in the time interval (0, t), as follows (Akiyama 1985; Bertero and Uang 1992):

$$m{\int }_{0}^{t}\ddot{u}\dot{u}\;dt+c{\int }_{0}^{t}{\dot{u}}^{2}\;dt+k{\int }_{0}^{t}u\dot{u}\;dt=-m{\int }_{0}^{t}{\ddot{u}}_{g}\dot{u}\;dt$$
(5)

where \({E}_{k}=m{\int }_{0}^{t}\ddot{u}\dot{u}\;dt\) is the kinetic energy; \({E}_{\xi }=c{\int }_{0}^{t}{\dot{u}}^{2}\;dt\) is the energy dissipated by the inherent damping; \({E}_{a}=k{\int }_{0}^{t}u\dot{u}\;dt\) is the energy absorbed by the system; and \({E}_{I}=-m{\int }_{0}^{t}{\ddot{u}}_{g}\dot{u}\;dt\) is the energy introduced into the system by the ground motion. This latter is commonly expressed in terms of equivalent velocity, \(VE\), and is normalized with respect to the mass of the structure as follows:

$$VE(T_{1})=\sqrt{2{E}_{I}(T_{1})}$$
(6)

Calculating \(VE\) for several elastic oscillators will produce the equivalent velocity spectra. IMs 7 and 8 described in Table 2 are obtained from the energy introduced into the system.

4.3 IMs based on direct computations of the ground motion record

IMs presented above take into account the spectral response of SDoF systems. In other words, they consider not only the characteristics of ground motions but also some related to the structural properties. However, several IMs can be obtained from direct computations of the ground motion record. The appeal of this type of IMs is that they do not depend on the dynamic properties of the structures. In this way, a generic fragility function could be developed to estimate the seismic performance of buildings with very different structural properties. However, this independency of the structural properties may cause a decrease of the efficiency for predicting EDPs. IMs from 9 to 18 described in Table 2 belong to this category.

5 Statistical analysis of IM-EDP pairs

Once the seismic hazard—intensity measures—and exposure—building models—have been defined and characterized, 1160 NLDAs have been performed. The Ruaumoko software has been used to perform the structural analyses (Carr 2000).

It has been found that there is a number of simulations in which the capacity of the structure to withstand gravity loads is exceeded. In other words, the outcome of the analysis indicates structural collapse. These results are considered to be governed by chaos. In a few words, chaos is a phenomenon in which even deterministic models may lead to unpredictable outcomes due to the sensitivity of the governing dynamic equations to initial conditions. It has been observed that MIDR diverges after a collapse mechanism is activated. This fact agrees with the behavior of chaotic systems, which are predictable for a short time and then become excessively random. After analysing the response of building models exhibiting collapse, it has been noticed that this short time is in the order of a few hundredths of a second. A deeper view on chaotic systems and their properties can be found in Strogatz (2018). Thus, results related to collapse have been excluded from the statistical analysis. These outcomes have been identified as simulations providing damage index of Park and Ang higher than 1 (Park and Ang 1985). In recent and current research, this criterion has been shown to be very efficient in detecting structural collapses (Vargas et al. 2019b).

Note that, if collapses were identified as those samples exceeding a MIDR limit (or MFA), it is recommended to split data between censored and uncensored observations (see for instance Jalayer and Cornell 2009; Martins and Silva 2020). For the sake of simplicity, samples exhibiting collapse, according to the damage index limit described above, have been excluded from the statistical analysis. A potential problem of this simplification is that a significant number of the removed samples have very similar IM values, which may affect the statistical analysis (Aschheim et al. 2019). To check if this problem occurs, the coefficient of variations of each of the IMs related to the excluded samples have been calculated; the minimum observed value has been 0.23, which is significant with respect to the one of the entire population (0.56), meaning there is no concentration around specific IM values. Anyhow, note that the number of excluded simulations could be reduced if the superior limit of the IM for scaling the ground motion records is diminished. In addition, simulations in which PGA > 1.5 g, PGV > 3 m/s, PGD > 1 m or \(Sa\left( {T_{1} } \right)\) > 4 g have been also excluded from the statistical analyses. Thus, 119 (≈10%) out of the 1160 generated samples have been excluded from the entire dataset. Consequently, 1041 cases are used to perform the statistical analysis of the IMs. Of this number, 141 are results in the elastic domain and 900 are results in the inelastic domain.

In the following, the resulting clouds of IM-EDP points are used to analyse several statistical properties related to the data variability.

5.1 Efficiency

Multi-regression models allow to define the optimal combination of several \(IV\)s to explain an EDP. In this study, IV-EDP relationships have been characterized by means of both multi-linear and -nonlinear regression models in the log–log space. As with linear least squares, nonlinear regression is based on determining values that minimize the sum of the squares of the residuals. In this sense, the following general linear least-square model allows several types of multi-regression analysis with respect to \(y\):

$$y = \alpha_0+\mathop \sum \limits_{i = 1}^{{N_{IV} }} \mathop \sum \limits_{j = 1}^{n} \alpha_{{j + \left( {i - 1} \right)*n}} z_{j} + \varepsilon$$
(7)

where \(N_{IV}\) represents the number of \(IV\) considered in the regression model; n represents the polynomial degree of the model; \(\alpha_{0}\), \(\alpha_{1}\), …\(\alpha_{{n* {N_{IV}}}}\) are the coefficients providing the best fit between model and data; \(z_{j}\) are basic functions; \(\varepsilon\) represents the residuals. It can easily be seen how polynomial regression falls within this model. That is, \(z_{1} = x\), \(z_{2} = x^2\), …\(z_{n} = x^{n}\) (Chapra 2017). Substituting in Eq. 7\(y = \ln EDP\) and \(z_{j} = \left( {\ln IV_{i} } \right)^{j}\), the general linear least-square model using polynomial functions can be used to extract statistical information from IV-EDP pairs according to the following equation (Eq. 8):

$$\ln EDP = \alpha_0+\mathop \sum \limits_{i = 1}^{{N_{IV} }} \mathop \sum \limits_{j = 1}^{n} \alpha_{{j + \left( {i - 1} \right)*n}} \left( {\ln IV_{i} } \right)^{j} + \varepsilon$$
(8)

This equation allows multi-linear (n = 1) and -quadratic (n = 2) regression models in the log–log space. In this way, several sources of information can be considered simultaneously to better predict a specific EDP. Further information regarding the development and implementation of this type of polynomial models in the log–log space can be found in Chapra (2017).

Thus, for \(N_{IV}\) = 1 Eq. 8 adopts the following form:

$$\ln EDP = \alpha_0+\mathop \sum \limits_{j = 1}^{n} \alpha_{j} \left( {\ln IM} \right)^{j} + \varepsilon$$
(9)

\(IV_{1}\) becomes IM in this equation.

In terms of the residuals, \(\varepsilon\), there are several variables that can be used to quantify the variability of IM-EDP clouds. For instance, Ebrahimian et al. (2015) considered the logarithmic standard deviation of the regression, \(S_{y/x}\), and the correlation coefficient, \(R^{2}\), to quantify the efficiency of IMs (for a perfect fit, \(S_{y/x}\) = 0 and \(R^{2}\) = 1, signifying that the adopted function explains 100% of the data variability). These two variables (\(R^{2}\) and \(S_{y/x}\)) are used herein to provide an estimation of the variability when analysing relationships involving IM-EDP pairs. The higher \(R^{2}\) or the lower \(S_{y/x}\), the lower the variability when predicting some EDP given an IM.

A more well-known and structure-specific measure of efficiency is the logarithmic standard deviation of the fragility curve, \(\beta\), called proficiency (i.e., logarithmic standard deviation of the regression divided by the slope of regression; see Padgett et al. (2008) and Ebrahimian and Jalayer (2021)). This measure has been also calculated to analyse the efficiency of the IMs. Summarizing, the IM providing the highest \(R^{2}\)  and the lowest \(S_{y/x}\) and \(\beta\) is the most efficient one, in the sense that it allows reducing the variability in seismic risk estimations. In the following, variables described in section 4 are used as basic IMs whilst MIDR as EDP.

Results from two types of regression models have been compared (linear and non-linear, i.e. n = 1 and n = 2 in Eq. 9, respectively). The reason for considering a quadratic regression model is that several IM-EDP pairs seem to exhibit a non-linear dependence. In this regard,  several authors have called into question the use of linear regression when characterizing these pairs (e.g. Baker 2007). It has been stated that a linear relationship may not be reasonable for the entire IM range of interest. To address this problem, the regression should be limited to the IM range where the linear form is reasonable, or piecewise linear relationships may be fitted (e.g. Mackie and Stojadinovic 2003). Herein, it has been opted to fit a quadratic regression model. It is worth mentioning that, when an IM-EDP set indicates that a linear fitting is more adequate to characterize this relationship in the entire interval, the quadratic model automatically makes the coefficient accompanying the nonlinear term close to zero.

Figure 5 shows the relationships between IM-EDP pairs in the log–log space for all the basic IMs described in Table 2. \(R^{2}\) values for the linear (\(R_{L}^{2}\)) and nonlinear regression models (\(R_{NL}^{2}\)) are presented in this figure. Table 3 summarizes, and also complements, results shown in Fig. 5; subscripts ‘L’ and ‘NL’ in \(R^{2}\) and \(S_{y/x}\) variables stand for linear and nonlinear regression model.

Fig. 5
figure 5

\({R}^{2}\) calculated with both linear and nonlinear regression models; MIDR has been considered as EDP

Table 3 Summary of the statistical analysis considering as EDP to MIDR

It can be observed that IMs related to the velocity are the ones most efficient to predict the MIDR of the analysed systems. It is also important to note that most of these variables are well fitted by both linear and nonlinear regression models. In other words, it has not been observed a significant variation of the dispersion when using any of these models. In any case, note that the polynomial arrangement presented in Eq. 9 allows fitting linear functions if that is what the data suggests; for these cases, \(\alpha_{2}\) tends to zero. Notwithstanding, although similar levels of variability are observed for most of the IMs used (i.e. between linear and nonlinear regression models), it will be shown that including the non-linear term in the regression model strongly influences the derivation of analytical fragility functions. It can also be seen in Table 3 that AvSv is the IM most efficient to explain the MIDR.

5.2 Sufficiency

A sufficient IM makes the statistical properties of a cloud of IM-EDP pairs independent of other parameters related to the rupture (e.g., magnitude, M, epicentral distance, ED, etc.). This property can be defined through the correlation between the residuals of a cloud of IM-EDP pairs and the seismological parameter of interest, SP; the higher this correlation the lower the sufficiency of the IM.

The p value between the aforementioned residuals and SPs has been used to verify the sufficiency of the IMs presented in Fig. 5 (Luco and Cornell 2007). According to this, the p value between the residuals of each of the IM-EDP clouds have been calculated with respect to the magnitude (\({pv}_{M}\)) and epicentral distance (\({pv}_{ED}\)); Table 3 also shows these results. It can be seen that most of the IMs present p values lower than 0.05, meaning that they are not sufficient with respect to the SP. These p values have been calculated with respect to the residuals provided by the linear regression model. However, similar conclusions can be drawn when performing this operation using the quadratic regression approach.

Regarding the group of IMs exhibiting sufficiency, only AvSd and VE(T1) meet this condition simultaneously with respect to M and ED; the most efficient of them is AvSd.

Lack of sufficiency indicates that some IMs can gain efficiency when combined in a multi-regression model (see Eq. 8) with the respective seismological parameter (Elefante et al. 2010). This is analogous to the concept of vector-valued IMs (Baker 2007). Several researchers have applied this sophistication to develop enhanced IMs (IM*), which consist in a vector of IVs providing more efficient estimates of EDPs (Modica and Stafford 2014; Yakhchalian et al. 2015, 2021; Bojórquez et al. 2015; Ge and Zhou 2018; Zengin and Abrahamson 2020). The amount of efficiency that can be gained by an IM depends on the correlation value between the residuals of the IM-EDP cloud and SP. The higher this correlation the higher the efficiency that can be gained.

Thus, for n = 1 and \({N}_{IV}\)=2, Eq. 8 can be used to include SPs in the linear regression analysis as follows:

$$\mathrm{ln}\left(EDP\right)={\alpha }_{0}+{\alpha }_{1}*\mathrm{ln}\left(IM\right)+{\alpha }_{2}*\mathrm{ln}(SP)+\varepsilon$$
(10)

For n = 2 and \({N}_{IV}\)=2, Eq. 8 adopts the following form:

$$\mathrm{ln}\left(EDP\right)={\alpha }_{0}+{\alpha }_{1}*\mathrm{ln}\left(IM\right)+{\alpha }_{2}*({\mathrm{ln}(IM))}^{2}+{\alpha }_{3}*\mathrm{ln}\left(SP\right)+{\alpha }_{4}*({\mathrm{ln}(SP))}^{2}+\varepsilon$$
(11)

According to these arrangements, \({R}^{2}\), \({S}_{y/x}\) and \(\beta\) are newly calculated by substituting SP for M and ED in Eqs. 10 and 11; Tables 4 and 5 show these results, respectively. Note that the proficiency terms (\(\beta )\) coming from multiple regression models have been calculated by considering that the slope of the linear regression analysis is \({\alpha }_{1}\) in Eqs. 10 and 11.

Table 4 Summary of the statistical analysis considering as EDP to MIDR
Table 5 Summary of the statistical analysis considering as EDP to MIDR

It is important to note that several of the IM*s considering to M in the arrangement presented in Eqs. 10 or 11 become sufficient with respect to ED; this is not that clear when considering ED in these equations (i.e. the resultant vector-valued IM is not sufficient with respect to the magnitude, see Table 5).

According to Tables 4 and 5, some IM*s could gain even more efficiency if an enhanced arrangement combining both M and ED together with the basic IM is used (i.e., \({N}_{IV}\)=3; n = 1, 2 in Eq. 8). For n = 1, the following linear regression model has been employed:

$$\mathrm{ln}\left(EDP\right)={\alpha }_{0}+{\alpha }_{1}*\mathrm{ln}\left(IM\right)+{\alpha }_{2}*\mathrm{ln}(M)+{\alpha }_{3}*\mathrm{ln}(ED)+\varepsilon$$
(12)

for the quadratic regression case (i.e. n = 2):

$$\mathrm{ln}\left(EDP\right)={\alpha }_{0}+{\alpha }_{1}*\mathrm{ln}\left(IM\right)+{\alpha }_{2}*({\mathrm{ln}(IM))}^{2}+{\alpha }_{3}*\mathrm{ln}\left(M\right)+{\alpha }_{4}*({\mathrm{ln}(M))}^{2}+{\alpha }_{5}*\mathrm{ln}\left(ED\right)+{\alpha }_{6}*({\mathrm{ln}(ED))}^{2}+\varepsilon$$
(13)

Table 6 shows the statistical properties of the IM*s by using the regression models presented in Eqs. 12 and 13. p values have been calculated again by considering the linear regression model. \(\beta\) has been calculated by using \({\alpha }_{1}\) as the slope of the linear regression arrangement. It is worth noting that, for all the multi-regression models used, AvSv has been present in the vector that forms the most efficient IM*.

Table 6 Summary of the statistical analysis considering as EDP to MIDR

5.3 Steadfastness

In Table 3, it can be seen that, except for AvSa, IMs defined in terms of average spectral values are best-correlated with MIDR. In general, IMs related to the spectral acceleration do not correlate well with MIDR. To further analyse this somehow unexpected result, data- efficiency measures-have been recalculated by creating sub-categories according to the number of stories (see Fig. 6, left). Hence, \({{R}}^{2}\), \({{S}}_{{y}/{x}}\) and \(\beta\) values have been obtained by using the quadratic regression model.

Fig. 6
figure 6

Steadfastness analysis. Left: \({R}^{2}\), \({S}_{y/x}\) and \(\beta\) values for buildings grouped according to the number of storeys. Right: \({R}^{2}\), \({S}_{y/x}\) and \(\beta\) values for buildings grouped according to the accumulated number of stories. The EDP analysed has been MIDR

A significant increase of \({R}^{2}\), as well as a reduction of \({S}_{y/x}\) and \(\beta\) values, can be seen for AvSa. In other words, the efficiency of AvSa has increased considerably, yet AvSv is best correlated with MIDR for most of the disaggregated structures. For low-rise structures—3- and 4-storey buildings—the most efficient IM is AvSd; for high-rise buildings—12- and 13-stories—AvVE is the most efficient one.

The fact that, when simultaneously analysing IM-EDP pairs coming from building models with different number of stories, an IM loses efficiency is a potential problem, since a higher number of structural calculations is required for obtaining reliable seismic risk estimations at urban level. In this article, when an IM or an IM* does not lose efficiency when aggregating results of buildings with different properties, it is catalogued as steadfast. Another way to see steadfastness is to think about sufficiency, that is, a steadfast IM is sufficient with respect to the parameter analysed, in this case, number of stories.

To analyse the steadfastness of IMs based on average spectral values, buildings with different number of stories have been progressively aggregated. Figure 6 (Right) shows the evolution of \({R}^{2}\), \({S}_{y/x}\) and \(\beta\) values when aggregating buildings whose number of stories increases. In other words, an abscissa value corresponds to those buildings whose number of stories is equal to or less than it. An abscissa of 9, for instance, represents a subgroup of buildings having 9 stories or less. This way, \({R}^{2}\), \({S}_{y/x}\) or \(\beta\) corresponding to an abscissa value of 3 are the same in Fig. 6, left and right. Also, when the abscissa is 13 in Fig. 6 (Right), parameters related to the efficiency are the same as those shown in Table 3 for the corresponding IM.

In Fig. 6 (Right), it can also be seen that AvSa shows certain steadfastness when buildings holding 3 to 6-stories are grouped. This result is consistent with the approach followed by several methods for seismic risk analysis which groups, in different structural types, those buildings sharing similar number of stories. For instance, in previous seismic risk studies (see, for instance, Lantada et al. 2009), RC Low-, Mid- and High-rise building types, having 1 to 3, 4 to 6 and more than 6 stories, respectively, have been used to perform seismic risk estimations. This sort of grouping allows using the same fragility function for every building belonging to the same typology when estimating, for instance, the exceedance probability of a certain damage level.

AvSv, AvSd and AvVE are variables less prone to lose efficiency when grouping structures having different number of stories. Therefore, these IMs are candidates for considerably reducing the number of structural calculations when estimating seismic risk, since the same fragility function may represent an important variety of buildings (even the entire set analysed in this research, if desired) no matter the number of stories. For instance, classical Low- Mid- and High-rise typologies could be grouped within a comprehensive structural typology based on AvSv, which is not only the most efficient but also one of the IMs showing stability; it means that this IM is steadfast.

A simplified way to measure steadfastness is to quantify how much the efficiency decreases when grouping a set of buildings of different properties. To do so, the proficiency measure, \(\beta\), has been selected in this article to estimate steadfastness. Thus, for a set of buildings, whose number of stories vary from I to S, the ratio between the average of the proficiency values for buildings grouped according to the number of stories, \({\overline{\beta }}_{N_{ST}}\), and the proficiency estimated from the entire set of buildings, \({\beta }_{I-S}\), is proposed as a measure of steadfastness (see Eq. 13).

$${\eta }_{I-S}=\frac{{\overline{\beta }}_{N_{ST}}}{{\beta }_{I-S}}$$
(14)

where \({\beta }_{N_{ST}}\) is a vector of I-S+1 elements containing \(\beta\) values calculated by grouping the structures according to the number of stories (see Fig. 6, left). Thus, \({\eta }_{3-13}\) provides an estimation of the amount of efficiency losses by an IM, when grouping buildings from I = 3 to S = 13 stories, with respect to subgroups of structures sharing the same number of stories. In this way, if \({\eta }_{I-S}\) is close to one, it means that the IM is steadfast for grouping buildings from \(I\) to \(S\) number of stories.

Tables 3, 4, 5 and 6 (last columns) show the steadfastness estimation for all the basic IMs described in Sect. 4, and their vector-valued versions. It can be seen that \(Sa({T}_{1})\) and AvSa are the IMs showing the lowest steadfastness of the analysed group.

It is worth mentioning that the scaling criteria used in this article (see Sect. 3) allows reaching similar demand levels for building models grouped according to number of stories. Meaning, if percentiles 5, 50 and 95 of the MIDR distribution are calculated by grouping data based on the number of stories, similar values are observed (see Fig. 7).

Fig. 7
figure 7

Percentiles 5, 50 and 95 for the MIDR distribution as a function of the number of stories

5.4 \({{\varvec{S}}{\varvec{a}}}_{{\varvec{a}}{\varvec{v}}{\varvec{g}}}\) versus AvSa

Several researchers have proven that it is more efficient to use an IM based on the geometric mean of spectral acceleration values around the fundamental period, \({Sa}_{avg}\), than \(Sa\left({T}_{1}\right)\), (Bianchini et al. 2009; Bojórquez and Iervolino 2011; Kazantzi and Vamvatsikos 2015; Eads et al. 2015; Adam et al. 2017). Note that \({Sa}_{avg}\) is analogous to AvSa since both IMs use averaged spectral acceleration ordinates over a period range around the fundamental period of the structure. The main difference between them is that \({Sa}_{avg}\) uses the geometric mean whilst AvSa the arithmetic one. In this section, both IMs are compared in terms of efficiency and steadfastness. Figure 8 (Left) shows \({R}^{2}\), \({S}_{y/x}\) and \(\beta\) values by grouping buildings according to the number of stories for AvSa and \({Sa}_{avg}\).

Fig. 8
figure 8

Efficiency and steadfastness comparison between AvSa and \({Sa}_{avg}\)

In terms of \({R}^{2}\) and \({S}_{y/x}\), it can be seen that for low-rise buildings (from 3 to 5 stories), the MIDR tends to exhibit less variability when predicted by \({Sa}_{avg}\); for higher buildings, AvSa is the IM providing less variability. In terms of \(\beta\), only for 3-storey buildings \({Sa}_{avg}\) produces less dispersion than AvSa. Regarding steadfastness, Fig. 8 (Right) shows the evolution of the efficiency-related parameters when grouping buildings of different number of stories; all variables indicate that \({Sa}_{avg}\) loses more efficiency than AvSa.

The comparison between the use of arithmetic versus geometric mean has also been performed by considering AvSd, AvSv and AvVE; similar conclusions have been found. For these reasons, throughout this article, it has been preferred to analyse the evolution of average-based IMs using the arithmetic mean rather than the geometric one.

5.5 ¿Why velocity-related IMs are more efficient to predict the MIDR?

Along this study, it has been shown that velocity- and energy-based IMs have the highest explanatory capacity in the above statistical relationships; that is, the maximum efficiency and steadfastness to predict MIDR. It seems reasonable to wonder why these IMs are better proxies than the rest of them. To think about this question, let us consider two single peak triangular acceleration inputs, \({\ddot{u}}_{g}\left(t\right)\), having the same PGA but different durations (Fig. 9).

Fig. 9
figure 9

Two single peak triangular acceleration inputs; same PGA, different durations

While the maximum input force is the same, i.e. PGA times the mass m, the total input force is:

$${\int }_{{t}_{1}}^{{t}_{2}}m{\ddot{u}}_{g}dt={\int }_{0}^{PGV}md{\dot{u}}_{g}=mPGV$$
(15)

This integral equals the PGV that is obtained at the end of the interval, multiplied by the mass. Therefore, despite the maximum force for the two peaks is the same, the total input force is larger for the one having the largest PGV, which corresponds to the dashed line in Fig. 9. In other words, while PGA is associated to the maximum force, PGV represents the total input force; in this case, in a half cycle. Since the deformation of a structure is more related to the total input force, it is to be expected that IMs based on velocity are better predictors of the MIDR than those based on acceleration. Further insight into this fact and into their mutual relationship can be gained by considering the input energy, \({E}_{I}\), which is rewritten herein for the sake of convenience:

$${E}_{I}=-m{\int }_{0}^{t}{\ddot{u}}_{g}\dot{u}dt$$
(16)

Integrating by parts Eq. 16 yields:

$${E}_{I}=-m\left[{\dot{u}}_{g}\dot{u}-{\int }_{0}^{t}{\dot{u}}_{g}\ddot{u}dt\right]$$
(17)

which can be put in the form:

$${E}_{I}=-m\left[{\dot{u}}_{g}\dot{u}-{\int }_{0}^{t}{\dot{u}}_{g}d\dot{u}\right]$$
(18)

since \(d\dot{u}=\ddot{u}dt\).

In Eq. 18, the first term is an oscillatory function around zero energy. The backbone curve of the energy is constituted by the integral term, which gives the increasing component of the energy input. Recall that \({\dot{u}}_{g}\) is related to the ground motion while \(\dot{u}\) is related to the structural response. Be it as it may, the meaning of the last presentation of the input energy with its two terms is clear: the energy input can be expressed solely in terms of the ground and structural velocities. This fact may help understand why velocity-based IMs are good descriptors of inelastic structural responses and damage represented in terms of the deformation field of the structure.

According to the previous rationalization, one would expect that EDPs related to the maximum acceleration exhibited by the structure are better predicted by acceleration-based IMs than velocity-based ones. For instance, MFA, which is an EDP highly associated to damage in non-structural components (Elenas and Meskouris 2001), is better described by IMs based on acceleration than velocity (see Fig. 10). In this figure can be seen that the IM most able to explain MFA is the PGA. It can also be seen that the quadratic regression model fits much better the cloud of points than the linear one. Tables 7, 8, 9 and 10 present statistical results regarding properties like efficiency, sufficiency and steadfastness given that the EDP is MFA.

Fig. 10
figure 10

\({R}^{2}\) calculated with both linear and nonlinear regression models; MFA has been considered as EDP

Table 7 Summary of the statistical analysis considering as EDP to MFA
Table 8 Summary of the statistical analysis considering as EDP to MFA
Table 9 Summary of the statistical analysis considering as EDP to MFA
Table 10 Summary of the statistical analysis considering as EDP to MFA

As a matter of example, the steadfastness evolution of the most efficient IMs to predict MFA are presented in Fig. 11. It is also shown the evolution of AvSa which, again, exhibits moderate efficiency to predict the response of buildings sharing the same number of stories. However, this efficiency is lost when grouping a limited variety of buildings with different number of stories. In other words, this IM is not steadfast with respect to MFA.

Fig. 11
figure 11

Steadfastness analysis. Left: \({R}^{2}\), \({S}_{y/x}\) and \(\beta\) values for buildings grouped according to the number of stories. Right: \({R}^{2}\), \({S}_{y/x}\) and \(\beta\) values for buildings grouped according to the accumulated number of stories. The EDP analysed has been MFA

In terms of sufficiency, when the EDP is MFA, \(Sa({T}_{1})\) is the only IM exhibiting this statistical property with respect to M (none of the IMs has shown efficiency with respect to ED). For this reason, the statistical properties of \(Sa({T}_{1})\) remain similar when integrated into more advanced vector-valued arrangements. Note that the lower the p value of the basic IM the higher the efficiency gained with the advanced IM*s.

6 Linear versus quadratic regression analysis to derive fragility curves

From sets of IM-EDP points, it is possible to derive fragility functions according to the cloud analysis approach (Jalayer et al., 2015). This methodology requires to calculate the best fit curve between a set of IM-EDP realizations in the log–log space. The resultant curve is used to estimate the median of a parametric statistical distribution, given an IM value. The variability of this parametric distribution is estimated as the logarithmic standard deviation of the regression (\({S}_{y/x})\). In this way, the probability of exceeding a certain damage threshold given an IM value can be calculated. These thresholds are particular realizations of the engineering demand parameter under consideration, EDPC. Certainly, the variability of the fragility functions is directly related to the dispersion of the IM-EDP points. The higher this dispersion the higher the uncertainty when defining whether the structure presents one specific damage state or another. That is why it is important to identify efficient IMs.

In general, the aforementioned fitting is performed using linear regression analysis. However, as shown in Fig. 5, some IM-EDP clouds are slightly better represented by a quadratic regression analysis; in the case of MFA (see Fig. 10), this tendency is much clearer.

Figure 12 (Left) shows the linear and nonlinear fittings in the log–log space given the IM-EDP relationship between AvSv and MIDR. While no significant changes are observed with respect to \({S}_{y/x}\) (0.334 for the linear and 0.329 for the nonlinear fitting), probabilities provided by the fragility functions are affected because of the regression analysis type. In order to assess how much this probability of exceedance may vary, in Fig. 12 (Right) are compared fragility functions for AvSv using linear and nonlinear curves as best fit. The damage threshold to be exceeded is MIDRC = 0.02; this threshold can be related to a damage state identified as extensive. In this figure is also shown the absolute value of the difference between both curves (Blue dashed line). This function shows that the difference in probability may reach values in the order of 0.22.

Fig. 12
figure 12

Left: linear and nonlinear fitting considering AvSv as IM and MIDR as EDP. All the data have been included in the regression analyses. Right: Fragility curves considering both fittings; the blue dashed line indicates the abolute value of the difference in probabilities between both fragility functions

Note that using a quadratic function can bring out problems when estimating probabilities of exceedance before the minimum or after the maximum value of the parabola (in the case of parabolas that open downwards). In Fig. 12, for instance, the quadratic function providing the best fit is a vertical parabola opening upwards. This parabola has its minimum value in a point located in the log–log space at (-12.0779, -12.1363) which, in the linear space, corresponds to (5.6837e-06 m/s, 5.3613e-06). In practical terms, the minimum value of the parabola fitted to the data (Fig. 12, left) will not produce significant problems, since there are no damage states associated with such small MIDR values.

If the IM-EDP relationship is analysed considering AvSa as IM, the resultant curve, given a nonlinear regression analysis, is a vertical parabola opening downwards (See Fig. 13, left). Its maximum value in the log–log space is located at (9.9796, -1.8129) which, in the linear space, corresponds to a point located at (2.1582e + 04 g, 0.1632). In practical terms, it is unlikely to have such extreme acceleration values, or this MIDR without reaching collapse. In Fig. 13 (Right) fragility functions for AvSa are derived using as best fit linear and nonlinear curves. The damage threshold to be exceeded is again MIDRC = 0.02. In this figure is also shown the absolute value of the difference between both curves (Blue dashed line). This function indicates that the difference in probability may reach values in the order of 0.23.

Fig. 13
figure 13

Left: linear and nonlinear fittings considering AvSa as IM and MIDR as EDP. All the data have been considered in the regression analyses. Right: Fragility curves considering both fittings; the blue dashed line indicates the abolute value of the difference in probabilities between both fragility functions

Fragility curves considering AvSa clearly reflect the high dispersion of this IM-EDP relationship when all the data are analysed simultaneously. This makes the identification of damage states more uncertain. However, as commented above, if subcategories are created by grouping buildings of similar stories, the dispersion between AvSa-MIDR pairs is reduced and, consequently, the quality of the fragility functions based on AvSa improves. For instance, Fig. 14 (Left) shows the linear and nonlinear regression fittings when considering data associated to RC mid-rise structures, i.e. building models ranging from 4 to 6 stories. A substantial reduction of the dispersion can be observed, which turns into an improvement of the quality of the fragility function (Fig. 14, right); for the nonlinear fitting, \({S}_{y/x}\) has decreased from 0.585, when all data is considered, to 0.415 for this subcategory.

Fig. 14
figure 14

Left: linear and nonlinear fittings considering AvSa as IM and MIDR as EDP. Only mid-rise structures have been considered in the regression analyses. Right: Fragility curves considering both fittings; the blue dashed line indicates the abolute value of the difference in probabilities between both fragility functions

In Fig. 14, it is also important to observe that the absolute value of the difference between the fragility functions (Blue dashed line) has reduced. This line shows that the difference in probability may reach values in the order of 0.15. Also, the resulting parabola now opens upwards.

In spite of the reduction of \({S}_{y/x}\) observed in Fig. 14 (Right), if fragility functions were derived for the same typology (mid-rise buildings) considering AvSv as IM, \({S}_{y/x}\) are 0.31 and 0.303 for the linear and nonlinear fittings, respectively. This indicates, again, the superior efficiency of IMs based on spectral velocity quantities to predict EDPs related to the deformation field of the structure.

7 Discussion

Variables related to the acceleration response of SDoF systems are commonly used in seismic design or risk assessment of civil structures. For instance, when applying the response spectrum method, which is one of the most invoked tools for seismic design, the acceleration design spectrum is commonly used to estimate seismic-induced forces. This approach is practical since most of the structural analysis techniques are applications based on the Newtonian-force concept. That is, the effects of equivalent static forces of various origin, static or even dynamic, are linearly combined to verify a number of stress conditions. In the case of seismic risk assessment, IMs based on the spectral acceleration have been commonly used to quantify the variability in the structural response. However, as demonstrated in this article, the seismic response and damage of structural elements can be better related to the energy entering into them than to the forces exerted upon them (this is true when the damage is quantified through EDPs associated to the deformation field of the structure). It is important to note that this energy can be regarded as a function of the velocity of the system, as shown in Eq. 18. Therefore, results presented in this article indicate that earthquake design projects, as well as seismic risk assessments, would be more reliable if they were tied to IMs based on spectral displacement, velocity or equivalent velocity rather than acceleration. For instance, when analysing buildings according to the number of storeys, AvSv has shown to diminish \({S}_{y/x}\) for RC structures ranging from 5 to 11 storeys; AvSd not only minimizes \({S}_{y/x}\) for rigid structures (3- and 4-storey buildings) but it is also sufficient with respect to M and ED; the most efficient IM for high-rise (12- and 13-storey buildings) is AvVE. For design purposes, the outcomes of the present statistical analysis should allow a better quantification of the safety factors when acting loads are combined.

As discussed throughout this article, an ‘ideal’ IM should exhibit efficiency, sufficiency and steadfastness. The sufficiency property is the most difficult to fulfil for the 18 basic IMs presented herein. However, multiple regression models allow to overcome this issue. Thus, considering linear regression analysis for the sake of simplicity, the following IM* meets all the statistical properties described in this article for an ‘ideal’ IM, when predicting MIDR:

$$AvSv*{M}^{0.48}$$
(19)

and for predicting MFA:

$$PGA*{M}^{0.29}$$
(20)

These IM*s have been derived by considering coefficients given by Eq. 10. In both arrangements, the exponential of the independent term \((e^{{\alpha }_{0}})\) has been set to 1; also, \({\alpha }_{1}\) and \({\alpha }_{2}\) have been normalized by \({\alpha }_{1}\).

In the context of earthquake engineering, probabilistic seismic hazard analysis (PSHA, Cornell 1968) has been and is the reference method to know about the seismic hazard on a site; see for instance Baker (2008). This method has been widely used for almost 50 years in many applications of the earthquake engineering (Mulargia et al. 2017). However, a number of researchers have questioned the validity of PSHA to properly quantify the expected seismic actions in urban environments, since many damaging earthquakes have occurred in regions rated as relatively low-risk areas (Stein et al. 2012; Mulargia et al. 2017). In this sense, using an IM highly correlated to the structural response, and revisiting some hypotheses of the PSHA with respect to the spatiotemporal distribution of earthquakes, could help improve the accuracy in seismic risk estimations. Note that some IMs based on average spectral values presented in this study can be easily incorporated within the conceptual framework of the PSHA. To do so, it would be necessary to develop ground motion predictive models to estimate the mean and dispersion values of these IMs. It is quite likely that, because of the steadfastness showed by them, the ground motion predictive models would be less dependent on the fundamental period of the system. Recently, Kohrangi et al. (2017) and Davalos and Miranda (2018) have presented ground motion predictive models for \({Sa}_{avg}\).

8 Conclusions

Advanced methodologies to quantify the probability of occurrence of a certain seismic damage state in buildings are based on the analysis of IM-EDP relationships. These relationships should be obtained using NLDA, since this is the most comprehensive numerical tool to estimate the dynamic behavior of a structure. It is also important to consider the main sources of uncertainty involved in seismic analysis. Therefore, a relevant objective for seismic engineering is to find which IM is the most capable of predicting a specific EDP. Identifying this IM will contribute to reducing uncertainties when quantifying seismic risk of the building stock in both rural and urban environments. This reduction is one of the key aspects to mitigating seismic risk worldwide, since stakeholders as well as policy- and decision- makers could prioritize actions to eradicate this risk based on more reliable estimations. The primary goal of this article has been to identify the most efficient IMs to predict MIDR and MFA. Other desirable properties expected in an ‘ideal’ IM like sufficiency and steadfastness have been also quantified. A set of 18 basic IMs was considered in the analysis; some of them are original proposals of this research.

The building stock in rural and urban environments is composed of very different structural typologies (Martins and Silva 2020). Materials, configuration, asymmetry level, and rise are physical properties generally considered to define these typologies. This is because when they vary, it is expected that the bivariate distribution of IM-EDP pairs also does. This fact demands the definition of a wide variety of structural types in order to include all the observed buildings in contemporary human settlements. However, this number of types can be reduced by employing IMs insensitive to variations of some of the aforementioned physical properties. This independence can be measured by means of a statistical property identified herein as steadfastness. An IM that exhibits steadfastness allows grouping several building types without significantly increasing the dispersion of the data. Hence, for instance, the same fragility function can be used to quantify damage to a wide variety of structures, which turns into a reduction of the number of calculations when estimating seismic risk. In other words, these IMs allow grouping a large variety of buildings within an enhanced structural type, since the dispersion of the IM-EDP pairs remains stable. For instance, similar \({R}^{2}\), \({S}_{y/x}\) and \(\beta\) values have been observed whether results are grouped by the number of stories or not. This has been detected after quantifying the steadfastness of several IMs to predict MIDR and MFA, in the face of variations of the number of stories. Further research should be oriented to verify if these IMs also exhibit steadfastness for other structural types such as steel, masonry or wood buildings. The seismic response of other types of infrastructure can also be analysed with these enhanced IMs (Pinyol et al. 2021).

One of the most comprehensive approaches that currently exists for the derivation of fragility functions is the cloud analysis (Jalayer et al. 2015). This procedure is based on the statistical analysis of a cloud of IM-EDP points. A key matter of this procedure is the type of regression analysis employed to represent the mean curve of the cloud. This article has compared the use of linear and nonlinear regression models; differences in the order of 0.2 in probability have been observed. In the author’s view, fragility curves obtained when adjusting a nonlinear model will more adequately represent the structural performance when the IM varies. However, attention should be paid to the maximum or minimum values of the fitted parabola to avoid calculating unrealistic probabilities.

Regarding IMs obtained from direct computations of the ground motion record, it has also been shown that those related to the velocity are the most efficient to predict the MIDR; in the case of the MFA, IMs related to acceleration play this role. Note that these IMs do not depend on the dynamic properties of the analysed structures. This fact implies that buildings holding very different structural properties can be analysed using the same fragility function. In other words, these IMs are steadfast according to the definition presented in this research. However, it has to be said that they generally produce more dispersion than those considering structural properties.

It has been observed that the effectiveness of insufficient IMs may increase when they are combined in a multi-regression model with seismological parameters like M and ED. Based on this concept, it would be of interest to analyse if new improved IM*s can be developed using other IVs (for instance the ones presented in Table 1 or those described in Sect. 2.2). In this respect, polynomial regression models allow much better fits to some EDPs (e.g. MFA) than linear regression ones. This sophistication in the regression-model-type may help increase the accuracy in predictions of EDPs.

EDPs are important variables to consider when assessing seismic risk. They should describe as best as possible the expected damage of a structure. MIDR and MFA have been considered as EDPs in this research. They are related to the damage state of a single story (the one exhibiting the highest demand), but they may fail in describing the global damage state of the structure. Other EDPs like the damage index of Park and Ang (1985), which considers the response of all the structural elements should be explored. In doing so, the quantification of the damage state of a structure affected by an earthquake is expected to be more reliable.

Results presented in this article meet the principles of fair data (Wilkinson et al. 2016). Thus, potential users could find, access, interoperate and re-use data presented herein. A text file containing the main results as well a document describing the data arrangement can be downloaded from the following website: http://kairoseq.upc.edu. A MATLAB code has also been included, which can be used to reproduce some results presented in this article. This code could be easily operated or adapted, allowing potential users to develop further ad-hoc applications.